From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.cz>,
David Rientjes <rientjes@google.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
azurIt <azurit@pobox.sk>,
linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org,
linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
kosaki.motohiro@gmail.com
Subject: Re: [patch 5/6] mm: memcg: enable memcg OOM killer only for user faults
Date: Mon, 29 Jul 2013 15:18:36 -0400 [thread overview]
Message-ID: <51F6C00C.5050702@gmail.com> (raw)
In-Reply-To: <1374791138-15665-6-git-send-email-hannes@cmpxchg.org>
(7/25/13 6:25 PM), Johannes Weiner wrote:
> System calls and kernel faults (uaccess, gup) can handle an out of
> memory situation gracefully and just return -ENOMEM.
>
> Enable the memcg OOM killer only for user faults, where it's really
> the only option available.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
> include/linux/memcontrol.h | 23 +++++++++++++++++++++++
> include/linux/sched.h | 3 +++
> mm/filemap.c | 11 ++++++++++-
> mm/memcontrol.c | 2 +-
> mm/memory.c | 40 ++++++++++++++++++++++++++++++----------
> 5 files changed, 67 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 7b4d9d7..9bb5eeb 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -125,6 +125,24 @@ extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
> extern void mem_cgroup_replace_page_cache(struct page *oldpage,
> struct page *newpage);
>
> +/**
> + * mem_cgroup_xchg_may_oom - toggle the memcg OOM killer for a task
> + * @p: task
> + * @new: true to enable, false to disable
> + *
> + * Toggle whether a failed memcg charge should invoke the OOM killer
> + * or just return -ENOMEM. Returns the previous toggle state.
> + */
> +static inline bool mem_cgroup_xchg_may_oom(struct task_struct *p, bool new)
> +{
> + bool old;
> +
> + old = p->memcg_oom.may_oom;
> + p->memcg_oom.may_oom = new;
> +
> + return old;
> +}
The name of xchg strongly suggest the function use compare-swap op. So, it seems
misleading name. I suggest just use "set_*" or something else. In linux kernel,
many setter functions already return old value. Don't mind.
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index fc09d21..4b3effc 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1398,6 +1398,9 @@ struct task_struct {
> unsigned long memsw_nr_pages; /* uncharged mem+swap usage */
> } memcg_batch;
> unsigned int memcg_kmem_skip_account;
> + struct memcg_oom_info {
> + unsigned int may_oom:1;
> + } memcg_oom;
This ":1" makes slower but doesn't diet any memory space, right? I suggest
to use bool. If anybody need to diet in future, he may change it to bit field.
That's ok, let's stop too early and questionable micro optimization.
> diff --git a/mm/filemap.c b/mm/filemap.c
> index a6981fe..2932810 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1617,6 +1617,7 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> struct file_ra_state *ra = &file->f_ra;
> struct inode *inode = mapping->host;
> pgoff_t offset = vmf->pgoff;
> + unsigned int may_oom;
Why don't you use bool? your mem_cgroup_xchg_may_oom() uses bool and it seems cleaner more.
> @@ -1626,7 +1627,11 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> return VM_FAULT_SIGBUS;
>
> /*
> - * Do we have something in the page cache already?
> + * Do we have something in the page cache already? Either
> + * way, try readahead, but disable the memcg OOM killer for it
> + * as readahead is optional and no errors are propagated up
> + * the fault stack. The OOM killer is enabled while trying to
> + * instantiate the faulting page individually below.
> */
> page = find_get_page(mapping, offset);
> if (likely(page) && !(vmf->flags & FAULT_FLAG_TRIED)) {
> @@ -1634,10 +1639,14 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> * We found the page, so try async readahead before
> * waiting for the lock.
> */
> + may_oom = mem_cgroup_xchg_may_oom(current, 0);
> do_async_mmap_readahead(vma, ra, file, page, offset);
> + mem_cgroup_xchg_may_oom(current, may_oom);
> } else if (!page) {
> /* No page in the page cache at all */
> + may_oom = mem_cgroup_xchg_may_oom(current, 0);
> do_sync_mmap_readahead(vma, ra, file, offset);
> + mem_cgroup_xchg_may_oom(current, may_oom);
> count_vm_event(PGMAJFAULT);
> mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
> ret = VM_FAULT_MAJOR;
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 00a7a66..30ae46a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2614,7 +2614,7 @@ static int mem_cgroup_do_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> return CHARGE_RETRY;
>
> /* If we don't need to call oom-killer at el, return immediately */
> - if (!oom_check)
> + if (!oom_check || !current->memcg_oom.may_oom)
> return CHARGE_NOMEM;
> /* check OOM */
> if (!mem_cgroup_handle_oom(mem_over_limit, gfp_mask, get_order(csize)))
> diff --git a/mm/memory.c b/mm/memory.c
> index f2ab2a8..5ea7b47 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3752,22 +3752,14 @@ unlock:
> /*
> * By the time we get here, we already hold the mm semaphore
> */
> -int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> - unsigned long address, unsigned int flags)
> +static int __handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> + unsigned long address, unsigned int flags)
> {
> pgd_t *pgd;
> pud_t *pud;
> pmd_t *pmd;
> pte_t *pte;
>
> - __set_current_state(TASK_RUNNING);
> -
> - count_vm_event(PGFAULT);
> - mem_cgroup_count_vm_event(mm, PGFAULT);
> -
> - /* do counter updates before entering really critical section. */
> - check_sync_rss_stat(current);
> -
> if (unlikely(is_vm_hugetlb_page(vma)))
> return hugetlb_fault(mm, vma, address, flags);
>
> @@ -3851,6 +3843,34 @@ retry:
> return handle_pte_fault(mm, vma, address, pte, pmd, flags);
> }
>
> +int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> + unsigned long address, unsigned int flags)
> +{
> + int ret;
> +
> + __set_current_state(TASK_RUNNING);
> +
> + count_vm_event(PGFAULT);
> + mem_cgroup_count_vm_event(mm, PGFAULT);
> +
> + /* do counter updates before entering really critical section. */
> + check_sync_rss_stat(current);
> +
> + /*
> + * Enable the memcg OOM handling for faults triggered in user
> + * space. Kernel faults are handled more gracefully.
> + */
> + if (flags & FAULT_FLAG_USER)
> + WARN_ON(mem_cgroup_xchg_may_oom(current, true) == true);
Please don't assume WARN_ON never erase any code. I'm not surprised if embedded
guys replace WARN_ON with nop in future.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.cz>,
David Rientjes <rientjes@google.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
azurIt <azurit@pobox.sk>,
linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org,
linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
kosaki.motohiro@gmail.com
Subject: Re: [patch 5/6] mm: memcg: enable memcg OOM killer only for user faults
Date: Mon, 29 Jul 2013 15:18:36 -0400 [thread overview]
Message-ID: <51F6C00C.5050702@gmail.com> (raw)
Message-ID: <20130729191836.2ArOdMN7LKW3zStHQuLwsDjy8yHRPZ9xTlghEW4ZOCQ@z> (raw)
In-Reply-To: <1374791138-15665-6-git-send-email-hannes@cmpxchg.org>
(7/25/13 6:25 PM), Johannes Weiner wrote:
> System calls and kernel faults (uaccess, gup) can handle an out of
> memory situation gracefully and just return -ENOMEM.
>
> Enable the memcg OOM killer only for user faults, where it's really
> the only option available.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
> include/linux/memcontrol.h | 23 +++++++++++++++++++++++
> include/linux/sched.h | 3 +++
> mm/filemap.c | 11 ++++++++++-
> mm/memcontrol.c | 2 +-
> mm/memory.c | 40 ++++++++++++++++++++++++++++++----------
> 5 files changed, 67 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 7b4d9d7..9bb5eeb 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -125,6 +125,24 @@ extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
> extern void mem_cgroup_replace_page_cache(struct page *oldpage,
> struct page *newpage);
>
> +/**
> + * mem_cgroup_xchg_may_oom - toggle the memcg OOM killer for a task
> + * @p: task
> + * @new: true to enable, false to disable
> + *
> + * Toggle whether a failed memcg charge should invoke the OOM killer
> + * or just return -ENOMEM. Returns the previous toggle state.
> + */
> +static inline bool mem_cgroup_xchg_may_oom(struct task_struct *p, bool new)
> +{
> + bool old;
> +
> + old = p->memcg_oom.may_oom;
> + p->memcg_oom.may_oom = new;
> +
> + return old;
> +}
The name of xchg strongly suggest the function use compare-swap op. So, it seems
misleading name. I suggest just use "set_*" or something else. In linux kernel,
many setter functions already return old value. Don't mind.
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index fc09d21..4b3effc 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1398,6 +1398,9 @@ struct task_struct {
> unsigned long memsw_nr_pages; /* uncharged mem+swap usage */
> } memcg_batch;
> unsigned int memcg_kmem_skip_account;
> + struct memcg_oom_info {
> + unsigned int may_oom:1;
> + } memcg_oom;
This ":1" makes slower but doesn't diet any memory space, right? I suggest
to use bool. If anybody need to diet in future, he may change it to bit field.
That's ok, let's stop too early and questionable micro optimization.
> diff --git a/mm/filemap.c b/mm/filemap.c
> index a6981fe..2932810 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1617,6 +1617,7 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> struct file_ra_state *ra = &file->f_ra;
> struct inode *inode = mapping->host;
> pgoff_t offset = vmf->pgoff;
> + unsigned int may_oom;
Why don't you use bool? your mem_cgroup_xchg_may_oom() uses bool and it seems cleaner more.
> @@ -1626,7 +1627,11 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> return VM_FAULT_SIGBUS;
>
> /*
> - * Do we have something in the page cache already?
> + * Do we have something in the page cache already? Either
> + * way, try readahead, but disable the memcg OOM killer for it
> + * as readahead is optional and no errors are propagated up
> + * the fault stack. The OOM killer is enabled while trying to
> + * instantiate the faulting page individually below.
> */
> page = find_get_page(mapping, offset);
> if (likely(page) && !(vmf->flags & FAULT_FLAG_TRIED)) {
> @@ -1634,10 +1639,14 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> * We found the page, so try async readahead before
> * waiting for the lock.
> */
> + may_oom = mem_cgroup_xchg_may_oom(current, 0);
> do_async_mmap_readahead(vma, ra, file, page, offset);
> + mem_cgroup_xchg_may_oom(current, may_oom);
> } else if (!page) {
> /* No page in the page cache at all */
> + may_oom = mem_cgroup_xchg_may_oom(current, 0);
> do_sync_mmap_readahead(vma, ra, file, offset);
> + mem_cgroup_xchg_may_oom(current, may_oom);
> count_vm_event(PGMAJFAULT);
> mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
> ret = VM_FAULT_MAJOR;
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 00a7a66..30ae46a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2614,7 +2614,7 @@ static int mem_cgroup_do_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> return CHARGE_RETRY;
>
> /* If we don't need to call oom-killer at el, return immediately */
> - if (!oom_check)
> + if (!oom_check || !current->memcg_oom.may_oom)
> return CHARGE_NOMEM;
> /* check OOM */
> if (!mem_cgroup_handle_oom(mem_over_limit, gfp_mask, get_order(csize)))
> diff --git a/mm/memory.c b/mm/memory.c
> index f2ab2a8..5ea7b47 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3752,22 +3752,14 @@ unlock:
> /*
> * By the time we get here, we already hold the mm semaphore
> */
> -int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> - unsigned long address, unsigned int flags)
> +static int __handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> + unsigned long address, unsigned int flags)
> {
> pgd_t *pgd;
> pud_t *pud;
> pmd_t *pmd;
> pte_t *pte;
>
> - __set_current_state(TASK_RUNNING);
> -
> - count_vm_event(PGFAULT);
> - mem_cgroup_count_vm_event(mm, PGFAULT);
> -
> - /* do counter updates before entering really critical section. */
> - check_sync_rss_stat(current);
> -
> if (unlikely(is_vm_hugetlb_page(vma)))
> return hugetlb_fault(mm, vma, address, flags);
>
> @@ -3851,6 +3843,34 @@ retry:
> return handle_pte_fault(mm, vma, address, pte, pmd, flags);
> }
>
> +int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> + unsigned long address, unsigned int flags)
> +{
> + int ret;
> +
> + __set_current_state(TASK_RUNNING);
> +
> + count_vm_event(PGFAULT);
> + mem_cgroup_count_vm_event(mm, PGFAULT);
> +
> + /* do counter updates before entering really critical section. */
> + check_sync_rss_stat(current);
> +
> + /*
> + * Enable the memcg OOM handling for faults triggered in user
> + * space. Kernel faults are handled more gracefully.
> + */
> + if (flags & FAULT_FLAG_USER)
> + WARN_ON(mem_cgroup_xchg_may_oom(current, true) == true);
Please don't assume WARN_ON never erase any code. I'm not surprised if embedded
guys replace WARN_ON with nop in future.
Thanks.
next prev parent reply other threads:[~2013-07-29 19:18 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-25 22:25 [patch 0/6] improve memcg oom killer robustness Johannes Weiner
2013-07-25 22:25 ` Johannes Weiner
2013-07-25 22:25 ` [patch 1/6] arch: mm: remove obsolete init OOM protection Johannes Weiner
2013-07-25 22:25 ` Johannes Weiner
2013-07-26 13:00 ` Michal Hocko
2013-07-26 13:00 ` Michal Hocko
2013-07-29 18:55 ` KOSAKI Motohiro
2013-07-29 18:55 ` KOSAKI Motohiro
2013-07-25 22:25 ` [patch 2/6] arch: mm: do not invoke OOM killer on kernel fault OOM Johannes Weiner
2013-07-25 22:25 ` Johannes Weiner
2013-07-26 13:07 ` Michal Hocko
2013-07-26 13:07 ` Michal Hocko
[not found] ` <1374791138-15665-3-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-07-29 18:58 ` KOSAKI Motohiro
2013-07-29 18:58 ` KOSAKI Motohiro
2013-07-29 18:58 ` KOSAKI Motohiro
2013-08-01 21:59 ` Johannes Weiner
2013-08-01 21:59 ` Johannes Weiner
2013-07-25 22:25 ` [patch 3/6] arch: mm: pass userspace fault flag to generic fault handler Johannes Weiner
2013-07-25 22:25 ` Johannes Weiner
[not found] ` <1374791138-15665-4-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-07-26 13:19 ` Michal Hocko
2013-07-26 13:19 ` Michal Hocko
2013-07-26 13:19 ` Michal Hocko
2013-07-26 18:45 ` Johannes Weiner
2013-07-26 18:45 ` Johannes Weiner
2013-07-25 22:25 ` [patch 4/6] x86: finish user fault error path with fatal signal Johannes Weiner
2013-07-25 22:25 ` Johannes Weiner
2013-07-26 13:52 ` Michal Hocko
2013-07-26 13:52 ` Michal Hocko
2013-07-26 18:46 ` Johannes Weiner
2013-07-26 18:46 ` Johannes Weiner
2013-07-29 12:45 ` Michal Hocko
2013-07-29 12:45 ` Michal Hocko
2013-07-29 19:01 ` KOSAKI Motohiro
2013-07-29 19:01 ` KOSAKI Motohiro
2013-07-25 22:25 ` [patch 5/6] mm: memcg: enable memcg OOM killer only for user faults Johannes Weiner
2013-07-25 22:25 ` Johannes Weiner
[not found] ` <1374791138-15665-6-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-07-26 14:16 ` Michal Hocko
2013-07-26 14:16 ` Michal Hocko
2013-07-26 14:16 ` Michal Hocko
2013-07-26 18:54 ` Johannes Weiner
2013-07-26 18:54 ` Johannes Weiner
2013-07-29 19:18 ` KOSAKI Motohiro [this message]
2013-07-29 19:18 ` KOSAKI Motohiro
[not found] ` <51F6C00C.5050702-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-07-29 19:44 ` Johannes Weiner
2013-07-29 19:44 ` Johannes Weiner
2013-07-29 19:44 ` Johannes Weiner
2013-07-29 19:47 ` KOSAKI Motohiro
2013-07-29 19:47 ` KOSAKI Motohiro
2013-07-25 22:25 ` [patch 6/6] mm: memcg: do not trap chargers with full callstack on OOM Johannes Weiner
2013-07-25 22:25 ` Johannes Weiner
[not found] ` <1374791138-15665-7-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-07-26 14:43 ` Michal Hocko
2013-07-26 14:43 ` Michal Hocko
2013-07-26 14:43 ` Michal Hocko
2013-07-26 21:28 ` Johannes Weiner
2013-07-26 21:28 ` Johannes Weiner
2013-07-29 14:12 ` Michal Hocko
2013-07-29 14:12 ` Michal Hocko
[not found] ` <20130729141250.GF4678-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-29 14:55 ` Johannes Weiner
2013-07-29 14:55 ` Johannes Weiner
2013-07-29 14:55 ` Johannes Weiner
[not found] ` <20130729145529.GW715-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-07-29 15:52 ` Michal Hocko
2013-07-29 15:52 ` Michal Hocko
2013-07-29 15:52 ` Michal Hocko
[not found] ` <20130726212808.GD17975-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-07-30 14:09 ` Michal Hocko
2013-07-30 14:09 ` Michal Hocko
2013-07-30 14:09 ` Michal Hocko
[not found] ` <20130730140913.GC15847-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-30 14:32 ` Johannes Weiner
2013-07-30 14:32 ` Johannes Weiner
2013-07-30 14:32 ` Johannes Weiner
[not found] ` <20130730143228.GD715-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-07-30 14:56 ` Michal Hocko
2013-07-30 14:56 ` Michal Hocko
2013-07-30 14:56 ` Michal Hocko
2013-07-25 22:31 ` [patch 3.2] memcg OOM robustness (x86 only) Johannes Weiner
2013-07-25 22:31 ` Johannes Weiner
2013-08-03 8:38 ` azurIt
2013-08-03 8:38 ` azurIt
2013-08-03 16:30 ` Johannes Weiner
2013-08-03 16:30 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51F6C00C.5050702@gmail.com \
--to=kosaki.motohiro@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=azurit@pobox.sk \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=rientjes@google.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.