From: PINTU KUMAR <pintu.k@samsung.com>
To: 'Michal Hocko' <mhocko@kernel.org>
Cc: akpm@linux-foundation.org, minchan@kernel.org, dave@stgolabs.net,
koct9i@gmail.com, rientjes@google.com, hannes@cmpxchg.org,
penguin-kernel@i-love.sakura.ne.jp, bywxiaobai@163.com,
mgorman@suse.de, vbabka@suse.cz, js1304@gmail.com,
kirill.shutemov@linux.intel.com, alexander.h.duyck@redhat.com,
sasha.levin@oracle.com, cl@linux.com, fengguang.wu@intel.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
cpgs@samsung.com, pintu_agarwal@yahoo.com, pintu.ping@gmail.com,
vishnu.ps@samsung.com, rohit.kr@samsung.com,
c.rajkumar@samsung.com, sreenathd@samsung.com
Subject: RE: [PATCH 1/1] mm: vmstat: Add OOM kill count in vmstat counter
Date: Mon, 05 Oct 2015 11:42:49 +0530 [thread overview]
Message-ID: <010401d0ff34$f48e8eb0$ddabac10$@samsung.com> (raw)
In-Reply-To: <20151001133843.GG24077@dhcp22.suse.cz>
Hi,
> -----Original Message-----
> From: Michal Hocko [mailto:mhocko@kernel.org]
> Sent: Thursday, October 01, 2015 7:09 PM
> To: Pintu Kumar
> Cc: akpm@linux-foundation.org; minchan@kernel.org; dave@stgolabs.net;
> koct9i@gmail.com; rientjes@google.com; hannes@cmpxchg.org; penguin-
> kernel@i-love.sakura.ne.jp; bywxiaobai@163.com; mgorman@suse.de;
> vbabka@suse.cz; js1304@gmail.com; kirill.shutemov@linux.intel.com;
> alexander.h.duyck@redhat.com; sasha.levin@oracle.com; cl@linux.com;
> fengguang.wu@intel.com; linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> cpgs@samsung.com; pintu_agarwal@yahoo.com; pintu.ping@gmail.com;
> vishnu.ps@samsung.com; rohit.kr@samsung.com; c.rajkumar@samsung.com;
> sreenathd@samsung.com
> Subject: Re: [PATCH 1/1] mm: vmstat: Add OOM kill count in vmstat counter
>
> On Thu 01-10-15 16:18:43, Pintu Kumar wrote:
> > This patch maintains number of oom calls and number of oom kill count
> > in /proc/vmstat.
> > It is helpful during sluggish, aging or long duration tests.
> > Currently if the OOM happens, it can be only seen in kernel ring buffer.
> > But during long duration tests, all the dmesg and /var/log/messages*
> > could be overwritten.
> > So, just like other counters, the oom can also be maintained in
> > /proc/vmstat.
> > It can be also seen if all logs are disabled in kernel.
> >
> > A snapshot of the result of over night test is shown below:
> > $ cat /proc/vmstat
> > oom_stall 610
> > oom_kill_count 1763
> >
> > Here, oom_stall indicates that there are 610 times, kernel entered
> > into OOM cases. However, there were around 1763 oom killing happens.
>
> This alone looks quite suspicious. Unless you have tasks which share the
address
> space without being in the same thread group this shouldn't happen in such a
> large scale.
Yes, this accounts for out_of_memory even from memory cgroups.
Please check few snapshots of dmesg outputs captured during over-night tests.
........
[49479.078033] [2: xxxxxxxx:20874] Memory cgroup out of memory: Kill
process 20880 (xxxxxxx) score 112 or sacrifice child
[49480.910430] [2: xxxxxxxx:20882] Memory cgroup out of memory: Kill
process 20888 (xxxxxxxx) score 112 or sacrifice child
[49567.046203] [0: yyyyyyy: 548] Out of memory: Kill process 20458
(zzzzzzzzzz) score 102 or sacrifice child
[49567.346588] [0: yyyyyyy: 548] Out of memory: Kill process 21102
(zzzzzzzzzz) score 104 or sacrifice child
.........
The _out of memory_ count in dmesg dump output exactly matches the number in
/proc/vmstat -> oom_kill_count
> </me looks into the patch>
> And indeed the patch is incorrect. You are only counting OOMs from the page
> allocator slow path. You are missing all the OOM invocations from the page
fault
> path.
Sorry, I am not sure what exactly you mean. Please point me out if I am missing
some places.
Actually, I tried to add it at generic place that is; oom_kill_process, which is
called by out_of_memory(...).
Are you talking about: pagefault_out_of_memory(...) ?
But, this is already calling: out_of_memory. No?
> The placement inside __alloc_pages_may_oom looks quite arbitrary as well. You
> are not counting events where we are OOM but somebody is holding the
> oom_mutex but you do count last attempt before going really OOM. Then we
> have cases which do not invoke OOM killer which are counted into oom_stall as
> well. I am not sure whether they should because I am not quite sure about the
> semantic of the counter in the first place.
Ok. Yes, it can be added right after it enters into __alloc_pages_may_oom.
I will make the changes.
Actually, I knowingly skipped the oom_lock case, because in our 3.10 kernel, we
had note_oom_kill(..)
Added right after this check.
So, I also added it exactly at the same place.
Ok, I can make the necessary changes, if the oom_lock case also matters.
> What is it supposed to tell us? How many times the system had to go into
> emergency OOM steps? How many times the direct reclaim didn't make any
> progress so we can consider the system OOM?
>
Yes, exactly, oom_stall can tell, how many times OOM is invoked in the system.
Yes, it can also tell how many times direct_reclaim fails completely.
Currently, we don't have any counter for direct_reclaim success/fail.
Also, oom_kill_process will not be invoked for higher orders
(PAGE_ALLOC_COSTLY_ORDER).
But, it will enter OOM and results into straight page allocation failure.
> oom_kill_count has a slightly misleading names because it suggests how many
> times oom_kill was called but in fact it counts the oom victims.
> Not sure whether this information is so much useful but the semantic is clear
at
> least.
>
Ok, agree about the semantic of the name: oom_kill_count.
If possible please suggest a better name.
How about the following names?
oom_victim_count ?
oom_nr_killed ?
oom_nr_victim ?
> > The OOM is bad for the any system. So, this counter can help the
> > developer in tuning the memory requirement at least during initial bringup.
> >
> > Signed-off-by: Pintu Kumar <pintu.k@samsung.com>
> > ---
> > include/linux/vm_event_item.h | 2 ++
> > mm/oom_kill.c | 2 ++
> > mm/page_alloc.c | 2 +-
> > mm/vmstat.c | 2 ++
> > 4 files changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/vm_event_item.h
> > b/include/linux/vm_event_item.h index 2b1cef8..ade0851 100644
> > --- a/include/linux/vm_event_item.h
> > +++ b/include/linux/vm_event_item.h
> > @@ -57,6 +57,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN,
> > PSWPOUT, #ifdef CONFIG_HUGETLB_PAGE
> > HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL, #endif
> > + OOM_STALL,
> > + OOM_KILL_COUNT,
> > UNEVICTABLE_PGCULLED, /* culled to noreclaim list */
> > UNEVICTABLE_PGSCANNED, /* scanned for reclaimability */
> > UNEVICTABLE_PGRESCUED, /* rescued from noreclaim list */
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 03b612b..e79caed
> > 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -570,6 +570,7 @@ void oom_kill_process(struct oom_control *oc, struct
> task_struct *p,
> > * space under its control.
> > */
> > do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
> > + count_vm_event(OOM_KILL_COUNT);
> > mark_oom_victim(victim);
> > pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-
> rss:%lukB\n",
> > task_pid_nr(victim), victim->comm, K(victim->mm->total_vm),
> @@
> > -600,6 +601,7 @@ void oom_kill_process(struct oom_control *oc, struct
> task_struct *p,
> > task_pid_nr(p), p->comm);
> > task_unlock(p);
> > do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true);
> > + count_vm_event(OOM_KILL_COUNT);
> > }
> > rcu_read_unlock();
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9bcfd70..1d82210
> > 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2761,7 +2761,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned
> int order,
> > schedule_timeout_uninterruptible(1);
> > return NULL;
> > }
> > -
> > + count_vm_event(OOM_STALL);
> > /*
> > * Go through the zonelist yet one more time, keep very high watermark
> > * here, this is only to catch a parallel oom killing, we must fail
> > if diff --git a/mm/vmstat.c b/mm/vmstat.c index 1fd0886..f054265
> > 100644
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -808,6 +808,8 @@ const char * const vmstat_text[] = {
> > "htlb_buddy_alloc_success",
> > "htlb_buddy_alloc_fail",
> > #endif
> > + "oom_stall",
> > + "oom_kill_count",
> > "unevictable_pgs_culled",
> > "unevictable_pgs_scanned",
> > "unevictable_pgs_rescued",
> > --
> > 1.7.9.5
>
> --
> Michal Hocko
> SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: PINTU KUMAR <pintu.k@samsung.com>
To: "'Michal Hocko'" <mhocko@kernel.org>
Cc: akpm@linux-foundation.org, minchan@kernel.org, dave@stgolabs.net,
koct9i@gmail.com, rientjes@google.com, hannes@cmpxchg.org,
penguin-kernel@i-love.sakura.ne.jp, bywxiaobai@163.com,
mgorman@suse.de, vbabka@suse.cz, js1304@gmail.com,
kirill.shutemov@linux.intel.com, alexander.h.duyck@redhat.com,
sasha.levin@oracle.com, cl@linux.com, fengguang.wu@intel.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
cpgs@samsung.com, pintu_agarwal@yahoo.com, pintu.ping@gmail.com,
vishnu.ps@samsung.com, rohit.kr@samsung.com,
c.rajkumar@samsung.com, sreenathd@samsung.com
Subject: RE: [PATCH 1/1] mm: vmstat: Add OOM kill count in vmstat counter
Date: Mon, 05 Oct 2015 11:42:49 +0530 [thread overview]
Message-ID: <010401d0ff34$f48e8eb0$ddabac10$@samsung.com> (raw)
In-Reply-To: <20151001133843.GG24077@dhcp22.suse.cz>
Hi,
> -----Original Message-----
> From: Michal Hocko [mailto:mhocko@kernel.org]
> Sent: Thursday, October 01, 2015 7:09 PM
> To: Pintu Kumar
> Cc: akpm@linux-foundation.org; minchan@kernel.org; dave@stgolabs.net;
> koct9i@gmail.com; rientjes@google.com; hannes@cmpxchg.org; penguin-
> kernel@i-love.sakura.ne.jp; bywxiaobai@163.com; mgorman@suse.de;
> vbabka@suse.cz; js1304@gmail.com; kirill.shutemov@linux.intel.com;
> alexander.h.duyck@redhat.com; sasha.levin@oracle.com; cl@linux.com;
> fengguang.wu@intel.com; linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> cpgs@samsung.com; pintu_agarwal@yahoo.com; pintu.ping@gmail.com;
> vishnu.ps@samsung.com; rohit.kr@samsung.com; c.rajkumar@samsung.com;
> sreenathd@samsung.com
> Subject: Re: [PATCH 1/1] mm: vmstat: Add OOM kill count in vmstat counter
>
> On Thu 01-10-15 16:18:43, Pintu Kumar wrote:
> > This patch maintains number of oom calls and number of oom kill count
> > in /proc/vmstat.
> > It is helpful during sluggish, aging or long duration tests.
> > Currently if the OOM happens, it can be only seen in kernel ring buffer.
> > But during long duration tests, all the dmesg and /var/log/messages*
> > could be overwritten.
> > So, just like other counters, the oom can also be maintained in
> > /proc/vmstat.
> > It can be also seen if all logs are disabled in kernel.
> >
> > A snapshot of the result of over night test is shown below:
> > $ cat /proc/vmstat
> > oom_stall 610
> > oom_kill_count 1763
> >
> > Here, oom_stall indicates that there are 610 times, kernel entered
> > into OOM cases. However, there were around 1763 oom killing happens.
>
> This alone looks quite suspicious. Unless you have tasks which share the
address
> space without being in the same thread group this shouldn't happen in such a
> large scale.
Yes, this accounts for out_of_memory even from memory cgroups.
Please check few snapshots of dmesg outputs captured during over-night tests.
........
[49479.078033] [2: xxxxxxxx:20874] Memory cgroup out of memory: Kill
process 20880 (xxxxxxx) score 112 or sacrifice child
[49480.910430] [2: xxxxxxxx:20882] Memory cgroup out of memory: Kill
process 20888 (xxxxxxxx) score 112 or sacrifice child
[49567.046203] [0: yyyyyyy: 548] Out of memory: Kill process 20458
(zzzzzzzzzz) score 102 or sacrifice child
[49567.346588] [0: yyyyyyy: 548] Out of memory: Kill process 21102
(zzzzzzzzzz) score 104 or sacrifice child
.........
The _out of memory_ count in dmesg dump output exactly matches the number in
/proc/vmstat -> oom_kill_count
> </me looks into the patch>
> And indeed the patch is incorrect. You are only counting OOMs from the page
> allocator slow path. You are missing all the OOM invocations from the page
fault
> path.
Sorry, I am not sure what exactly you mean. Please point me out if I am missing
some places.
Actually, I tried to add it at generic place that is; oom_kill_process, which is
called by out_of_memory(...).
Are you talking about: pagefault_out_of_memory(...) ?
But, this is already calling: out_of_memory. No?
> The placement inside __alloc_pages_may_oom looks quite arbitrary as well. You
> are not counting events where we are OOM but somebody is holding the
> oom_mutex but you do count last attempt before going really OOM. Then we
> have cases which do not invoke OOM killer which are counted into oom_stall as
> well. I am not sure whether they should because I am not quite sure about the
> semantic of the counter in the first place.
Ok. Yes, it can be added right after it enters into __alloc_pages_may_oom.
I will make the changes.
Actually, I knowingly skipped the oom_lock case, because in our 3.10 kernel, we
had note_oom_kill(..)
Added right after this check.
So, I also added it exactly at the same place.
Ok, I can make the necessary changes, if the oom_lock case also matters.
> What is it supposed to tell us? How many times the system had to go into
> emergency OOM steps? How many times the direct reclaim didn't make any
> progress so we can consider the system OOM?
>
Yes, exactly, oom_stall can tell, how many times OOM is invoked in the system.
Yes, it can also tell how many times direct_reclaim fails completely.
Currently, we don't have any counter for direct_reclaim success/fail.
Also, oom_kill_process will not be invoked for higher orders
(PAGE_ALLOC_COSTLY_ORDER).
But, it will enter OOM and results into straight page allocation failure.
> oom_kill_count has a slightly misleading names because it suggests how many
> times oom_kill was called but in fact it counts the oom victims.
> Not sure whether this information is so much useful but the semantic is clear
at
> least.
>
Ok, agree about the semantic of the name: oom_kill_count.
If possible please suggest a better name.
How about the following names?
oom_victim_count ?
oom_nr_killed ?
oom_nr_victim ?
> > The OOM is bad for the any system. So, this counter can help the
> > developer in tuning the memory requirement at least during initial bringup.
> >
> > Signed-off-by: Pintu Kumar <pintu.k@samsung.com>
> > ---
> > include/linux/vm_event_item.h | 2 ++
> > mm/oom_kill.c | 2 ++
> > mm/page_alloc.c | 2 +-
> > mm/vmstat.c | 2 ++
> > 4 files changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/vm_event_item.h
> > b/include/linux/vm_event_item.h index 2b1cef8..ade0851 100644
> > --- a/include/linux/vm_event_item.h
> > +++ b/include/linux/vm_event_item.h
> > @@ -57,6 +57,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN,
> > PSWPOUT, #ifdef CONFIG_HUGETLB_PAGE
> > HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL, #endif
> > + OOM_STALL,
> > + OOM_KILL_COUNT,
> > UNEVICTABLE_PGCULLED, /* culled to noreclaim list */
> > UNEVICTABLE_PGSCANNED, /* scanned for reclaimability */
> > UNEVICTABLE_PGRESCUED, /* rescued from noreclaim list */
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 03b612b..e79caed
> > 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -570,6 +570,7 @@ void oom_kill_process(struct oom_control *oc, struct
> task_struct *p,
> > * space under its control.
> > */
> > do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
> > + count_vm_event(OOM_KILL_COUNT);
> > mark_oom_victim(victim);
> > pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-
> rss:%lukB\n",
> > task_pid_nr(victim), victim->comm, K(victim->mm->total_vm),
> @@
> > -600,6 +601,7 @@ void oom_kill_process(struct oom_control *oc, struct
> task_struct *p,
> > task_pid_nr(p), p->comm);
> > task_unlock(p);
> > do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true);
> > + count_vm_event(OOM_KILL_COUNT);
> > }
> > rcu_read_unlock();
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9bcfd70..1d82210
> > 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2761,7 +2761,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned
> int order,
> > schedule_timeout_uninterruptible(1);
> > return NULL;
> > }
> > -
> > + count_vm_event(OOM_STALL);
> > /*
> > * Go through the zonelist yet one more time, keep very high watermark
> > * here, this is only to catch a parallel oom killing, we must fail
> > if diff --git a/mm/vmstat.c b/mm/vmstat.c index 1fd0886..f054265
> > 100644
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -808,6 +808,8 @@ const char * const vmstat_text[] = {
> > "htlb_buddy_alloc_success",
> > "htlb_buddy_alloc_fail",
> > #endif
> > + "oom_stall",
> > + "oom_kill_count",
> > "unevictable_pgs_culled",
> > "unevictable_pgs_scanned",
> > "unevictable_pgs_rescued",
> > --
> > 1.7.9.5
>
> --
> Michal Hocko
> SUSE Labs
next prev parent reply other threads:[~2015-10-05 6:12 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-01 10:48 [PATCH 1/1] mm: vmstat: Add OOM kill count in vmstat counter Pintu Kumar
2015-10-01 10:48 ` Pintu Kumar
2015-10-01 13:29 ` Anshuman Khandual
2015-10-01 13:29 ` Anshuman Khandual
2015-10-05 6:19 ` PINTU KUMAR
2015-10-05 6:19 ` PINTU KUMAR
2015-10-01 13:38 ` Michal Hocko
2015-10-01 13:38 ` Michal Hocko
2015-10-05 6:12 ` PINTU KUMAR [this message]
2015-10-05 6:12 ` PINTU KUMAR
2015-10-05 12:22 ` Michal Hocko
2015-10-05 12:22 ` Michal Hocko
2015-10-06 6:59 ` PINTU KUMAR
2015-10-06 6:59 ` PINTU KUMAR
2015-10-06 15:41 ` Michal Hocko
2015-10-06 15:41 ` Michal Hocko
2015-10-07 14:48 ` PINTU KUMAR
2015-10-07 14:48 ` PINTU KUMAR
2015-10-08 14:18 ` Michal Hocko
2015-10-08 14:18 ` Michal Hocko
2015-10-08 16:06 ` PINTU KUMAR
2015-10-08 16:06 ` PINTU KUMAR
2015-10-08 16:30 ` Michal Hocko
2015-10-08 16:30 ` Michal Hocko
2015-10-09 12:59 ` PINTU KUMAR
2015-10-09 12:59 ` PINTU KUMAR
2015-10-12 13:33 ` [PATCH 1/1] mm: vmstat: Add OOM victims " Pintu Kumar
2015-10-12 13:33 ` Pintu Kumar
2015-10-12 14:28 ` [RESEND PATCH " Pintu Kumar
2015-10-12 14:28 ` Pintu Kumar
2015-10-14 3:05 ` David Rientjes
2015-10-14 3:05 ` David Rientjes
2015-10-14 13:41 ` PINTU KUMAR
2015-10-14 13:41 ` PINTU KUMAR
2015-10-14 22:04 ` David Rientjes
2015-10-14 22:04 ` David Rientjes
2015-10-15 14:35 ` PINTU KUMAR
2015-10-15 14:35 ` PINTU KUMAR
2015-10-12 14:44 ` [PATCH " PINTU KUMAR
2015-10-12 14:44 ` PINTU KUMAR
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='010401d0ff34$f48e8eb0$ddabac10$@samsung.com' \
--to=pintu.k@samsung.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.h.duyck@redhat.com \
--cc=bywxiaobai@163.com \
--cc=c.rajkumar@samsung.com \
--cc=cl@linux.com \
--cc=cpgs@samsung.com \
--cc=dave@stgolabs.net \
--cc=fengguang.wu@intel.com \
--cc=hannes@cmpxchg.org \
--cc=js1304@gmail.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=koct9i@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@kernel.org \
--cc=minchan@kernel.org \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=pintu.ping@gmail.com \
--cc=pintu_agarwal@yahoo.com \
--cc=rientjes@google.com \
--cc=rohit.kr@samsung.com \
--cc=sasha.levin@oracle.com \
--cc=sreenathd@samsung.com \
--cc=vbabka@suse.cz \
--cc=vishnu.ps@samsung.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.