* [PATCH] badness() dramatically overcounts memory
@ 2008-02-05 3:34 Jeff Davis
2008-02-05 4:13 ` Balbir Singh
0 siblings, 1 reply; 7+ messages in thread
From: Jeff Davis @ 2008-02-05 3:34 UTC (permalink / raw)
To: linux-kernel, linux-mm
[-- Attachment #1: Type: text/plain, Size: 821 bytes --]
In oom_kill.c, one of the badness calculations is wildly inaccurate. If
memory is shared among child processes, that same memory will be counted
for each child, effectively multiplying the memory penalty by N, where N
is the number of children.
This makes it almost certain that the parent will always be chosen as
the victim of the OOM killer (assuming any substantial amount memory
shared among the children), even if the parent and children are well
behaved and have a reasonable and unchanging VM size.
Usually this does not actually alleviate the memory pressure because the
truly bad process is completely unrelated; and the OOM killer must later
kill the truly bad process.
This trivial patch corrects the calculation so that it does not count a
child's shared memory against the parent.
Regards,
Jeff Davis
[-- Attachment #2: linux-badness.diff --]
[-- Type: text/x-patch, Size: 776 bytes --]
--- mm/oom_kill.c.orig 2007-08-01 10:41:53.000000000 -0700
+++ mm/oom_kill.c 2008-02-04 14:47:10.000000000 -0800
@@ -83,11 +83,14 @@
* machine with an endless amount of children. In case a single
* child is eating the vast majority of memory, adding only half
* to the parents will make the child our kill candidate of choice.
+ * When counting the children's vmsize against the parent, we
+ * subtract shared_vm first, to avoid overcounting memory that is
+ * shared among the child processes and the parent.
*/
list_for_each_entry(child, &p->children, sibling) {
task_lock(child);
if (child->mm != mm && child->mm)
- points += child->mm->total_vm/2 + 1;
+ points += (child->mm->total_vm - child->mm->shared_vm)/2 + 1;
task_unlock(child);
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] badness() dramatically overcounts memory
2008-02-05 3:34 [PATCH] badness() dramatically overcounts memory Jeff Davis
@ 2008-02-05 4:13 ` Balbir Singh
2008-02-05 23:02 ` Jeff Davis
0 siblings, 1 reply; 7+ messages in thread
From: Balbir Singh @ 2008-02-05 4:13 UTC (permalink / raw)
To: Jeff Davis; +Cc: linux-kernel, linux-mm
Jeff Davis wrote:
> In oom_kill.c, one of the badness calculations is wildly inaccurate. If
> memory is shared among child processes, that same memory will be counted
> for each child, effectively multiplying the memory penalty by N, where N
> is the number of children.
>
> This makes it almost certain that the parent will always be chosen as
> the victim of the OOM killer (assuming any substantial amount memory
> shared among the children), even if the parent and children are well
> behaved and have a reasonable and unchanging VM size.
>
> Usually this does not actually alleviate the memory pressure because the
> truly bad process is completely unrelated; and the OOM killer must later
> kill the truly bad process.
>
> This trivial patch corrects the calculation so that it does not count a
> child's shared memory against the parent.
>
Hi, Jeff,
1. grep on the kernel source tells me that shared_vm is incremented only in
vm_stat_account(), which is a NO-OP if CONFIG_PROC_FS is not defined.
2. How have you tested these patches? One way to do it would be to use the
memory controller and set a small limit on the control group. A memory
intensive application will soon see an OOM.
I do need to look at OOM kill sanity, my colleagues using the memory controller
have reported wrong actions taken by the OOM killer, but I am yet to analyze them.
The interesting thing is the use of total_vm and not the RSS which is used as
the basis by the OOM killer. I need to read/understand the code a bit more.
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] badness() dramatically overcounts memory
2008-02-05 4:13 ` Balbir Singh
@ 2008-02-05 23:02 ` Jeff Davis
2008-02-05 23:09 ` David Rientjes
0 siblings, 1 reply; 7+ messages in thread
From: Jeff Davis @ 2008-02-05 23:02 UTC (permalink / raw)
To: balbir; +Cc: linux-kernel, linux-mm
On Tue, 2008-02-05 at 09:43 +0530, Balbir Singh wrote:
> 1. grep on the kernel source tells me that shared_vm is incremented only in
> vm_stat_account(), which is a NO-OP if CONFIG_PROC_FS is not defined.
I see, thanks for pointing that out. Is there another way do you think?
Would the penalty be to high to enable vm_stat_account when
CONFIG_PROC_FS is not defined?
Or perhaps my patch would only have an effect when CONFIG_PROC_FS is set
(which is default)?
> 2. How have you tested these patches? One way to do it would be to use the
> memory controller and set a small limit on the control group. A memory
> intensive application will soon see an OOM.
I have done a quick test a while back when I first wrote the patch. I
will test more thoroughly now.
> The interesting thing is the use of total_vm and not the RSS which is used as
> the basis by the OOM killer. I need to read/understand the code a bit more.
RSS makes more sense to me as well.
To me, it makes no sense to count shared memory, because killing a
process doesn't free the shared memory.
Regards,
Jeff Davis
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] badness() dramatically overcounts memory
2008-02-05 23:02 ` Jeff Davis
@ 2008-02-05 23:09 ` David Rientjes
2008-02-06 1:54 ` KOSAKI Motohiro
0 siblings, 1 reply; 7+ messages in thread
From: David Rientjes @ 2008-02-05 23:09 UTC (permalink / raw)
To: Jeff Davis; +Cc: balbir, linux-kernel, linux-mm, Andrea Arcangeli
On Tue, 5 Feb 2008, Jeff Davis wrote:
> > The interesting thing is the use of total_vm and not the RSS which is used as
> > the basis by the OOM killer. I need to read/understand the code a bit more.
>
> RSS makes more sense to me as well.
>
> To me, it makes no sense to count shared memory, because killing a
> process doesn't free the shared memory.
>
Andrea Arcangeli has patches pending which change this to the RSS.
Specifically:
http://marc.info/?l=linux-mm&m=119977937126925
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] badness() dramatically overcounts memory
2008-02-05 23:09 ` David Rientjes
@ 2008-02-06 1:54 ` KOSAKI Motohiro
2008-02-06 2:05 ` David Rientjes
2008-02-06 4:00 ` Balbir Singh
0 siblings, 2 replies; 7+ messages in thread
From: KOSAKI Motohiro @ 2008-02-06 1:54 UTC (permalink / raw)
To: David Rientjes
Cc: kosaki.motohiro, Jeff Davis, balbir, linux-kernel, linux-mm,
Andrea Arcangeli
Hi
> > > The interesting thing is the use of total_vm and not the RSS which is used as
> > > the basis by the OOM killer. I need to read/understand the code a bit more.
> >
> > RSS makes more sense to me as well.
>
> Andrea Arcangeli has patches pending which change this to the RSS.
> Specifically:
>
> http://marc.info/?l=linux-mm&m=119977937126925
I agreed with you that RSS is better :)
but..
on many node numa, per zone rss is more better..
- kosaki
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] badness() dramatically overcounts memory
2008-02-06 1:54 ` KOSAKI Motohiro
@ 2008-02-06 2:05 ` David Rientjes
2008-02-06 4:00 ` Balbir Singh
1 sibling, 0 replies; 7+ messages in thread
From: David Rientjes @ 2008-02-06 2:05 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Jeff Davis, balbir, linux-kernel, linux-mm, Andrea Arcangeli
On Wed, 6 Feb 2008, KOSAKI Motohiro wrote:
> > Andrea Arcangeli has patches pending which change this to the RSS.
> > Specifically:
> >
> > http://marc.info/?l=linux-mm&m=119977937126925
>
> I agreed with you that RSS is better :)
>
>
>
> but..
> on many node numa, per zone rss is more better..
>
It depends on how your applications are taking advantage of NUMA
optimizations. If they're constrained by mempolicies to a subset of nodes
then the badness scoring isn't even used: the task that triggered the OOM
condition is the one that is automatically killed.
At this point, I think you're going to need to present an actual case
study where Andrea's patch isn't sufficient for selecting the appropriate
task on large NUMA machines.
David
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] badness() dramatically overcounts memory
2008-02-06 1:54 ` KOSAKI Motohiro
2008-02-06 2:05 ` David Rientjes
@ 2008-02-06 4:00 ` Balbir Singh
1 sibling, 0 replies; 7+ messages in thread
From: Balbir Singh @ 2008-02-06 4:00 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: David Rientjes, Jeff Davis, linux-kernel, linux-mm,
Andrea Arcangeli
KOSAKI Motohiro wrote:
> Hi
>
>>>> The interesting thing is the use of total_vm and not the RSS which is used as
>>>> the basis by the OOM killer. I need to read/understand the code a bit more.
>>> RSS makes more sense to me as well.
>> Andrea Arcangeli has patches pending which change this to the RSS.
>> Specifically:
>>
>> http://marc.info/?l=linux-mm&m=119977937126925
>
> I agreed with you that RSS is better :)
>
>
>
> but..
> on many node numa, per zone rss is more better..
Do we have a per zone RSS per task? I don't remember seeing it.
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-02-06 4:02 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-05 3:34 [PATCH] badness() dramatically overcounts memory Jeff Davis
2008-02-05 4:13 ` Balbir Singh
2008-02-05 23:02 ` Jeff Davis
2008-02-05 23:09 ` David Rientjes
2008-02-06 1:54 ` KOSAKI Motohiro
2008-02-06 2:05 ` David Rientjes
2008-02-06 4:00 ` Balbir Singh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).