* [patch] mm: oom_kill: revert 3% system memory bonus for privileged tasks
@ 2014-01-15 23:43 Johannes Weiner
2014-01-16 0:18 ` David Rientjes
0 siblings, 1 reply; 10+ messages in thread
From: Johannes Weiner @ 2014-01-15 23:43 UTC (permalink / raw)
To: Andrew Morton; +Cc: David Rientjes, Michal Hocko, linux-mm, linux-kernel
With a63d83f427fb ("oom: badness heuristic rewrite"), the OOM killer
tries to avoid killing privileged tasks by subtracting 3% of overall
memory (system or cgroup) from their per-task consumption. But as a
result, all root tasks that consume less than 3% of overall memory are
considered equal, and so it only takes 33+ privileged tasks pushing
the system out of memory for the OOM killer to do something stupid and
kill sshd or dhclient. For example, on a 32G machine it can't tell
the difference between the 1M agetty and the 10G fork bomb member.
The changelog describes this 3% boost as the equivalent to the global
overcommit limit being 3% higher for privileged tasks, but this is not
the same as discounting 3% of overall memory from _every privileged
task individually_ during OOM selection.
Revert back to the old priority boost of pretending root tasks are
only a quarter of their actual size.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/oom_kill.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1e4a600a6163..1b0011c3d9e2 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -166,11 +166,11 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
task_unlock(p);
/*
- * Root processes get 3% bonus, just like the __vm_enough_memory()
- * implementation used by LSMs.
+ * Memory consumption being equal, prefer killing an
+ * unprivileged task over a root task.
*/
if (has_capability_noaudit(p, CAP_SYS_ADMIN))
- adj -= 30;
+ points /= 4;
/* Normalize to oom_score_adj units */
adj *= totalpages / 1000;
--
1.8.4.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [patch] mm: oom_kill: revert 3% system memory bonus for privileged tasks
2014-01-15 23:43 [patch] mm: oom_kill: revert 3% system memory bonus for privileged tasks Johannes Weiner
@ 2014-01-16 0:18 ` David Rientjes
2014-01-16 7:07 ` Johannes Weiner
0 siblings, 1 reply; 10+ messages in thread
From: David Rientjes @ 2014-01-16 0:18 UTC (permalink / raw)
To: Johannes Weiner; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel
On Wed, 15 Jan 2014, Johannes Weiner wrote:
> With a63d83f427fb ("oom: badness heuristic rewrite"), the OOM killer
> tries to avoid killing privileged tasks by subtracting 3% of overall
> memory (system or cgroup) from their per-task consumption. But as a
> result, all root tasks that consume less than 3% of overall memory are
> considered equal, and so it only takes 33+ privileged tasks pushing
> the system out of memory for the OOM killer to do something stupid and
> kill sshd or dhclient. For example, on a 32G machine it can't tell
> the difference between the 1M agetty and the 10G fork bomb member.
>
> The changelog describes this 3% boost as the equivalent to the global
> overcommit limit being 3% higher for privileged tasks, but this is not
> the same as discounting 3% of overall memory from _every privileged
> task individually_ during OOM selection.
>
> Revert back to the old priority boost of pretending root tasks are
> only a quarter of their actual size.
>
Unfortunately, I think this could potentially be too much of a bonus. On
your same 32GB machine, if a root process is using 18GB and a user process
is using 14GB, the user process ends up getting selected while the current
discount of 3% still selects the root process.
I do like the idea of scaling this bonus depending on points, however. I
think it would be better if we could scale the discount but also limit it
to some sane value.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] mm: oom_kill: revert 3% system memory bonus for privileged tasks
2014-01-16 0:18 ` David Rientjes
@ 2014-01-16 7:07 ` Johannes Weiner
2014-01-22 4:53 ` David Rientjes
0 siblings, 1 reply; 10+ messages in thread
From: Johannes Weiner @ 2014-01-16 7:07 UTC (permalink / raw)
To: David Rientjes; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel
On Wed, Jan 15, 2014 at 04:18:47PM -0800, David Rientjes wrote:
> On Wed, 15 Jan 2014, Johannes Weiner wrote:
>
> > With a63d83f427fb ("oom: badness heuristic rewrite"), the OOM killer
> > tries to avoid killing privileged tasks by subtracting 3% of overall
> > memory (system or cgroup) from their per-task consumption. But as a
> > result, all root tasks that consume less than 3% of overall memory are
> > considered equal, and so it only takes 33+ privileged tasks pushing
> > the system out of memory for the OOM killer to do something stupid and
> > kill sshd or dhclient. For example, on a 32G machine it can't tell
> > the difference between the 1M agetty and the 10G fork bomb member.
> >
> > The changelog describes this 3% boost as the equivalent to the global
> > overcommit limit being 3% higher for privileged tasks, but this is not
> > the same as discounting 3% of overall memory from _every privileged
> > task individually_ during OOM selection.
> >
> > Revert back to the old priority boost of pretending root tasks are
> > only a quarter of their actual size.
> >
>
> Unfortunately, I think this could potentially be too much of a bonus. On
> your same 32GB machine, if a root process is using 18GB and a user process
> is using 14GB, the user process ends up getting selected while the current
> discount of 3% still selects the root process.
>
> I do like the idea of scaling this bonus depending on points, however. I
> think it would be better if we could scale the discount but also limit it
> to some sane value.
I just reverted to the /= 4 because we had that for a long time and it
seemed to work. I don't really mind either way as long as we get rid
of that -3%. Do you have a suggestion?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] mm: oom_kill: revert 3% system memory bonus for privileged tasks
2014-01-16 7:07 ` Johannes Weiner
@ 2014-01-22 4:53 ` David Rientjes
2014-01-24 4:05 ` Johannes Weiner
0 siblings, 1 reply; 10+ messages in thread
From: David Rientjes @ 2014-01-22 4:53 UTC (permalink / raw)
To: Johannes Weiner; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel
On Thu, 16 Jan 2014, Johannes Weiner wrote:
> > Unfortunately, I think this could potentially be too much of a bonus. On
> > your same 32GB machine, if a root process is using 18GB and a user process
> > is using 14GB, the user process ends up getting selected while the current
> > discount of 3% still selects the root process.
> >
> > I do like the idea of scaling this bonus depending on points, however. I
> > think it would be better if we could scale the discount but also limit it
> > to some sane value.
>
> I just reverted to the /= 4 because we had that for a long time and it
> seemed to work. I don't really mind either way as long as we get rid
> of that -3%. Do you have a suggestion?
>
How about simply using 3% of the root process's points so that root
processes get some bonus compared to non-root processes with the same
memory usage and it's scaled to the usage rather than amount of available
memory?
So rather than points /= 4, we do
if (has_capability_noaudit(p, CAP_SYS_ADMIN))
points -= (points * 3) / 100;
instead. Sound good?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] mm: oom_kill: revert 3% system memory bonus for privileged tasks
2014-01-22 4:53 ` David Rientjes
@ 2014-01-24 4:05 ` Johannes Weiner
2014-01-26 3:48 ` [patch] mm, oom: base root bonus on current usage David Rientjes
0 siblings, 1 reply; 10+ messages in thread
From: Johannes Weiner @ 2014-01-24 4:05 UTC (permalink / raw)
To: David Rientjes; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel
On Tue, Jan 21, 2014 at 08:53:07PM -0800, David Rientjes wrote:
> On Thu, 16 Jan 2014, Johannes Weiner wrote:
>
> > > Unfortunately, I think this could potentially be too much of a bonus. On
> > > your same 32GB machine, if a root process is using 18GB and a user process
> > > is using 14GB, the user process ends up getting selected while the current
> > > discount of 3% still selects the root process.
> > >
> > > I do like the idea of scaling this bonus depending on points, however. I
> > > think it would be better if we could scale the discount but also limit it
> > > to some sane value.
> >
> > I just reverted to the /= 4 because we had that for a long time and it
> > seemed to work. I don't really mind either way as long as we get rid
> > of that -3%. Do you have a suggestion?
> >
>
> How about simply using 3% of the root process's points so that root
> processes get some bonus compared to non-root processes with the same
> memory usage and it's scaled to the usage rather than amount of available
> memory?
>
> So rather than points /= 4, we do
>
> if (has_capability_noaudit(p, CAP_SYS_ADMIN))
> points -= (points * 3) / 100;
>
> instead. Sound good?
Yes, should be okay.
Do you want to send a patch? Want me to update mine?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [patch] mm, oom: base root bonus on current usage
2014-01-24 4:05 ` Johannes Weiner
@ 2014-01-26 3:48 ` David Rientjes
2014-01-26 15:27 ` Johannes Weiner
2014-01-29 20:28 ` Andrew Morton
0 siblings, 2 replies; 10+ messages in thread
From: David Rientjes @ 2014-01-26 3:48 UTC (permalink / raw)
To: Andrew Morton; +Cc: Johannes Weiner, Michal Hocko, linux-mm, linux-kernel
A 3% of system memory bonus is sometimes too excessive in comparison to
other processes and can yield poor results when all processes on the
system are root and none of them use over 3% of memory.
Replace the 3% of system memory bonus with a 3% of current memory usage
bonus.
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: David Rientjes <rientjes@google.com>
---
Documentation/filesystems/proc.txt | 4 ++--
mm/oom_kill.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -1386,8 +1386,8 @@ may allocate from based on an estimation of its current memory and swap use.
For example, if a task is using all allowed memory, its badness score will be
1000. If it is using half of its allowed memory, its score will be 500.
-There is an additional factor included in the badness score: root
-processes are given 3% extra memory over other tasks.
+There is an additional factor included in the badness score: the current memory
+and swap usage is discounted by 3% for root processes.
The amount of "allowed" memory depends on the context in which the oom killer
was called. If it is due to the memory assigned to the allocating task's cpuset
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -178,7 +178,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
* implementation used by LSMs.
*/
if (has_capability_noaudit(p, CAP_SYS_ADMIN))
- adj -= 30;
+ points -= (points * 3) / 100;
/* Normalize to oom_score_adj units */
adj *= totalpages / 1000;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] mm, oom: base root bonus on current usage
2014-01-26 3:48 ` [patch] mm, oom: base root bonus on current usage David Rientjes
@ 2014-01-26 15:27 ` Johannes Weiner
2014-01-29 20:28 ` Andrew Morton
1 sibling, 0 replies; 10+ messages in thread
From: Johannes Weiner @ 2014-01-26 15:27 UTC (permalink / raw)
To: David Rientjes; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel
On Sat, Jan 25, 2014 at 07:48:32PM -0800, David Rientjes wrote:
> A 3% of system memory bonus is sometimes too excessive in comparison to
> other processes and can yield poor results when all processes on the
> system are root and none of them use over 3% of memory.
>
> Replace the 3% of system memory bonus with a 3% of current memory usage
> bonus.
>
> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: David Rientjes <rientjes@google.com>
Looks good, thanks a lot!
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] mm, oom: base root bonus on current usage
2014-01-26 3:48 ` [patch] mm, oom: base root bonus on current usage David Rientjes
2014-01-26 15:27 ` Johannes Weiner
@ 2014-01-29 20:28 ` Andrew Morton
2014-01-30 0:35 ` David Rientjes
2014-01-30 2:12 ` Johannes Weiner
1 sibling, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2014-01-29 20:28 UTC (permalink / raw)
To: David Rientjes; +Cc: Johannes Weiner, Michal Hocko, linux-mm, linux-kernel
On Sat, 25 Jan 2014 19:48:32 -0800 (PST) David Rientjes <rientjes@google.com> wrote:
> A 3% of system memory bonus is sometimes too excessive in comparison to
> other processes and can yield poor results when all processes on the
> system are root and none of them use over 3% of memory.
>
> Replace the 3% of system memory bonus with a 3% of current memory usage
> bonus.
This changelog has deteriorated :( We should provide sufficient info so
that people will be able to determine whether this patch will fix a
problem they or their customers are observing. And so that people who
maintain -stable and its derivatives can decide whether to backport it.
I went back and stole some text from the v1 patch. Please review the
result. The changelog would be even better if it were to describe the
new behaviour under the problematic workloads.
We don't think -stable needs this?
From: David Rientjes <rientjes@google.com>
Subject: mm, oom: base root bonus on current usage
A 3% of system memory bonus is sometimes too excessive in comparison to
other processes.
With a63d83f427fb ("oom: badness heuristic rewrite"), the OOM killer tries
to avoid killing privileged tasks by subtracting 3% of overall memory
(system or cgroup) from their per-task consumption. But as a result, all
root tasks that consume less than 3% of overall memory are considered
equal, and so it only takes 33+ privileged tasks pushing the system out of
memory for the OOM killer to do something stupid and kill sshd or
dhclient. For example, on a 32G machine it can't tell the difference
between the 1M agetty and the 10G fork bomb member.
The changelog describes this 3% boost as the equivalent to the global
overcommit limit being 3% higher for privileged tasks, but this is not the
same as discounting 3% of overall memory from _every privileged task
individually_ during OOM selection.
Replace the 3% of system memory bonus with a 3% of current memory usage
bonus.
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/filesystems/proc.txt | 4 ++--
mm/oom_kill.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff -puN Documentation/filesystems/proc.txt~mm-oom-base-root-bonus-on-current-usage Documentation/filesystems/proc.txt
--- a/Documentation/filesystems/proc.txt~mm-oom-base-root-bonus-on-current-usage
+++ a/Documentation/filesystems/proc.txt
@@ -1386,8 +1386,8 @@ may allocate from based on an estimation
For example, if a task is using all allowed memory, its badness score will be
1000. If it is using half of its allowed memory, its score will be 500.
-There is an additional factor included in the badness score: root
-processes are given 3% extra memory over other tasks.
+There is an additional factor included in the badness score: the current memory
+and swap usage is discounted by 3% for root processes.
The amount of "allowed" memory depends on the context in which the oom killer
was called. If it is due to the memory assigned to the allocating task's cpuset
diff -puN mm/oom_kill.c~mm-oom-base-root-bonus-on-current-usage mm/oom_kill.c
--- a/mm/oom_kill.c~mm-oom-base-root-bonus-on-current-usage
+++ a/mm/oom_kill.c
@@ -178,7 +178,7 @@ unsigned long oom_badness(struct task_st
* implementation used by LSMs.
*/
if (has_capability_noaudit(p, CAP_SYS_ADMIN))
- adj -= 30;
+ points -= (points * 3) / 100;
/* Normalize to oom_score_adj units */
adj *= totalpages / 1000;
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] mm, oom: base root bonus on current usage
2014-01-29 20:28 ` Andrew Morton
@ 2014-01-30 0:35 ` David Rientjes
2014-01-30 2:12 ` Johannes Weiner
1 sibling, 0 replies; 10+ messages in thread
From: David Rientjes @ 2014-01-30 0:35 UTC (permalink / raw)
To: Andrew Morton; +Cc: Johannes Weiner, Michal Hocko, linux-mm, linux-kernel
On Wed, 29 Jan 2014, Andrew Morton wrote:
> This changelog has deteriorated :( We should provide sufficient info so
> that people will be able to determine whether this patch will fix a
> problem they or their customers are observing. And so that people who
> maintain -stable and its derivatives can decide whether to backport it.
>
> I went back and stole some text from the v1 patch. Please review the
> result. The changelog would be even better if it were to describe the
> new behaviour under the problematic workloads.
>
The new changelog looks fine with the exception of the mention of sshd
which typically sets itself to be disabled from oom killing altogether.
> We don't think -stable needs this?
>
Nobody has reported it in over three years as causing an issue, probably
because people typically have enough memory that oom kills don't come from
a ton of small processes allocating memory that can't be reclaimed,
there's usually at least one large process to kill.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] mm, oom: base root bonus on current usage
2014-01-29 20:28 ` Andrew Morton
2014-01-30 0:35 ` David Rientjes
@ 2014-01-30 2:12 ` Johannes Weiner
1 sibling, 0 replies; 10+ messages in thread
From: Johannes Weiner @ 2014-01-30 2:12 UTC (permalink / raw)
To: Andrew Morton; +Cc: David Rientjes, Michal Hocko, linux-mm, linux-kernel
On Wed, Jan 29, 2014 at 12:28:13PM -0800, Andrew Morton wrote:
> On Sat, 25 Jan 2014 19:48:32 -0800 (PST) David Rientjes <rientjes@google.com> wrote:
>
> > A 3% of system memory bonus is sometimes too excessive in comparison to
> > other processes and can yield poor results when all processes on the
> > system are root and none of them use over 3% of memory.
> >
> > Replace the 3% of system memory bonus with a 3% of current memory usage
> > bonus.
>
> This changelog has deteriorated :( We should provide sufficient info so
> that people will be able to determine whether this patch will fix a
> problem they or their customers are observing. And so that people who
> maintain -stable and its derivatives can decide whether to backport it.
>
> I went back and stole some text from the v1 patch. Please review the
> result. The changelog would be even better if it were to describe the
> new behaviour under the problematic workloads.
Looks good to me, thanks. How about the below?
> We don't think -stable needs this?
That's actually a good idea, we're putting it into RHEL too.
> From: David Rientjes <rientjes@google.com>
> Subject: mm, oom: base root bonus on current usage
>
> A 3% of system memory bonus is sometimes too excessive in comparison to
> other processes.
>
> With a63d83f427fb ("oom: badness heuristic rewrite"), the OOM killer tries
> to avoid killing privileged tasks by subtracting 3% of overall memory
> (system or cgroup) from their per-task consumption. But as a result, all
> root tasks that consume less than 3% of overall memory are considered
> equal, and so it only takes 33+ privileged tasks pushing the system out of
> memory for the OOM killer to do something stupid and kill sshd or
> dhclient. For example, on a 32G machine it can't tell the difference
> between the 1M agetty and the 10G fork bomb member.
>
> The changelog describes this 3% boost as the equivalent to the global
> overcommit limit being 3% higher for privileged tasks, but this is not the
> same as discounting 3% of overall memory from _every privileged task
> individually_ during OOM selection.
>
> Replace the 3% of system memory bonus with a 3% of current memory usage
> bonus.
By giving root tasks a bonus that is proportional to their actual
size, they remain comparable even when relatively small. In the
example above, the OOM killer will discount the 1M agetty's 256
badness points down to 179, and the 10G fork bomb's 262144 points down
to 183500 points and make the right choice, instead of discounting
both to 0 and killing agetty because it's first in the task list.
> Signed-off-by: David Rientjes <rientjes@google.com>
> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@kernel.org>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-01-30 2:12 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-15 23:43 [patch] mm: oom_kill: revert 3% system memory bonus for privileged tasks Johannes Weiner
2014-01-16 0:18 ` David Rientjes
2014-01-16 7:07 ` Johannes Weiner
2014-01-22 4:53 ` David Rientjes
2014-01-24 4:05 ` Johannes Weiner
2014-01-26 3:48 ` [patch] mm, oom: base root bonus on current usage David Rientjes
2014-01-26 15:27 ` Johannes Weiner
2014-01-29 20:28 ` Andrew Morton
2014-01-30 0:35 ` David Rientjes
2014-01-30 2:12 ` Johannes Weiner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).