From: Oleg Nesterov <oleg@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Kyle Walker <kwalker@redhat.com>,
Christoph Lameter <cl@linux.com>,
Andrew Morton <akpm@linux-foundation.org>,
David Rientjes <rientjes@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Vladimir Davydov <vdavydov@parallels.com>,
linux-mm <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Stanislav Kozina <skozina@redhat.com>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Subject: Re: can't oom-kill zap the victim's memory?
Date: Tue, 6 Oct 2015 20:45:02 +0200 [thread overview]
Message-ID: <20151006184502.GA15787@redhat.com> (raw)
In-Reply-To: <20150923205923.GB19054@dhcp22.suse.cz>
Damn. I can't believe this, but I still can't make the initial change.
And no, it is not that I hit some technical problems, just I can't
decide what exactly the first step should do to be a) really simple
and b) useful. I am starting to think I'll just update my draft patch
which uses queue_work() and send it tomorrow (yes, tomorrow again ;).
But let me at least answer this email,
On 09/23, Michal Hocko wrote:
>
> On Tue 22-09-15 18:06:08, Oleg Nesterov wrote:
> >
> > OK, let it be a kthread from the very beginning, I won't argue. This
> > is really minor compared to other problems.
>
> I am still not sure how you want to implement that kernel thread but I
> am quite skeptical it would be very much useful because all the current
> allocations which end up in the OOM killer path cannot simply back off
> and drop the locks with the current allocator semantic. So they will
> be sitting on top of unknown pile of locks whether you do an additional
> reclaim (unmap the anon memory) in the direct OOM context or looping
> in the allocator and waiting for kthread/workqueue to do its work. The
> only argument that I can see is the stack usage but I haven't seen stack
> overflows in the OOM path AFAIR.
Please see below,
> > And note that the caller can held other locks we do not even know about.
> > Most probably we should not deadlock, at least if we only unmap the anon
> > pages, but still this doesn't look safe.
>
> The unmapper cannot fall back to reclaim and/or trigger the OOM so
> we should be indeed very careful and mark the allocation context
> appropriately. I can remember mmu_gather but it is only doing
> opportunistic allocation AFAIR.
And I was going to make V1 which avoids queue_work/kthread and zaps the
memory in oom_kill_process() context.
But this can't work because we need to increment ->mm_users to avoid
the race with exit_mmap/etc. And this means that we need mmput() after
that, and as we recently discussed it can deadlock if mm_users goes
to zero, we can't do exit_mmap/etc in oom_kill_process().
> > Hmm. If we already have mmap_sem and started zap_page_range() then
> > I do not think it makes sense to stop until we free everything we can.
>
> Zapping a huge address space can take quite some time
Yes, and this is another reason we should do this asynchronously.
> and we really do
> not have to free it all on behalf of the killer when enough memory is
> freed to allow for further progress and the rest can be done by the
> victim. If one batch doesn't seem sufficient then another retry can
> continue.
>
> I do not think that a limited scan would make the implementation more
> complicated
But we can't even know much memory unmap_single_vma() actually frees.
Even if we could, how can we know we freed enough?
Anyway. Perhaps it makes sense to abort the for_each_vma() loop if
freed_enough_mem() == T. But it is absolutely not clear to me how we
should define this freed_enough_mem(), so I think we should do this
later.
> > But. Can't we just remove another ->oom_score_adj check when we try
> > to kill all mm users (the last for_each_process loop). If yes, this
> > all can be simplified.
> >
> > I guess we can't and its a pity. Because it looks simply pointless
> > to not kill all mm users. This just means the select_bad_process()
> > picked the wrong task.
>
> Yes I am not really sure why oom_score_adj is not per-mm and we are
> doing that per signal struct to be honest.
Heh ;) Yes, but I guess it is too late to move it back.
> Maybe we can revisit this...
I hope, but I am not going to try to remove this OOM_SCORE_ADJ_MIN
check now. Just we should not zap this mm if we find the OOM-unkillable
user.
Oleg.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Oleg Nesterov <oleg@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Kyle Walker <kwalker@redhat.com>,
Christoph Lameter <cl@linux.com>,
Andrew Morton <akpm@linux-foundation.org>,
David Rientjes <rientjes@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Vladimir Davydov <vdavydov@parallels.com>,
linux-mm <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Stanislav Kozina <skozina@redhat.com>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Subject: Re: can't oom-kill zap the victim's memory?
Date: Tue, 6 Oct 2015 20:45:02 +0200 [thread overview]
Message-ID: <20151006184502.GA15787@redhat.com> (raw)
In-Reply-To: <20150923205923.GB19054@dhcp22.suse.cz>
Damn. I can't believe this, but I still can't make the initial change.
And no, it is not that I hit some technical problems, just I can't
decide what exactly the first step should do to be a) really simple
and b) useful. I am starting to think I'll just update my draft patch
which uses queue_work() and send it tomorrow (yes, tomorrow again ;).
But let me at least answer this email,
On 09/23, Michal Hocko wrote:
>
> On Tue 22-09-15 18:06:08, Oleg Nesterov wrote:
> >
> > OK, let it be a kthread from the very beginning, I won't argue. This
> > is really minor compared to other problems.
>
> I am still not sure how you want to implement that kernel thread but I
> am quite skeptical it would be very much useful because all the current
> allocations which end up in the OOM killer path cannot simply back off
> and drop the locks with the current allocator semantic. So they will
> be sitting on top of unknown pile of locks whether you do an additional
> reclaim (unmap the anon memory) in the direct OOM context or looping
> in the allocator and waiting for kthread/workqueue to do its work. The
> only argument that I can see is the stack usage but I haven't seen stack
> overflows in the OOM path AFAIR.
Please see below,
> > And note that the caller can held other locks we do not even know about.
> > Most probably we should not deadlock, at least if we only unmap the anon
> > pages, but still this doesn't look safe.
>
> The unmapper cannot fall back to reclaim and/or trigger the OOM so
> we should be indeed very careful and mark the allocation context
> appropriately. I can remember mmu_gather but it is only doing
> opportunistic allocation AFAIR.
And I was going to make V1 which avoids queue_work/kthread and zaps the
memory in oom_kill_process() context.
But this can't work because we need to increment ->mm_users to avoid
the race with exit_mmap/etc. And this means that we need mmput() after
that, and as we recently discussed it can deadlock if mm_users goes
to zero, we can't do exit_mmap/etc in oom_kill_process().
> > Hmm. If we already have mmap_sem and started zap_page_range() then
> > I do not think it makes sense to stop until we free everything we can.
>
> Zapping a huge address space can take quite some time
Yes, and this is another reason we should do this asynchronously.
> and we really do
> not have to free it all on behalf of the killer when enough memory is
> freed to allow for further progress and the rest can be done by the
> victim. If one batch doesn't seem sufficient then another retry can
> continue.
>
> I do not think that a limited scan would make the implementation more
> complicated
But we can't even know much memory unmap_single_vma() actually frees.
Even if we could, how can we know we freed enough?
Anyway. Perhaps it makes sense to abort the for_each_vma() loop if
freed_enough_mem() == T. But it is absolutely not clear to me how we
should define this freed_enough_mem(), so I think we should do this
later.
> > But. Can't we just remove another ->oom_score_adj check when we try
> > to kill all mm users (the last for_each_process loop). If yes, this
> > all can be simplified.
> >
> > I guess we can't and its a pity. Because it looks simply pointless
> > to not kill all mm users. This just means the select_bad_process()
> > picked the wrong task.
>
> Yes I am not really sure why oom_score_adj is not per-mm and we are
> doing that per signal struct to be honest.
Heh ;) Yes, but I guess it is too late to move it back.
> Maybe we can revisit this...
I hope, but I am not going to try to remove this OOM_SCORE_ADJ_MIN
check now. Just we should not zap this mm if we find the OOM-unkillable
user.
Oleg.
next prev parent reply other threads:[~2015-10-06 18:48 UTC|newest]
Thread overview: 213+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-17 17:59 [PATCH] mm/oom_kill.c: don't kill TASK_UNINTERRUPTIBLE tasks Kyle Walker
2015-09-17 17:59 ` Kyle Walker
2015-09-17 19:22 ` Oleg Nesterov
2015-09-17 19:22 ` Oleg Nesterov
2015-09-18 15:41 ` Christoph Lameter
2015-09-18 15:41 ` Christoph Lameter
2015-09-18 16:24 ` Oleg Nesterov
2015-09-18 16:24 ` Oleg Nesterov
2015-09-18 16:39 ` Tetsuo Handa
2015-09-18 16:39 ` Tetsuo Handa
2015-09-18 16:54 ` Oleg Nesterov
2015-09-18 16:54 ` Oleg Nesterov
2015-09-18 17:00 ` Christoph Lameter
2015-09-18 17:00 ` Christoph Lameter
2015-09-18 19:07 ` Oleg Nesterov
2015-09-18 19:07 ` Oleg Nesterov
2015-09-18 19:19 ` Christoph Lameter
2015-09-18 19:19 ` Christoph Lameter
2015-09-18 21:28 ` Kyle Walker
2015-09-18 22:07 ` Christoph Lameter
2015-09-18 22:07 ` Christoph Lameter
2015-09-19 8:32 ` Michal Hocko
2015-09-19 8:32 ` Michal Hocko
2015-09-19 14:33 ` Tetsuo Handa
2015-09-19 14:33 ` Tetsuo Handa
2015-09-19 15:51 ` Michal Hocko
2015-09-19 15:51 ` Michal Hocko
2015-09-21 23:33 ` David Rientjes
2015-09-21 23:33 ` David Rientjes
2015-09-22 5:33 ` Tetsuo Handa
2015-09-22 5:33 ` Tetsuo Handa
2015-09-22 23:32 ` David Rientjes
2015-09-22 23:32 ` David Rientjes
2015-09-23 12:03 ` Kyle Walker
2015-09-23 12:03 ` Kyle Walker
2015-09-24 11:50 ` Tetsuo Handa
2015-09-24 11:50 ` Tetsuo Handa
2015-09-19 14:44 ` Oleg Nesterov
2015-09-19 14:44 ` Oleg Nesterov
2015-09-21 23:27 ` David Rientjes
2015-09-21 23:27 ` David Rientjes
2015-09-19 8:25 ` Michal Hocko
2015-09-19 8:25 ` Michal Hocko
2015-09-19 8:22 ` Michal Hocko
2015-09-19 8:22 ` Michal Hocko
2015-09-21 23:08 ` David Rientjes
2015-09-21 23:08 ` David Rientjes
2015-09-19 15:03 ` can't oom-kill zap the victim's memory? Oleg Nesterov
2015-09-19 15:03 ` Oleg Nesterov
2015-09-19 15:10 ` Oleg Nesterov
2015-09-19 15:10 ` Oleg Nesterov
2015-09-19 15:58 ` Michal Hocko
2015-09-19 15:58 ` Michal Hocko
2015-09-20 13:16 ` Oleg Nesterov
2015-09-20 13:16 ` Oleg Nesterov
2015-09-19 22:24 ` Linus Torvalds
2015-09-19 22:24 ` Linus Torvalds
2015-09-19 22:54 ` Raymond Jennings
2015-09-19 23:00 ` Raymond Jennings
2015-09-19 23:00 ` Raymond Jennings
2015-09-19 23:13 ` Linus Torvalds
2015-09-19 23:13 ` Linus Torvalds
2015-09-20 9:33 ` Michal Hocko
2015-09-20 9:33 ` Michal Hocko
2015-09-20 13:06 ` Oleg Nesterov
2015-09-20 13:06 ` Oleg Nesterov
2015-09-20 12:56 ` Oleg Nesterov
2015-09-20 12:56 ` Oleg Nesterov
2015-09-20 18:05 ` Linus Torvalds
2015-09-20 18:05 ` Linus Torvalds
2015-09-20 18:21 ` Raymond Jennings
2015-09-20 18:23 ` Raymond Jennings
2015-09-20 19:07 ` Raymond Jennings
2015-09-20 19:07 ` Raymond Jennings
2015-09-21 13:57 ` Oleg Nesterov
2015-09-21 13:57 ` Oleg Nesterov
2015-09-21 13:44 ` Oleg Nesterov
2015-09-21 13:44 ` Oleg Nesterov
2015-09-21 14:24 ` Michal Hocko
2015-09-21 14:24 ` Michal Hocko
2015-09-21 15:32 ` Oleg Nesterov
2015-09-21 15:32 ` Oleg Nesterov
2015-09-21 16:12 ` Michal Hocko
2015-09-21 16:12 ` Michal Hocko
2015-09-22 16:06 ` Oleg Nesterov
2015-09-22 16:06 ` Oleg Nesterov
2015-09-22 23:04 ` David Rientjes
2015-09-22 23:04 ` David Rientjes
2015-09-23 20:59 ` Michal Hocko
2015-09-23 20:59 ` Michal Hocko
2015-09-24 21:15 ` David Rientjes
2015-09-24 21:15 ` David Rientjes
2015-09-25 9:35 ` Michal Hocko
2015-09-25 9:35 ` Michal Hocko
2015-09-25 16:14 ` Tetsuo Handa
2015-09-25 16:14 ` Tetsuo Handa
2015-09-28 16:18 ` Tetsuo Handa
2015-09-28 16:18 ` Tetsuo Handa
2015-09-28 22:28 ` David Rientjes
2015-09-28 22:28 ` David Rientjes
2015-10-02 12:36 ` Michal Hocko
2015-10-02 12:36 ` Michal Hocko
2015-10-02 19:01 ` Linus Torvalds
2015-10-02 19:01 ` Linus Torvalds
2015-10-05 14:44 ` Michal Hocko
2015-10-05 14:44 ` Michal Hocko
2015-10-07 5:16 ` Vlastimil Babka
2015-10-07 5:16 ` Vlastimil Babka
2015-10-07 10:43 ` Tetsuo Handa
2015-10-07 10:43 ` Tetsuo Handa
2015-10-08 9:40 ` Vlastimil Babka
2015-10-08 9:40 ` Vlastimil Babka
2015-10-06 7:55 ` Eric W. Biederman
2015-10-06 7:55 ` Eric W. Biederman
2015-10-06 8:49 ` Linus Torvalds
2015-10-06 8:49 ` Linus Torvalds
2015-10-06 8:55 ` Linus Torvalds
2015-10-06 8:55 ` Linus Torvalds
2015-10-06 14:52 ` Eric W. Biederman
2015-10-06 14:52 ` Eric W. Biederman
2015-10-03 6:02 ` Can't we use timeout based OOM warning/killing? Tetsuo Handa
2015-10-03 6:02 ` Tetsuo Handa
2015-10-06 14:51 ` Tetsuo Handa
2015-10-06 14:51 ` Tetsuo Handa
2015-10-12 6:43 ` Tetsuo Handa
2015-10-12 6:43 ` Tetsuo Handa
2015-10-12 15:25 ` Silent hang up caused by pages being not scanned? Tetsuo Handa
2015-10-12 15:25 ` Tetsuo Handa
2015-10-12 21:23 ` Linus Torvalds
2015-10-12 21:23 ` Linus Torvalds
2015-10-13 12:21 ` Tetsuo Handa
2015-10-13 12:21 ` Tetsuo Handa
2015-10-13 16:37 ` Linus Torvalds
2015-10-13 16:37 ` Linus Torvalds
2015-10-14 12:21 ` Tetsuo Handa
2015-10-14 12:21 ` Tetsuo Handa
2015-10-15 13:14 ` Michal Hocko
2015-10-15 13:14 ` Michal Hocko
2015-10-16 15:57 ` Michal Hocko
2015-10-16 15:57 ` Michal Hocko
2015-10-16 18:34 ` Linus Torvalds
2015-10-16 18:34 ` Linus Torvalds
2015-10-16 18:49 ` Tetsuo Handa
2015-10-16 18:49 ` Tetsuo Handa
2015-10-19 12:57 ` Michal Hocko
2015-10-19 12:57 ` Michal Hocko
2015-10-19 12:53 ` Michal Hocko
2015-10-19 12:53 ` Michal Hocko
2015-10-13 13:32 ` Michal Hocko
2015-10-13 13:32 ` Michal Hocko
2015-10-13 16:19 ` Tetsuo Handa
2015-10-13 16:19 ` Tetsuo Handa
2015-10-14 13:22 ` Michal Hocko
2015-10-14 13:22 ` Michal Hocko
2015-10-14 14:38 ` Tetsuo Handa
2015-10-14 14:38 ` Tetsuo Handa
2015-10-14 14:59 ` Michal Hocko
2015-10-14 14:59 ` Michal Hocko
2015-10-14 15:06 ` Tetsuo Handa
2015-10-14 15:06 ` Tetsuo Handa
2015-10-26 11:44 ` Newbie's question: memory allocation when reclaiming memory Tetsuo Handa
2015-10-26 11:44 ` Tetsuo Handa
2015-11-05 8:46 ` Vlastimil Babka
2015-11-05 8:46 ` Vlastimil Babka
2015-10-06 15:25 ` Can't we use timeout based OOM warning/killing? Linus Torvalds
2015-10-08 15:33 ` Tetsuo Handa
2015-10-08 15:33 ` Tetsuo Handa
2015-10-10 12:50 ` Tetsuo Handa
2015-10-10 12:50 ` Tetsuo Handa
2015-09-28 22:24 ` can't oom-kill zap the victim's memory? David Rientjes
2015-09-28 22:24 ` David Rientjes
2015-09-29 7:57 ` Tetsuo Handa
2015-09-29 7:57 ` Tetsuo Handa
2015-09-29 22:56 ` David Rientjes
2015-09-29 22:56 ` David Rientjes
2015-09-30 4:25 ` Tetsuo Handa
2015-09-30 4:25 ` Tetsuo Handa
2015-09-30 10:21 ` Tetsuo Handa
2015-09-30 10:21 ` Tetsuo Handa
2015-09-30 21:11 ` David Rientjes
2015-09-30 21:11 ` David Rientjes
2015-10-01 12:13 ` Tetsuo Handa
2015-10-01 12:13 ` Tetsuo Handa
2015-10-01 14:48 ` Michal Hocko
2015-10-01 14:48 ` Michal Hocko
2015-10-02 13:06 ` Tetsuo Handa
2015-10-02 13:06 ` Tetsuo Handa
2015-10-06 18:45 ` Oleg Nesterov [this message]
2015-10-06 18:45 ` Oleg Nesterov
2015-10-07 11:03 ` Tetsuo Handa
2015-10-07 11:03 ` Tetsuo Handa
2015-10-07 12:00 ` Oleg Nesterov
2015-10-07 12:00 ` Oleg Nesterov
2015-10-08 14:04 ` Michal Hocko
2015-10-08 14:04 ` Michal Hocko
2015-10-08 14:01 ` Michal Hocko
2015-10-08 14:01 ` Michal Hocko
2015-09-21 16:51 ` Tetsuo Handa
2015-09-21 16:51 ` Tetsuo Handa
2015-09-22 12:43 ` Oleg Nesterov
2015-09-22 12:43 ` Oleg Nesterov
2015-09-22 14:30 ` Tetsuo Handa
2015-09-22 14:30 ` Tetsuo Handa
2015-09-22 14:45 ` Oleg Nesterov
2015-09-22 14:45 ` Oleg Nesterov
2015-09-21 23:42 ` David Rientjes
2015-09-21 23:42 ` David Rientjes
2015-09-21 16:55 ` Linus Torvalds
2015-09-21 16:55 ` Linus Torvalds
2015-09-20 14:50 ` Tetsuo Handa
2015-09-20 14:50 ` Tetsuo Handa
2015-09-20 14:55 ` Oleg Nesterov
2015-09-20 14:55 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151006184502.GA15787@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=hannes@cmpxchg.org \
--cc=kwalker@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=rientjes@google.com \
--cc=skozina@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=vdavydov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.