From: Oleg Nesterov <oleg@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kyle Walker <kwalker@redhat.com>,
Christoph Lameter <cl@linux.com>,
Michal Hocko <mhocko@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
David Rientjes <rientjes@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Vladimir Davydov <vdavydov@parallels.com>,
linux-mm <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Stanislav Kozina <skozina@redhat.com>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Subject: Re: can't oom-kill zap the victim's memory?
Date: Sun, 20 Sep 2015 14:56:42 +0200 [thread overview]
Message-ID: <20150920125642.GA2104@redhat.com> (raw)
In-Reply-To: <CA+55aFwkvbMrGseOsZNaxgP3wzDoVjkGasBKFxpn07SaokvpXA@mail.gmail.com>
On 09/19, Linus Torvalds wrote:
>
> On Sat, Sep 19, 2015 at 8:03 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > +
> > +static void oom_unmap_func(struct work_struct *work)
> > +{
> > + struct mm_struct *mm = xchg(&oom_unmap_mm, NULL);
> > +
> > + if (!atomic_inc_not_zero(&mm->mm_users))
> > + return;
> > +
> > + // If this is not safe we can do use_mm() + unuse_mm()
> > + down_read(&mm->mmap_sem);
>
> I don't think this is safe.
>
> What makes you sure that we might not deadlock on the mmap_sem here?
> For all we know, the process that is going out of memory is in the
> middle of a mmap(), and already holds the mmap_sem for writing. No?
In this case the workqueue thread will block. But it can not block
forever. I mean if it can then the killed process will never exit
(exit_mm does down_read) and release its memory, so we lose anyway.
But let me repeat this patch is obviously not complete/etc,
> So at the very least that needs to be a trylock, I think.
And we want to avoid using workqueues when the caller can do this
directly. And in this case we certainly need trylock. But this needs
some refactoring: we do not want to do this under oom_lock, otoh it
makes sense to do this from mark_oom_victim() if current && killed,
and a lot more details.
The workqueue thread has other reasons for trylock, but probably not
in the initial version of this patch. And perhaps we should use a
dedicated kthread and do not use workqueues at all. And yes, a single
"mm_struct *oom_unmap_mm" is ugly, it should be the list of mm's to
unmap, but then at least we need MMF_MEMDIE.
> And I'm not
> sure zap_page_range() is ok with the mmap_sem only held for reading.
> Normally our rule is that you can *populate* the page tables
> concurrently, but you can't tear the down.
Well, according to madvise_need_mmap_write() MADV_DONTNEED does this
under down_read().
But yes, yes, this is probably not right anyway. Say, VM_LOCKED...
That is why I mentioned that perhaps this should only unmap the
anonymous pages. We can probably add zap_details->for_oom hint.
Another question if it is safe to abuse the foreign mm this way.
Well, zap_page_range_single() does this, so this is probably safe.
But we can do use_mm().
Oleg.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Oleg Nesterov <oleg@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kyle Walker <kwalker@redhat.com>,
Christoph Lameter <cl@linux.com>,
Michal Hocko <mhocko@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
David Rientjes <rientjes@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Vladimir Davydov <vdavydov@parallels.com>,
linux-mm <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Stanislav Kozina <skozina@redhat.com>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Subject: Re: can't oom-kill zap the victim's memory?
Date: Sun, 20 Sep 2015 14:56:42 +0200 [thread overview]
Message-ID: <20150920125642.GA2104@redhat.com> (raw)
In-Reply-To: <CA+55aFwkvbMrGseOsZNaxgP3wzDoVjkGasBKFxpn07SaokvpXA@mail.gmail.com>
On 09/19, Linus Torvalds wrote:
>
> On Sat, Sep 19, 2015 at 8:03 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > +
> > +static void oom_unmap_func(struct work_struct *work)
> > +{
> > + struct mm_struct *mm = xchg(&oom_unmap_mm, NULL);
> > +
> > + if (!atomic_inc_not_zero(&mm->mm_users))
> > + return;
> > +
> > + // If this is not safe we can do use_mm() + unuse_mm()
> > + down_read(&mm->mmap_sem);
>
> I don't think this is safe.
>
> What makes you sure that we might not deadlock on the mmap_sem here?
> For all we know, the process that is going out of memory is in the
> middle of a mmap(), and already holds the mmap_sem for writing. No?
In this case the workqueue thread will block. But it can not block
forever. I mean if it can then the killed process will never exit
(exit_mm does down_read) and release its memory, so we lose anyway.
But let me repeat this patch is obviously not complete/etc,
> So at the very least that needs to be a trylock, I think.
And we want to avoid using workqueues when the caller can do this
directly. And in this case we certainly need trylock. But this needs
some refactoring: we do not want to do this under oom_lock, otoh it
makes sense to do this from mark_oom_victim() if current && killed,
and a lot more details.
The workqueue thread has other reasons for trylock, but probably not
in the initial version of this patch. And perhaps we should use a
dedicated kthread and do not use workqueues at all. And yes, a single
"mm_struct *oom_unmap_mm" is ugly, it should be the list of mm's to
unmap, but then at least we need MMF_MEMDIE.
> And I'm not
> sure zap_page_range() is ok with the mmap_sem only held for reading.
> Normally our rule is that you can *populate* the page tables
> concurrently, but you can't tear the down.
Well, according to madvise_need_mmap_write() MADV_DONTNEED does this
under down_read().
But yes, yes, this is probably not right anyway. Say, VM_LOCKED...
That is why I mentioned that perhaps this should only unmap the
anonymous pages. We can probably add zap_details->for_oom hint.
Another question if it is safe to abuse the foreign mm this way.
Well, zap_page_range_single() does this, so this is probably safe.
But we can do use_mm().
Oleg.
next prev parent reply other threads:[~2015-09-20 12:59 UTC|newest]
Thread overview: 213+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-17 17:59 [PATCH] mm/oom_kill.c: don't kill TASK_UNINTERRUPTIBLE tasks Kyle Walker
2015-09-17 17:59 ` Kyle Walker
2015-09-17 19:22 ` Oleg Nesterov
2015-09-17 19:22 ` Oleg Nesterov
2015-09-18 15:41 ` Christoph Lameter
2015-09-18 15:41 ` Christoph Lameter
2015-09-18 16:24 ` Oleg Nesterov
2015-09-18 16:24 ` Oleg Nesterov
2015-09-18 16:39 ` Tetsuo Handa
2015-09-18 16:39 ` Tetsuo Handa
2015-09-18 16:54 ` Oleg Nesterov
2015-09-18 16:54 ` Oleg Nesterov
2015-09-18 17:00 ` Christoph Lameter
2015-09-18 17:00 ` Christoph Lameter
2015-09-18 19:07 ` Oleg Nesterov
2015-09-18 19:07 ` Oleg Nesterov
2015-09-18 19:19 ` Christoph Lameter
2015-09-18 19:19 ` Christoph Lameter
2015-09-18 21:28 ` Kyle Walker
2015-09-18 22:07 ` Christoph Lameter
2015-09-18 22:07 ` Christoph Lameter
2015-09-19 8:32 ` Michal Hocko
2015-09-19 8:32 ` Michal Hocko
2015-09-19 14:33 ` Tetsuo Handa
2015-09-19 14:33 ` Tetsuo Handa
2015-09-19 15:51 ` Michal Hocko
2015-09-19 15:51 ` Michal Hocko
2015-09-21 23:33 ` David Rientjes
2015-09-21 23:33 ` David Rientjes
2015-09-22 5:33 ` Tetsuo Handa
2015-09-22 5:33 ` Tetsuo Handa
2015-09-22 23:32 ` David Rientjes
2015-09-22 23:32 ` David Rientjes
2015-09-23 12:03 ` Kyle Walker
2015-09-23 12:03 ` Kyle Walker
2015-09-24 11:50 ` Tetsuo Handa
2015-09-24 11:50 ` Tetsuo Handa
2015-09-19 14:44 ` Oleg Nesterov
2015-09-19 14:44 ` Oleg Nesterov
2015-09-21 23:27 ` David Rientjes
2015-09-21 23:27 ` David Rientjes
2015-09-19 8:25 ` Michal Hocko
2015-09-19 8:25 ` Michal Hocko
2015-09-19 8:22 ` Michal Hocko
2015-09-19 8:22 ` Michal Hocko
2015-09-21 23:08 ` David Rientjes
2015-09-21 23:08 ` David Rientjes
2015-09-19 15:03 ` can't oom-kill zap the victim's memory? Oleg Nesterov
2015-09-19 15:03 ` Oleg Nesterov
2015-09-19 15:10 ` Oleg Nesterov
2015-09-19 15:10 ` Oleg Nesterov
2015-09-19 15:58 ` Michal Hocko
2015-09-19 15:58 ` Michal Hocko
2015-09-20 13:16 ` Oleg Nesterov
2015-09-20 13:16 ` Oleg Nesterov
2015-09-19 22:24 ` Linus Torvalds
2015-09-19 22:24 ` Linus Torvalds
2015-09-19 22:54 ` Raymond Jennings
2015-09-19 23:00 ` Raymond Jennings
2015-09-19 23:00 ` Raymond Jennings
2015-09-19 23:13 ` Linus Torvalds
2015-09-19 23:13 ` Linus Torvalds
2015-09-20 9:33 ` Michal Hocko
2015-09-20 9:33 ` Michal Hocko
2015-09-20 13:06 ` Oleg Nesterov
2015-09-20 13:06 ` Oleg Nesterov
2015-09-20 12:56 ` Oleg Nesterov [this message]
2015-09-20 12:56 ` Oleg Nesterov
2015-09-20 18:05 ` Linus Torvalds
2015-09-20 18:05 ` Linus Torvalds
2015-09-20 18:21 ` Raymond Jennings
2015-09-20 18:23 ` Raymond Jennings
2015-09-20 19:07 ` Raymond Jennings
2015-09-20 19:07 ` Raymond Jennings
2015-09-21 13:57 ` Oleg Nesterov
2015-09-21 13:57 ` Oleg Nesterov
2015-09-21 13:44 ` Oleg Nesterov
2015-09-21 13:44 ` Oleg Nesterov
2015-09-21 14:24 ` Michal Hocko
2015-09-21 14:24 ` Michal Hocko
2015-09-21 15:32 ` Oleg Nesterov
2015-09-21 15:32 ` Oleg Nesterov
2015-09-21 16:12 ` Michal Hocko
2015-09-21 16:12 ` Michal Hocko
2015-09-22 16:06 ` Oleg Nesterov
2015-09-22 16:06 ` Oleg Nesterov
2015-09-22 23:04 ` David Rientjes
2015-09-22 23:04 ` David Rientjes
2015-09-23 20:59 ` Michal Hocko
2015-09-23 20:59 ` Michal Hocko
2015-09-24 21:15 ` David Rientjes
2015-09-24 21:15 ` David Rientjes
2015-09-25 9:35 ` Michal Hocko
2015-09-25 9:35 ` Michal Hocko
2015-09-25 16:14 ` Tetsuo Handa
2015-09-25 16:14 ` Tetsuo Handa
2015-09-28 16:18 ` Tetsuo Handa
2015-09-28 16:18 ` Tetsuo Handa
2015-09-28 22:28 ` David Rientjes
2015-09-28 22:28 ` David Rientjes
2015-10-02 12:36 ` Michal Hocko
2015-10-02 12:36 ` Michal Hocko
2015-10-02 19:01 ` Linus Torvalds
2015-10-02 19:01 ` Linus Torvalds
2015-10-05 14:44 ` Michal Hocko
2015-10-05 14:44 ` Michal Hocko
2015-10-07 5:16 ` Vlastimil Babka
2015-10-07 5:16 ` Vlastimil Babka
2015-10-07 10:43 ` Tetsuo Handa
2015-10-07 10:43 ` Tetsuo Handa
2015-10-08 9:40 ` Vlastimil Babka
2015-10-08 9:40 ` Vlastimil Babka
2015-10-06 7:55 ` Eric W. Biederman
2015-10-06 7:55 ` Eric W. Biederman
2015-10-06 8:49 ` Linus Torvalds
2015-10-06 8:49 ` Linus Torvalds
2015-10-06 8:55 ` Linus Torvalds
2015-10-06 8:55 ` Linus Torvalds
2015-10-06 14:52 ` Eric W. Biederman
2015-10-06 14:52 ` Eric W. Biederman
2015-10-03 6:02 ` Can't we use timeout based OOM warning/killing? Tetsuo Handa
2015-10-03 6:02 ` Tetsuo Handa
2015-10-06 14:51 ` Tetsuo Handa
2015-10-06 14:51 ` Tetsuo Handa
2015-10-12 6:43 ` Tetsuo Handa
2015-10-12 6:43 ` Tetsuo Handa
2015-10-12 15:25 ` Silent hang up caused by pages being not scanned? Tetsuo Handa
2015-10-12 15:25 ` Tetsuo Handa
2015-10-12 21:23 ` Linus Torvalds
2015-10-12 21:23 ` Linus Torvalds
2015-10-13 12:21 ` Tetsuo Handa
2015-10-13 12:21 ` Tetsuo Handa
2015-10-13 16:37 ` Linus Torvalds
2015-10-13 16:37 ` Linus Torvalds
2015-10-14 12:21 ` Tetsuo Handa
2015-10-14 12:21 ` Tetsuo Handa
2015-10-15 13:14 ` Michal Hocko
2015-10-15 13:14 ` Michal Hocko
2015-10-16 15:57 ` Michal Hocko
2015-10-16 15:57 ` Michal Hocko
2015-10-16 18:34 ` Linus Torvalds
2015-10-16 18:34 ` Linus Torvalds
2015-10-16 18:49 ` Tetsuo Handa
2015-10-16 18:49 ` Tetsuo Handa
2015-10-19 12:57 ` Michal Hocko
2015-10-19 12:57 ` Michal Hocko
2015-10-19 12:53 ` Michal Hocko
2015-10-19 12:53 ` Michal Hocko
2015-10-13 13:32 ` Michal Hocko
2015-10-13 13:32 ` Michal Hocko
2015-10-13 16:19 ` Tetsuo Handa
2015-10-13 16:19 ` Tetsuo Handa
2015-10-14 13:22 ` Michal Hocko
2015-10-14 13:22 ` Michal Hocko
2015-10-14 14:38 ` Tetsuo Handa
2015-10-14 14:38 ` Tetsuo Handa
2015-10-14 14:59 ` Michal Hocko
2015-10-14 14:59 ` Michal Hocko
2015-10-14 15:06 ` Tetsuo Handa
2015-10-14 15:06 ` Tetsuo Handa
2015-10-26 11:44 ` Newbie's question: memory allocation when reclaiming memory Tetsuo Handa
2015-10-26 11:44 ` Tetsuo Handa
2015-11-05 8:46 ` Vlastimil Babka
2015-11-05 8:46 ` Vlastimil Babka
2015-10-06 15:25 ` Can't we use timeout based OOM warning/killing? Linus Torvalds
2015-10-08 15:33 ` Tetsuo Handa
2015-10-08 15:33 ` Tetsuo Handa
2015-10-10 12:50 ` Tetsuo Handa
2015-10-10 12:50 ` Tetsuo Handa
2015-09-28 22:24 ` can't oom-kill zap the victim's memory? David Rientjes
2015-09-28 22:24 ` David Rientjes
2015-09-29 7:57 ` Tetsuo Handa
2015-09-29 7:57 ` Tetsuo Handa
2015-09-29 22:56 ` David Rientjes
2015-09-29 22:56 ` David Rientjes
2015-09-30 4:25 ` Tetsuo Handa
2015-09-30 4:25 ` Tetsuo Handa
2015-09-30 10:21 ` Tetsuo Handa
2015-09-30 10:21 ` Tetsuo Handa
2015-09-30 21:11 ` David Rientjes
2015-09-30 21:11 ` David Rientjes
2015-10-01 12:13 ` Tetsuo Handa
2015-10-01 12:13 ` Tetsuo Handa
2015-10-01 14:48 ` Michal Hocko
2015-10-01 14:48 ` Michal Hocko
2015-10-02 13:06 ` Tetsuo Handa
2015-10-02 13:06 ` Tetsuo Handa
2015-10-06 18:45 ` Oleg Nesterov
2015-10-06 18:45 ` Oleg Nesterov
2015-10-07 11:03 ` Tetsuo Handa
2015-10-07 11:03 ` Tetsuo Handa
2015-10-07 12:00 ` Oleg Nesterov
2015-10-07 12:00 ` Oleg Nesterov
2015-10-08 14:04 ` Michal Hocko
2015-10-08 14:04 ` Michal Hocko
2015-10-08 14:01 ` Michal Hocko
2015-10-08 14:01 ` Michal Hocko
2015-09-21 16:51 ` Tetsuo Handa
2015-09-21 16:51 ` Tetsuo Handa
2015-09-22 12:43 ` Oleg Nesterov
2015-09-22 12:43 ` Oleg Nesterov
2015-09-22 14:30 ` Tetsuo Handa
2015-09-22 14:30 ` Tetsuo Handa
2015-09-22 14:45 ` Oleg Nesterov
2015-09-22 14:45 ` Oleg Nesterov
2015-09-21 23:42 ` David Rientjes
2015-09-21 23:42 ` David Rientjes
2015-09-21 16:55 ` Linus Torvalds
2015-09-21 16:55 ` Linus Torvalds
2015-09-20 14:50 ` Tetsuo Handa
2015-09-20 14:50 ` Tetsuo Handa
2015-09-20 14:55 ` Oleg Nesterov
2015-09-20 14:55 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150920125642.GA2104@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=hannes@cmpxchg.org \
--cc=kwalker@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=rientjes@google.com \
--cc=skozina@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=vdavydov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.