From: Alexey Brodkin <Alexey.Brodkin@synopsys.com>
To: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
"linux-snps-arc@lists.infradead.org"
<linux-snps-arc@lists.infradead.org>
Subject: arc: mm->mmap_sem gets locked in do_page_fault() in case of OOM killer invocation
Date: Fri, 16 Feb 2018 12:40:30 +0000 [thread overview]
Message-ID: <1518784830.3544.33.camel@synopsys.com> (raw)
Hi Vineet,
While playing with OOM killer I bumped in a pure software deadlock on ARC
which is even observed in simulation (i.e. it has nothing to do with HW peculiarities).
What's nice kernel even sees that lock-up if "Lock Debugging" is enabled.
That's what I see:
-------------------------------------------->8-------------------------------------------
# /home/oom-test 450 & /home/oom-test 450
oom-test invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
CPU: 0 PID: 67 Comm: oom-test Not tainted 4.14.19 #2
Stack Trace:
arc_unwind_core.constprop.1+0xd4/0xf8
dump_header.isra.6+0x84/0x2f8
oom_kill_process+0x258/0x7c8
out_of_memory+0xb8/0x5e0
__alloc_pages_nodemask+0x922/0xd28
handle_mm_fault+0x284/0xd90
do_page_fault+0xf6/0x2a0
ret_from_exception+0x0/0x8
Mem-Info:
active_anon:62276 inactive_anon:341 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
slab_reclaimable:26 slab_unreclaimable:196
mapped:105 shmem:578 pagetables:263 bounce:0
free:344 free_pcp:39 free_cma:0
Node 0 active_anon:498208kB inactive_anon:2728kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:840kB
dirty:
0kB writeback:0kB shmem:4624kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Normal free:2752kB min:2840kB low:3544kB high:4248kB active_anon:498208kB inactive_anon:2728kB active_file:0kB inactive_file:0kB unevictable:0kB
writependin
g:0kB present:524288kB managed:508584kB mlocked:0kB kernel_stack:240kB pagetables:2104kB bounce:0kB free_pcp:312kB local_pcp:312kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*8kB 0*16kB 0*32kB 1*64kB (M) 1*128kB (M) 0*256kB 1*512kB (M) 0*1024kB 1*2048kB (M) 0*4096kB 0*8192kB = 2752kB
578 total pagecache pages
65536 pages RAM
0 pages HighMem/MovableOnly
1963 pages reserved
[ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[ 41] 0 41 157 103 3 0 0 0 syslogd
[ 43] 0 43 156 106 3 0 0 0 klogd
[ 63] 0 63 157 99 3 0 0 0 getty
[ 64] 0 64 159 118 3 0 0 0 sh
[ 66] 0 66 115291 31094 124 0 0 0 oom-test
[ 67] 0 67 115291 31004 124 0 0 0 oom-test
Out of memory: Kill process 66 (oom-test) score 476 or sacrifice child
Killed process 66 (oom-test) total-vm:922328kB, anon-rss:248328kB, file-rss:0kB, shmem-rss:424kB
============================================
WARNING: possible recursive locking detected
4.14.19 #2 Not tainted
--------------------------------------------
oom-test/66 is trying to acquire lock:
(&mm->mmap_sem){++++}, at: [<80217d50>] do_exit+0x444/0x7f8
but task is already holding lock:
(&mm->mmap_sem){++++}, at: [<8021028a>] do_page_fault+0x9e/0x2a0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&mm->mmap_sem);
lock(&mm->mmap_sem);
*** DEADLOCK ***
May be due to missing lock nesting notation
1 lock held by oom-test/66:
#0: (&mm->mmap_sem){++++}, at: [<8021028a>] do_page_fault+0x9e/0x2a0
stack backtrace:
CPU: 0 PID: 66 Comm: oom-test Not tainted 4.14.19 #2
Stack Trace:
arc_unwind_core.constprop.1+0xd4/0xf8
__lock_acquire+0x582/0x1494
lock_acquire+0x3c/0x58
down_read+0x1a/0x28
do_exit+0x444/0x7f8
do_group_exit+0x26/0x8c
get_signal+0x1aa/0x7d4
do_signal+0x30/0x220
resume_user_mode_begin+0x90/0xd8
-------------------------------------------->8-------------------------------------------
Looking at our code in "arch/arc/mm/fault.c" I may see why "mm->mmap_sem" is not released:
1. fatal_signal_pending(current) returns non-zero value
2. ((fault & VM_FAULT_ERROR) && !(fault & VM_FAULT_RETRY)) is false thus up_read(&mm->mmap_sem)
is not executed.
3. It was a user-space process thus we simply return [with "mm->mmap_sem" still held].
See the code snippet below:
-------------------------------------------->8-------------------------------------------
/* If Pagefault was interrupted by SIGKILL, exit page fault "early" */
if (unlikely(fatal_signal_pending(current))) {
if ((fault & VM_FAULT_ERROR) && !(fault & VM_FAULT_RETRY))
up_read(&mm->mmap_sem);
if (user_mode(regs))
return;
}
-------------------------------------------->8-------------------------------------------
Then we leave page fault handler and before returning to user-space we
process pending signal which happen to be a death signal and so we end-up executing the
following code-path (see stack trace above):
do_exit() -> exit_mm() -> down_read(&mm->mmap_sem) <-- And here we go locking ourselves for good.
What's interesting most if not all architectures return from page fault handler with
"mm->mmap_sem" held in case of fatal_signal_pending(). So I would expect the same failure as I see on ARC
to happen on other arches too... though I was not able to trigger that on ARM (WandBoard Quad).
I think because on ARM and many others the check is a bit different:
-------------------------------------------->8-------------------------------------------
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
if (!user_mode(regs))
goto no_context;
return 0;
}
-------------------------------------------->8-------------------------------------------
So to get into problematic code-path (i.e. exit with "mm->mmap_sem" still held) we need
__do_page_fault() to return VM_FAULT_RETRY. Which makes reproduction even more complicated but
I think it's still doable :)
The simplest solution here seems to be unconditional up_read(&mm->mmap_sem) before return but
that's so strange it was not done by that time. Anyways any thought are very welcome!
-Alexey
WARNING: multiple messages have this Message-ID (diff)
From: Alexey.Brodkin@synopsys.com (Alexey Brodkin)
To: linux-snps-arc@lists.infradead.org
Subject: arc: mm->mmap_sem gets locked in do_page_fault() in case of OOM killer invocation
Date: Fri, 16 Feb 2018 12:40:30 +0000 [thread overview]
Message-ID: <1518784830.3544.33.camel@synopsys.com> (raw)
Hi Vineet,
While playing with OOM killer I bumped in a pure software deadlock on ARC
which is even observed in simulation (i.e. it has nothing to do with HW peculiarities).
What's nice kernel even sees that lock-up if "Lock Debugging" is enabled.
That's what I see:
-------------------------------------------->8-------------------------------------------
# /home/oom-test 450 & /home/oom-test 450
oom-test invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
CPU: 0 PID: 67 Comm: oom-test Not tainted 4.14.19 #2
Stack Trace:
arc_unwind_core.constprop.1+0xd4/0xf8
dump_header.isra.6+0x84/0x2f8
oom_kill_process+0x258/0x7c8
out_of_memory+0xb8/0x5e0
__alloc_pages_nodemask+0x922/0xd28
handle_mm_fault+0x284/0xd90
do_page_fault+0xf6/0x2a0
ret_from_exception+0x0/0x8
Mem-Info:
active_anon:62276 inactive_anon:341 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
slab_reclaimable:26 slab_unreclaimable:196
mapped:105 shmem:578 pagetables:263 bounce:0
free:344 free_pcp:39 free_cma:0
Node 0 active_anon:498208kB inactive_anon:2728kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:840kB
dirty:
0kB writeback:0kB shmem:4624kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Normal free:2752kB min:2840kB low:3544kB high:4248kB active_anon:498208kB inactive_anon:2728kB active_file:0kB inactive_file:0kB unevictable:0kB
writependin
g:0kB present:524288kB managed:508584kB mlocked:0kB kernel_stack:240kB pagetables:2104kB bounce:0kB free_pcp:312kB local_pcp:312kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*8kB 0*16kB 0*32kB 1*64kB (M) 1*128kB (M) 0*256kB 1*512kB (M) 0*1024kB 1*2048kB (M) 0*4096kB 0*8192kB = 2752kB
578 total pagecache pages
65536 pages RAM
0 pages HighMem/MovableOnly
1963 pages reserved
[ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[ 41] 0 41 157 103 3 0 0 0 syslogd
[ 43] 0 43 156 106 3 0 0 0 klogd
[ 63] 0 63 157 99 3 0 0 0 getty
[ 64] 0 64 159 118 3 0 0 0 sh
[ 66] 0 66 115291 31094 124 0 0 0 oom-test
[ 67] 0 67 115291 31004 124 0 0 0 oom-test
Out of memory: Kill process 66 (oom-test) score 476 or sacrifice child
Killed process 66 (oom-test) total-vm:922328kB, anon-rss:248328kB, file-rss:0kB, shmem-rss:424kB
============================================
WARNING: possible recursive locking detected
4.14.19 #2 Not tainted
--------------------------------------------
oom-test/66 is trying to acquire lock:
(&mm->mmap_sem){++++}, at: [<80217d50>] do_exit+0x444/0x7f8
but task is already holding lock:
(&mm->mmap_sem){++++}, at: [<8021028a>] do_page_fault+0x9e/0x2a0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&mm->mmap_sem);
lock(&mm->mmap_sem);
*** DEADLOCK ***
May be due to missing lock nesting notation
1 lock held by oom-test/66:
#0: (&mm->mmap_sem){++++}, at: [<8021028a>] do_page_fault+0x9e/0x2a0
stack backtrace:
CPU: 0 PID: 66 Comm: oom-test Not tainted 4.14.19 #2
Stack Trace:
arc_unwind_core.constprop.1+0xd4/0xf8
__lock_acquire+0x582/0x1494
lock_acquire+0x3c/0x58
down_read+0x1a/0x28
do_exit+0x444/0x7f8
do_group_exit+0x26/0x8c
get_signal+0x1aa/0x7d4
do_signal+0x30/0x220
resume_user_mode_begin+0x90/0xd8
-------------------------------------------->8-------------------------------------------
Looking at our code in "arch/arc/mm/fault.c" I may see why "mm->mmap_sem" is not released:
1. fatal_signal_pending(current) returns non-zero value
2. ((fault & VM_FAULT_ERROR) && !(fault & VM_FAULT_RETRY)) is false thus up_read(&mm->mmap_sem)
is not executed.
3. It was a user-space process thus we simply return [with "mm->mmap_sem" still held].
See the code snippet below:
-------------------------------------------->8-------------------------------------------
/* If Pagefault was interrupted by SIGKILL, exit page fault "early" */
if (unlikely(fatal_signal_pending(current))) {
if ((fault & VM_FAULT_ERROR) && !(fault & VM_FAULT_RETRY))
up_read(&mm->mmap_sem);
if (user_mode(regs))
return;
}
-------------------------------------------->8-------------------------------------------
Then we leave page fault handler and before returning to user-space we
process pending signal which happen to be a death signal and so we end-up executing the
following code-path (see stack trace above):
do_exit() -> exit_mm() -> down_read(&mm->mmap_sem) <-- And here we go locking ourselves for good.
What's interesting most if not all architectures return from page fault handler with
"mm->mmap_sem" held in case of fatal_signal_pending(). So I would expect the same failure as I see on ARC
to happen on other arches too... though I was not able to trigger that on ARM (WandBoard Quad).
I think because on ARM and many others the check is a bit different:
-------------------------------------------->8-------------------------------------------
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
if (!user_mode(regs))
goto no_context;
return 0;
}
-------------------------------------------->8-------------------------------------------
So to get into problematic code-path (i.e. exit with "mm->mmap_sem" still held) we need
__do_page_fault() to return VM_FAULT_RETRY. Which makes reproduction even more complicated but
I think it's still doable :)
The simplest solution here seems to be unconditional up_read(&mm->mmap_sem) before return but
that's so strange it was not done by that time. Anyways any thought are very welcome!
-Alexey
next reply other threads:[~2018-02-16 12:40 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-16 12:40 Alexey Brodkin [this message]
2018-02-16 12:40 ` arc: mm->mmap_sem gets locked in do_page_fault() in case of OOM killer invocation Alexey Brodkin
2018-02-26 20:44 ` Alexey Brodkin
2018-02-26 20:44 ` Alexey Brodkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1518784830.3544.33.camel@synopsys.com \
--to=alexey.brodkin@synopsys.com \
--cc=Vineet.Gupta1@synopsys.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-snps-arc@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.