From: alan@signal11.us (Alan Ott)
To: linux-arm-kernel@lists.infradead.org
Subject: Deadlock in do_page_fault() on ARM (old kernel)
Date: Wed, 15 Jan 2014 20:13:04 -0500 [thread overview]
Message-ID: <52D73220.3030108@signal11.us> (raw)
Hello,
I have a deadlock that I'm trying to understand. The symptom is multiple
tasks trying to acquire a read lock (down_read()) on mm->mmap_sem in
do_page_fault(). I'll be right up front and say that this is a fairly
old kernel (2.6.37 TI PSP kernel) on a fairly old processor DaVinci 6446.
At the time of the deadlock, sysrq's show-all-tasks shows the following
for three of the tasks which are deadlocked (there are more, but I just
picked the interesting ones; the full output is at [1]):
ui D c0ea8208 0 1405 1293 0x00000000
[<c0ea8208>] (schedule+0x33c/0x3c4) from [<c0eaa3b4>]
(__down_read+0xbc/0xd4)
[<c0eaa3b4>] (__down_read+0xbc/0xd4) from [<c0c0b378>]
(do_page_fault+0x94/0x248)
[<c0c0b378>] (do_page_fault+0x94/0x248) from [<c0c052e0>]
(do_DataAbort+0x34/0x94)
[<c0c052e0>] (do_DataAbort+0x34/0x94) from [<c0c05b0c>]
(__dabt_svc+0x4c/0x60)
Exception stack(0xc048dce8 to 0xc048dd30)
dce0: 400e9a94 c048ddb0 ffffffec 00000000 c048c000
c048dda4
dd00: 400e9a94 00000000 ffffff92 c048c000 00000000 00000001 00000014
c048dd34
dd20: 00000000 c0d1f68c 00000013 ffffffff
[<c0c05b0c>] (__dabt_svc+0x4c/0x60) from [<c0d1f68c>]
(__copy_to_user_std+0xcc/0x3a8)
ui D c0ea8208 0 1406 1293 0x00000000
[<c0ea8208>] (schedule+0x33c/0x3c4) from [<c0eaa3b4>]
(__down_read+0xbc/0xd4)
[<c0eaa3b4>] (__down_read+0xbc/0xd4) from [<c0c0b378>]
(do_page_fault+0x94/0x248)
[<c0c0b378>] (do_page_fault+0x94/0x248) from [<c0c052e0>]
(do_DataAbort+0x34/0x94)
[<c0c052e0>] (do_DataAbort+0x34/0x94) from [<c0c05f0c>]
(ret_from_exception+0x0/0x10)
Exception stack(0xc048ffb0 to 0xc048fff8)
ffa0: 00000060 0000000a 000000a8
0010d000
ffc0: 00c23d80 00c23de8 405af06c 00000000 405af03c 405af074 00000050
000001ff
ffe0: 405ae000 40185748 404f5c4c 404f393c 80000010 ffffffff
ui D c0ea8208 0 1411 1293 0x00000000
[<c0ea8208>] (schedule+0x33c/0x3c4) from [<c0eaa3b4>]
(__down_read+0xbc/0xd4)
[<c0eaa3b4>] (__down_read+0xbc/0xd4) from [<c0c0b378>]
(do_page_fault+0x94/0x248)
[<c0c0b378>] (do_page_fault+0x94/0x248) from [<c0c052e0>]
(do_DataAbort+0x34/0x94)
[<c0c052e0>] (do_DataAbort+0x34/0x94) from [<c0c05f0c>]
(ret_from_exception+0x0/0x10)
Exception stack(0xc053bfb0 to 0xc053bff8)
bfa0: 00000000 00000001 00ba3610
00000000
bfc0: 00000000 00ba3610 00bb6020 00ba3610 40074000 00b91024 415e4930
00000583
bfe0: 00b611a0 415e38e0 4005f3e4 ffff0fc0 60000010 ffffffff
---- [snip] ----
Showing all locks held in the system:
1 lock held by getty/1294:
#0: (&tty->atomic_read_lock){+.+...}, at: [<c0d45bf0>]
n_tty_read+0x21c/0x670
1 lock held by ui/1405:
#0: (&mm->mmap_sem){++++++}, at: [<c0c0b378>] do_page_fault+0x94/0x248
1 lock held by ui/1406:
#0: (&mm->mmap_sem){++++++}, at: [<c0c0b378>] do_page_fault+0x94/0x248
1 lock held by ui/1408:
#0: (&mm->mmap_sem){++++++}, at: [<c0c0b378>] do_page_fault+0x94/0x248
1 lock held by ui/1409:
#0: (&mm->mmap_sem){++++++}, at: [<c0c0b378>] do_page_fault+0x94/0x248
1 lock held by ui/1411:
#0: (&mm->mmap_sem){++++++}, at: [<c0c0b378>] do_page_fault+0x94/0x248
1 lock held by ui/1416:
#0: (&mm->mmap_sem){++++++}, at: [<c0c6e604>] sys_mmap_pgoff+0x70/0xc0
1 lock held by ui/1418:
#0: (&mm->mmap_sem){++++++}, at: [<c0c0b378>] do_page_fault+0x94/0x248
1 lock held by ui/1420:
#0: (&mm->mmap_sem){++++++}, at: [<c0c6e604>] sys_mmap_pgoff+0x70/0xc0
1 lock held by ui/1434:
#0: (&tty->atomic_read_lock){+.+...}, at: [<c0d45bf0>]
n_tty_read+0x21c/0x670
Note that above, do_page_fault() takes out a read lock (down_read()) and
sys_mmap_pgoff() takes out a write lock (down_write()).
I've searched for this kind of problem and found two patches which seem
to be related to this issue[2]. I have applied both with no better results.
So my questions are:
1. Why don't I see a full backtrace beyond the exception stack? It's the
same when dump_stack() is called manually.
2. __copy_to_user_memcpy() takes a read lock (down_read()) on
mm->mmap_sem. While that lock is held, __copy_to_user_memcpy() can
generate a page fault, causing do_page_fault() to get called, which will
also try to get a read lock (down_read()) on mm->mmap_sem. Multiple read
locks can be taken on an rw_semaphore, but deadlock will occur if
another thread tries to get a write lock (down_write()) in between. For
example:
Task 1: Task 2:
down_read(sem)
down_write(sem) <-- Goes to sleep
down_read(sem) <-- Goes to sleep
There is a thread from 2005[3] which seems to discuss the same concept
of recursive rw_semaphores, but for futexes.
Other comments:
1. My analysis of this probably wrong. Otherwise it seems many others
would have the same problem, and they don't seem to. I'm hoping this
email will help to correct my understanding.
2. I looked through the git logs for recent (since 2.6.37 time frame)
and nothing else jumped out@me as being an obvious fix for this
situation.
Thanks for any insight you can give,
Alan.
[1] http://www.signal11.us/~alan/show-all-tasks-deadlock.txt
[2] Some websites/bugtrackers mention this commit with a similar issue,
but I'm not entirely sure how it's related:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8878a539ff19a43cf3729e7562cd528f490246ae
This one seems obviously related, but has no effect on my system:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=435a7ef52db7d86e67a009b36cac1457f8972391
[3] http://thread.gmane.org/gmane.linux.kernel/280900
next reply other threads:[~2014-01-16 1:13 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-16 1:13 Alan Ott [this message]
2014-01-17 13:46 ` Deadlock in do_page_fault() on ARM (old kernel) Russell King - ARM Linux
2014-01-18 0:57 ` Alan Ott
2014-01-18 1:20 ` Russell King - ARM Linux
2014-01-20 23:50 ` Alan Ott
2014-01-20 10:15 ` Michal Hocko
2014-01-20 18:45 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52D73220.3030108@signal11.us \
--to=alan@signal11.us \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).