From: Xiaowei Yang <xiaowei.yang@huawei.com>
To: Nick Piggin <npiggin@kernel.dk>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Jan Beulich <JBeulich@novell.com>
Cc: Kenneth Lee <liguozhu@huawei.com>,
wangzhenguo@huawei.com, linqaingmin <linqiangmin@huawei.com>,
fanhenglong@huawei.com, Wu Fengguang <fengguang.wu@intel.com>,
linux-kernel@vger.kernel.org, Kaushik Barde <kbarde@huawei.com>
Subject: One (possible) x86 get_user_pages bug
Date: Thu, 27 Jan 2011 21:05:30 +0800 [thread overview]
Message-ID: <4D416D9A.9010603@huawei.com> (raw)
Actually this bug is met with a SLES11 SP1 dom0 kernel
(2.6.32.12-0.7-xen), and we still can't reproduce it with a native
2.6.32 kernel. But as we suspect the native kernel might have the same
issue, we send it to LKML for consultant.
At first the error message looks like this:
----------------------------------------------------------------
[201674.150162] BUG: Bad page state in process java pfn:d13b8
[201674.151345] page:ffff8800075c7040 flags:4000000000200000 count:0
mapcount:0 mapping:(null) index:7f093bdfd
[201674.152474] Pid: 14793, comm: java Tainted: G N
2.6.32.12-0.7-xen #2
[201674.153585] Call Trace:
[201674.154643] [<ffffffff80009a75>] dump_trace+0x65/0x180
[201674.155686] [<ffffffff80369f26>] dump_stack+0x69/0x73
[201674.156744] [<ffffffff8009c0df>] bad_page+0xdf/0x160
[201674.157773] [<ffffffff800614f1>] get_futex_key+0x71/0x1a0
[201674.158820] [<ffffffff80061f72>] futex_wake+0x52/0x130
[201674.159852] [<ffffffff8006414f>] do_futex+0x11f/0xc40
[201674.160875] [<ffffffff80064cf2>] sys_futex+0x82/0x160
[201674.161907] [<ffffffff8003aa26>] mm_release+0xb6/0x110
[201674.162960] [<ffffffff8003f65e>] exit_mm+0x1e/0x150
[201674.163991] [<ffffffff80040567>] do_exit+0x127/0x7e0
[201674.165028] [<ffffffff80040c32>] sys_exit+0x12/0x20
[201674.166070] [<ffffffff80007388>] system_call_fastpath+0x16/0x1b
[201674.167130] [<00007f098db046b0>] 0x7f098db046b0
----------------------------------------------------------------
After CONFIG_DEBUG_VM option turned on (kind of), the faulting spot is
captured -- get_page() in gup_pte_range() is used upon a free page and
it triggers a BUG_ON.
We created a scenario to reproduce the bug:
----------------------------------------------------------------
// proc1/proc1.2 are 2 threads sharing one page table.
// proc1 is the parent of proc2.
proc1 proc2 proc1.2
... ... // in gup_pte_range()
... ... pte = gup_get_pte()
... ... page1 = pte_page(pte) // (1)
do_wp_page(page1) ... ...
... exit_map() ...
... ... get_page(page1) // (2)
-----------------------------------------------------------------
do_wp_page() and exit_map() cause page1 to be released into free list
before get_page() in proc1.2 is called. The longer the delay between
(1)&(2), the easier the BUG_ON shows.
An experimental patch is made to prevent the PTE being modified in the
middle of gup_pte_range(). The BUG_ON disappears afterward.
However, from the comments embedded in gup.c, it seems deliberate to
avoid the lock in the fast path. The question is: if so, how to avoid
the above scenario?
Thanks,
xiaowei
--------------------------------------------------------------------
--- /usr/src/linux-2.6.32.12-0.7/arch/x86/mm/gup.c.org 2011-01-27
20:11:45.000000000 +0800
+++ /usr/src/linux-2.6.32.12-0.7/arch/x86/mm/gup.c 2011-01-27
20:11:22.000000000 +0800
@@ -72,17 +72,18 @@static noinline int gup_pte_range(pmd_t pmd, unsigned
long addr,
unsigned long end, int write, struct
page **pages, int *nr)
{
unsigned long mask;
pte_t *ptep;
+ spinlock_t *ptl;
mask = _PAGE_PRESENT|_PAGE_USER;
if (write)
mask |= _PAGE_RW;
- ptep = pte_offset_map(&pmd, addr);
+ ptep = pte_offset_map_lock(current->mm, &pmd, addr, &ptl);
do {
pte_t pte = gup_get_pte(ptep);
struct page *page;
if ((pte_flags(pte) & (mask | _PAGE_SPECIAL)) != mask) {
- pte_unmap(ptep);
+ pte_unmap_unlock(ptep, ptl);
return 0;
}
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
@@ -90,8 +91,9 @@
get_page(page);
pages[*nr] = page;
(*nr)++;
} while (ptep++, addr += PAGE_SIZE, addr != end);
- pte_unmap(ptep - 1);
+ pte_unmap_unlock(ptep - 1, ptl);
return 1;
}
next reply other threads:[~2011-01-27 13:11 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-27 13:05 Xiaowei Yang [this message]
2011-01-27 13:56 ` One (possible) x86 get_user_pages bug Peter Zijlstra
2011-01-27 14:30 ` Jan Beulich
2011-01-28 10:51 ` Peter Zijlstra
2011-01-27 14:49 ` Jan Beulich
2011-01-27 14:49 ` Jan Beulich
2011-01-27 15:01 ` Peter Zijlstra
2011-01-27 18:27 ` Jeremy Fitzhardinge
2011-01-27 18:27 ` Jeremy Fitzhardinge
2011-01-27 19:27 ` Peter Zijlstra
2011-01-30 13:01 ` Avi Kivity
2011-01-30 22:21 ` Kaushik Barde
2011-01-30 22:21 ` Kaushik Barde
2011-01-31 18:04 ` Jeremy Fitzhardinge
2011-01-31 20:10 ` Kaushik Barde
2011-01-31 20:10 ` Kaushik Barde
2011-01-31 22:10 ` Jeremy Fitzhardinge
2011-01-27 16:07 ` Jan Beulich
2011-01-27 16:07 ` Jan Beulich
2011-01-27 16:25 ` Peter Zijlstra
2011-01-27 16:41 ` Jan Beulich
2011-01-27 16:41 ` Jan Beulich
2011-01-27 16:56 ` Peter Zijlstra
2011-01-27 21:24 ` Nick Piggin
2011-01-28 7:17 ` Xiaowei Yang
2011-01-28 7:17 ` Xiaowei Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D416D9A.9010603@huawei.com \
--to=xiaowei.yang@huawei.com \
--cc=JBeulich@novell.com \
--cc=a.p.zijlstra@chello.nl \
--cc=fanhenglong@huawei.com \
--cc=fengguang.wu@intel.com \
--cc=kbarde@huawei.com \
--cc=liguozhu@huawei.com \
--cc=linqiangmin@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=npiggin@kernel.dk \
--cc=wangzhenguo@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.