Re: arm/ksm: Unable to handle kernel paging request in get_ksm_page() and ksm_scan_thread()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Xishi Qiu <qiuxishi@huawei.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	neilb@suse.de, heiko.carstens@de.ibm.com, dhowells@redhat.com,
	izik.eidus@ravellosystems.com, aarcange@redhat.com,
	chrisw@sous-sol.org, Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	weiyuan.wei@huawei.com
Subject: Re: arm/ksm: Unable to handle kernel paging request in get_ksm_page() and ksm_scan_thread()
Date: Mon, 30 Mar 2015 09:46:13 +0800	[thread overview]
Message-ID: <5518AAE5.8060308@huawei.com> (raw)
In-Reply-To: <alpine.LSU.2.11.1503291701580.1052@eggly.anvils>

On 2015/3/30 8:43, Hugh Dickins wrote:

> On Sat, 28 Mar 2015, Xishi Qiu wrote:
>> On 2015/3/26 21:23, Xishi Qiu wrote:
>>
>>> Here are two panic logs from smart phone test, and the kernel version is v3.10.
>>>
>>> log1 is "Unable to handle kernel paging request at virtual address c0704da020", it should be ffffffc0704da020, right?
> 
> That one was an oops at get_ksm_page+0x34/0x150: I'm pretty sure that
> comes from the "kpfn = ACCESS_ONCE(stable_node->kpfn)" line, that the
> stable_node pointer (in x21 or x22) has upper bits cleared; which
> suggests corruption of the rmap_item supposed to point to it.
> 
> get_ksm_page() is tricky with ACCESS_ONCEs against page migration,
> and the structures tricky with unions; but pointers overlay pointers
> in those unions, I don't see any way we might pick up an address with
> the upper 24 or 32 bits cleared due to that.
> 
>>> and log2 is "Unable to handle kernel paging request at virtual address 1e000796", it should be ffffffc01e000796, right?
> 
> And this one was an oops at ksm_scan_thread+0x4ac/0xce0; as is the oops
> you posted below.  Which contains lots of hex numbers, but very little
> info I can work from.
> 
> Please make a CONFIG_DEBUG_INFO=y build of one of the kernels you're
> hitting this with, then use the disassembler (objdump -ld perhaps) to
> identify precisely which line of ksm.c that is oopsing on: the compiler
> will have inlined more interesting functions into ksm_scan_thread, so
> I haven't a clue where it's actually oopsing.
> 
> Maybe we'll find that it's also oopsing on a kernel virtual address
> from an rmap_item, maybe we won't.
> 
> And I don't read arm64 assembler at all, so I shall be rather limited
> in what I can tell you, I'm afraid.
> 
>>>
>>> I cann't repeat the panic by test, so could anyone tell me this is the 
>>> bug of ksm or other reason?
> 
> I've not heard of any problem like this with KSM on other architectures.
> Maybe it is making some assumption which is invalid on arm64, but I'd
> have thought we'd have heard about that before now.  My guess is that
> something in your kernel is stamping on KSM's structures.
> 
> A relevant experiment (after identifying the oops line in your current
> kernel) might be to switch from CONFIG_SLAB=y to CONFIG_SLUB=y or vice
> versa.  I doubt SLAB or SLUB is to blame, but changing allocator might
> shake things up in a way that either hides the problem, or shifts it
> elsewhere.
> 
> Hugh
> 

Hi Hugh,

Thanks for your reply. There are 3 cases as follows, at first I think maybe
something causes the oops, but all of the cases are relevant to "rmap_item",
so I have no idea.

1. ksm_scan_thread+0xa88/0xce0 -> unstable_tree_search_insert() -> tree_rmap_item = rb_entry(*new, struct rmap_item, node);

2. ksm_scan_thread+0x4ac/0xce0 -> get_next_rmap_item() -> if ((rmap_item->address & PAGE_MASK) == addr)

3. get_ksm_page+0x34/0x150 -> get_ksm_page() -> kpfn = ACCESS_ONCE(stable_node->kpfn);

Thanks,
Xishi Qiu


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Xishi Qiu <qiuxishi@huawei.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>, <neilb@suse.de>,
	<heiko.carstens@de.ibm.com>, <dhowells@redhat.com>,
	<izik.eidus@ravellosystems.com>, <aarcange@redhat.com>,
	<chrisw@sous-sol.org>, Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>, <weiyuan.wei@huawei.com>
Subject: Re: arm/ksm: Unable to handle kernel paging request in get_ksm_page() and ksm_scan_thread()
Date: Mon, 30 Mar 2015 09:46:13 +0800	[thread overview]
Message-ID: <5518AAE5.8060308@huawei.com> (raw)
In-Reply-To: <alpine.LSU.2.11.1503291701580.1052@eggly.anvils>

On 2015/3/30 8:43, Hugh Dickins wrote:

> On Sat, 28 Mar 2015, Xishi Qiu wrote:
>> On 2015/3/26 21:23, Xishi Qiu wrote:
>>
>>> Here are two panic logs from smart phone test, and the kernel version is v3.10.
>>>
>>> log1 is "Unable to handle kernel paging request at virtual address c0704da020", it should be ffffffc0704da020, right?
> 
> That one was an oops at get_ksm_page+0x34/0x150: I'm pretty sure that
> comes from the "kpfn = ACCESS_ONCE(stable_node->kpfn)" line, that the
> stable_node pointer (in x21 or x22) has upper bits cleared; which
> suggests corruption of the rmap_item supposed to point to it.
> 
> get_ksm_page() is tricky with ACCESS_ONCEs against page migration,
> and the structures tricky with unions; but pointers overlay pointers
> in those unions, I don't see any way we might pick up an address with
> the upper 24 or 32 bits cleared due to that.
> 
>>> and log2 is "Unable to handle kernel paging request at virtual address 1e000796", it should be ffffffc01e000796, right?
> 
> And this one was an oops at ksm_scan_thread+0x4ac/0xce0; as is the oops
> you posted below.  Which contains lots of hex numbers, but very little
> info I can work from.
> 
> Please make a CONFIG_DEBUG_INFO=y build of one of the kernels you're
> hitting this with, then use the disassembler (objdump -ld perhaps) to
> identify precisely which line of ksm.c that is oopsing on: the compiler
> will have inlined more interesting functions into ksm_scan_thread, so
> I haven't a clue where it's actually oopsing.
> 
> Maybe we'll find that it's also oopsing on a kernel virtual address
> from an rmap_item, maybe we won't.
> 
> And I don't read arm64 assembler at all, so I shall be rather limited
> in what I can tell you, I'm afraid.
> 
>>>
>>> I cann't repeat the panic by test, so could anyone tell me this is the 
>>> bug of ksm or other reason?
> 
> I've not heard of any problem like this with KSM on other architectures.
> Maybe it is making some assumption which is invalid on arm64, but I'd
> have thought we'd have heard about that before now.  My guess is that
> something in your kernel is stamping on KSM's structures.
> 
> A relevant experiment (after identifying the oops line in your current
> kernel) might be to switch from CONFIG_SLAB=y to CONFIG_SLUB=y or vice
> versa.  I doubt SLAB or SLUB is to blame, but changing allocator might
> shake things up in a way that either hides the problem, or shifts it
> elsewhere.
> 
> Hugh
> 

Hi Hugh,

Thanks for your reply. There are 3 cases as follows, at first I think maybe
something causes the oops, but all of the cases are relevant to "rmap_item",
so I have no idea.

1. ksm_scan_thread+0xa88/0xce0 -> unstable_tree_search_insert() -> tree_rmap_item = rb_entry(*new, struct rmap_item, node);

2. ksm_scan_thread+0x4ac/0xce0 -> get_next_rmap_item() -> if ((rmap_item->address & PAGE_MASK) == addr)

3. get_ksm_page+0x34/0x150 -> get_ksm_page() -> kpfn = ACCESS_ONCE(stable_node->kpfn);

Thanks,
Xishi Qiu

next prev parent reply	other threads:[~2015-03-30  1:59 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-26 13:23 arm/ksm: Unable to handle kernel paging request in get_ksm_page() and ksm_scan_thread() Xishi Qiu
2015-03-26 13:23 ` Xishi Qiu
2015-03-28  3:16 ` Xishi Qiu
2015-03-28  3:16   ` Xishi Qiu
2015-03-30  0:43   ` Hugh Dickins
2015-03-30  0:43     ` Hugh Dickins
2015-03-30  1:46     ` Xishi Qiu [this message]
2015-03-30  1:46       ` Xishi Qiu
2015-03-30  3:06       ` Xishi Qiu
2015-03-30  3:06         ` Xishi Qiu
2015-03-30  4:36         ` Hugh Dickins
2015-03-30  4:36           ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5518AAE5.8060308@huawei.com \
    --to=qiuxishi@huawei.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=chrisw@sous-sol.org \
    --cc=dhowells@redhat.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hughd@google.com \
    --cc=izik.eidus@ravellosystems.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=neilb@suse.de \
    --cc=peterz@infradead.org \
    --cc=weiyuan.wei@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.