mm, ksm: NULL ptr deref in unstable_tree_search

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* mm, ksm: NULL ptr deref in unstable_tree_search_insert
@ 2012-12-19  1:17 Sasha Levin
  2012-12-19  1:36 ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Sasha Levin @ 2012-12-19  1:17 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins; +Cc: linux-mm, linux-kernel@vger.kernel.org

Hi all,

While fuzzing with trinity inside a KVM tools guest, running latest linux-next kernel, I've
stumbled on the following:

[  127.959264] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110
[  127.960379] IP: [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90
[  127.960379] PGD cc54067 PUD cc55067 PMD 0
[  127.960379] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[  127.960379] Dumping ftrace buffer:
[  127.960379]    (ftrace buffer empty)
[  127.960379] CPU 0
[  127.960379] Pid: 3174, comm: ksmd Tainted: G        W    3.7.0-next-20121218-sasha-00023-g8e46e86 #220
[  127.978032] RIP: 0010:[<ffffffff81185b60>]  [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90
[  127.978032] RSP: 0018:ffff8800137abb78  EFLAGS: 00010046
[  127.978032] RAX: 0000000000000086 RBX: 0000000000000110 RCX: 0000000000000001
[  127.978032] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000110
[  127.978032] RBP: ffff8800137abc18 R08: 0000000000000002 R09: 0000000000000000
[  127.978032] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[  127.978032] R13: 0000000000000002 R14: ffff8800137b0000 R15: 0000000000000000
[  127.978032] FS:  0000000000000000(0000) GS:ffff8800bfc00000(0000) knlGS:0000000000000000
[  127.978032] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  127.978032] CR2: 0000000000000110 CR3: 000000000cc51000 CR4: 00000000000406f0
[  127.978032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  127.978032] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  127.978032] Process ksmd (pid: 3174, threadinfo ffff8800137aa000, task ffff8800137b0000)
[  127.978032] Stack:
[  127.978032]  ffff8800137abba8 ffffffff863d8b50 ffff8800137b0948 ffffffff863d8b50
[  127.978032]  ffff8800137abbb8 ffffffff81180a12 ffff8800137abbb8 ffffffff81180a9e
[  127.978032]  ffff8800137abbe8 ffffffff8118108e ffff8800137abc18 0000000000000000
[  127.978032] Call Trace:
[  127.978032]  [<ffffffff81180a12>] ? get_lock_stats+0x22/0x70
[  127.978032]  [<ffffffff81180a9e>] ? put_lock_stats.isra.16+0xe/0x40
[  127.978032]  [<ffffffff8118108e>] ? lock_release_holdtime+0x11e/0x130
[  127.978032]  [<ffffffff811889aa>] lock_acquire+0x1ca/0x270
[  127.978032]  [<ffffffff8125992f>] ? unstable_tree_search_insert+0x9f/0x260
[  127.978032]  [<ffffffff83cd7337>] down_read+0x47/0x90
[  127.978032]  [<ffffffff8125992f>] ? unstable_tree_search_insert+0x9f/0x260
[  127.978032]  [<ffffffff8125992f>] unstable_tree_search_insert+0x9f/0x260
[  127.978032]  [<ffffffff8125af27>] cmp_and_merge_page+0xe7/0x1e0
[  127.978032]  [<ffffffff8125b085>] ksm_do_scan+0x65/0xa0
[  127.978032]  [<ffffffff8125b12f>] ksm_scan_thread+0x6f/0x2d0
[  127.978032]  [<ffffffff8113de40>] ? abort_exclusive_wait+0xb0/0xb0
[  127.978032]  [<ffffffff8125b0c0>] ? ksm_do_scan+0xa0/0xa0
[  127.978032]  [<ffffffff8113cbd3>] kthread+0xe3/0xf0
[  127.978032]  [<ffffffff8113caf0>] ? __kthread_bind+0x40/0x40
[  127.978032]  [<ffffffff83cdae7c>] ret_from_fork+0x7c/0xb0
[  127.978032]  [<ffffffff8113caf0>] ? __kthread_bind+0x40/0x40
[  127.978032] Code: 00 83 3d c3 2b b0 05 00 0f 85 d5 09 00 00 be f9 0b 00 00 48 c7 c7 1c d0 b2 84 89 55 88 e8 89 82 f8 ff 8b 55
88 e9 b9 09 00 00 90 <48> 81 3b 60 59 22 86 b8 01 00 00 00 44 0f 44 e8 41 83 fc 01 77
[  127.978032] RIP  [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90
[  127.978032]  RSP <ffff8800137abb78>
[  127.978032] CR2: 0000000000000110
[  127.978032] ---[ end trace 3dc1b0c5db8c1230 ]---

The relevant piece of code is:

	static struct page *get_mergeable_page(struct rmap_item *rmap_item)
	{
	        struct mm_struct *mm = rmap_item->mm;
	        unsigned long addr = rmap_item->address;
	        struct vm_area_struct *vma;
	        struct page *page;
	
	        down_read(&mm->mmap_sem);

Where 'mm' is NULL. I'm not really sure how it happens though.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mm, ksm: NULL ptr deref in unstable_tree_search_insert
  2012-12-19  1:17 mm, ksm: NULL ptr deref in unstable_tree_search_insert Sasha Levin
@ 2012-12-19  1:36 ` Hugh Dickins
  2012-12-19 12:16   ` Petr Holasek
  2012-12-19 16:32   ` Petr Holasek
  0 siblings, 2 replies; 6+ messages in thread
From: Hugh Dickins @ 2012-12-19  1:36 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Andrew Morton, Petr Holasek, linux-mm,
	linux-kernel@vger.kernel.org

On Tue, 18 Dec 2012, Sasha Levin wrote:

> Hi all,
> 
> While fuzzing with trinity inside a KVM tools guest, running latest linux-next kernel, I've
> stumbled on the following:
> 
> [  127.959264] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110
> [  127.960379] IP: [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90
> [  127.960379] PGD cc54067 PUD cc55067 PMD 0
> [  127.960379] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [  127.960379] Dumping ftrace buffer:
> [  127.960379]    (ftrace buffer empty)
> [  127.960379] CPU 0
> [  127.960379] Pid: 3174, comm: ksmd Tainted: G        W    3.7.0-next-20121218-sasha-00023-g8e46e86 #220
> [  127.978032] RIP: 0010:[<ffffffff81185b60>]  [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90
> [  127.978032] RSP: 0018:ffff8800137abb78  EFLAGS: 00010046
> [  127.978032] RAX: 0000000000000086 RBX: 0000000000000110 RCX: 0000000000000001
> [  127.978032] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000110
> [  127.978032] RBP: ffff8800137abc18 R08: 0000000000000002 R09: 0000000000000000
> [  127.978032] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
> [  127.978032] R13: 0000000000000002 R14: ffff8800137b0000 R15: 0000000000000000
> [  127.978032] FS:  0000000000000000(0000) GS:ffff8800bfc00000(0000) knlGS:0000000000000000
> [  127.978032] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  127.978032] CR2: 0000000000000110 CR3: 000000000cc51000 CR4: 00000000000406f0
> [  127.978032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  127.978032] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  127.978032] Process ksmd (pid: 3174, threadinfo ffff8800137aa000, task ffff8800137b0000)
> [  127.978032] Stack:
> [  127.978032]  ffff8800137abba8 ffffffff863d8b50 ffff8800137b0948 ffffffff863d8b50
> [  127.978032]  ffff8800137abbb8 ffffffff81180a12 ffff8800137abbb8 ffffffff81180a9e
> [  127.978032]  ffff8800137abbe8 ffffffff8118108e ffff8800137abc18 0000000000000000
> [  127.978032] Call Trace:
> [  127.978032]  [<ffffffff81180a12>] ? get_lock_stats+0x22/0x70
> [  127.978032]  [<ffffffff81180a9e>] ? put_lock_stats.isra.16+0xe/0x40
> [  127.978032]  [<ffffffff8118108e>] ? lock_release_holdtime+0x11e/0x130
> [  127.978032]  [<ffffffff811889aa>] lock_acquire+0x1ca/0x270
> [  127.978032]  [<ffffffff8125992f>] ? unstable_tree_search_insert+0x9f/0x260
> [  127.978032]  [<ffffffff83cd7337>] down_read+0x47/0x90
> [  127.978032]  [<ffffffff8125992f>] ? unstable_tree_search_insert+0x9f/0x260
> [  127.978032]  [<ffffffff8125992f>] unstable_tree_search_insert+0x9f/0x260
> [  127.978032]  [<ffffffff8125af27>] cmp_and_merge_page+0xe7/0x1e0
> [  127.978032]  [<ffffffff8125b085>] ksm_do_scan+0x65/0xa0
> [  127.978032]  [<ffffffff8125b12f>] ksm_scan_thread+0x6f/0x2d0
> [  127.978032]  [<ffffffff8113de40>] ? abort_exclusive_wait+0xb0/0xb0
> [  127.978032]  [<ffffffff8125b0c0>] ? ksm_do_scan+0xa0/0xa0
> [  127.978032]  [<ffffffff8113cbd3>] kthread+0xe3/0xf0
> [  127.978032]  [<ffffffff8113caf0>] ? __kthread_bind+0x40/0x40
> [  127.978032]  [<ffffffff83cdae7c>] ret_from_fork+0x7c/0xb0
> [  127.978032]  [<ffffffff8113caf0>] ? __kthread_bind+0x40/0x40
> [  127.978032] Code: 00 83 3d c3 2b b0 05 00 0f 85 d5 09 00 00 be f9 0b 00 00 48 c7 c7 1c d0 b2 84 89 55 88 e8 89 82 f8 ff 8b 55
> 88 e9 b9 09 00 00 90 <48> 81 3b 60 59 22 86 b8 01 00 00 00 44 0f 44 e8 41 83 fc 01 77
> [  127.978032] RIP  [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90
> [  127.978032]  RSP <ffff8800137abb78>
> [  127.978032] CR2: 0000000000000110
> [  127.978032] ---[ end trace 3dc1b0c5db8c1230 ]---
> 
> The relevant piece of code is:
> 
> 	static struct page *get_mergeable_page(struct rmap_item *rmap_item)
> 	{
> 	        struct mm_struct *mm = rmap_item->mm;
> 	        unsigned long addr = rmap_item->address;
> 	        struct vm_area_struct *vma;
> 	        struct page *page;
> 	
> 	        down_read(&mm->mmap_sem);
> 
> Where 'mm' is NULL. I'm not really sure how it happens though.

Thanks, yes, I got that, and it's not peculiar to fuzzing at all:
I'm testing the fix at the moment, but just hit something else too
(ksmd oops on NULL p->mm in task_numa_fault i.e. task_numa_placement).

For the moment, you're safer not to run KSM: configure it out or don't
set it to run.  Fixes to follow later, I'll try to remember to Cc you.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mm, ksm: NULL ptr deref in unstable_tree_search_insert
  2012-12-19  1:36 ` Hugh Dickins
@ 2012-12-19 12:16   ` Petr Holasek
  2012-12-19 17:55     ` Hugh Dickins
  2012-12-19 16:32   ` Petr Holasek
  1 sibling, 1 reply; 6+ messages in thread
From: Petr Holasek @ 2012-12-19 12:16 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Sasha Levin, Andrew Morton, linux-mm,
	linux-kernel@vger.kernel.org

On Tue, 18 Dec 2012, Hugh Dickins wrote:
> On Tue, 18 Dec 2012, Sasha Levin wrote:
> 
> > Hi all,
> > 
> > While fuzzing with trinity inside a KVM tools guest, running latest linux-next kernel, I've
> > stumbled on the following:
> > 
> > [  127.959264] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110
> > [  127.960379] IP: [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90
...

> > 88 e9 b9 09 00 00 90 <48> 81 3b 60 59 22 86 b8 01 00 00 00 44 0f 44 e8 41 83 fc 01 77
> > [  127.978032] RIP  [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90
> > [  127.978032]  RSP <ffff8800137abb78>
> > [  127.978032] CR2: 0000000000000110
> > [  127.978032] ---[ end trace 3dc1b0c5db8c1230 ]---
> > 
> > The relevant piece of code is:
> > 
> > 	static struct page *get_mergeable_page(struct rmap_item *rmap_item)
> > 	{
> > 	        struct mm_struct *mm = rmap_item->mm;
> > 	        unsigned long addr = rmap_item->address;
> > 	        struct vm_area_struct *vma;
> > 	        struct page *page;
> > 	
> > 	        down_read(&mm->mmap_sem);
> > 
> > Where 'mm' is NULL. I'm not really sure how it happens though.
> 
> Thanks, yes, I got that, and it's not peculiar to fuzzing at all:
> I'm testing the fix at the moment, but just hit something else too
> (ksmd oops on NULL p->mm in task_numa_fault i.e. task_numa_placement).
> 
> For the moment, you're safer not to run KSM: configure it out or don't
> set it to run.  Fixes to follow later, I'll try to remember to Cc you.
> 

Hello all,

I've also tried fuzzing with trinity inside of kvm guest when tested KSM
patch, but applied on top of 3.7-rc8, but didn't trigger that oops. So
going to do the same testing on linux-next.

Hugh, does it seem like bug in unstable_tree_search_insert() you mentioned
in yesterday email of something else?

Thank you for your testing && feedback!

Petr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mm, ksm: NULL ptr deref in unstable_tree_search_insert
  2012-12-19 12:16   ` Petr Holasek
@ 2012-12-19 17:55     ` Hugh Dickins
  2012-12-20  0:21       ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2012-12-19 17:55 UTC (permalink / raw)
  To: Petr Holasek
  Cc: Sasha Levin, Andrew Morton, linux-mm,
	linux-kernel@vger.kernel.org

On Wed, 19 Dec 2012, Petr Holasek wrote:
> On Tue, 18 Dec 2012, Hugh Dickins wrote:
> > On Tue, 18 Dec 2012, Sasha Levin wrote:
> > 
> > > Hi all,
> > > 
> > > While fuzzing with trinity inside a KVM tools guest, running latest linux-next kernel, I've
> > > stumbled on the following:
> > > 
> > > [  127.959264] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110
> > > [  127.960379] IP: [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90
> ...
> 
> > > 88 e9 b9 09 00 00 90 <48> 81 3b 60 59 22 86 b8 01 00 00 00 44 0f 44 e8 41 83 fc 01 77
> > > [  127.978032] RIP  [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90
> > > [  127.978032]  RSP <ffff8800137abb78>
> > > [  127.978032] CR2: 0000000000000110
> > > [  127.978032] ---[ end trace 3dc1b0c5db8c1230 ]---
> > > 
> > > The relevant piece of code is:
> > > 
> > > 	static struct page *get_mergeable_page(struct rmap_item *rmap_item)
> > > 	{
> > > 	        struct mm_struct *mm = rmap_item->mm;
> > > 	        unsigned long addr = rmap_item->address;
> > > 	        struct vm_area_struct *vma;
> > > 	        struct page *page;
> > > 	
> > > 	        down_read(&mm->mmap_sem);
> > > 
> > > Where 'mm' is NULL. I'm not really sure how it happens though.
> > 
> > Thanks, yes, I got that, and it's not peculiar to fuzzing at all:
> > I'm testing the fix at the moment, but just hit something else too
> > (ksmd oops on NULL p->mm in task_numa_fault i.e. task_numa_placement).
> > 
> > For the moment, you're safer not to run KSM: configure it out or don't
> > set it to run.  Fixes to follow later, I'll try to remember to Cc you.

Sadly I couldn't send out fixes yesterday, because my testing then
met another more subtle problem, that I've still not yet resolved.

> > 
> 
> Hello all,
> 
> I've also tried fuzzing with trinity inside of kvm guest when tested KSM
> patch, but applied on top of 3.7-rc8, but didn't trigger that oops. So
> going to do the same testing on linux-next.

I haven't tried linux-next myself, it being in a transitional state
until 3.8-rc1.  I've been testing on Linus's git from last weekend
(head at a4f1de176614f634c367e5994a7bcc428c940df0 to be precise),
plus your patch, plus my fixes.

> 
> Hugh, does it seem like bug in unstable_tree_search_insert() you mentioned
> in yesterday email of something else?

Yes, Sasha's is exactly the one I got earlier: hitting in
get_mergeable_page() due to bug in unstable_tree_search_insert().

>From your next mail, I see you've started looking into it; so I'd
better show what I have at the moment, although I do hate posting
incomplete known-buggy patches: this on top of git + your patch.

The kernel/sched/fair.c part to go to Mel and 3.8-rc1 when I've a
moment to separate it out.

The CONFIG_NUMA section in stable_tree_append() I added late
last night, but it's not enough to fix the issue I'm still seeing:
merge_across_nodes 0 and 1 cases are stable, but switching between
them can bring pages_shared down to 0 but leave something behind
in the stable_tree.  I suspect it's a wrong-nid issue, but I've
not yet tracked it down with the BUG_ONs I've been adding (not
included below).

Removing the KSM_RUN_MERGE test: unimportant, but so far as I could
see it should be unnecessary, and if it is necessary, then I think
we would need to check for another state too.

Hastily back to debugging,
Hugh

--- 3.7+git+petr/kernel/sched/fair.c	2012-12-16 16:35:08.724441527 -0800
+++ linux/kernel/sched/fair.c	2012-12-18 21:37:24.727964195 -0800
@@ -793,8 +793,11 @@ unsigned int sysctl_numa_balancing_scan_
 
 static void task_numa_placement(struct task_struct *p)
 {
-	int seq = ACCESS_ONCE(p->mm->numa_scan_seq);
+	int seq;
 
+	if (!p->mm)	/* for example, ksmd faulting in a user's mm */
+		return;
+	seq = ACCESS_ONCE(p->mm->numa_scan_seq);
 	if (p->numa_scan_seq == seq)
 		return;
 	p->numa_scan_seq = seq;
--- 3.7+git+petr/mm/ksm.c	2012-12-18 12:15:04.972032321 -0800
+++ linux/mm/ksm.c	2012-12-19 09:21:12.004004777 -0800
@@ -1151,7 +1151,6 @@ struct rmap_item *unstable_tree_search_i
 
 	nid = get_kpfn_nid(page_to_pfn(page));
 	root = &root_unstable_tree[nid];
-
 	new = &root->rb_node;
 
 	while (*new) {
@@ -1174,22 +1173,16 @@ struct rmap_item *unstable_tree_search_i
 		}
 
 		/*
-		 * When there isn't same page location, don't do anything.
-		 * If tree_page was migrated previously, it will be flushed
-		 * out and put into right unstable tree next time. If the
-		 * page was migrated in the meantime, it will be ignored
-		 * this round. When both pages were migrated to the same
-		 * node, ignore them too.
+		 * If tree_page has been migrated to another NUMA node, it
+		 * will be flushed out and put into the right unstable tree
+		 * next time: only merge with it if merge_across_nodes.
 		 * Just notice, we don't have similar problem for PageKsm
-		 * because their migration is disabled now. (62b61f611e) */
-
-#ifdef CONFIG_NUMA
-		if (page_to_nid(page) != page_to_nid(tree_page) ||
-			tree_rmap_item->nid != page_to_nid(tree_page)) {
+		 * because their migration is disabled now. (62b61f611e)
+		 */
+		if (!ksm_merge_across_nodes && page_to_nid(tree_page) != nid) {
 			put_page(tree_page);
 			return NULL;
 		}
-#endif
 
 		ret = memcmp_pages(page, tree_page);
 
@@ -1209,7 +1202,7 @@ struct rmap_item *unstable_tree_search_i
 	rmap_item->address |= UNSTABLE_FLAG;
 	rmap_item->address |= (ksm_scan.seqnr & SEQNR_MASK);
 #ifdef CONFIG_NUMA
-	rmap_item->nid = page_to_nid(page);
+	rmap_item->nid = nid;
 #endif
 	rb_link_node(&rmap_item->node, parent, new);
 	rb_insert_color(&rmap_item->node, root);
@@ -1226,6 +1219,13 @@ struct rmap_item *unstable_tree_search_i
 static void stable_tree_append(struct rmap_item *rmap_item,
 			       struct stable_node *stable_node)
 {
+#ifdef CONFIG_NUMA
+	/*
+	 * Usually rmap_item->nid is already set correctly,
+	 * but it may be wrong after switching merge_across_nodes.
+	 */
+	rmap_item->nid = get_kpfn_nid(stable_node->kpfn);
+#endif
 	rmap_item->head = stable_node;
 	rmap_item->address |= STABLE_FLAG;
 	hlist_add_head(&rmap_item->hlist, &stable_node->hlist);
@@ -1852,7 +1852,7 @@ static struct stable_node *ksm_check_sta
 	struct rb_node *node;
 	int nid;
 
-	for (nid = 0; nid < MAX_NUMNODES; nid++)
+	for (nid = 0; nid < nr_node_ids; nid++)
 		for (node = rb_first(&root_stable_tree[nid]); node;
 				node = rb_next(node)) {
 			struct stable_node *stable_node;
@@ -2030,22 +2030,15 @@ static ssize_t merge_across_nodes_store(
 		return -EINVAL;
 
 	mutex_lock(&ksm_thread_mutex);
-	if (ksm_run & KSM_RUN_MERGE) {
-		err = -EBUSY;
-	} else {
-		if (ksm_merge_across_nodes != knob) {
-			if (ksm_pages_shared > 0)
-				err = -EBUSY;
-			else
-				ksm_merge_across_nodes = knob;
-		}
+	if (ksm_merge_across_nodes != knob) {
+		if (ksm_pages_shared)
+			err = -EBUSY;
+		else
+			ksm_merge_across_nodes = knob;
 	}
-
-	if (err)
-		count = err;
 	mutex_unlock(&ksm_thread_mutex);
 
-	return count;
+	return err ? err : count;
 }
 KSM_ATTR(merge_across_nodes);
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mm, ksm: NULL ptr deref in unstable_tree_search_insert
  2012-12-19 17:55     ` Hugh Dickins
@ 2012-12-20  0:21       ` Hugh Dickins
  0 siblings, 0 replies; 6+ messages in thread
From: Hugh Dickins @ 2012-12-20  0:21 UTC (permalink / raw)
  To: Petr Holasek
  Cc: Sasha Levin, Andrew Morton, linux-mm,
	linux-kernel@vger.kernel.org

On Wed, 19 Dec 2012, Hugh Dickins wrote:
> On Wed, 19 Dec 2012, Petr Holasek wrote:
> > > 
> > > For the moment, you're safer not to run KSM: configure it out or don't
> > > set it to run.  Fixes to follow later, I'll try to remember to Cc you.
> 
> Sadly I couldn't send out fixes yesterday, because my testing then
> met another more subtle problem, that I've still not yet resolved.
> 
> > I've also tried fuzzing with trinity inside of kvm guest when tested KSM
> > patch, but applied on top of 3.7-rc8, but didn't trigger that oops. So
> > going to do the same testing on linux-next.
> 
> I haven't tried linux-next myself, it being in a transitional state
> until 3.8-rc1.  I've been testing on Linus's git from last weekend
> (head at a4f1de176614f634c367e5994a7bcc428c940df0 to be precise),
> plus your patch, plus my fixes.
> 
> > 
> > Hugh, does it seem like bug in unstable_tree_search_insert() you mentioned
> > in yesterday email of something else?
> 
> Yes, Sasha's is exactly the one I got earlier: hitting in
> get_mergeable_page() due to bug in unstable_tree_search_insert().
> 
> From your next mail, I see you've started looking into it; so I'd
> better show what I have at the moment, although I do hate posting
> incomplete known-buggy patches: this on top of git + your patch.
> 
> The kernel/sched/fair.c part to go to Mel and 3.8-rc1 when I've a
> moment to separate it out.

I'd better accelerate on that.

> 
> The CONFIG_NUMA section in stable_tree_append() I added late
> last night, but it's not enough to fix the issue I'm still seeing:
> merge_across_nodes 0 and 1 cases are stable, but switching between
> them can bring pages_shared down to 0 but leave something behind
> in the stable_tree.  I suspect it's a wrong-nid issue, but I've
> not yet tracked it down with the BUG_ONs I've been adding (not
> included below).

I half understand this now.  The lifetime of the stable_nodes is
different from that of the rmap_items hanging off them: a stable_node
has to stay around until the PageKsm page pointing to it has been freed;
which is why remove_rmap_item_from_tree() may bring ksm_pages_shared
down to 0, but even so does not call remove_node_from_stable_tree().

That means that merge_across_nodes_store() can find ksm_pages_shared
0 as it wishes, but there still be nodes in the stable_tree(s):
including nodes which are wrongly placed once ksm_merge_across_nodes
switches behaviour - nodes which will end up causing oopses (e.g.
because kpfn belongs to NUMAnode 1 but it's left in tree 0).

I say half understand, because to bring ksm_pages_shared down to 0,
of course I've been setting run to 2 (KSM_RUN_UNMERGE): that has
succeeded, so why has it not freed all the PageKsm pages?  I did
experiment this morning with extra code in merge_across_nodes_store()
to run get_ksm_page() on every stable_node remaining, but that was
not enough to free these nodes.

Ah, perhaps it's due to pages queued up on a pagevec, which need to
be drained: I hadn't thought of that until writing now, I'll try it
out tonight.  Or perhaps we've messed up the ordering versus fork
(I am testing under concurrent load).

Whatever it is, I think the solution to this would best be a separate
patch on top of yours: it is only a problem when changing the sense of
merge_across_nodes after running the other way, which is something few
people will often do, but something which ought not to be prohibited
(it would be lame for people to have to reboot when experimenting with
which way they want to set it), and ought not to cause oopses.

The answer is going to need a separate explanation.  I expect it will
involve a combination of something to improve the freeing rate (draining
pagevecs if that is effective), the loop to free residual stable_nodes,
and an -EBUSY for safety if those measures fail.  I'll experiment over
the coming days, and send in a patch once I'm satisfied.

I'm glad to see that akpm has now dropped your v5 patch from his mm tree;
but if you'd like to send him a v6 merging in my ksm.c mods from below
(ask if you need any explanations), go ahead - I think it's okay in
mmotm/next for a few days without the further fixup, but does need
further fixup before it reaches Linus (for 3.9 I presume).

Hugh

> 
> Removing the KSM_RUN_MERGE test: unimportant, but so far as I could
> see it should be unnecessary, and if it is necessary, then I think
> we would need to check for another state too.
> 
> Hastily back to debugging,
> Hugh
> 
> --- 3.7+git+petr/kernel/sched/fair.c	2012-12-16 16:35:08.724441527 -0800
> +++ linux/kernel/sched/fair.c	2012-12-18 21:37:24.727964195 -0800
> @@ -793,8 +793,11 @@ unsigned int sysctl_numa_balancing_scan_
>  
>  static void task_numa_placement(struct task_struct *p)
>  {
> -	int seq = ACCESS_ONCE(p->mm->numa_scan_seq);
> +	int seq;
>  
> +	if (!p->mm)	/* for example, ksmd faulting in a user's mm */
> +		return;
> +	seq = ACCESS_ONCE(p->mm->numa_scan_seq);
>  	if (p->numa_scan_seq == seq)
>  		return;
>  	p->numa_scan_seq = seq;
> --- 3.7+git+petr/mm/ksm.c	2012-12-18 12:15:04.972032321 -0800
> +++ linux/mm/ksm.c	2012-12-19 09:21:12.004004777 -0800
> @@ -1151,7 +1151,6 @@ struct rmap_item *unstable_tree_search_i
>  
>  	nid = get_kpfn_nid(page_to_pfn(page));
>  	root = &root_unstable_tree[nid];
> -
>  	new = &root->rb_node;
>  
>  	while (*new) {
> @@ -1174,22 +1173,16 @@ struct rmap_item *unstable_tree_search_i
>  		}
>  
>  		/*
> -		 * When there isn't same page location, don't do anything.
> -		 * If tree_page was migrated previously, it will be flushed
> -		 * out and put into right unstable tree next time. If the
> -		 * page was migrated in the meantime, it will be ignored
> -		 * this round. When both pages were migrated to the same
> -		 * node, ignore them too.
> +		 * If tree_page has been migrated to another NUMA node, it
> +		 * will be flushed out and put into the right unstable tree
> +		 * next time: only merge with it if merge_across_nodes.
>  		 * Just notice, we don't have similar problem for PageKsm
> -		 * because their migration is disabled now. (62b61f611e) */
> -
> -#ifdef CONFIG_NUMA
> -		if (page_to_nid(page) != page_to_nid(tree_page) ||
> -			tree_rmap_item->nid != page_to_nid(tree_page)) {
> +		 * because their migration is disabled now. (62b61f611e)
> +		 */
> +		if (!ksm_merge_across_nodes && page_to_nid(tree_page) != nid) {
>  			put_page(tree_page);
>  			return NULL;
>  		}
> -#endif
>  
>  		ret = memcmp_pages(page, tree_page);
>  
> @@ -1209,7 +1202,7 @@ struct rmap_item *unstable_tree_search_i
>  	rmap_item->address |= UNSTABLE_FLAG;
>  	rmap_item->address |= (ksm_scan.seqnr & SEQNR_MASK);
>  #ifdef CONFIG_NUMA
> -	rmap_item->nid = page_to_nid(page);
> +	rmap_item->nid = nid;
>  #endif
>  	rb_link_node(&rmap_item->node, parent, new);
>  	rb_insert_color(&rmap_item->node, root);
> @@ -1226,6 +1219,13 @@ struct rmap_item *unstable_tree_search_i
>  static void stable_tree_append(struct rmap_item *rmap_item,
>  			       struct stable_node *stable_node)
>  {
> +#ifdef CONFIG_NUMA
> +	/*
> +	 * Usually rmap_item->nid is already set correctly,
> +	 * but it may be wrong after switching merge_across_nodes.
> +	 */
> +	rmap_item->nid = get_kpfn_nid(stable_node->kpfn);
> +#endif
>  	rmap_item->head = stable_node;
>  	rmap_item->address |= STABLE_FLAG;
>  	hlist_add_head(&rmap_item->hlist, &stable_node->hlist);
> @@ -1852,7 +1852,7 @@ static struct stable_node *ksm_check_sta
>  	struct rb_node *node;
>  	int nid;
>  
> -	for (nid = 0; nid < MAX_NUMNODES; nid++)
> +	for (nid = 0; nid < nr_node_ids; nid++)
>  		for (node = rb_first(&root_stable_tree[nid]); node;
>  				node = rb_next(node)) {
>  			struct stable_node *stable_node;
> @@ -2030,22 +2030,15 @@ static ssize_t merge_across_nodes_store(
>  		return -EINVAL;
>  
>  	mutex_lock(&ksm_thread_mutex);
> -	if (ksm_run & KSM_RUN_MERGE) {
> -		err = -EBUSY;
> -	} else {
> -		if (ksm_merge_across_nodes != knob) {
> -			if (ksm_pages_shared > 0)
> -				err = -EBUSY;
> -			else
> -				ksm_merge_across_nodes = knob;
> -		}
> +	if (ksm_merge_across_nodes != knob) {
> +		if (ksm_pages_shared)
> +			err = -EBUSY;
> +		else
> +			ksm_merge_across_nodes = knob;
>  	}
> -
> -	if (err)
> -		count = err;
>  	mutex_unlock(&ksm_thread_mutex);
>  
> -	return count;
> +	return err ? err : count;
>  }
>  KSM_ATTR(merge_across_nodes);
>  #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mm, ksm: NULL ptr deref in unstable_tree_search_insert
  2012-12-19  1:36 ` Hugh Dickins
  2012-12-19 12:16   ` Petr Holasek
@ 2012-12-19 16:32   ` Petr Holasek
  1 sibling, 0 replies; 6+ messages in thread
From: Petr Holasek @ 2012-12-19 16:32 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Sasha Levin, Andrew Morton, linux-mm,
	linux-kernel@vger.kernel.org

On Tue, 18 Dec 2012, Hugh Dickins wrote:
> On Tue, 18 Dec 2012, Sasha Levin wrote:
> 
> > Hi all,
> > 
> > While fuzzing with trinity inside a KVM tools guest, running latest linux-next kernel, I've
> > stumbled on the following:
> > 
> > [  127.959264] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110
> > [  127.960379] IP: [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90
> > [  127.960379] PGD cc54067 PUD cc55067 PMD 0
> > [  127.960379] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > [  127.960379] Dumping ftrace buffer:
> > [  127.960379]    (ftrace buffer empty)
> > [  127.960379] CPU 0
> > [  127.960379] Pid: 3174, comm: ksmd Tainted: G        W    3.7.0-next-20121218-sasha-00023-g8e46e86 #220
> > [  127.978032] RIP: 0010:[<ffffffff81185b60>]  [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90
> > [  127.978032] RSP: 0018:ffff8800137abb78  EFLAGS: 00010046
> > [  127.978032] RAX: 0000000000000086 RBX: 0000000000000110 RCX: 0000000000000001
> > [  127.978032] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000110
> > [  127.978032] RBP: ffff8800137abc18 R08: 0000000000000002 R09: 0000000000000000
> > [  127.978032] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
> > [  127.978032] R13: 0000000000000002 R14: ffff8800137b0000 R15: 0000000000000000
> > [  127.978032] FS:  0000000000000000(0000) GS:ffff8800bfc00000(0000) knlGS:0000000000000000
> > [  127.978032] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  127.978032] CR2: 0000000000000110 CR3: 000000000cc51000 CR4: 00000000000406f0
> > [  127.978032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  127.978032] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [  127.978032] Process ksmd (pid: 3174, threadinfo ffff8800137aa000, task ffff8800137b0000)
> > [  127.978032] Stack:
> > [  127.978032]  ffff8800137abba8 ffffffff863d8b50 ffff8800137b0948 ffffffff863d8b50
> > [  127.978032]  ffff8800137abbb8 ffffffff81180a12 ffff8800137abbb8 ffffffff81180a9e
> > [  127.978032]  ffff8800137abbe8 ffffffff8118108e ffff8800137abc18 0000000000000000
> > [  127.978032] Call Trace:
> > [  127.978032]  [<ffffffff81180a12>] ? get_lock_stats+0x22/0x70
> > [  127.978032]  [<ffffffff81180a9e>] ? put_lock_stats.isra.16+0xe/0x40
> > [  127.978032]  [<ffffffff8118108e>] ? lock_release_holdtime+0x11e/0x130
> > [  127.978032]  [<ffffffff811889aa>] lock_acquire+0x1ca/0x270
> > [  127.978032]  [<ffffffff8125992f>] ? unstable_tree_search_insert+0x9f/0x260
> > [  127.978032]  [<ffffffff83cd7337>] down_read+0x47/0x90
> > [  127.978032]  [<ffffffff8125992f>] ? unstable_tree_search_insert+0x9f/0x260
> > [  127.978032]  [<ffffffff8125992f>] unstable_tree_search_insert+0x9f/0x260
> > [  127.978032]  [<ffffffff8125af27>] cmp_and_merge_page+0xe7/0x1e0
> > [  127.978032]  [<ffffffff8125b085>] ksm_do_scan+0x65/0xa0
> > [  127.978032]  [<ffffffff8125b12f>] ksm_scan_thread+0x6f/0x2d0
> > [  127.978032]  [<ffffffff8113de40>] ? abort_exclusive_wait+0xb0/0xb0
> > [  127.978032]  [<ffffffff8125b0c0>] ? ksm_do_scan+0xa0/0xa0
> > [  127.978032]  [<ffffffff8113cbd3>] kthread+0xe3/0xf0
> > [  127.978032]  [<ffffffff8113caf0>] ? __kthread_bind+0x40/0x40
> > [  127.978032]  [<ffffffff83cdae7c>] ret_from_fork+0x7c/0xb0
> > [  127.978032]  [<ffffffff8113caf0>] ? __kthread_bind+0x40/0x40
> > [  127.978032] Code: 00 83 3d c3 2b b0 05 00 0f 85 d5 09 00 00 be f9 0b 00 00 48 c7 c7 1c d0 b2 84 89 55 88 e8 89 82 f8 ff 8b 55
> > 88 e9 b9 09 00 00 90 <48> 81 3b 60 59 22 86 b8 01 00 00 00 44 0f 44 e8 41 83 fc 01 77
> > [  127.978032] RIP  [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90
> > [  127.978032]  RSP <ffff8800137abb78>
> > [  127.978032] CR2: 0000000000000110
> > [  127.978032] ---[ end trace 3dc1b0c5db8c1230 ]---
> > 
> > The relevant piece of code is:
> > 
> > 	static struct page *get_mergeable_page(struct rmap_item *rmap_item)
> > 	{
> > 	        struct mm_struct *mm = rmap_item->mm;
> > 	        unsigned long addr = rmap_item->address;
> > 	        struct vm_area_struct *vma;
> > 	        struct page *page;
> > 	
> > 	        down_read(&mm->mmap_sem);
> > 
> > Where 'mm' is NULL. I'm not really sure how it happens though.
> 
> Thanks, yes, I got that, and it's not peculiar to fuzzing at all:
> I'm testing the fix at the moment, but just hit something else too
> (ksmd oops on NULL p->mm in task_numa_fault i.e. task_numa_placement).
> 
> For the moment, you're safer not to run KSM: configure it out or don't
> set it to run.  Fixes to follow later, I'll try to remember to Cc you.
> 

Thanks to trinity inside of KVM guest, I've reproduced it too.

[ 1193.299397] Call Trace:
[ 1193.328506]  [<ffffffff811785c7>] ksm_scan_thread+0x967/0xd70
[ 1193.397097]  [<ffffffff810818d0>] ? wake_up_bit+0x40/0x40
[ 1193.461528]  [<ffffffff81177c60>] ? run_store+0x2b0/0x2b0
[ 1193.525962]  [<ffffffff81080fb0>] kthread+0xc0/0xd0
[ 1193.584157]  [<ffffffff81080ef0>] ? kthread_create_on_node+0x120/0x120
[ 1193.662101]  [<ffffffff8165c3ac>] ret_from_fork+0x7c/0xb0
[ 1193.726535]  [<ffffffff81080ef0>] ? kthread_create_on_node+0x120/0x120
[ 1193.804475] Code: fe 4a cc ff 48 83 c4 08 5b 5d c3 0f 1f 80 00 00 00 00 66
66 66 66 90 55 48 89 e5 53 48 89 fb 48 83 ec 08 e8 9a 0a 00 00 48 89 d8 <f0>
48 ff 00 79 05 e8 9c 4a cc ff 48 83 c4 08 5b 5d c3 55 48 89 
[ 1194.030380] RIP  [<ffffffff81651ed9>] down_read+0x19/0x2b
[ 1194.094816]  RSP <ffff880122a73de8>
[ 1194.136385] CR2: 00007fb4c6e01268
[ 1194.176280] ---[ end trace 17dda1cb9a62bc36 ]---

With enabled CONFIG_NUMA_BALANCING this one, but not sure if we should use
new numasched code or ksm:

[ 4706.859796] Call Trace:
[ 4706.888904]  [<ffffffff811577d5>] do_numa_page+0xe5/0x130
[ 4706.953335]  [<ffffffff81157a79>] handle_pte_fault+0x259/0xa50
[ 4707.023012]  [<ffffffffa008a025>] ? kvm_set_spte_hva+0x25/0x30 [kvm]
[ 4707.098878]  [<ffffffff8115903e>] handle_mm_fault+0x26e/0x660
[ 4707.167470]  [<ffffffff811775a2>] ?
__mmu_notifier_invalidate_range_end+0x72/0x90
[ 4707.256850]  [<ffffffff8112e68e>] ? unlock_page+0x2e/0x40
[ 4707.321283]  [<ffffffff811779d5>] break_ksm+0x75/0xa0
[ 4707.381560]  [<ffffffff81177c0d>] break_cow+0x5d/0x80
[ 4707.441833]  [<ffffffff811794c7>] ksm_scan_thread+0xc87/0xd70
[ 4707.510427]  [<ffffffff810818e0>] ? wake_up_bit+0x40/0x40
[ 4707.574860]  [<ffffffff81178840>] ? run_store+0x2b0/0x2b0
[ 4707.639294]  [<ffffffff81080fc0>] kthread+0xc0/0xd0
[ 4707.697489]  [<ffffffff81080f00>] ? kthread_create_on_node+0x120/0x120
[ 4707.775435]  [<ffffffff8165d8ec>] ret_from_fork+0x7c/0xb0
[ 4707.839869]  [<ffffffff81080f00>] ? kthread_create_on_node+0x120/0x120
[ 4707.917806] Code: f8 65 48 8b 1c 25 40 c7 00 00 e9 11 00 00 00 48 8b 5d e8
4c 8b 65 f0 4c 8b 6d f8 c9 c3 0f 1f 00 84 d2 74 2c 48 8b 83 98 02 00 00 <8b>
80 68 03 00 00 3b 83 94 07 00 00 74 d6 89 83 94 07 00 00 48 
[ 4708.143711] RIP  [<ffffffff81098db3>] task_numa_fault+0x43/0xa0
[ 4708.214389]  RSP <ffff880122a1fbc8>
[ 4708.255957] CR2: 0000000000000368
[ 4708.295722] ---[ end trace 5ffe704785995d40 ]---

I am starting looking into it.

thanks!
Petr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-12-20  0:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-19  1:17 mm, ksm: NULL ptr deref in unstable_tree_search_insert Sasha Levin
2012-12-19  1:36 ` Hugh Dickins
2012-12-19 12:16   ` Petr Holasek
2012-12-19 17:55     ` Hugh Dickins
2012-12-20  0:21       ` Hugh Dickins
2012-12-19 16:32   ` Petr Holasek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).