* mm, ksm: NULL ptr deref in unstable_tree_search_insert @ 2012-12-19 1:17 Sasha Levin 2012-12-19 1:36 ` Hugh Dickins 0 siblings, 1 reply; 6+ messages in thread From: Sasha Levin @ 2012-12-19 1:17 UTC (permalink / raw) To: Andrew Morton, Hugh Dickins; +Cc: linux-mm, linux-kernel@vger.kernel.org Hi all, While fuzzing with trinity inside a KVM tools guest, running latest linux-next kernel, I've stumbled on the following: [ 127.959264] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110 [ 127.960379] IP: [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90 [ 127.960379] PGD cc54067 PUD cc55067 PMD 0 [ 127.960379] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 127.960379] Dumping ftrace buffer: [ 127.960379] (ftrace buffer empty) [ 127.960379] CPU 0 [ 127.960379] Pid: 3174, comm: ksmd Tainted: G W 3.7.0-next-20121218-sasha-00023-g8e46e86 #220 [ 127.978032] RIP: 0010:[<ffffffff81185b60>] [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90 [ 127.978032] RSP: 0018:ffff8800137abb78 EFLAGS: 00010046 [ 127.978032] RAX: 0000000000000086 RBX: 0000000000000110 RCX: 0000000000000001 [ 127.978032] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000110 [ 127.978032] RBP: ffff8800137abc18 R08: 0000000000000002 R09: 0000000000000000 [ 127.978032] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [ 127.978032] R13: 0000000000000002 R14: ffff8800137b0000 R15: 0000000000000000 [ 127.978032] FS: 0000000000000000(0000) GS:ffff8800bfc00000(0000) knlGS:0000000000000000 [ 127.978032] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 127.978032] CR2: 0000000000000110 CR3: 000000000cc51000 CR4: 00000000000406f0 [ 127.978032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 127.978032] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 127.978032] Process ksmd (pid: 3174, threadinfo ffff8800137aa000, task ffff8800137b0000) [ 127.978032] Stack: [ 127.978032] ffff8800137abba8 ffffffff863d8b50 ffff8800137b0948 ffffffff863d8b50 [ 127.978032] ffff8800137abbb8 ffffffff81180a12 ffff8800137abbb8 ffffffff81180a9e [ 127.978032] ffff8800137abbe8 ffffffff8118108e ffff8800137abc18 0000000000000000 [ 127.978032] Call Trace: [ 127.978032] [<ffffffff81180a12>] ? get_lock_stats+0x22/0x70 [ 127.978032] [<ffffffff81180a9e>] ? put_lock_stats.isra.16+0xe/0x40 [ 127.978032] [<ffffffff8118108e>] ? lock_release_holdtime+0x11e/0x130 [ 127.978032] [<ffffffff811889aa>] lock_acquire+0x1ca/0x270 [ 127.978032] [<ffffffff8125992f>] ? unstable_tree_search_insert+0x9f/0x260 [ 127.978032] [<ffffffff83cd7337>] down_read+0x47/0x90 [ 127.978032] [<ffffffff8125992f>] ? unstable_tree_search_insert+0x9f/0x260 [ 127.978032] [<ffffffff8125992f>] unstable_tree_search_insert+0x9f/0x260 [ 127.978032] [<ffffffff8125af27>] cmp_and_merge_page+0xe7/0x1e0 [ 127.978032] [<ffffffff8125b085>] ksm_do_scan+0x65/0xa0 [ 127.978032] [<ffffffff8125b12f>] ksm_scan_thread+0x6f/0x2d0 [ 127.978032] [<ffffffff8113de40>] ? abort_exclusive_wait+0xb0/0xb0 [ 127.978032] [<ffffffff8125b0c0>] ? ksm_do_scan+0xa0/0xa0 [ 127.978032] [<ffffffff8113cbd3>] kthread+0xe3/0xf0 [ 127.978032] [<ffffffff8113caf0>] ? __kthread_bind+0x40/0x40 [ 127.978032] [<ffffffff83cdae7c>] ret_from_fork+0x7c/0xb0 [ 127.978032] [<ffffffff8113caf0>] ? __kthread_bind+0x40/0x40 [ 127.978032] Code: 00 83 3d c3 2b b0 05 00 0f 85 d5 09 00 00 be f9 0b 00 00 48 c7 c7 1c d0 b2 84 89 55 88 e8 89 82 f8 ff 8b 55 88 e9 b9 09 00 00 90 <48> 81 3b 60 59 22 86 b8 01 00 00 00 44 0f 44 e8 41 83 fc 01 77 [ 127.978032] RIP [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90 [ 127.978032] RSP <ffff8800137abb78> [ 127.978032] CR2: 0000000000000110 [ 127.978032] ---[ end trace 3dc1b0c5db8c1230 ]--- The relevant piece of code is: static struct page *get_mergeable_page(struct rmap_item *rmap_item) { struct mm_struct *mm = rmap_item->mm; unsigned long addr = rmap_item->address; struct vm_area_struct *vma; struct page *page; down_read(&mm->mmap_sem); Where 'mm' is NULL. I'm not really sure how it happens though. Thanks, Sasha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mm, ksm: NULL ptr deref in unstable_tree_search_insert 2012-12-19 1:17 mm, ksm: NULL ptr deref in unstable_tree_search_insert Sasha Levin @ 2012-12-19 1:36 ` Hugh Dickins 2012-12-19 12:16 ` Petr Holasek 2012-12-19 16:32 ` Petr Holasek 0 siblings, 2 replies; 6+ messages in thread From: Hugh Dickins @ 2012-12-19 1:36 UTC (permalink / raw) To: Sasha Levin Cc: Andrew Morton, Petr Holasek, linux-mm, linux-kernel@vger.kernel.org On Tue, 18 Dec 2012, Sasha Levin wrote: > Hi all, > > While fuzzing with trinity inside a KVM tools guest, running latest linux-next kernel, I've > stumbled on the following: > > [ 127.959264] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110 > [ 127.960379] IP: [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90 > [ 127.960379] PGD cc54067 PUD cc55067 PMD 0 > [ 127.960379] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > [ 127.960379] Dumping ftrace buffer: > [ 127.960379] (ftrace buffer empty) > [ 127.960379] CPU 0 > [ 127.960379] Pid: 3174, comm: ksmd Tainted: G W 3.7.0-next-20121218-sasha-00023-g8e46e86 #220 > [ 127.978032] RIP: 0010:[<ffffffff81185b60>] [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90 > [ 127.978032] RSP: 0018:ffff8800137abb78 EFLAGS: 00010046 > [ 127.978032] RAX: 0000000000000086 RBX: 0000000000000110 RCX: 0000000000000001 > [ 127.978032] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000110 > [ 127.978032] RBP: ffff8800137abc18 R08: 0000000000000002 R09: 0000000000000000 > [ 127.978032] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 > [ 127.978032] R13: 0000000000000002 R14: ffff8800137b0000 R15: 0000000000000000 > [ 127.978032] FS: 0000000000000000(0000) GS:ffff8800bfc00000(0000) knlGS:0000000000000000 > [ 127.978032] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 127.978032] CR2: 0000000000000110 CR3: 000000000cc51000 CR4: 00000000000406f0 > [ 127.978032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 127.978032] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 127.978032] Process ksmd (pid: 3174, threadinfo ffff8800137aa000, task ffff8800137b0000) > [ 127.978032] Stack: > [ 127.978032] ffff8800137abba8 ffffffff863d8b50 ffff8800137b0948 ffffffff863d8b50 > [ 127.978032] ffff8800137abbb8 ffffffff81180a12 ffff8800137abbb8 ffffffff81180a9e > [ 127.978032] ffff8800137abbe8 ffffffff8118108e ffff8800137abc18 0000000000000000 > [ 127.978032] Call Trace: > [ 127.978032] [<ffffffff81180a12>] ? get_lock_stats+0x22/0x70 > [ 127.978032] [<ffffffff81180a9e>] ? put_lock_stats.isra.16+0xe/0x40 > [ 127.978032] [<ffffffff8118108e>] ? lock_release_holdtime+0x11e/0x130 > [ 127.978032] [<ffffffff811889aa>] lock_acquire+0x1ca/0x270 > [ 127.978032] [<ffffffff8125992f>] ? unstable_tree_search_insert+0x9f/0x260 > [ 127.978032] [<ffffffff83cd7337>] down_read+0x47/0x90 > [ 127.978032] [<ffffffff8125992f>] ? unstable_tree_search_insert+0x9f/0x260 > [ 127.978032] [<ffffffff8125992f>] unstable_tree_search_insert+0x9f/0x260 > [ 127.978032] [<ffffffff8125af27>] cmp_and_merge_page+0xe7/0x1e0 > [ 127.978032] [<ffffffff8125b085>] ksm_do_scan+0x65/0xa0 > [ 127.978032] [<ffffffff8125b12f>] ksm_scan_thread+0x6f/0x2d0 > [ 127.978032] [<ffffffff8113de40>] ? abort_exclusive_wait+0xb0/0xb0 > [ 127.978032] [<ffffffff8125b0c0>] ? ksm_do_scan+0xa0/0xa0 > [ 127.978032] [<ffffffff8113cbd3>] kthread+0xe3/0xf0 > [ 127.978032] [<ffffffff8113caf0>] ? __kthread_bind+0x40/0x40 > [ 127.978032] [<ffffffff83cdae7c>] ret_from_fork+0x7c/0xb0 > [ 127.978032] [<ffffffff8113caf0>] ? __kthread_bind+0x40/0x40 > [ 127.978032] Code: 00 83 3d c3 2b b0 05 00 0f 85 d5 09 00 00 be f9 0b 00 00 48 c7 c7 1c d0 b2 84 89 55 88 e8 89 82 f8 ff 8b 55 > 88 e9 b9 09 00 00 90 <48> 81 3b 60 59 22 86 b8 01 00 00 00 44 0f 44 e8 41 83 fc 01 77 > [ 127.978032] RIP [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90 > [ 127.978032] RSP <ffff8800137abb78> > [ 127.978032] CR2: 0000000000000110 > [ 127.978032] ---[ end trace 3dc1b0c5db8c1230 ]--- > > The relevant piece of code is: > > static struct page *get_mergeable_page(struct rmap_item *rmap_item) > { > struct mm_struct *mm = rmap_item->mm; > unsigned long addr = rmap_item->address; > struct vm_area_struct *vma; > struct page *page; > > down_read(&mm->mmap_sem); > > Where 'mm' is NULL. I'm not really sure how it happens though. Thanks, yes, I got that, and it's not peculiar to fuzzing at all: I'm testing the fix at the moment, but just hit something else too (ksmd oops on NULL p->mm in task_numa_fault i.e. task_numa_placement). For the moment, you're safer not to run KSM: configure it out or don't set it to run. Fixes to follow later, I'll try to remember to Cc you. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mm, ksm: NULL ptr deref in unstable_tree_search_insert 2012-12-19 1:36 ` Hugh Dickins @ 2012-12-19 12:16 ` Petr Holasek 2012-12-19 17:55 ` Hugh Dickins 2012-12-19 16:32 ` Petr Holasek 1 sibling, 1 reply; 6+ messages in thread From: Petr Holasek @ 2012-12-19 12:16 UTC (permalink / raw) To: Hugh Dickins Cc: Sasha Levin, Andrew Morton, linux-mm, linux-kernel@vger.kernel.org On Tue, 18 Dec 2012, Hugh Dickins wrote: > On Tue, 18 Dec 2012, Sasha Levin wrote: > > > Hi all, > > > > While fuzzing with trinity inside a KVM tools guest, running latest linux-next kernel, I've > > stumbled on the following: > > > > [ 127.959264] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110 > > [ 127.960379] IP: [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90 ... > > 88 e9 b9 09 00 00 90 <48> 81 3b 60 59 22 86 b8 01 00 00 00 44 0f 44 e8 41 83 fc 01 77 > > [ 127.978032] RIP [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90 > > [ 127.978032] RSP <ffff8800137abb78> > > [ 127.978032] CR2: 0000000000000110 > > [ 127.978032] ---[ end trace 3dc1b0c5db8c1230 ]--- > > > > The relevant piece of code is: > > > > static struct page *get_mergeable_page(struct rmap_item *rmap_item) > > { > > struct mm_struct *mm = rmap_item->mm; > > unsigned long addr = rmap_item->address; > > struct vm_area_struct *vma; > > struct page *page; > > > > down_read(&mm->mmap_sem); > > > > Where 'mm' is NULL. I'm not really sure how it happens though. > > Thanks, yes, I got that, and it's not peculiar to fuzzing at all: > I'm testing the fix at the moment, but just hit something else too > (ksmd oops on NULL p->mm in task_numa_fault i.e. task_numa_placement). > > For the moment, you're safer not to run KSM: configure it out or don't > set it to run. Fixes to follow later, I'll try to remember to Cc you. > Hello all, I've also tried fuzzing with trinity inside of kvm guest when tested KSM patch, but applied on top of 3.7-rc8, but didn't trigger that oops. So going to do the same testing on linux-next. Hugh, does it seem like bug in unstable_tree_search_insert() you mentioned in yesterday email of something else? Thank you for your testing && feedback! Petr -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mm, ksm: NULL ptr deref in unstable_tree_search_insert 2012-12-19 12:16 ` Petr Holasek @ 2012-12-19 17:55 ` Hugh Dickins 2012-12-20 0:21 ` Hugh Dickins 0 siblings, 1 reply; 6+ messages in thread From: Hugh Dickins @ 2012-12-19 17:55 UTC (permalink / raw) To: Petr Holasek Cc: Sasha Levin, Andrew Morton, linux-mm, linux-kernel@vger.kernel.org On Wed, 19 Dec 2012, Petr Holasek wrote: > On Tue, 18 Dec 2012, Hugh Dickins wrote: > > On Tue, 18 Dec 2012, Sasha Levin wrote: > > > > > Hi all, > > > > > > While fuzzing with trinity inside a KVM tools guest, running latest linux-next kernel, I've > > > stumbled on the following: > > > > > > [ 127.959264] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110 > > > [ 127.960379] IP: [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90 > ... > > > > 88 e9 b9 09 00 00 90 <48> 81 3b 60 59 22 86 b8 01 00 00 00 44 0f 44 e8 41 83 fc 01 77 > > > [ 127.978032] RIP [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90 > > > [ 127.978032] RSP <ffff8800137abb78> > > > [ 127.978032] CR2: 0000000000000110 > > > [ 127.978032] ---[ end trace 3dc1b0c5db8c1230 ]--- > > > > > > The relevant piece of code is: > > > > > > static struct page *get_mergeable_page(struct rmap_item *rmap_item) > > > { > > > struct mm_struct *mm = rmap_item->mm; > > > unsigned long addr = rmap_item->address; > > > struct vm_area_struct *vma; > > > struct page *page; > > > > > > down_read(&mm->mmap_sem); > > > > > > Where 'mm' is NULL. I'm not really sure how it happens though. > > > > Thanks, yes, I got that, and it's not peculiar to fuzzing at all: > > I'm testing the fix at the moment, but just hit something else too > > (ksmd oops on NULL p->mm in task_numa_fault i.e. task_numa_placement). > > > > For the moment, you're safer not to run KSM: configure it out or don't > > set it to run. Fixes to follow later, I'll try to remember to Cc you. Sadly I couldn't send out fixes yesterday, because my testing then met another more subtle problem, that I've still not yet resolved. > > > > Hello all, > > I've also tried fuzzing with trinity inside of kvm guest when tested KSM > patch, but applied on top of 3.7-rc8, but didn't trigger that oops. So > going to do the same testing on linux-next. I haven't tried linux-next myself, it being in a transitional state until 3.8-rc1. I've been testing on Linus's git from last weekend (head at a4f1de176614f634c367e5994a7bcc428c940df0 to be precise), plus your patch, plus my fixes. > > Hugh, does it seem like bug in unstable_tree_search_insert() you mentioned > in yesterday email of something else? Yes, Sasha's is exactly the one I got earlier: hitting in get_mergeable_page() due to bug in unstable_tree_search_insert(). >From your next mail, I see you've started looking into it; so I'd better show what I have at the moment, although I do hate posting incomplete known-buggy patches: this on top of git + your patch. The kernel/sched/fair.c part to go to Mel and 3.8-rc1 when I've a moment to separate it out. The CONFIG_NUMA section in stable_tree_append() I added late last night, but it's not enough to fix the issue I'm still seeing: merge_across_nodes 0 and 1 cases are stable, but switching between them can bring pages_shared down to 0 but leave something behind in the stable_tree. I suspect it's a wrong-nid issue, but I've not yet tracked it down with the BUG_ONs I've been adding (not included below). Removing the KSM_RUN_MERGE test: unimportant, but so far as I could see it should be unnecessary, and if it is necessary, then I think we would need to check for another state too. Hastily back to debugging, Hugh --- 3.7+git+petr/kernel/sched/fair.c 2012-12-16 16:35:08.724441527 -0800 +++ linux/kernel/sched/fair.c 2012-12-18 21:37:24.727964195 -0800 @@ -793,8 +793,11 @@ unsigned int sysctl_numa_balancing_scan_ static void task_numa_placement(struct task_struct *p) { - int seq = ACCESS_ONCE(p->mm->numa_scan_seq); + int seq; + if (!p->mm) /* for example, ksmd faulting in a user's mm */ + return; + seq = ACCESS_ONCE(p->mm->numa_scan_seq); if (p->numa_scan_seq == seq) return; p->numa_scan_seq = seq; --- 3.7+git+petr/mm/ksm.c 2012-12-18 12:15:04.972032321 -0800 +++ linux/mm/ksm.c 2012-12-19 09:21:12.004004777 -0800 @@ -1151,7 +1151,6 @@ struct rmap_item *unstable_tree_search_i nid = get_kpfn_nid(page_to_pfn(page)); root = &root_unstable_tree[nid]; - new = &root->rb_node; while (*new) { @@ -1174,22 +1173,16 @@ struct rmap_item *unstable_tree_search_i } /* - * When there isn't same page location, don't do anything. - * If tree_page was migrated previously, it will be flushed - * out and put into right unstable tree next time. If the - * page was migrated in the meantime, it will be ignored - * this round. When both pages were migrated to the same - * node, ignore them too. + * If tree_page has been migrated to another NUMA node, it + * will be flushed out and put into the right unstable tree + * next time: only merge with it if merge_across_nodes. * Just notice, we don't have similar problem for PageKsm - * because their migration is disabled now. (62b61f611e) */ - -#ifdef CONFIG_NUMA - if (page_to_nid(page) != page_to_nid(tree_page) || - tree_rmap_item->nid != page_to_nid(tree_page)) { + * because their migration is disabled now. (62b61f611e) + */ + if (!ksm_merge_across_nodes && page_to_nid(tree_page) != nid) { put_page(tree_page); return NULL; } -#endif ret = memcmp_pages(page, tree_page); @@ -1209,7 +1202,7 @@ struct rmap_item *unstable_tree_search_i rmap_item->address |= UNSTABLE_FLAG; rmap_item->address |= (ksm_scan.seqnr & SEQNR_MASK); #ifdef CONFIG_NUMA - rmap_item->nid = page_to_nid(page); + rmap_item->nid = nid; #endif rb_link_node(&rmap_item->node, parent, new); rb_insert_color(&rmap_item->node, root); @@ -1226,6 +1219,13 @@ struct rmap_item *unstable_tree_search_i static void stable_tree_append(struct rmap_item *rmap_item, struct stable_node *stable_node) { +#ifdef CONFIG_NUMA + /* + * Usually rmap_item->nid is already set correctly, + * but it may be wrong after switching merge_across_nodes. + */ + rmap_item->nid = get_kpfn_nid(stable_node->kpfn); +#endif rmap_item->head = stable_node; rmap_item->address |= STABLE_FLAG; hlist_add_head(&rmap_item->hlist, &stable_node->hlist); @@ -1852,7 +1852,7 @@ static struct stable_node *ksm_check_sta struct rb_node *node; int nid; - for (nid = 0; nid < MAX_NUMNODES; nid++) + for (nid = 0; nid < nr_node_ids; nid++) for (node = rb_first(&root_stable_tree[nid]); node; node = rb_next(node)) { struct stable_node *stable_node; @@ -2030,22 +2030,15 @@ static ssize_t merge_across_nodes_store( return -EINVAL; mutex_lock(&ksm_thread_mutex); - if (ksm_run & KSM_RUN_MERGE) { - err = -EBUSY; - } else { - if (ksm_merge_across_nodes != knob) { - if (ksm_pages_shared > 0) - err = -EBUSY; - else - ksm_merge_across_nodes = knob; - } + if (ksm_merge_across_nodes != knob) { + if (ksm_pages_shared) + err = -EBUSY; + else + ksm_merge_across_nodes = knob; } - - if (err) - count = err; mutex_unlock(&ksm_thread_mutex); - return count; + return err ? err : count; } KSM_ATTR(merge_across_nodes); #endif -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mm, ksm: NULL ptr deref in unstable_tree_search_insert 2012-12-19 17:55 ` Hugh Dickins @ 2012-12-20 0:21 ` Hugh Dickins 0 siblings, 0 replies; 6+ messages in thread From: Hugh Dickins @ 2012-12-20 0:21 UTC (permalink / raw) To: Petr Holasek Cc: Sasha Levin, Andrew Morton, linux-mm, linux-kernel@vger.kernel.org On Wed, 19 Dec 2012, Hugh Dickins wrote: > On Wed, 19 Dec 2012, Petr Holasek wrote: > > > > > > For the moment, you're safer not to run KSM: configure it out or don't > > > set it to run. Fixes to follow later, I'll try to remember to Cc you. > > Sadly I couldn't send out fixes yesterday, because my testing then > met another more subtle problem, that I've still not yet resolved. > > > I've also tried fuzzing with trinity inside of kvm guest when tested KSM > > patch, but applied on top of 3.7-rc8, but didn't trigger that oops. So > > going to do the same testing on linux-next. > > I haven't tried linux-next myself, it being in a transitional state > until 3.8-rc1. I've been testing on Linus's git from last weekend > (head at a4f1de176614f634c367e5994a7bcc428c940df0 to be precise), > plus your patch, plus my fixes. > > > > > Hugh, does it seem like bug in unstable_tree_search_insert() you mentioned > > in yesterday email of something else? > > Yes, Sasha's is exactly the one I got earlier: hitting in > get_mergeable_page() due to bug in unstable_tree_search_insert(). > > From your next mail, I see you've started looking into it; so I'd > better show what I have at the moment, although I do hate posting > incomplete known-buggy patches: this on top of git + your patch. > > The kernel/sched/fair.c part to go to Mel and 3.8-rc1 when I've a > moment to separate it out. I'd better accelerate on that. > > The CONFIG_NUMA section in stable_tree_append() I added late > last night, but it's not enough to fix the issue I'm still seeing: > merge_across_nodes 0 and 1 cases are stable, but switching between > them can bring pages_shared down to 0 but leave something behind > in the stable_tree. I suspect it's a wrong-nid issue, but I've > not yet tracked it down with the BUG_ONs I've been adding (not > included below). I half understand this now. The lifetime of the stable_nodes is different from that of the rmap_items hanging off them: a stable_node has to stay around until the PageKsm page pointing to it has been freed; which is why remove_rmap_item_from_tree() may bring ksm_pages_shared down to 0, but even so does not call remove_node_from_stable_tree(). That means that merge_across_nodes_store() can find ksm_pages_shared 0 as it wishes, but there still be nodes in the stable_tree(s): including nodes which are wrongly placed once ksm_merge_across_nodes switches behaviour - nodes which will end up causing oopses (e.g. because kpfn belongs to NUMAnode 1 but it's left in tree 0). I say half understand, because to bring ksm_pages_shared down to 0, of course I've been setting run to 2 (KSM_RUN_UNMERGE): that has succeeded, so why has it not freed all the PageKsm pages? I did experiment this morning with extra code in merge_across_nodes_store() to run get_ksm_page() on every stable_node remaining, but that was not enough to free these nodes. Ah, perhaps it's due to pages queued up on a pagevec, which need to be drained: I hadn't thought of that until writing now, I'll try it out tonight. Or perhaps we've messed up the ordering versus fork (I am testing under concurrent load). Whatever it is, I think the solution to this would best be a separate patch on top of yours: it is only a problem when changing the sense of merge_across_nodes after running the other way, which is something few people will often do, but something which ought not to be prohibited (it would be lame for people to have to reboot when experimenting with which way they want to set it), and ought not to cause oopses. The answer is going to need a separate explanation. I expect it will involve a combination of something to improve the freeing rate (draining pagevecs if that is effective), the loop to free residual stable_nodes, and an -EBUSY for safety if those measures fail. I'll experiment over the coming days, and send in a patch once I'm satisfied. I'm glad to see that akpm has now dropped your v5 patch from his mm tree; but if you'd like to send him a v6 merging in my ksm.c mods from below (ask if you need any explanations), go ahead - I think it's okay in mmotm/next for a few days without the further fixup, but does need further fixup before it reaches Linus (for 3.9 I presume). Hugh > > Removing the KSM_RUN_MERGE test: unimportant, but so far as I could > see it should be unnecessary, and if it is necessary, then I think > we would need to check for another state too. > > Hastily back to debugging, > Hugh > > --- 3.7+git+petr/kernel/sched/fair.c 2012-12-16 16:35:08.724441527 -0800 > +++ linux/kernel/sched/fair.c 2012-12-18 21:37:24.727964195 -0800 > @@ -793,8 +793,11 @@ unsigned int sysctl_numa_balancing_scan_ > > static void task_numa_placement(struct task_struct *p) > { > - int seq = ACCESS_ONCE(p->mm->numa_scan_seq); > + int seq; > > + if (!p->mm) /* for example, ksmd faulting in a user's mm */ > + return; > + seq = ACCESS_ONCE(p->mm->numa_scan_seq); > if (p->numa_scan_seq == seq) > return; > p->numa_scan_seq = seq; > --- 3.7+git+petr/mm/ksm.c 2012-12-18 12:15:04.972032321 -0800 > +++ linux/mm/ksm.c 2012-12-19 09:21:12.004004777 -0800 > @@ -1151,7 +1151,6 @@ struct rmap_item *unstable_tree_search_i > > nid = get_kpfn_nid(page_to_pfn(page)); > root = &root_unstable_tree[nid]; > - > new = &root->rb_node; > > while (*new) { > @@ -1174,22 +1173,16 @@ struct rmap_item *unstable_tree_search_i > } > > /* > - * When there isn't same page location, don't do anything. > - * If tree_page was migrated previously, it will be flushed > - * out and put into right unstable tree next time. If the > - * page was migrated in the meantime, it will be ignored > - * this round. When both pages were migrated to the same > - * node, ignore them too. > + * If tree_page has been migrated to another NUMA node, it > + * will be flushed out and put into the right unstable tree > + * next time: only merge with it if merge_across_nodes. > * Just notice, we don't have similar problem for PageKsm > - * because their migration is disabled now. (62b61f611e) */ > - > -#ifdef CONFIG_NUMA > - if (page_to_nid(page) != page_to_nid(tree_page) || > - tree_rmap_item->nid != page_to_nid(tree_page)) { > + * because their migration is disabled now. (62b61f611e) > + */ > + if (!ksm_merge_across_nodes && page_to_nid(tree_page) != nid) { > put_page(tree_page); > return NULL; > } > -#endif > > ret = memcmp_pages(page, tree_page); > > @@ -1209,7 +1202,7 @@ struct rmap_item *unstable_tree_search_i > rmap_item->address |= UNSTABLE_FLAG; > rmap_item->address |= (ksm_scan.seqnr & SEQNR_MASK); > #ifdef CONFIG_NUMA > - rmap_item->nid = page_to_nid(page); > + rmap_item->nid = nid; > #endif > rb_link_node(&rmap_item->node, parent, new); > rb_insert_color(&rmap_item->node, root); > @@ -1226,6 +1219,13 @@ struct rmap_item *unstable_tree_search_i > static void stable_tree_append(struct rmap_item *rmap_item, > struct stable_node *stable_node) > { > +#ifdef CONFIG_NUMA > + /* > + * Usually rmap_item->nid is already set correctly, > + * but it may be wrong after switching merge_across_nodes. > + */ > + rmap_item->nid = get_kpfn_nid(stable_node->kpfn); > +#endif > rmap_item->head = stable_node; > rmap_item->address |= STABLE_FLAG; > hlist_add_head(&rmap_item->hlist, &stable_node->hlist); > @@ -1852,7 +1852,7 @@ static struct stable_node *ksm_check_sta > struct rb_node *node; > int nid; > > - for (nid = 0; nid < MAX_NUMNODES; nid++) > + for (nid = 0; nid < nr_node_ids; nid++) > for (node = rb_first(&root_stable_tree[nid]); node; > node = rb_next(node)) { > struct stable_node *stable_node; > @@ -2030,22 +2030,15 @@ static ssize_t merge_across_nodes_store( > return -EINVAL; > > mutex_lock(&ksm_thread_mutex); > - if (ksm_run & KSM_RUN_MERGE) { > - err = -EBUSY; > - } else { > - if (ksm_merge_across_nodes != knob) { > - if (ksm_pages_shared > 0) > - err = -EBUSY; > - else > - ksm_merge_across_nodes = knob; > - } > + if (ksm_merge_across_nodes != knob) { > + if (ksm_pages_shared) > + err = -EBUSY; > + else > + ksm_merge_across_nodes = knob; > } > - > - if (err) > - count = err; > mutex_unlock(&ksm_thread_mutex); > > - return count; > + return err ? err : count; > } > KSM_ATTR(merge_across_nodes); > #endif -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mm, ksm: NULL ptr deref in unstable_tree_search_insert 2012-12-19 1:36 ` Hugh Dickins 2012-12-19 12:16 ` Petr Holasek @ 2012-12-19 16:32 ` Petr Holasek 1 sibling, 0 replies; 6+ messages in thread From: Petr Holasek @ 2012-12-19 16:32 UTC (permalink / raw) To: Hugh Dickins Cc: Sasha Levin, Andrew Morton, linux-mm, linux-kernel@vger.kernel.org On Tue, 18 Dec 2012, Hugh Dickins wrote: > On Tue, 18 Dec 2012, Sasha Levin wrote: > > > Hi all, > > > > While fuzzing with trinity inside a KVM tools guest, running latest linux-next kernel, I've > > stumbled on the following: > > > > [ 127.959264] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110 > > [ 127.960379] IP: [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90 > > [ 127.960379] PGD cc54067 PUD cc55067 PMD 0 > > [ 127.960379] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > > [ 127.960379] Dumping ftrace buffer: > > [ 127.960379] (ftrace buffer empty) > > [ 127.960379] CPU 0 > > [ 127.960379] Pid: 3174, comm: ksmd Tainted: G W 3.7.0-next-20121218-sasha-00023-g8e46e86 #220 > > [ 127.978032] RIP: 0010:[<ffffffff81185b60>] [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90 > > [ 127.978032] RSP: 0018:ffff8800137abb78 EFLAGS: 00010046 > > [ 127.978032] RAX: 0000000000000086 RBX: 0000000000000110 RCX: 0000000000000001 > > [ 127.978032] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000110 > > [ 127.978032] RBP: ffff8800137abc18 R08: 0000000000000002 R09: 0000000000000000 > > [ 127.978032] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 > > [ 127.978032] R13: 0000000000000002 R14: ffff8800137b0000 R15: 0000000000000000 > > [ 127.978032] FS: 0000000000000000(0000) GS:ffff8800bfc00000(0000) knlGS:0000000000000000 > > [ 127.978032] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 127.978032] CR2: 0000000000000110 CR3: 000000000cc51000 CR4: 00000000000406f0 > > [ 127.978032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 127.978032] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > [ 127.978032] Process ksmd (pid: 3174, threadinfo ffff8800137aa000, task ffff8800137b0000) > > [ 127.978032] Stack: > > [ 127.978032] ffff8800137abba8 ffffffff863d8b50 ffff8800137b0948 ffffffff863d8b50 > > [ 127.978032] ffff8800137abbb8 ffffffff81180a12 ffff8800137abbb8 ffffffff81180a9e > > [ 127.978032] ffff8800137abbe8 ffffffff8118108e ffff8800137abc18 0000000000000000 > > [ 127.978032] Call Trace: > > [ 127.978032] [<ffffffff81180a12>] ? get_lock_stats+0x22/0x70 > > [ 127.978032] [<ffffffff81180a9e>] ? put_lock_stats.isra.16+0xe/0x40 > > [ 127.978032] [<ffffffff8118108e>] ? lock_release_holdtime+0x11e/0x130 > > [ 127.978032] [<ffffffff811889aa>] lock_acquire+0x1ca/0x270 > > [ 127.978032] [<ffffffff8125992f>] ? unstable_tree_search_insert+0x9f/0x260 > > [ 127.978032] [<ffffffff83cd7337>] down_read+0x47/0x90 > > [ 127.978032] [<ffffffff8125992f>] ? unstable_tree_search_insert+0x9f/0x260 > > [ 127.978032] [<ffffffff8125992f>] unstable_tree_search_insert+0x9f/0x260 > > [ 127.978032] [<ffffffff8125af27>] cmp_and_merge_page+0xe7/0x1e0 > > [ 127.978032] [<ffffffff8125b085>] ksm_do_scan+0x65/0xa0 > > [ 127.978032] [<ffffffff8125b12f>] ksm_scan_thread+0x6f/0x2d0 > > [ 127.978032] [<ffffffff8113de40>] ? abort_exclusive_wait+0xb0/0xb0 > > [ 127.978032] [<ffffffff8125b0c0>] ? ksm_do_scan+0xa0/0xa0 > > [ 127.978032] [<ffffffff8113cbd3>] kthread+0xe3/0xf0 > > [ 127.978032] [<ffffffff8113caf0>] ? __kthread_bind+0x40/0x40 > > [ 127.978032] [<ffffffff83cdae7c>] ret_from_fork+0x7c/0xb0 > > [ 127.978032] [<ffffffff8113caf0>] ? __kthread_bind+0x40/0x40 > > [ 127.978032] Code: 00 83 3d c3 2b b0 05 00 0f 85 d5 09 00 00 be f9 0b 00 00 48 c7 c7 1c d0 b2 84 89 55 88 e8 89 82 f8 ff 8b 55 > > 88 e9 b9 09 00 00 90 <48> 81 3b 60 59 22 86 b8 01 00 00 00 44 0f 44 e8 41 83 fc 01 77 > > [ 127.978032] RIP [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90 > > [ 127.978032] RSP <ffff8800137abb78> > > [ 127.978032] CR2: 0000000000000110 > > [ 127.978032] ---[ end trace 3dc1b0c5db8c1230 ]--- > > > > The relevant piece of code is: > > > > static struct page *get_mergeable_page(struct rmap_item *rmap_item) > > { > > struct mm_struct *mm = rmap_item->mm; > > unsigned long addr = rmap_item->address; > > struct vm_area_struct *vma; > > struct page *page; > > > > down_read(&mm->mmap_sem); > > > > Where 'mm' is NULL. I'm not really sure how it happens though. > > Thanks, yes, I got that, and it's not peculiar to fuzzing at all: > I'm testing the fix at the moment, but just hit something else too > (ksmd oops on NULL p->mm in task_numa_fault i.e. task_numa_placement). > > For the moment, you're safer not to run KSM: configure it out or don't > set it to run. Fixes to follow later, I'll try to remember to Cc you. > Thanks to trinity inside of KVM guest, I've reproduced it too. [ 1193.299397] Call Trace: [ 1193.328506] [<ffffffff811785c7>] ksm_scan_thread+0x967/0xd70 [ 1193.397097] [<ffffffff810818d0>] ? wake_up_bit+0x40/0x40 [ 1193.461528] [<ffffffff81177c60>] ? run_store+0x2b0/0x2b0 [ 1193.525962] [<ffffffff81080fb0>] kthread+0xc0/0xd0 [ 1193.584157] [<ffffffff81080ef0>] ? kthread_create_on_node+0x120/0x120 [ 1193.662101] [<ffffffff8165c3ac>] ret_from_fork+0x7c/0xb0 [ 1193.726535] [<ffffffff81080ef0>] ? kthread_create_on_node+0x120/0x120 [ 1193.804475] Code: fe 4a cc ff 48 83 c4 08 5b 5d c3 0f 1f 80 00 00 00 00 66 66 66 66 90 55 48 89 e5 53 48 89 fb 48 83 ec 08 e8 9a 0a 00 00 48 89 d8 <f0> 48 ff 00 79 05 e8 9c 4a cc ff 48 83 c4 08 5b 5d c3 55 48 89 [ 1194.030380] RIP [<ffffffff81651ed9>] down_read+0x19/0x2b [ 1194.094816] RSP <ffff880122a73de8> [ 1194.136385] CR2: 00007fb4c6e01268 [ 1194.176280] ---[ end trace 17dda1cb9a62bc36 ]--- With enabled CONFIG_NUMA_BALANCING this one, but not sure if we should use new numasched code or ksm: [ 4706.859796] Call Trace: [ 4706.888904] [<ffffffff811577d5>] do_numa_page+0xe5/0x130 [ 4706.953335] [<ffffffff81157a79>] handle_pte_fault+0x259/0xa50 [ 4707.023012] [<ffffffffa008a025>] ? kvm_set_spte_hva+0x25/0x30 [kvm] [ 4707.098878] [<ffffffff8115903e>] handle_mm_fault+0x26e/0x660 [ 4707.167470] [<ffffffff811775a2>] ? __mmu_notifier_invalidate_range_end+0x72/0x90 [ 4707.256850] [<ffffffff8112e68e>] ? unlock_page+0x2e/0x40 [ 4707.321283] [<ffffffff811779d5>] break_ksm+0x75/0xa0 [ 4707.381560] [<ffffffff81177c0d>] break_cow+0x5d/0x80 [ 4707.441833] [<ffffffff811794c7>] ksm_scan_thread+0xc87/0xd70 [ 4707.510427] [<ffffffff810818e0>] ? wake_up_bit+0x40/0x40 [ 4707.574860] [<ffffffff81178840>] ? run_store+0x2b0/0x2b0 [ 4707.639294] [<ffffffff81080fc0>] kthread+0xc0/0xd0 [ 4707.697489] [<ffffffff81080f00>] ? kthread_create_on_node+0x120/0x120 [ 4707.775435] [<ffffffff8165d8ec>] ret_from_fork+0x7c/0xb0 [ 4707.839869] [<ffffffff81080f00>] ? kthread_create_on_node+0x120/0x120 [ 4707.917806] Code: f8 65 48 8b 1c 25 40 c7 00 00 e9 11 00 00 00 48 8b 5d e8 4c 8b 65 f0 4c 8b 6d f8 c9 c3 0f 1f 00 84 d2 74 2c 48 8b 83 98 02 00 00 <8b> 80 68 03 00 00 3b 83 94 07 00 00 74 d6 89 83 94 07 00 00 48 [ 4708.143711] RIP [<ffffffff81098db3>] task_numa_fault+0x43/0xa0 [ 4708.214389] RSP <ffff880122a1fbc8> [ 4708.255957] CR2: 0000000000000368 [ 4708.295722] ---[ end trace 5ffe704785995d40 ]--- I am starting looking into it. thanks! Petr -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-12-20 0:22 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-12-19 1:17 mm, ksm: NULL ptr deref in unstable_tree_search_insert Sasha Levin 2012-12-19 1:36 ` Hugh Dickins 2012-12-19 12:16 ` Petr Holasek 2012-12-19 17:55 ` Hugh Dickins 2012-12-20 0:21 ` Hugh Dickins 2012-12-19 16:32 ` Petr Holasek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).