From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx202.postini.com [74.125.245.202]) by kanga.kvack.org (Postfix) with SMTP id CA4C26B0031 for ; Thu, 18 Jul 2013 13:43:06 -0400 (EDT) Received: from /spool/local by e23smtp04.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 19 Jul 2013 03:27:11 +1000 Received: from d23relay03.au.ibm.com (d23relay03.au.ibm.com [9.190.235.21]) by d23dlp03.au.ibm.com (Postfix) with ESMTP id 27C523578050 for ; Fri, 19 Jul 2013 03:42:39 +1000 (EST) Received: from d23av01.au.ibm.com (d23av01.au.ibm.com [9.190.234.96]) by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r6IHgPQQ6947274 for ; Fri, 19 Jul 2013 03:42:29 +1000 Received: from d23av01.au.ibm.com (loopback [127.0.0.1]) by d23av01.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r6IHgYYL019787 for ; Fri, 19 Jul 2013 03:42:35 +1000 From: "Aneesh Kumar K.V" Subject: Re: hugepage related lockdep trace. In-Reply-To: <20130718000901.GA31972@blaptop> References: <20130717153223.GD27731@redhat.com> <20130718000901.GA31972@blaptop> Date: Thu, 18 Jul 2013 23:12:24 +0530 Message-ID: <87hafrdatb.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim , Dave Jones , Linux Kernel , linux-mm@kvack.org Cc: Rik van Riel , Michal Hocko , KAMEZAWA Hiroyuki , Hillf Danton , Andrew Morton Minchan Kim writes: > Ccing people get_maintainer says. > > On Wed, Jul 17, 2013 at 11:32:23AM -0400, Dave Jones wrote: >> [128095.470960] ================================= >> [128095.471315] [ INFO: inconsistent lock state ] >> [128095.471660] 3.11.0-rc1+ #9 Not tainted >> [128095.472156] --------------------------------- >> [128095.472905] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. >> [128095.473650] kswapd0/49 [HC0[0]:SC0[0]:HE1:SE1] takes: >> [128095.474373] (&mapping->i_mmap_mutex){+.+.?.}, at: [] page_referenced+0x87/0x5e3 >> [128095.475128] {RECLAIM_FS-ON-W} state was registered at: >> [128095.475866] [] mark_held_locks+0x81/0xe7 >> [128095.476597] [] lockdep_trace_alloc+0x5e/0xbc >> [128095.477322] [] __alloc_pages_nodemask+0x8b/0x9b6 >> [128095.478049] [] __get_free_pages+0x20/0x31 >> [128095.478769] [] get_zeroed_page+0x12/0x14 >> [128095.479477] [] __pmd_alloc+0x1c/0x6b >> [128095.480138] [] huge_pmd_share+0x265/0x283 >> [128095.480138] [] huge_pte_alloc+0x5d/0x71 >> [128095.480138] [] hugetlb_fault+0x7c/0x64a >> [128095.480138] [] handle_mm_fault+0x255/0x299 >> [128095.480138] [] __do_page_fault+0x142/0x55c >> [128095.480138] [] do_page_fault+0xd/0x16 >> [128095.480138] [] error_code+0x6c/0x74 >> [128095.480138] irq event stamp: 3136917 >> [128095.480138] hardirqs last enabled at (3136917): [] _raw_spin_unlock_irq+0x27/0x50 >> [128095.480138] hardirqs last disabled at (3136916): [] _raw_spin_lock_irq+0x15/0x78 >> [128095.480138] softirqs last enabled at (3136180): [] __do_softirq+0x137/0x30f >> [128095.480138] softirqs last disabled at (3136175): [] irq_exit+0xa8/0xaa >> [128095.480138] >> other info that might help us debug this: >> [128095.480138] Possible unsafe locking scenario: >> >> [128095.480138] CPU0 >> [128095.480138] ---- >> [128095.480138] lock(&mapping->i_mmap_mutex); >> [128095.480138] >> [128095.480138] lock(&mapping->i_mmap_mutex); >> [128095.480138] >> *** DEADLOCK *** >> >> [128095.480138] no locks held by kswapd0/49. >> [128095.480138] >> stack backtrace: >> [128095.480138] CPU: 1 PID: 49 Comm: kswapd0 Not tainted 3.11.0-rc1+ #9 >> [128095.480138] Hardware name: Dell Inc. Precision WorkStation 490 /0DT031, BIOS A08 04/25/2008 >> [128095.480138] c1d32630 00000000 ee39fb18 c15b001e ee395780 ee39fb54 c15acdcb c1751845 >> [128095.480138] c1751bbf 00000031 00000000 00000000 00000000 00000000 00000001 00000001 >> [128095.480138] c1751bbf 00000008 ee395c44 00000100 ee39fb88 c10a6130 00000008 0000d8fb >> [128095.480138] Call Trace: >> [128095.480138] [] dump_stack+0x4b/0x79 >> [128095.480138] [] print_usage_bug+0x1d9/0x1e3 >> [128095.480138] [] mark_lock+0x1e0/0x261 >> [128095.480138] [] ? check_usage_backwards+0x109/0x109 >> [128095.480138] [] __lock_acquire+0x623/0x17f2 >> [128095.480138] [] ? sched_clock_cpu+0xcd/0x130 >> [128095.480138] [] ? sched_clock_local+0x42/0x12e >> [128095.480138] [] lock_acquire+0x7d/0x195 >> [128095.480138] [] ? page_referenced+0x87/0x5e3 >> [128095.480138] [] mutex_lock_nested+0x6c/0x3a7 >> [128095.480138] [] ? page_referenced+0x87/0x5e3 >> [128095.480138] [] ? page_referenced+0x87/0x5e3 >> [128095.480138] [] ? mem_cgroup_charge_statistics.isra.24+0x61/0x9e >> [128095.480138] [] page_referenced+0x87/0x5e3 >> [128095.480138] [] ? raid0_congested+0x26/0x8a [raid0] >> [128095.480138] [] shrink_page_list+0x3d9/0x947 >> [128095.480138] [] ? trace_hardirqs_on+0xb/0xd >> [128095.480138] [] shrink_inactive_list+0x155/0x4cb >> [128095.480138] [] shrink_lruvec+0x300/0x5ce >> [128095.480138] [] shrink_zone+0x53/0x14e >> [128095.480138] [] kswapd+0x517/0xa75 >> [128095.480138] [] ? mem_cgroup_shrink_node_zone+0x280/0x280 >> [128095.480138] [] kthread+0xa8/0xaa >> [128095.480138] [] ? trace_hardirqs_on+0xb/0xd >> [128095.480138] [] ret_from_kernel_thread+0x1b/0x28 >> [128095.480138] [] ? insert_kthread_work+0x63/0x63 > > IMHO, it's a false positive because i_mmap_mutex was held by kswapd > while one in the middle of fault path could be never on kswapd context. > > It seems lockdep for reclaim-over-fs isn't enough smart to identify > between background and direct reclaim. > > Wait for other's opinion. Is that reasoning correct ?. We may not deadlock because hugetlb pages cannot be reclaimed. So the fault path in hugetlb won't end up reclaiming pages from same inode. But the report is correct right ? Looking at the hugetlb code we have in huge_pmd_share out: pte = (pte_t *)pmd_alloc(mm, pud, addr); mutex_unlock(&mapping->i_mmap_mutex); return pte; I guess we should move that pmd_alloc outside i_mmap_mutex. Otherwise that pmd_alloc can result in a reclaim which can call shrink_page_list ? Something like ? diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 83aff0a..2cb1be3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3266,8 +3266,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) put_page(virt_to_page(spte)); spin_unlock(&mm->page_table_lock); out: - pte = (pte_t *)pmd_alloc(mm, pud, addr); mutex_unlock(&mapping->i_mmap_mutex); + pte = (pte_t *)pmd_alloc(mm, pud, addr); return pte; } -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org