linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Jones <davej@redhat.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Rik van Riel <riel@redhat.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Hillf Danton <dhillf@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] hugetlb: fix lockdep splat caused by pmd sharing
Date: Wed, 31 Jul 2013 08:35:30 +0900	[thread overview]
Message-ID: <20130730233530.GA19340@bbox> (raw)
In-Reply-To: <20130730152333.GJ15847@dhcp22.suse.cz>

On Tue, Jul 30, 2013 at 05:23:33PM +0200, Michal Hocko wrote:
> On Tue 30-07-13 16:58:34, Peter Zijlstra wrote:
> > On Tue, Jul 30, 2013 at 04:46:00PM +0200, Michal Hocko wrote:
> [...]
> > > +/*
> > > + * Now, reclaim path never holds hugetlbfs_inode->i_mmap_mutex while it could
> > > + * hold normal inode->i_mmap_mutex so this annotation avoids a lockdep splat.
> > 
> > How about something like:
> > 
> > /*
> >  * Hugetlbfs is not reclaimable; therefore its i_mmap_mutex will never
> >  * be taken from reclaim -- unlike regular filesystems. This needs an
> >  * annotation because huge_pmd_share() does an allocation under
> >  * i_mmap_mutex.
> >  */
> > 
> > It clarifies the exact conditions and makes easier to verify the
> > validity of the annotation.
> 
> Yes, looks much better. Thanks!
> ---
> >From 673cbe2ca7df0decd7320987d97585660542e468 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Tue, 30 Jul 2013 17:22:14 +0200
> Subject: [PATCH] hugetlb: fix lockdep splat caused by pmd sharing
> 
> Dave has reported the following lockdep splat:
> [128095.470960] =================================
> [128095.471315] [ INFO: inconsistent lock state ]
> [128095.471660] 3.11.0-rc1+ #9 Not tainted
> [128095.472156] ---------------------------------
> [128095.472905] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
> [128095.473650] kswapd0/49 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [128095.474373]  (&mapping->i_mmap_mutex){+.+.?.}, at: [<c114971b>] page_referenced+0x87/0x5e3
> [128095.475128] {RECLAIM_FS-ON-W} state was registered at:
> [128095.475866]   [<c10a6232>] mark_held_locks+0x81/0xe7
> [128095.476597]   [<c10a8db3>] lockdep_trace_alloc+0x5e/0xbc
> [128095.477322]   [<c112316b>] __alloc_pages_nodemask+0x8b/0x9b6
> [128095.478049]   [<c1123ab6>] __get_free_pages+0x20/0x31
> [128095.478769]   [<c1123ad9>] get_zeroed_page+0x12/0x14
> [128095.479477]   [<c113fe1e>] __pmd_alloc+0x1c/0x6b
> [128095.480138]   [<c1155ea7>] huge_pmd_share+0x265/0x283
> [128095.480138]   [<c1155f22>] huge_pte_alloc+0x5d/0x71
> [128095.480138]   [<c115612e>] hugetlb_fault+0x7c/0x64a
> [128095.480138]   [<c114087c>] handle_mm_fault+0x255/0x299
> [128095.480138]   [<c15bbab0>] __do_page_fault+0x142/0x55c
> [128095.480138]   [<c15bbed7>] do_page_fault+0xd/0x16
> [128095.480138]   [<c15b927c>] error_code+0x6c/0x74
> [128095.480138] irq event stamp: 3136917
> [128095.480138] hardirqs last  enabled at (3136917): [<c15b8139>] _raw_spin_unlock_irq+0x27/0x50
> [128095.480138] hardirqs last disabled at (3136916): [<c15b7f4e>] _raw_spin_lock_irq+0x15/0x78
> [128095.480138] softirqs last  enabled at (3136180): [<c1048e4a>] __do_softirq+0x137/0x30f
> [128095.480138] softirqs last disabled at (3136175): [<c1049195>] irq_exit+0xa8/0xaa
> [128095.480138]
> other info that might help us debug this:
> [128095.480138]  Possible unsafe locking scenario:
> [128095.480138]        CPU0
> [128095.480138]        ----
> [128095.480138]   lock(&mapping->i_mmap_mutex);
> [128095.480138]   <Interrupt>
> [128095.480138]     lock(&mapping->i_mmap_mutex);
> [128095.480138]
>  *** DEADLOCK ***
> [128095.480138] no locks held by kswapd0/49.
> [128095.480138]
> stack backtrace:
> [128095.480138] CPU: 1 PID: 49 Comm: kswapd0 Not tainted 3.11.0-rc1+ #9
> [128095.480138] Hardware name: Dell Inc.                 Precision WorkStation 490    /0DT031, BIOS A08 04/25/2008
> [128095.480138]  c1d32630 00000000 ee39fb18 c15b001e ee395780 ee39fb54 c15acdcb c1751845
> [128095.480138]  c1751bbf 00000031 00000000 00000000 00000000 00000000 00000001 00000001
> [128095.480138]  c1751bbf 00000008 ee395c44 00000100 ee39fb88 c10a6130 00000008 0000d8fb
> [128095.480138] Call Trace:
> [128095.480138]  [<c15b001e>] dump_stack+0x4b/0x79
> [128095.480138]  [<c15acdcb>] print_usage_bug+0x1d9/0x1e3
> [128095.480138]  [<c10a6130>] mark_lock+0x1e0/0x261
> [128095.480138]  [<c10a5878>] ? check_usage_backwards+0x109/0x109
> [128095.480138]  [<c10a6cde>] __lock_acquire+0x623/0x17f2
> [128095.480138]  [<c107aa43>] ? sched_clock_cpu+0xcd/0x130
> [128095.480138]  [<c107a7e8>] ? sched_clock_local+0x42/0x12e
> [128095.480138]  [<c10a84cf>] lock_acquire+0x7d/0x195
> [128095.480138]  [<c114971b>] ? page_referenced+0x87/0x5e3
> [128095.480138]  [<c15b3671>] mutex_lock_nested+0x6c/0x3a7
> [128095.480138]  [<c114971b>] ? page_referenced+0x87/0x5e3
> [128095.480138]  [<c114971b>] ? page_referenced+0x87/0x5e3
> [128095.480138]  [<c11661d5>] ? mem_cgroup_charge_statistics.isra.24+0x61/0x9e
> [128095.480138]  [<c114971b>] page_referenced+0x87/0x5e3
> [128095.480138]  [<f8433030>] ? raid0_congested+0x26/0x8a [raid0]
> [128095.480138]  [<c112b9c7>] shrink_page_list+0x3d9/0x947
> [128095.480138]  [<c10a6457>] ? trace_hardirqs_on+0xb/0xd
> [128095.480138]  [<c112c3cf>] shrink_inactive_list+0x155/0x4cb
> [128095.480138]  [<c112cd07>] shrink_lruvec+0x300/0x5ce
> [128095.480138]  [<c112d028>] shrink_zone+0x53/0x14e
> [128095.480138]  [<c112e531>] kswapd+0x517/0xa75
> [128095.480138]  [<c112e01a>] ? mem_cgroup_shrink_node_zone+0x280/0x280
> [128095.480138]  [<c10661ff>] kthread+0xa8/0xaa
> [128095.480138]  [<c10a6457>] ? trace_hardirqs_on+0xb/0xd
> [128095.480138]  [<c15bf737>] ret_from_kernel_thread+0x1b/0x28
> [128095.480138]  [<c1066157>] ? insert_kthread_work+0x63/0x63
> 
> which is a false positive caused by hugetlb pmd sharing code which
> allocates a new pmd from withing mappint->i_mmap_mutex. If this
> allocation causes reclaim then the lockdep detector complains that we
> might self-deadlock.
> 
> This is not correct though, because hugetlb pages are not reclaimable so
> their mapping will be never touched from the reclaim path.
> 
> The patch tells lockup detector that hugetlb i_mmap_mutex is special
> by assigning it a separate lockdep class so it won't report possible
> deadlocks on unrelated mappings.
> 
> [peterz@infradead.org: comment for annotation]
> Reported-by: Dave Jones <davej@redhat.com>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Minchan Kim <minchan@kernel.org>

Thanks, Michal!
Only remained thing is Dave's testing.

- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-07-30 23:35 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-17 15:32 hugepage related lockdep trace Dave Jones
2013-07-18  0:09 ` Minchan Kim
2013-07-18 17:42   ` Aneesh Kumar K.V
2013-07-19  0:13     ` Minchan Kim
2013-07-23  7:24       ` Hush Bensen
2013-07-24  2:57         ` Minchan Kim
2013-07-23 14:01       ` Michal Hocko
2013-07-24  2:44         ` Minchan Kim
2013-07-25 13:30           ` Michal Hocko
2013-07-29  8:24             ` Minchan Kim
2013-07-29 14:53               ` Michal Hocko
2013-07-29 15:20                 ` Peter Zijlstra
2013-07-30 14:29                   ` Michal Hocko
2013-07-30 14:46                     ` [PATCH] hugetlb: fix lockdep splat caused by pmd sharing Michal Hocko
2013-07-30 14:58                       ` Peter Zijlstra
2013-07-30 15:23                         ` Michal Hocko
2013-07-30 23:35                           ` Minchan Kim [this message]
2013-07-30 23:37                             ` Dave Jones
2013-07-19  2:08     ` hugepage related lockdep trace Hillf Danton
2013-07-19  3:20       ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130730233530.GA19340@bbox \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=davej@redhat.com \
    --cc=dhillf@gmail.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).