All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Jones <davej@redhat.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Rik van Riel <riel@redhat.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Hillf Danton <dhillf@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] hugetlb: fix lockdep splat caused by pmd sharing
Date: Wed, 31 Jul 2013 08:35:30 +0900	[thread overview]
Message-ID: <20130730233530.GA19340@bbox> (raw)
In-Reply-To: <20130730152333.GJ15847@dhcp22.suse.cz>

On Tue, Jul 30, 2013 at 05:23:33PM +0200, Michal Hocko wrote:
> On Tue 30-07-13 16:58:34, Peter Zijlstra wrote:
> > On Tue, Jul 30, 2013 at 04:46:00PM +0200, Michal Hocko wrote:
> [...]
> > > +/*
> > > + * Now, reclaim path never holds hugetlbfs_inode->i_mmap_mutex while it could
> > > + * hold normal inode->i_mmap_mutex so this annotation avoids a lockdep splat.
> > 
> > How about something like:
> > 
> > /*
> >  * Hugetlbfs is not reclaimable; therefore its i_mmap_mutex will never
> >  * be taken from reclaim -- unlike regular filesystems. This needs an
> >  * annotation because huge_pmd_share() does an allocation under
> >  * i_mmap_mutex.
> >  */
> > 
> > It clarifies the exact conditions and makes easier to verify the
> > validity of the annotation.
> 
> Yes, looks much better. Thanks!
> ---
> >From 673cbe2ca7df0decd7320987d97585660542e468 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Tue, 30 Jul 2013 17:22:14 +0200
> Subject: [PATCH] hugetlb: fix lockdep splat caused by pmd sharing
> 
> Dave has reported the following lockdep splat:
> [128095.470960] =================================
> [128095.471315] [ INFO: inconsistent lock state ]
> [128095.471660] 3.11.0-rc1+ #9 Not tainted
> [128095.472156] ---------------------------------
> [128095.472905] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
> [128095.473650] kswapd0/49 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [128095.474373]  (&mapping->i_mmap_mutex){+.+.?.}, at: [<c114971b>] page_referenced+0x87/0x5e3
> [128095.475128] {RECLAIM_FS-ON-W} state was registered at:
> [128095.475866]   [<c10a6232>] mark_held_locks+0x81/0xe7
> [128095.476597]   [<c10a8db3>] lockdep_trace_alloc+0x5e/0xbc
> [128095.477322]   [<c112316b>] __alloc_pages_nodemask+0x8b/0x9b6
> [128095.478049]   [<c1123ab6>] __get_free_pages+0x20/0x31
> [128095.478769]   [<c1123ad9>] get_zeroed_page+0x12/0x14
> [128095.479477]   [<c113fe1e>] __pmd_alloc+0x1c/0x6b
> [128095.480138]   [<c1155ea7>] huge_pmd_share+0x265/0x283
> [128095.480138]   [<c1155f22>] huge_pte_alloc+0x5d/0x71
> [128095.480138]   [<c115612e>] hugetlb_fault+0x7c/0x64a
> [128095.480138]   [<c114087c>] handle_mm_fault+0x255/0x299
> [128095.480138]   [<c15bbab0>] __do_page_fault+0x142/0x55c
> [128095.480138]   [<c15bbed7>] do_page_fault+0xd/0x16
> [128095.480138]   [<c15b927c>] error_code+0x6c/0x74
> [128095.480138] irq event stamp: 3136917
> [128095.480138] hardirqs last  enabled at (3136917): [<c15b8139>] _raw_spin_unlock_irq+0x27/0x50
> [128095.480138] hardirqs last disabled at (3136916): [<c15b7f4e>] _raw_spin_lock_irq+0x15/0x78
> [128095.480138] softirqs last  enabled at (3136180): [<c1048e4a>] __do_softirq+0x137/0x30f
> [128095.480138] softirqs last disabled at (3136175): [<c1049195>] irq_exit+0xa8/0xaa
> [128095.480138]
> other info that might help us debug this:
> [128095.480138]  Possible unsafe locking scenario:
> [128095.480138]        CPU0
> [128095.480138]        ----
> [128095.480138]   lock(&mapping->i_mmap_mutex);
> [128095.480138]   <Interrupt>
> [128095.480138]     lock(&mapping->i_mmap_mutex);
> [128095.480138]
>  *** DEADLOCK ***
> [128095.480138] no locks held by kswapd0/49.
> [128095.480138]
> stack backtrace:
> [128095.480138] CPU: 1 PID: 49 Comm: kswapd0 Not tainted 3.11.0-rc1+ #9
> [128095.480138] Hardware name: Dell Inc.                 Precision WorkStation 490    /0DT031, BIOS A08 04/25/2008
> [128095.480138]  c1d32630 00000000 ee39fb18 c15b001e ee395780 ee39fb54 c15acdcb c1751845
> [128095.480138]  c1751bbf 00000031 00000000 00000000 00000000 00000000 00000001 00000001
> [128095.480138]  c1751bbf 00000008 ee395c44 00000100 ee39fb88 c10a6130 00000008 0000d8fb
> [128095.480138] Call Trace:
> [128095.480138]  [<c15b001e>] dump_stack+0x4b/0x79
> [128095.480138]  [<c15acdcb>] print_usage_bug+0x1d9/0x1e3
> [128095.480138]  [<c10a6130>] mark_lock+0x1e0/0x261
> [128095.480138]  [<c10a5878>] ? check_usage_backwards+0x109/0x109
> [128095.480138]  [<c10a6cde>] __lock_acquire+0x623/0x17f2
> [128095.480138]  [<c107aa43>] ? sched_clock_cpu+0xcd/0x130
> [128095.480138]  [<c107a7e8>] ? sched_clock_local+0x42/0x12e
> [128095.480138]  [<c10a84cf>] lock_acquire+0x7d/0x195
> [128095.480138]  [<c114971b>] ? page_referenced+0x87/0x5e3
> [128095.480138]  [<c15b3671>] mutex_lock_nested+0x6c/0x3a7
> [128095.480138]  [<c114971b>] ? page_referenced+0x87/0x5e3
> [128095.480138]  [<c114971b>] ? page_referenced+0x87/0x5e3
> [128095.480138]  [<c11661d5>] ? mem_cgroup_charge_statistics.isra.24+0x61/0x9e
> [128095.480138]  [<c114971b>] page_referenced+0x87/0x5e3
> [128095.480138]  [<f8433030>] ? raid0_congested+0x26/0x8a [raid0]
> [128095.480138]  [<c112b9c7>] shrink_page_list+0x3d9/0x947
> [128095.480138]  [<c10a6457>] ? trace_hardirqs_on+0xb/0xd
> [128095.480138]  [<c112c3cf>] shrink_inactive_list+0x155/0x4cb
> [128095.480138]  [<c112cd07>] shrink_lruvec+0x300/0x5ce
> [128095.480138]  [<c112d028>] shrink_zone+0x53/0x14e
> [128095.480138]  [<c112e531>] kswapd+0x517/0xa75
> [128095.480138]  [<c112e01a>] ? mem_cgroup_shrink_node_zone+0x280/0x280
> [128095.480138]  [<c10661ff>] kthread+0xa8/0xaa
> [128095.480138]  [<c10a6457>] ? trace_hardirqs_on+0xb/0xd
> [128095.480138]  [<c15bf737>] ret_from_kernel_thread+0x1b/0x28
> [128095.480138]  [<c1066157>] ? insert_kthread_work+0x63/0x63
> 
> which is a false positive caused by hugetlb pmd sharing code which
> allocates a new pmd from withing mappint->i_mmap_mutex. If this
> allocation causes reclaim then the lockdep detector complains that we
> might self-deadlock.
> 
> This is not correct though, because hugetlb pages are not reclaimable so
> their mapping will be never touched from the reclaim path.
> 
> The patch tells lockup detector that hugetlb i_mmap_mutex is special
> by assigning it a separate lockdep class so it won't report possible
> deadlocks on unrelated mappings.
> 
> [peterz@infradead.org: comment for annotation]
> Reported-by: Dave Jones <davej@redhat.com>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Minchan Kim <minchan@kernel.org>

Thanks, Michal!
Only remained thing is Dave's testing.

- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Jones <davej@redhat.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Rik van Riel <riel@redhat.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Hillf Danton <dhillf@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] hugetlb: fix lockdep splat caused by pmd sharing
Date: Wed, 31 Jul 2013 08:35:30 +0900	[thread overview]
Message-ID: <20130730233530.GA19340@bbox> (raw)
In-Reply-To: <20130730152333.GJ15847@dhcp22.suse.cz>

On Tue, Jul 30, 2013 at 05:23:33PM +0200, Michal Hocko wrote:
> On Tue 30-07-13 16:58:34, Peter Zijlstra wrote:
> > On Tue, Jul 30, 2013 at 04:46:00PM +0200, Michal Hocko wrote:
> [...]
> > > +/*
> > > + * Now, reclaim path never holds hugetlbfs_inode->i_mmap_mutex while it could
> > > + * hold normal inode->i_mmap_mutex so this annotation avoids a lockdep splat.
> > 
> > How about something like:
> > 
> > /*
> >  * Hugetlbfs is not reclaimable; therefore its i_mmap_mutex will never
> >  * be taken from reclaim -- unlike regular filesystems. This needs an
> >  * annotation because huge_pmd_share() does an allocation under
> >  * i_mmap_mutex.
> >  */
> > 
> > It clarifies the exact conditions and makes easier to verify the
> > validity of the annotation.
> 
> Yes, looks much better. Thanks!
> ---
> >From 673cbe2ca7df0decd7320987d97585660542e468 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Tue, 30 Jul 2013 17:22:14 +0200
> Subject: [PATCH] hugetlb: fix lockdep splat caused by pmd sharing
> 
> Dave has reported the following lockdep splat:
> [128095.470960] =================================
> [128095.471315] [ INFO: inconsistent lock state ]
> [128095.471660] 3.11.0-rc1+ #9 Not tainted
> [128095.472156] ---------------------------------
> [128095.472905] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
> [128095.473650] kswapd0/49 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [128095.474373]  (&mapping->i_mmap_mutex){+.+.?.}, at: [<c114971b>] page_referenced+0x87/0x5e3
> [128095.475128] {RECLAIM_FS-ON-W} state was registered at:
> [128095.475866]   [<c10a6232>] mark_held_locks+0x81/0xe7
> [128095.476597]   [<c10a8db3>] lockdep_trace_alloc+0x5e/0xbc
> [128095.477322]   [<c112316b>] __alloc_pages_nodemask+0x8b/0x9b6
> [128095.478049]   [<c1123ab6>] __get_free_pages+0x20/0x31
> [128095.478769]   [<c1123ad9>] get_zeroed_page+0x12/0x14
> [128095.479477]   [<c113fe1e>] __pmd_alloc+0x1c/0x6b
> [128095.480138]   [<c1155ea7>] huge_pmd_share+0x265/0x283
> [128095.480138]   [<c1155f22>] huge_pte_alloc+0x5d/0x71
> [128095.480138]   [<c115612e>] hugetlb_fault+0x7c/0x64a
> [128095.480138]   [<c114087c>] handle_mm_fault+0x255/0x299
> [128095.480138]   [<c15bbab0>] __do_page_fault+0x142/0x55c
> [128095.480138]   [<c15bbed7>] do_page_fault+0xd/0x16
> [128095.480138]   [<c15b927c>] error_code+0x6c/0x74
> [128095.480138] irq event stamp: 3136917
> [128095.480138] hardirqs last  enabled at (3136917): [<c15b8139>] _raw_spin_unlock_irq+0x27/0x50
> [128095.480138] hardirqs last disabled at (3136916): [<c15b7f4e>] _raw_spin_lock_irq+0x15/0x78
> [128095.480138] softirqs last  enabled at (3136180): [<c1048e4a>] __do_softirq+0x137/0x30f
> [128095.480138] softirqs last disabled at (3136175): [<c1049195>] irq_exit+0xa8/0xaa
> [128095.480138]
> other info that might help us debug this:
> [128095.480138]  Possible unsafe locking scenario:
> [128095.480138]        CPU0
> [128095.480138]        ----
> [128095.480138]   lock(&mapping->i_mmap_mutex);
> [128095.480138]   <Interrupt>
> [128095.480138]     lock(&mapping->i_mmap_mutex);
> [128095.480138]
>  *** DEADLOCK ***
> [128095.480138] no locks held by kswapd0/49.
> [128095.480138]
> stack backtrace:
> [128095.480138] CPU: 1 PID: 49 Comm: kswapd0 Not tainted 3.11.0-rc1+ #9
> [128095.480138] Hardware name: Dell Inc.                 Precision WorkStation 490    /0DT031, BIOS A08 04/25/2008
> [128095.480138]  c1d32630 00000000 ee39fb18 c15b001e ee395780 ee39fb54 c15acdcb c1751845
> [128095.480138]  c1751bbf 00000031 00000000 00000000 00000000 00000000 00000001 00000001
> [128095.480138]  c1751bbf 00000008 ee395c44 00000100 ee39fb88 c10a6130 00000008 0000d8fb
> [128095.480138] Call Trace:
> [128095.480138]  [<c15b001e>] dump_stack+0x4b/0x79
> [128095.480138]  [<c15acdcb>] print_usage_bug+0x1d9/0x1e3
> [128095.480138]  [<c10a6130>] mark_lock+0x1e0/0x261
> [128095.480138]  [<c10a5878>] ? check_usage_backwards+0x109/0x109
> [128095.480138]  [<c10a6cde>] __lock_acquire+0x623/0x17f2
> [128095.480138]  [<c107aa43>] ? sched_clock_cpu+0xcd/0x130
> [128095.480138]  [<c107a7e8>] ? sched_clock_local+0x42/0x12e
> [128095.480138]  [<c10a84cf>] lock_acquire+0x7d/0x195
> [128095.480138]  [<c114971b>] ? page_referenced+0x87/0x5e3
> [128095.480138]  [<c15b3671>] mutex_lock_nested+0x6c/0x3a7
> [128095.480138]  [<c114971b>] ? page_referenced+0x87/0x5e3
> [128095.480138]  [<c114971b>] ? page_referenced+0x87/0x5e3
> [128095.480138]  [<c11661d5>] ? mem_cgroup_charge_statistics.isra.24+0x61/0x9e
> [128095.480138]  [<c114971b>] page_referenced+0x87/0x5e3
> [128095.480138]  [<f8433030>] ? raid0_congested+0x26/0x8a [raid0]
> [128095.480138]  [<c112b9c7>] shrink_page_list+0x3d9/0x947
> [128095.480138]  [<c10a6457>] ? trace_hardirqs_on+0xb/0xd
> [128095.480138]  [<c112c3cf>] shrink_inactive_list+0x155/0x4cb
> [128095.480138]  [<c112cd07>] shrink_lruvec+0x300/0x5ce
> [128095.480138]  [<c112d028>] shrink_zone+0x53/0x14e
> [128095.480138]  [<c112e531>] kswapd+0x517/0xa75
> [128095.480138]  [<c112e01a>] ? mem_cgroup_shrink_node_zone+0x280/0x280
> [128095.480138]  [<c10661ff>] kthread+0xa8/0xaa
> [128095.480138]  [<c10a6457>] ? trace_hardirqs_on+0xb/0xd
> [128095.480138]  [<c15bf737>] ret_from_kernel_thread+0x1b/0x28
> [128095.480138]  [<c1066157>] ? insert_kthread_work+0x63/0x63
> 
> which is a false positive caused by hugetlb pmd sharing code which
> allocates a new pmd from withing mappint->i_mmap_mutex. If this
> allocation causes reclaim then the lockdep detector complains that we
> might self-deadlock.
> 
> This is not correct though, because hugetlb pages are not reclaimable so
> their mapping will be never touched from the reclaim path.
> 
> The patch tells lockup detector that hugetlb i_mmap_mutex is special
> by assigning it a separate lockdep class so it won't report possible
> deadlocks on unrelated mappings.
> 
> [peterz@infradead.org: comment for annotation]
> Reported-by: Dave Jones <davej@redhat.com>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Minchan Kim <minchan@kernel.org>

Thanks, Michal!
Only remained thing is Dave's testing.

- 
Kind regards,
Minchan Kim

  reply	other threads:[~2013-07-30 23:35 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-17 15:32 hugepage related lockdep trace Dave Jones
2013-07-17 15:32 ` Dave Jones
2013-07-18  0:09 ` Minchan Kim
2013-07-18  0:09   ` Minchan Kim
2013-07-18 17:42   ` Aneesh Kumar K.V
2013-07-18 17:42     ` Aneesh Kumar K.V
2013-07-19  0:13     ` Minchan Kim
2013-07-19  0:13       ` Minchan Kim
2013-07-23  7:24       ` Hush Bensen
2013-07-23  7:24         ` Hush Bensen
2013-07-24  2:57         ` Minchan Kim
2013-07-24  2:57           ` Minchan Kim
2013-07-23 14:01       ` Michal Hocko
2013-07-23 14:01         ` Michal Hocko
2013-07-24  2:44         ` Minchan Kim
2013-07-24  2:44           ` Minchan Kim
2013-07-25 13:30           ` Michal Hocko
2013-07-25 13:30             ` Michal Hocko
2013-07-29  8:24             ` Minchan Kim
2013-07-29  8:24               ` Minchan Kim
2013-07-29 14:53               ` Michal Hocko
2013-07-29 14:53                 ` Michal Hocko
2013-07-29 15:20                 ` Peter Zijlstra
2013-07-29 15:20                   ` Peter Zijlstra
2013-07-30 14:29                   ` Michal Hocko
2013-07-30 14:29                     ` Michal Hocko
2013-07-30 14:46                     ` [PATCH] hugetlb: fix lockdep splat caused by pmd sharing Michal Hocko
2013-07-30 14:46                       ` Michal Hocko
2013-07-30 14:58                       ` Peter Zijlstra
2013-07-30 14:58                         ` Peter Zijlstra
2013-07-30 15:23                         ` Michal Hocko
2013-07-30 15:23                           ` Michal Hocko
2013-07-30 23:35                           ` Minchan Kim [this message]
2013-07-30 23:35                             ` Minchan Kim
2013-07-30 23:37                             ` Dave Jones
2013-07-30 23:37                               ` Dave Jones
2013-07-19  2:08     ` hugepage related lockdep trace Hillf Danton
2013-07-19  2:08       ` Hillf Danton
2013-07-19  3:20       ` Aneesh Kumar K.V
2013-07-19  3:20         ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130730233530.GA19340@bbox \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=davej@redhat.com \
    --cc=dhillf@gmail.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.