mm: ksm: deadlock in oom killing process while breaking ksm pages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* mm: ksm: deadlock in oom killing process while breaking ksm pages
@ 2015-10-01 14:34 Sasha Levin
  2015-10-05  7:31 ` Hugh Dickins
  0 siblings, 1 reply; 3+ messages in thread
From: Sasha Levin @ 2015-10-01 14:34 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: LKML, linux-mm@kvack.org, Andrew Morton

Hi Hugh,

I've hit this (actual) lockup during testing. It seems that we were trying to allocate
a new page to break KSM on an existing page, ended up in the oom killer who killed our
process, and locked up in __ksm_exit() trying to get a write lock while already holding
a read lock.

A very similar scenario is presented in the patch that introduced this behaviour
(9ba6929480 ("ksm: fix oom deadlock")):

    There's a now-obvious deadlock in KSM's out-of-memory handling:
    imagine ksmd or KSM_RUN_UNMERGE handling, holding ksm_thread_mutex,
    trying to allocate a page to break KSM in an mm which becomes the
    OOM victim (quite likely in the unmerge case): it's killed and goes
    to exit, and hangs there waiting to acquire ksm_thread_mutex.

So I'm guessing that the solution is incomplete for the slow path.

[3201844.610523] =============================================
[3201844.610988] [ INFO: possible recursive locking detected ]
[3201844.611405] 4.3.0-rc3-next-20150930-sasha-00077-g3434920 #4 Not tainted
[3201844.611907] ---------------------------------------------
[3201844.612373] ksm02/28830 is trying to acquire lock:
[3201844.612749] (&mm->mmap_sem){++++++}, at: __ksm_exit (mm/ksm.c:1821)
[3201844.613472] RWsem: count: 1 owner: None
[3201844.613782]
[3201844.613782] but task is already holding lock:
[3201844.614248] (&mm->mmap_sem){++++++}, at: run_store (mm/ksm.c:769 mm/ksm.c:2124)
[3201844.614904] RWsem: count: 1 owner: None
[3201844.615212]
[3201844.615212] other info that might help us debug this:
[3201844.615727]  Possible unsafe locking scenario:
[3201844.615727]
[3201844.616240]        CPU0
[3201844.616446]        ----
[3201844.616650]   lock(&mm->mmap_sem);
[3201844.616952]   lock(&mm->mmap_sem);
[3201844.617252]
[3201844.617252]  *** DEADLOCK ***
[3201844.617252]
[3201844.617733]  May be due to missing lock nesting notation
[3201844.617733]
[3201844.618265] 6 locks held by ksm02/28830:
[3201844.618576] #0: (sb_writers#5){.+.+.+}, at: __sb_start_write (fs/super.c:1176)
[3201844.619327] RWsem: count: 0 owner: None
[3201844.619633] #1: (&of->mutex){+.+.+.}, at: kernfs_fop_write (fs/kernfs/file.c:298)
[3201844.624648] Mutex: counter: 0 owner: ksm02
[3201844.624978] #2: (s_active#448){.+.+.+}, at: kernfs_fop_write (fs/kernfs/file.c:298)
[3201844.625733] #3: (ksm_thread_mutex){+.+.+.}, at: run_store (mm/ksm.c:2120)
[3201844.626448] Mutex: counter: -1 owner: ksm02
[3201844.626786] #4: (&mm->mmap_sem){++++++}, at: run_store (mm/ksm.c:769 mm/ksm.c:2124)
[3201844.627486] RWsem: count: 1 owner: None
[3201844.627792] #5: (oom_lock){+.+...}, at: __alloc_pages_nodemask (mm/page_alloc.c:2779 mm/page_alloc.c:3213 mm/page_alloc.c:3300)
[3201844.628594] Mutex: counter: 0 owner: ksm02
[3201844.628919]
[3201844.628919] stack backtrace:
[3201844.629276] CPU: 0 PID: 28830 Comm: ksm02 Not tainted 4.3.0-rc3-next-20150930-sasha-00077-g3434920 #4
[3201844.629970]  ffffffffaf41d680 00000000b8d5e1f1 ffff88065e42eec0 ffffffffa1d454c8
[3201844.630663]  ffffffffaf41d680 ffff88065e42f080 ffffffffa04269ee ffff88065e42f088
[3201844.631292]  ffffffffa0427746 ffff882c88b24008 ffff8806845b8e10 ffffffffafb842c0
[3201844.631952] Call Trace:
[3201844.632204] dump_stack (lib/dump_stack.c:52)
[3201844.636449] __lock_acquire (kernel/locking/lockdep.c:1776 kernel/locking/lockdep.c:1820 kernel/locking/lockdep.c:2152 kernel/locking/lockdep.c:3239)
[3201844.639909] lock_acquire (kernel/locking/lockdep.c:3620)
[3201844.640997] down_write (./arch/x86/include/asm/rwsem.h:130 kernel/locking/rwsem.c:51)
[3201844.642011] __ksm_exit (mm/ksm.c:1821)
[3201844.642501] mmput (./arch/x86/include/asm/bitops.h:311 include/linux/khugepaged.h:35 kernel/fork.c:701)
[3201844.642920] oom_kill_process (mm/oom_kill.c:604)
[3201844.644528] out_of_memory (mm/oom_kill.c:700)
[3201844.646626] __alloc_pages_nodemask (mm/page_alloc.c:2822 mm/page_alloc.c:3213 mm/page_alloc.c:3300)
[3201844.649972] alloc_pages_vma (mm/mempolicy.c:2044)
[3201844.650462] ? wp_page_copy.isra.36 (mm/memory.c:2074)
[3201844.651000] wp_page_copy.isra.36 (mm/memory.c:2074)
[3201844.652544] do_wp_page (mm/memory.c:2349)
[3201844.654048] handle_mm_fault (mm/memory.c:3310 mm/memory.c:3404 mm/memory.c:3433)
[3201844.657519] break_ksm (mm/ksm.c:374)
[3201844.659348] unmerge_ksm_pages (mm/ksm.c:673)
[3201844.659831] run_store (mm/ksm.c:776 mm/ksm.c:2124)
[3201844.661837] kobj_attr_store (lib/kobject.c:792)
[3201844.662743] sysfs_kf_write (fs/sysfs/file.c:131)
[3201844.663656] kernfs_fop_write (fs/kernfs/file.c:312)
[3201844.664154] __vfs_write (fs/read_write.c:489)
[3201844.666502] vfs_write (fs/read_write.c:539)
[3201844.666935] SyS_write (fs/read_write.c:586 fs/read_write.c:577)
[3201844.668965] tracesys_phase2 (arch/x86/entry/entry_64.S:270)


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: mm: ksm: deadlock in oom killing process while breaking ksm pages
  2015-10-01 14:34 mm: ksm: deadlock in oom killing process while breaking ksm pages Sasha Levin
@ 2015-10-05  7:31 ` Hugh Dickins
  2015-10-06 22:43   ` Hugh Dickins
  0 siblings, 1 reply; 3+ messages in thread
From: Hugh Dickins @ 2015-10-05  7:31 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Hugh Dickins, LKML, linux-mm@kvack.org, Andrew Morton

On Thu, 1 Oct 2015, Sasha Levin wrote:

> Hi Hugh,
> 
> I've hit this (actual) lockup during testing. It seems that we were trying to allocate
> a new page to break KSM on an existing page, ended up in the oom killer who killed our
> process, and locked up in __ksm_exit() trying to get a write lock while already holding
> a read lock.
> 
> A very similar scenario is presented in the patch that introduced this behaviour
> (9ba6929480 ("ksm: fix oom deadlock")):
> 
>     There's a now-obvious deadlock in KSM's out-of-memory handling:
>     imagine ksmd or KSM_RUN_UNMERGE handling, holding ksm_thread_mutex,
>     trying to allocate a page to break KSM in an mm which becomes the
>     OOM victim (quite likely in the unmerge case): it's killed and goes
>     to exit, and hangs there waiting to acquire ksm_thread_mutex.
> 
> So I'm guessing that the solution is incomplete for the slow path.

Thank you, Sasha, this is a nice one.  I've only just started ruminating
on it, will do so (intermittently!) for a few days.  Maybe the answer
will be to take an additional reference to the mm when unmerging; but
done wrong that can frustrate OOM freeing memory altogether, so it's
not a solution I'll rush into without consideration.

Plus it's not clear to me yet whether it can only be a problem when
unmerging, or could hit other calls to break_ksm().  I do have a
v3.9-era patch to remove all the calls to break_cow(), but IIRC
it's a patch I didn't quite get working reliably at the time.

This does reinforce my suspicion that, one way or another, you
happen to be targetting trinity at ksm more effectively these days:
I don't see any cause for alarm over recent kernel changes yet.

> 
> [3201844.610523] =============================================
> [3201844.610988] [ INFO: possible recursive locking detected ]
> [3201844.611405] 4.3.0-rc3-next-20150930-sasha-00077-g3434920 #4 Not tainted
> [3201844.611907] ---------------------------------------------
> [3201844.612373] ksm02/28830 is trying to acquire lock:
> [3201844.612749] (&mm->mmap_sem){++++++}, at: __ksm_exit (mm/ksm.c:1821)
> [3201844.613472] RWsem: count: 1 owner: None
> [3201844.613782]
> [3201844.613782] but task is already holding lock:
> [3201844.614248] (&mm->mmap_sem){++++++}, at: run_store (mm/ksm.c:769 mm/ksm.c:2124)
> [3201844.614904] RWsem: count: 1 owner: None
> [3201844.615212]
> [3201844.615212] other info that might help us debug this:
> [3201844.615727]  Possible unsafe locking scenario:
> [3201844.615727]
> [3201844.616240]        CPU0
> [3201844.616446]        ----
> [3201844.616650]   lock(&mm->mmap_sem);
> [3201844.616952]   lock(&mm->mmap_sem);
> [3201844.617252]
> [3201844.617252]  *** DEADLOCK ***
> [3201844.617252]
> [3201844.617733]  May be due to missing lock nesting notation
> [3201844.617733]
> [3201844.618265] 6 locks held by ksm02/28830:
> [3201844.618576] #0: (sb_writers#5){.+.+.+}, at: __sb_start_write (fs/super.c:1176)
> [3201844.619327] RWsem: count: 0 owner: None
> [3201844.619633] #1: (&of->mutex){+.+.+.}, at: kernfs_fop_write (fs/kernfs/file.c:298)
> [3201844.624648] Mutex: counter: 0 owner: ksm02
> [3201844.624978] #2: (s_active#448){.+.+.+}, at: kernfs_fop_write (fs/kernfs/file.c:298)
> [3201844.625733] #3: (ksm_thread_mutex){+.+.+.}, at: run_store (mm/ksm.c:2120)
> [3201844.626448] Mutex: counter: -1 owner: ksm02
> [3201844.626786] #4: (&mm->mmap_sem){++++++}, at: run_store (mm/ksm.c:769 mm/ksm.c:2124)
> [3201844.627486] RWsem: count: 1 owner: None
> [3201844.627792] #5: (oom_lock){+.+...}, at: __alloc_pages_nodemask (mm/page_alloc.c:2779 mm/page_alloc.c:3213 mm/page_alloc.c:3300)
> [3201844.628594] Mutex: counter: 0 owner: ksm02
> [3201844.628919]
> [3201844.628919] stack backtrace:
> [3201844.629276] CPU: 0 PID: 28830 Comm: ksm02 Not tainted 4.3.0-rc3-next-20150930-sasha-00077-g3434920 #4
> [3201844.629970]  ffffffffaf41d680 00000000b8d5e1f1 ffff88065e42eec0 ffffffffa1d454c8
> [3201844.630663]  ffffffffaf41d680 ffff88065e42f080 ffffffffa04269ee ffff88065e42f088
> [3201844.631292]  ffffffffa0427746 ffff882c88b24008 ffff8806845b8e10 ffffffffafb842c0
> [3201844.631952] Call Trace:
> [3201844.632204] dump_stack (lib/dump_stack.c:52)
> [3201844.636449] __lock_acquire (kernel/locking/lockdep.c:1776 kernel/locking/lockdep.c:1820 kernel/locking/lockdep.c:2152 kernel/locking/lockdep.c:3239)
> [3201844.639909] lock_acquire (kernel/locking/lockdep.c:3620)
> [3201844.640997] down_write (./arch/x86/include/asm/rwsem.h:130 kernel/locking/rwsem.c:51)
> [3201844.642011] __ksm_exit (mm/ksm.c:1821)
> [3201844.642501] mmput (./arch/x86/include/asm/bitops.h:311 include/linux/khugepaged.h:35 kernel/fork.c:701)

I assume this interesting reference to khugepaged_exit()
is just one of those off-by-one-line things?

> [3201844.642920] oom_kill_process (mm/oom_kill.c:604)
> [3201844.644528] out_of_memory (mm/oom_kill.c:700)
> [3201844.646626] __alloc_pages_nodemask (mm/page_alloc.c:2822 mm/page_alloc.c:3213 mm/page_alloc.c:3300)
> [3201844.649972] alloc_pages_vma (mm/mempolicy.c:2044)
> [3201844.650462] ? wp_page_copy.isra.36 (mm/memory.c:2074)
> [3201844.651000] wp_page_copy.isra.36 (mm/memory.c:2074)
> [3201844.652544] do_wp_page (mm/memory.c:2349)
> [3201844.654048] handle_mm_fault (mm/memory.c:3310 mm/memory.c:3404 mm/memory.c:3433)
> [3201844.657519] break_ksm (mm/ksm.c:374)
> [3201844.659348] unmerge_ksm_pages (mm/ksm.c:673)
> [3201844.659831] run_store (mm/ksm.c:776 mm/ksm.c:2124)
> [3201844.661837] kobj_attr_store (lib/kobject.c:792)
> [3201844.662743] sysfs_kf_write (fs/sysfs/file.c:131)
> [3201844.663656] kernfs_fop_write (fs/kernfs/file.c:312)
> [3201844.664154] __vfs_write (fs/read_write.c:489)
> [3201844.666502] vfs_write (fs/read_write.c:539)
> [3201844.666935] SyS_write (fs/read_write.c:586 fs/read_write.c:577)
> [3201844.668965] tracesys_phase2 (arch/x86/entry/entry_64.S:270)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: mm: ksm: deadlock in oom killing process while breaking ksm pages
  2015-10-05  7:31 ` Hugh Dickins
@ 2015-10-06 22:43   ` Hugh Dickins
  0 siblings, 0 replies; 3+ messages in thread
From: Hugh Dickins @ 2015-10-06 22:43 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Oleg Nesterov, Michal Hocko, Hugh Dickins, LKML,
	linux-mm@kvack.org, Andrew Morton

On Mon, 5 Oct 2015, Hugh Dickins wrote:
> On Thu, 1 Oct 2015, Sasha Levin wrote:
> 
> > Hi Hugh,
> > 
> > I've hit this (actual) lockup during testing. It seems that we were trying to allocate
> > a new page to break KSM on an existing page, ended up in the oom killer who killed our
> > process, and locked up in __ksm_exit() trying to get a write lock while already holding
> > a read lock.
> > 
> > A very similar scenario is presented in the patch that introduced this behaviour
> > (9ba6929480 ("ksm: fix oom deadlock")):
> > 
> >     There's a now-obvious deadlock in KSM's out-of-memory handling:
> >     imagine ksmd or KSM_RUN_UNMERGE handling, holding ksm_thread_mutex,
> >     trying to allocate a page to break KSM in an mm which becomes the
> >     OOM victim (quite likely in the unmerge case): it's killed and goes
> >     to exit, and hangs there waiting to acquire ksm_thread_mutex.
> > 
> > So I'm guessing that the solution is incomplete for the slow path.
> 
> Thank you, Sasha, this is a nice one.  I've only just started ruminating
> on it, will do so (intermittently!) for a few days.  Maybe the answer
> will be to take an additional reference to the mm when unmerging; but
> done wrong that can frustrate OOM freeing memory altogether, so it's
> not a solution I'll rush into without consideration.

I do believe that nice Mr Oleg Nesterov is getting me off the hook
for this one.  He even mentioned __ksm_exit() in an earlier version
of his patch.  Just a temporary blip in the next tree, which should
soon be fixed by https://lkml.org/lkml/2015/10/6/548

Thank you, Oleg!
Hugh

> 
> Plus it's not clear to me yet whether it can only be a problem when
> unmerging, or could hit other calls to break_ksm().  I do have a
> v3.9-era patch to remove all the calls to break_cow(), but IIRC
> it's a patch I didn't quite get working reliably at the time.
> 
> This does reinforce my suspicion that, one way or another, you
> happen to be targetting trinity at ksm more effectively these days:
> I don't see any cause for alarm over recent kernel changes yet.
> 
> > 
> > [3201844.610523] =============================================
> > [3201844.610988] [ INFO: possible recursive locking detected ]
> > [3201844.611405] 4.3.0-rc3-next-20150930-sasha-00077-g3434920 #4 Not tainted
> > [3201844.611907] ---------------------------------------------
> > [3201844.612373] ksm02/28830 is trying to acquire lock:
> > [3201844.612749] (&mm->mmap_sem){++++++}, at: __ksm_exit (mm/ksm.c:1821)
> > [3201844.613472] RWsem: count: 1 owner: None
> > [3201844.613782]
> > [3201844.613782] but task is already holding lock:
> > [3201844.614248] (&mm->mmap_sem){++++++}, at: run_store (mm/ksm.c:769 mm/ksm.c:2124)
> > [3201844.614904] RWsem: count: 1 owner: None
> > [3201844.615212]
> > [3201844.615212] other info that might help us debug this:
> > [3201844.615727]  Possible unsafe locking scenario:
> > [3201844.615727]
> > [3201844.616240]        CPU0
> > [3201844.616446]        ----
> > [3201844.616650]   lock(&mm->mmap_sem);
> > [3201844.616952]   lock(&mm->mmap_sem);
> > [3201844.617252]
> > [3201844.617252]  *** DEADLOCK ***
> > [3201844.617252]
> > [3201844.617733]  May be due to missing lock nesting notation
> > [3201844.617733]
> > [3201844.618265] 6 locks held by ksm02/28830:
> > [3201844.618576] #0: (sb_writers#5){.+.+.+}, at: __sb_start_write (fs/super.c:1176)
> > [3201844.619327] RWsem: count: 0 owner: None
> > [3201844.619633] #1: (&of->mutex){+.+.+.}, at: kernfs_fop_write (fs/kernfs/file.c:298)
> > [3201844.624648] Mutex: counter: 0 owner: ksm02
> > [3201844.624978] #2: (s_active#448){.+.+.+}, at: kernfs_fop_write (fs/kernfs/file.c:298)
> > [3201844.625733] #3: (ksm_thread_mutex){+.+.+.}, at: run_store (mm/ksm.c:2120)
> > [3201844.626448] Mutex: counter: -1 owner: ksm02
> > [3201844.626786] #4: (&mm->mmap_sem){++++++}, at: run_store (mm/ksm.c:769 mm/ksm.c:2124)
> > [3201844.627486] RWsem: count: 1 owner: None
> > [3201844.627792] #5: (oom_lock){+.+...}, at: __alloc_pages_nodemask (mm/page_alloc.c:2779 mm/page_alloc.c:3213 mm/page_alloc.c:3300)
> > [3201844.628594] Mutex: counter: 0 owner: ksm02
> > [3201844.628919]
> > [3201844.628919] stack backtrace:
> > [3201844.629276] CPU: 0 PID: 28830 Comm: ksm02 Not tainted 4.3.0-rc3-next-20150930-sasha-00077-g3434920 #4
> > [3201844.629970]  ffffffffaf41d680 00000000b8d5e1f1 ffff88065e42eec0 ffffffffa1d454c8
> > [3201844.630663]  ffffffffaf41d680 ffff88065e42f080 ffffffffa04269ee ffff88065e42f088
> > [3201844.631292]  ffffffffa0427746 ffff882c88b24008 ffff8806845b8e10 ffffffffafb842c0
> > [3201844.631952] Call Trace:
> > [3201844.632204] dump_stack (lib/dump_stack.c:52)
> > [3201844.636449] __lock_acquire (kernel/locking/lockdep.c:1776 kernel/locking/lockdep.c:1820 kernel/locking/lockdep.c:2152 kernel/locking/lockdep.c:3239)
> > [3201844.639909] lock_acquire (kernel/locking/lockdep.c:3620)
> > [3201844.640997] down_write (./arch/x86/include/asm/rwsem.h:130 kernel/locking/rwsem.c:51)
> > [3201844.642011] __ksm_exit (mm/ksm.c:1821)
> > [3201844.642501] mmput (./arch/x86/include/asm/bitops.h:311 include/linux/khugepaged.h:35 kernel/fork.c:701)
> 
> I assume this interesting reference to khugepaged_exit()
> is just one of those off-by-one-line things?
> 
> > [3201844.642920] oom_kill_process (mm/oom_kill.c:604)
> > [3201844.644528] out_of_memory (mm/oom_kill.c:700)
> > [3201844.646626] __alloc_pages_nodemask (mm/page_alloc.c:2822 mm/page_alloc.c:3213 mm/page_alloc.c:3300)
> > [3201844.649972] alloc_pages_vma (mm/mempolicy.c:2044)
> > [3201844.650462] ? wp_page_copy.isra.36 (mm/memory.c:2074)
> > [3201844.651000] wp_page_copy.isra.36 (mm/memory.c:2074)
> > [3201844.652544] do_wp_page (mm/memory.c:2349)
> > [3201844.654048] handle_mm_fault (mm/memory.c:3310 mm/memory.c:3404 mm/memory.c:3433)
> > [3201844.657519] break_ksm (mm/ksm.c:374)
> > [3201844.659348] unmerge_ksm_pages (mm/ksm.c:673)
> > [3201844.659831] run_store (mm/ksm.c:776 mm/ksm.c:2124)
> > [3201844.661837] kobj_attr_store (lib/kobject.c:792)
> > [3201844.662743] sysfs_kf_write (fs/sysfs/file.c:131)
> > [3201844.663656] kernfs_fop_write (fs/kernfs/file.c:312)
> > [3201844.664154] __vfs_write (fs/read_write.c:489)
> > [3201844.666502] vfs_write (fs/read_write.c:539)
> > [3201844.666935] SyS_write (fs/read_write.c:586 fs/read_write.c:577)
> > [3201844.668965] tracesys_phase2 (arch/x86/entry/entry_64.S:270)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-10-06 22:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-01 14:34 mm: ksm: deadlock in oom killing process while breaking ksm pages Sasha Levin
2015-10-05  7:31 ` Hugh Dickins
2015-10-06 22:43   ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).