public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 3.18 0/1] dm: remove duplicate dm_get_live_table() in __dm_destroy()
@ 2018-11-11  8:06 Corey Wright
  2018-11-11  8:22 ` [PATCH 3.18 1/1] " Corey Wright
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Corey Wright @ 2018-11-11  8:06 UTC (permalink / raw)
  To: stable; +Cc: Sasha Levin

The recently released stable version 3.18.125 introduced a deadlock
because dm_get_live_table() is called twice within __dm_destroy().

The backported commit e1db66a5 "dm: fix AB-BA deadlock in
__dm_destroy()" doesn't *move* the dm_get_live_table() call from
before the mutex_lock(), as the original commit 2a708cff does, but
instead *adds* a new dm_get_live_table() call after the mutex_lock().
The two dm_get_live_table() calls result in a deadlock:

[  311.291323] INFO: task cryptsetup:209 blocked for more than 120 seconds.
[  311.420925]       Not tainted 3.18.125+1-amd64 #1
[  311.559858] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  311.651116] cryptsetup      D 0000000000000000     0   209    203 0x00000000
[  311.732304]  ffff88007abfd5f0 0000000000000082 0000000000000001 ffff88007a470390
[  311.873420]  00000000000136c0 ffff88007a78bfd8 00000000000136c0 ffff88007abfd5f0
[  311.934275]  0000000000000001 ffff88007a78bc70 7fffffffffffffff ffff88007a78bc68
[  311.940115] Call Trace:
[  311.949956]  [<ffffffffa01d0b70>] ? dev_suspend+0x260/0x260 [dm_mod]
[  312.179891]  [<ffffffff81553d8a>] ? schedule_timeout+0x24a/0x2d0
[  312.375447]  [<ffffffff810a7ba4>] ? __wake_up+0x34/0x50
[  312.377825]  [<ffffffff810c3374>] ? srcu_readers_seq_idx.isra.8+0x54/0x70
[  312.557921]  [<ffffffff815519b0>] ? wait_for_completion+0xb0/0x120
[  312.561314]  [<ffffffff81096340>] ? wake_up_state+0x20/0x20
[  312.664457]  [<ffffffff810c36e8>] ? __synchronize_srcu+0xd8/0x120
[  312.768794]  [<ffffffff810c3260>] ? call_srcu+0x70/0x70
[  312.790337]  [<ffffffffa01ca757>] ? __dm_destroy+0x107/0x2e0 [dm_mod]
[  312.909878]  [<ffffffffa01d0b70>] ? dev_suspend+0x260/0x260 [dm_mod]
[  312.978804]  [<ffffffffa01d0c4e>] ? dev_remove+0xde/0x120 [dm_mod]
[  313.082322]  [<ffffffffa01d12e3>] ? ctl_ioctl+0x203/0x4c0 [dm_mod]
[  313.175957]  [<ffffffffa01d15b3>] ? dm_ctl_ioctl+0x13/0x20 [dm_mod]
[  313.301981]  [<ffffffff811d40c0>] ? do_vfs_ioctl+0x2d0/0x4a0
[  313.384648]  [<ffffffff8108848c>] ? task_work_run+0xbc/0xf0
[  313.489669]  [<ffffffff811d4311>] ? SyS_ioctl+0x81/0xa0
[  313.510846]  [<ffffffff815551cd>] ? system_call_fastpath+0x16/0x1b

Removing the original dm_get_live_table() call from before the
mutex_lock() prevents the deadlock.

Thanks for maintaining 3.18!

PS Greg, Was this a subtle attempt to get someone to speak up and say
   "I am using this!" as you requested in the 3.18.125 release
   announcement? ;)

Corey
--
undefined@pobox.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 3.18 1/1] dm: remove duplicate dm_get_live_table() in __dm_destroy()
  2018-11-11  8:06 [PATCH 3.18 0/1] dm: remove duplicate dm_get_live_table() in __dm_destroy() Corey Wright
@ 2018-11-11  8:22 ` Corey Wright
  2018-11-11 20:53 ` [PATCH 3.18 0/1] " Greg KH
  2018-11-11 22:20 ` Sasha Levin
  2 siblings, 0 replies; 5+ messages in thread
From: Corey Wright @ 2018-11-11  8:22 UTC (permalink / raw)
  To: stable; +Cc: Sasha Levin

__dm_destroy() takes io_barrier SRCU lock (dm_get_live_table) twice
which leads to a deadlock.  Remove taking lock before holding
suspend_lock to prevent a different potential deadlock.

Signed-off-by: Corey Wright <undefined@pobox.com>
Fixes: e1db66a5fdcc ("dm: fix AB-BA deadlock in __dm_destroy()")
Cc: Sasha Levin <sashal@kernel.org>
---
 drivers/md/dm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 00c86ff..3955df0 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2589,7 +2589,6 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
 	might_sleep();
 
 	spin_lock(&_minor_lock);
-	map = dm_get_live_table(md, &srcu_idx);
 	idr_replace(&_minor_idr, MINOR_ALLOCED, MINOR(disk_devt(dm_disk(md))));
 	set_bit(DMF_FREEING, &md->flags);
 	spin_unlock(&_minor_lock);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 3.18 0/1] dm: remove duplicate dm_get_live_table() in __dm_destroy()
  2018-11-11  8:06 [PATCH 3.18 0/1] dm: remove duplicate dm_get_live_table() in __dm_destroy() Corey Wright
  2018-11-11  8:22 ` [PATCH 3.18 1/1] " Corey Wright
@ 2018-11-11 20:53 ` Greg KH
  2018-11-12  2:03   ` Corey Wright
  2018-11-11 22:20 ` Sasha Levin
  2 siblings, 1 reply; 5+ messages in thread
From: Greg KH @ 2018-11-11 20:53 UTC (permalink / raw)
  To: Corey Wright; +Cc: stable, Sasha Levin

On Sun, Nov 11, 2018 at 02:06:54AM -0600, Corey Wright wrote:
> The recently released stable version 3.18.125 introduced a deadlock
> because dm_get_live_table() is called twice within __dm_destroy().
> 
> The backported commit e1db66a5 "dm: fix AB-BA deadlock in
> __dm_destroy()" doesn't *move* the dm_get_live_table() call from
> before the mutex_lock(), as the original commit 2a708cff does, but
> instead *adds* a new dm_get_live_table() call after the mutex_lock().
> The two dm_get_live_table() calls result in a deadlock:
> 
> [  311.291323] INFO: task cryptsetup:209 blocked for more than 120 seconds.
> [  311.420925]       Not tainted 3.18.125+1-amd64 #1
> [  311.559858] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  311.651116] cryptsetup      D 0000000000000000     0   209    203 0x00000000
> [  311.732304]  ffff88007abfd5f0 0000000000000082 0000000000000001 ffff88007a470390
> [  311.873420]  00000000000136c0 ffff88007a78bfd8 00000000000136c0 ffff88007abfd5f0
> [  311.934275]  0000000000000001 ffff88007a78bc70 7fffffffffffffff ffff88007a78bc68
> [  311.940115] Call Trace:
> [  311.949956]  [<ffffffffa01d0b70>] ? dev_suspend+0x260/0x260 [dm_mod]
> [  312.179891]  [<ffffffff81553d8a>] ? schedule_timeout+0x24a/0x2d0
> [  312.375447]  [<ffffffff810a7ba4>] ? __wake_up+0x34/0x50
> [  312.377825]  [<ffffffff810c3374>] ? srcu_readers_seq_idx.isra.8+0x54/0x70
> [  312.557921]  [<ffffffff815519b0>] ? wait_for_completion+0xb0/0x120
> [  312.561314]  [<ffffffff81096340>] ? wake_up_state+0x20/0x20
> [  312.664457]  [<ffffffff810c36e8>] ? __synchronize_srcu+0xd8/0x120
> [  312.768794]  [<ffffffff810c3260>] ? call_srcu+0x70/0x70
> [  312.790337]  [<ffffffffa01ca757>] ? __dm_destroy+0x107/0x2e0 [dm_mod]
> [  312.909878]  [<ffffffffa01d0b70>] ? dev_suspend+0x260/0x260 [dm_mod]
> [  312.978804]  [<ffffffffa01d0c4e>] ? dev_remove+0xde/0x120 [dm_mod]
> [  313.082322]  [<ffffffffa01d12e3>] ? ctl_ioctl+0x203/0x4c0 [dm_mod]
> [  313.175957]  [<ffffffffa01d15b3>] ? dm_ctl_ioctl+0x13/0x20 [dm_mod]
> [  313.301981]  [<ffffffff811d40c0>] ? do_vfs_ioctl+0x2d0/0x4a0
> [  313.384648]  [<ffffffff8108848c>] ? task_work_run+0xbc/0xf0
> [  313.489669]  [<ffffffff811d4311>] ? SyS_ioctl+0x81/0xa0
> [  313.510846]  [<ffffffff815551cd>] ? system_call_fastpath+0x16/0x1b
> 
> Removing the original dm_get_live_table() call from before the
> mutex_lock() prevents the deadlock.
> 
> Thanks for maintaining 3.18!
> 
> PS Greg, Was this a subtle attempt to get someone to speak up and say
>    "I am using this!" as you requested in the 3.18.125 release
>    announcement? ;)

Heh, no it was not, but it's nice to see someone actually read that :)

Why are you using 3.18 and why can you not just use a newer kernel
version?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 3.18 0/1] dm: remove duplicate dm_get_live_table() in __dm_destroy()
  2018-11-11  8:06 [PATCH 3.18 0/1] dm: remove duplicate dm_get_live_table() in __dm_destroy() Corey Wright
  2018-11-11  8:22 ` [PATCH 3.18 1/1] " Corey Wright
  2018-11-11 20:53 ` [PATCH 3.18 0/1] " Greg KH
@ 2018-11-11 22:20 ` Sasha Levin
  2 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2018-11-11 22:20 UTC (permalink / raw)
  To: Corey Wright; +Cc: stable

On Sun, Nov 11, 2018 at 02:06:54AM -0600, Corey Wright wrote:
>The recently released stable version 3.18.125 introduced a deadlock
>because dm_get_live_table() is called twice within __dm_destroy().
>
>The backported commit e1db66a5 "dm: fix AB-BA deadlock in
>__dm_destroy()" doesn't *move* the dm_get_live_table() call from
>before the mutex_lock(), as the original commit 2a708cff does, but
>instead *adds* a new dm_get_live_table() call after the mutex_lock().
>The two dm_get_live_table() calls result in a deadlock:
>
>[  311.291323] INFO: task cryptsetup:209 blocked for more than 120 seconds.
>[  311.420925]       Not tainted 3.18.125+1-amd64 #1
>[  311.559858] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>[  311.651116] cryptsetup      D 0000000000000000     0   209    203 0x00000000
>[  311.732304]  ffff88007abfd5f0 0000000000000082 0000000000000001 ffff88007a470390
>[  311.873420]  00000000000136c0 ffff88007a78bfd8 00000000000136c0 ffff88007abfd5f0
>[  311.934275]  0000000000000001 ffff88007a78bc70 7fffffffffffffff ffff88007a78bc68
>[  311.940115] Call Trace:
>[  311.949956]  [<ffffffffa01d0b70>] ? dev_suspend+0x260/0x260 [dm_mod]
>[  312.179891]  [<ffffffff81553d8a>] ? schedule_timeout+0x24a/0x2d0
>[  312.375447]  [<ffffffff810a7ba4>] ? __wake_up+0x34/0x50
>[  312.377825]  [<ffffffff810c3374>] ? srcu_readers_seq_idx.isra.8+0x54/0x70
>[  312.557921]  [<ffffffff815519b0>] ? wait_for_completion+0xb0/0x120
>[  312.561314]  [<ffffffff81096340>] ? wake_up_state+0x20/0x20
>[  312.664457]  [<ffffffff810c36e8>] ? __synchronize_srcu+0xd8/0x120
>[  312.768794]  [<ffffffff810c3260>] ? call_srcu+0x70/0x70
>[  312.790337]  [<ffffffffa01ca757>] ? __dm_destroy+0x107/0x2e0 [dm_mod]
>[  312.909878]  [<ffffffffa01d0b70>] ? dev_suspend+0x260/0x260 [dm_mod]
>[  312.978804]  [<ffffffffa01d0c4e>] ? dev_remove+0xde/0x120 [dm_mod]
>[  313.082322]  [<ffffffffa01d12e3>] ? ctl_ioctl+0x203/0x4c0 [dm_mod]
>[  313.175957]  [<ffffffffa01d15b3>] ? dm_ctl_ioctl+0x13/0x20 [dm_mod]
>[  313.301981]  [<ffffffff811d40c0>] ? do_vfs_ioctl+0x2d0/0x4a0
>[  313.384648]  [<ffffffff8108848c>] ? task_work_run+0xbc/0xf0
>[  313.489669]  [<ffffffff811d4311>] ? SyS_ioctl+0x81/0xa0
>[  313.510846]  [<ffffffff815551cd>] ? system_call_fastpath+0x16/0x1b
>
>Removing the original dm_get_live_table() call from before the
>mutex_lock() prevents the deadlock.
>
>Thanks for maintaining 3.18!
>
>PS Greg, Was this a subtle attempt to get someone to speak up and say
>   "I am using this!" as you requested in the 3.18.125 release
>   announcement? ;)

Hm, interesting. It looks like git did the wrong thing here, sorry for
that :(

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 3.18 0/1] dm: remove duplicate dm_get_live_table() in __dm_destroy()
  2018-11-11 20:53 ` [PATCH 3.18 0/1] " Greg KH
@ 2018-11-12  2:03   ` Corey Wright
  0 siblings, 0 replies; 5+ messages in thread
From: Corey Wright @ 2018-11-12  2:03 UTC (permalink / raw)
  To: Greg KH; +Cc: stable, Sasha Levin

On Sun, 11 Nov 2018 12:53:28 -0800
Greg KH <greg@kroah.com> wrote:

> On Sun, Nov 11, 2018 at 02:06:54AM -0600, Corey Wright wrote:
> > The recently released stable version 3.18.125 introduced a deadlock
> > because dm_get_live_table() is called twice within __dm_destroy().
> > 
> > The backported commit e1db66a5 "dm: fix AB-BA deadlock in
> > __dm_destroy()" doesn't *move* the dm_get_live_table() call from
> > before the mutex_lock(), as the original commit 2a708cff does, but
> > instead *adds* a new dm_get_live_table() call after the mutex_lock().
> > The two dm_get_live_table() calls result in a deadlock:
> > 
> > [  311.291323] INFO: task cryptsetup:209 blocked for more than 120 seconds.
> > [  311.420925]       Not tainted 3.18.125+1-amd64 #1
> > [  311.559858] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  311.651116] cryptsetup      D 0000000000000000     0   209    203 0x00000000
> > [  311.732304]  ffff88007abfd5f0 0000000000000082 0000000000000001 ffff88007a470390
> > [  311.873420]  00000000000136c0 ffff88007a78bfd8 00000000000136c0 ffff88007abfd5f0
> > [  311.934275]  0000000000000001 ffff88007a78bc70 7fffffffffffffff ffff88007a78bc68
> > [  311.940115] Call Trace:
> > [  311.949956]  [<ffffffffa01d0b70>] ? dev_suspend+0x260/0x260 [dm_mod]
> > [  312.179891]  [<ffffffff81553d8a>] ? schedule_timeout+0x24a/0x2d0
> > [  312.375447]  [<ffffffff810a7ba4>] ? __wake_up+0x34/0x50
> > [  312.377825]  [<ffffffff810c3374>] ? srcu_readers_seq_idx.isra.8+0x54/0x70
> > [  312.557921]  [<ffffffff815519b0>] ? wait_for_completion+0xb0/0x120
> > [  312.561314]  [<ffffffff81096340>] ? wake_up_state+0x20/0x20
> > [  312.664457]  [<ffffffff810c36e8>] ? __synchronize_srcu+0xd8/0x120
> > [  312.768794]  [<ffffffff810c3260>] ? call_srcu+0x70/0x70
> > [  312.790337]  [<ffffffffa01ca757>] ? __dm_destroy+0x107/0x2e0 [dm_mod]
> > [  312.909878]  [<ffffffffa01d0b70>] ? dev_suspend+0x260/0x260 [dm_mod]
> > [  312.978804]  [<ffffffffa01d0c4e>] ? dev_remove+0xde/0x120 [dm_mod]
> > [  313.082322]  [<ffffffffa01d12e3>] ? ctl_ioctl+0x203/0x4c0 [dm_mod]
> > [  313.175957]  [<ffffffffa01d15b3>] ? dm_ctl_ioctl+0x13/0x20 [dm_mod]
> > [  313.301981]  [<ffffffff811d40c0>] ? do_vfs_ioctl+0x2d0/0x4a0
> > [  313.384648]  [<ffffffff8108848c>] ? task_work_run+0xbc/0xf0
> > [  313.489669]  [<ffffffff811d4311>] ? SyS_ioctl+0x81/0xa0
> > [  313.510846]  [<ffffffff815551cd>] ? system_call_fastpath+0x16/0x1b
> > 
> > Removing the original dm_get_live_table() call from before the
> > mutex_lock() prevents the deadlock.
> > 
> > Thanks for maintaining 3.18!
> > 
> > PS Greg, Was this a subtle attempt to get someone to speak up and say
> >    "I am using this!" as you requested in the 3.18.125 release
> >    announcement? ;)
> 
> Heh, no it was not, but it's nice to see someone actually read that :)
> 
> Why are you using 3.18 and why can you not just use a newer kernel
> version?

Full disclosure: I have no expectations that you maintain 3.18,
especially not for my personal use-case.  I am not a device
manufacturer and therefor realize I'm not the intended audience for
long-term stable kernels.

It's all your fault!  You make staying on 3.18 too easy. ;)  And even
when the process breaks, like the bad backport I encountered with
3.18.125, as a software developer who doesn't do C programming
professionally any more I enjoyed the exercise.

I run 3.18 patched with Linux-VServer separating several services on a
single personal server.  I was hoping to migrate to LXC so as to use
my distro's kernel, but between busyness of job and family and
continued stable updates I've stuck with 3.18.  Linux-VServer even
supports 4.4 and 4.9, but momentum has kept me on 3.18.

If you drop 3.18 stable updates then I'll make do.  (Maybe that'll be
the kick in the pants I need to finally migrate to LXC or maybe I'll
keep kicking the can down the road and migrate to 4.4 or 4.9 patched
with Linux-VServer.)

> thanks,
> 
> greg k-h

Thank you for your and Sasha's support of 3.18 specifically, but the
Linux kernel community in general!

Corey
--
undefined@pobox.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-11-12  8:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-11  8:06 [PATCH 3.18 0/1] dm: remove duplicate dm_get_live_table() in __dm_destroy() Corey Wright
2018-11-11  8:22 ` [PATCH 3.18 1/1] " Corey Wright
2018-11-11 20:53 ` [PATCH 3.18 0/1] " Greg KH
2018-11-12  2:03   ` Corey Wright
2018-11-11 22:20 ` Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox