All of lore.kernel.org
 help / color / mirror / Atom feed
* staged dm_internal_{suspend, resume} related changes for wider review
       [not found]               ` <20141029002125.GC29288@redhat.com>
@ 2014-10-29  1:22                 ` Mike Snitzer
  2014-10-29 19:06                   ` Mike Snitzer
  2014-11-03 23:25                   ` Mikulas Patocka
  0 siblings, 2 replies; 23+ messages in thread
From: Mike Snitzer @ 2014-10-29  1:22 UTC (permalink / raw)
  To: Zdenek Kabelac
  Cc: ejt, Bryn Reeves, Mikulas Patocka, dm-devel, Alasdair G. Kergon

I pushed rebased versions of these patches to linux-dm.git's
'dm-for-3.19' branch (and pulled into 'for-next' purely to get the
kernel.org autobuilders testing the code).  See top 3 commits here:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19

I'm open to any and all changes or dropping code entirely; so me staging
like this is just to keep the review ball rolling as quickly as
possible.

But in particular we need to get Mikulas' and Bryn's feedback on how the
dm_internal_{suspend,resume} changes impact dm-stats.

Mike

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: staged dm_internal_{suspend, resume} related changes for wider review
  2014-10-29  1:22                 ` staged dm_internal_{suspend, resume} related changes for wider review Mike Snitzer
@ 2014-10-29 19:06                   ` Mike Snitzer
  2014-10-29 20:49                     ` Mike Snitzer
  2014-11-03 23:25                   ` Mikulas Patocka
  1 sibling, 1 reply; 23+ messages in thread
From: Mike Snitzer @ 2014-10-29 19:06 UTC (permalink / raw)
  To: Alasdair G. Kergon
  Cc: ejt, Bryn Reeves, Mikulas Patocka, dm-devel, Zdenek Kabelac

On Tue, Oct 28 2014 at  9:22pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> I pushed rebased versions of these patches to linux-dm.git's
> 'dm-for-3.19' branch (and pulled into 'for-next' purely to get the
> kernel.org autobuilders testing the code).  See top 3 commits here:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19
> 
> I'm open to any and all changes or dropping code entirely; so me staging
> like this is just to keep the review ball rolling as quickly as
> possible.
> 
> But in particular we need to get Mikulas' and Bryn's feedback on how the
> dm_internal_{suspend,resume} changes impact dm-stats.

Rebased, and added a few more changes, see top 5 commits here:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: staged dm_internal_{suspend, resume} related changes for wider review
  2014-10-29 19:06                   ` Mike Snitzer
@ 2014-10-29 20:49                     ` Mike Snitzer
  0 siblings, 0 replies; 23+ messages in thread
From: Mike Snitzer @ 2014-10-29 20:49 UTC (permalink / raw)
  To: Alasdair G. Kergon
  Cc: ejt, Bryn Reeves, Mikulas Patocka, dm-devel, Zdenek Kabelac

On Wed, Oct 29 2014 at  3:06pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Tue, Oct 28 2014 at  9:22pm -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
> 
> > I pushed rebased versions of these patches to linux-dm.git's
> > 'dm-for-3.19' branch (and pulled into 'for-next' purely to get the
> > kernel.org autobuilders testing the code).  See top 3 commits here:
> > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19
> > 
> > I'm open to any and all changes or dropping code entirely; so me staging
> > like this is just to keep the review ball rolling as quickly as
> > possible.
> > 
> > But in particular we need to get Mikulas' and Bryn's feedback on how the
> > dm_internal_{suspend,resume} changes impact dm-stats.
> 
> Rebased, and added a few more changes, see top 5 commits here:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19

FYI, Joe was kind enough to write a series of tests for these changes,
see:
https://github.com/jthornber/device-mapper-test-suite/commit/860555a3e838e995f8f2549deb9fd7ccebb0deb5
https://github.com/jthornber/device-mapper-test-suite/commit/24da8a852bce9ef94db98bdefdf5cd8cd0d5e2eb

# dmtest run --profile fusionio --suite thin-provisioning -t /SuspendTests/
Loaded suite thin-provisioning
Started
SuspendTests
  suspend_pool_active_thins_no_io...PASS
  suspend_pool_after_suspend_thin...PASS
  suspend_pool_concurrent_suspend...PASS
  suspend_pool_no_active_thins...PASS
  suspend_pool_no_thins...PASS
  suspend_pool_resume_thin...PASS
  suspend_pool_suspended_thin...PASS

Finished in 22.350814169 seconds.

7 tests, 0 assertions, 0 failures, 0 errors

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: staged dm_internal_{suspend, resume} related changes for wider review
  2014-10-29  1:22                 ` staged dm_internal_{suspend, resume} related changes for wider review Mike Snitzer
  2014-10-29 19:06                   ` Mike Snitzer
@ 2014-11-03 23:25                   ` Mikulas Patocka
  2014-11-04  0:17                     ` Mike Snitzer
  1 sibling, 1 reply; 23+ messages in thread
From: Mikulas Patocka @ 2014-11-03 23:25 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: ejt, Bryn Reeves, dm-devel, Alasdair G. Kergon, Zdenek Kabelac



On Tue, 28 Oct 2014, Mike Snitzer wrote:

> I pushed rebased versions of these patches to linux-dm.git's
> 'dm-for-3.19' branch (and pulled into 'for-next' purely to get the
> kernel.org autobuilders testing the code).  See top 3 commits here:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19
> 
> I'm open to any and all changes or dropping code entirely; so me staging
> like this is just to keep the review ball rolling as quickly as
> possible.
> 
> But in particular we need to get Mikulas' and Bryn's feedback on how the
> dm_internal_{suspend,resume} changes impact dm-stats.
> 
> Mike

Hi

As I said on irc - it is not correct to take a mutex from one syscall and 
drop the mutex from another syscall.

I hope Joe can use the bio prison to block bios while the pool is 
suspended.

Mikulas

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: staged dm_internal_{suspend, resume} related changes for wider review
  2014-11-03 23:25                   ` Mikulas Patocka
@ 2014-11-04  0:17                     ` Mike Snitzer
  2014-11-05  1:16                       ` Mike Snitzer
  0 siblings, 1 reply; 23+ messages in thread
From: Mike Snitzer @ 2014-11-04  0:17 UTC (permalink / raw)
  To: device-mapper development
  Cc: ejt, Bryn M. Reeves, Alasdair G Kergon, Zdenek Kabelac


[-- Attachment #1.1: Type: text/plain, Size: 1078 bytes --]

On Nov 3, 2014 6:27 PM, "Mikulas Patocka" <mpatocka@redhat.com> wrote:
>
>
>
> On Tue, 28 Oct 2014, Mike Snitzer wrote:
>
> > I pushed rebased versions of these patches to linux-dm.git's
> > 'dm-for-3.19' branch (and pulled into 'for-next' purely to get the
> > kernel.org autobuilders testing the code).  See top 3 commits here:
> >
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19
> >
> > I'm open to any and all changes or dropping code entirely; so me staging
> > like this is just to keep the review ball rolling as quickly as
> > possible.
> >
> > But in particular we need to get Mikulas' and Bryn's feedback on how the
> > dm_internal_{suspend,resume} changes impact dm-stats.
> >
> > Mike
>
> Hi
>
> As I said on irc - it is not correct to take a mutex from one syscall and
> drop the mutex from another syscall.

I didn't modify that aspect of dm_internal_suspend+resume.  I only extended
the interface to other targets via export.

> I hope Joe can use the bio prison to block bios while the pool is
> suspended.

We'll see.

[-- Attachment #1.2: Type: text/html, Size: 1629 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: staged dm_internal_{suspend, resume} related changes for wider review
  2014-11-04  0:17                     ` Mike Snitzer
@ 2014-11-05  1:16                       ` Mike Snitzer
  2014-11-05 12:43                         ` Mikulas Patocka
  0 siblings, 1 reply; 23+ messages in thread
From: Mike Snitzer @ 2014-11-05  1:16 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon,
	Zdenek Kabelac

On Mon, Nov 03 2014 at  7:17pm -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Nov 3, 2014 6:27 PM, "Mikulas Patocka" <mpatocka@redhat.com> wrote:
> >
> >
> >
> > On Tue, 28 Oct 2014, Mike Snitzer wrote:
> >
> > > I pushed rebased versions of these patches to linux-dm.git's
> > > 'dm-for-3.19' branch (and pulled into 'for-next' purely to get the
> > > kernel.org autobuilders testing the code).  See top 3 commits here:
> > >
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19
> > >
> > > I'm open to any and all changes or dropping code entirely; so me staging
> > > like this is just to keep the review ball rolling as quickly as
> > > possible.
> > >
> > > But in particular we need to get Mikulas' and Bryn's feedback on how the
> > > dm_internal_{suspend,resume} changes impact dm-stats.
> > >
> > > Mike
> >
> > Hi
> >
> > As I said on irc - it is not correct to take a mutex from one syscall and
> > drop the mutex from another syscall.
> 
> I didn't modify that aspect of dm_internal_suspend+resume.  I only extended
> the interface to other targets via export.
> 
> > I hope Joe can use the bio prison to block bios while the pool is
> > suspended.
> 
> We'll see.

Joe said today: "bio-prison thing isn't going to work, since we need the
worker thread to complete as part of the [thin-pool] suspend".

And the following patch fixes the locking issue you raised:

From: Mike Snitzer <snitzer@redhat.com>
Date: Tue, 4 Nov 2014 15:38:59 -0500
Subject: [PATCH] dm: leverage DM_INTERNAL_SUSPEND_FLAG instead of abusing suspend_lock

Update dm_internal_{suspend,resume} to always take and release the
mapped_device's suspend_lock.

Also update dm_{suspend,resume} to be aware of potential for
DM_INTERNAL_SUSPEND_FLAG to be set and respond accordingly by
interruptibly waiting for the DM_INTERNAL_SUSPEND_FLAG to be cleared.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm.c | 41 +++++++++++++++++++++++++++++++++--------
 1 file changed, 33 insertions(+), 8 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 38cea89..ac0dcad 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -19,6 +19,7 @@
 #include <linux/idr.h>
 #include <linux/hdreg.h>
 #include <linux/delay.h>
+#include <linux/wait.h>
 
 #include <trace/events/block.h>
 
@@ -2827,6 +2828,7 @@ int dm_suspend(struct mapped_device *md, unsigned suspend_flags)
 	bool do_lockfs = suspend_flags & DM_SUSPEND_LOCKFS_FLAG;
 	bool noflush = suspend_flags & DM_SUSPEND_NOFLUSH_FLAG;
 
+retry:
 	mutex_lock_nested(&md->suspend_lock, SINGLE_DEPTH_NESTING);
 
 	if (dm_suspended_md(md)) {
@@ -2834,6 +2836,15 @@ int dm_suspend(struct mapped_device *md, unsigned suspend_flags)
 		goto out_unlock;
 	}
 
+	if (dm_suspended_internally_md(md)) {
+		/* already internally suspended, wait for internal resume */
+		mutex_unlock(&md->suspend_lock);
+		r = wait_on_bit(&md->flags, DMF_SUSPENDED_INTERNALLY, TASK_INTERRUPTIBLE);
+		if (r)
+			return r;
+		goto retry;
+	}
+
 	map = rcu_dereference(md->map);
 
 	r = __dm_suspend(md, map, do_lockfs, noflush, true);
@@ -2876,10 +2887,20 @@ int dm_resume(struct mapped_device *md)
 	int r = -EINVAL;
 	struct dm_table *map = NULL;
 
+retry:
 	mutex_lock(&md->suspend_lock);
 	if (!dm_suspended_md(md))
 		goto out;
 
+	if (dm_suspended_internally_md(md)) {
+		/* already internally suspended, wait for internal resume */
+		mutex_unlock(&md->suspend_lock);
+		r = wait_on_bit(&md->flags, DMF_SUSPENDED_INTERNALLY, TASK_INTERRUPTIBLE);
+		if (r)
+			return r;
+		goto retry;
+	}
+
 	map = rcu_dereference(md->map);
 	if (!map || !dm_table_get_size(map))
 		goto out;
@@ -2911,13 +2932,13 @@ static void __dm_internal_suspend(struct mapped_device *md, unsigned suspend_fla
 	struct dm_table *map = NULL;
 	bool noflush = suspend_flags & DM_SUSPEND_NOFLUSH_FLAG;
 
+	mutex_lock(&md->suspend_lock);
 	if (WARN_ON(dm_suspended_internally_md(md)))
-		return; /* disallow nested internal suspends! */
+		goto out; /* disallow nested internal suspends! */
 
-	mutex_lock(&md->suspend_lock);
 	if (dm_suspended_md(md)) {
 		set_bit(DMF_SUSPENDED_INTERNALLY, &md->flags);
-		return; /* nest suspend */
+		goto out; /* nest suspend */
 	}
 
 	map = rcu_dereference(md->map);
@@ -2928,6 +2949,8 @@ static void __dm_internal_suspend(struct mapped_device *md, unsigned suspend_fla
 	set_bit(DMF_SUSPENDED_INTERNALLY, &md->flags);
 
 	dm_table_postsuspend_targets(map);
+out:
+	mutex_unlock(&md->suspend_lock);
 }
 
 void dm_internal_suspend(struct mapped_device *md)
@@ -2946,13 +2969,12 @@ void dm_internal_resume(struct mapped_device *md)
 {
 	struct dm_table *map = NULL;
 
+	mutex_lock(&md->suspend_lock);
 	if (WARN_ON(!dm_suspended_internally_md(md)))
-		return;
+		goto done;
 
-	if (dm_suspended_md(md)) {
-		clear_bit(DMF_SUSPENDED_INTERNALLY, &md->flags);
-		goto done; /* resume from nested suspend */
-	}
+	if (dm_suspended_md(md))
+		goto done_clear_bit; /* resume from nested suspend */
 
 	map = rcu_dereference(md->map);
 	if (WARN_ON(!map))
@@ -2964,7 +2986,10 @@ void dm_internal_resume(struct mapped_device *md)
 	 */
 	(void) __dm_resume(md, map, false);
 
+done_clear_bit:
 	clear_bit(DMF_SUSPENDED_INTERNALLY, &md->flags);
+	smp_mb__after_atomic();
+	wake_up_bit(&md->flags, DMF_SUSPENDED_INTERNALLY);
 
 done:
 	mutex_unlock(&md->suspend_lock);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: staged dm_internal_{suspend, resume} related changes for wider review
  2014-11-05  1:16                       ` Mike Snitzer
@ 2014-11-05 12:43                         ` Mikulas Patocka
  2014-11-05 13:05                           ` Mikulas Patocka
  0 siblings, 1 reply; 23+ messages in thread
From: Mikulas Patocka @ 2014-11-05 12:43 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon,
	Zdenek Kabelac



On Tue, 4 Nov 2014, Mike Snitzer wrote:

> > > Hi
> > >
> > > As I said on irc - it is not correct to take a mutex from one syscall and
> > > drop the mutex from another syscall.
> > 
> > I didn't modify that aspect of dm_internal_suspend+resume.  I only extended
> > the interface to other targets via export.
> > 
> > > I hope Joe can use the bio prison to block bios while the pool is
> > > suspended.
> > 
> > We'll see.
> 
> Joe said today: "bio-prison thing isn't going to work, since we need the
> worker thread to complete as part of the [thin-pool] suspend".

If you block bios coming into the prison in pool's presuspend method 
(similar to this patch 
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.19&id=2d616 
), the worker thread is still active, so it should process bios.

You can for example set the flag in the prison meaning that the prison is 
suspended and then call dm_internal_suspend immediatelly followed by 
dm_internal_resume - that will clear in-progress bios and prevent new bios 
from coming in (and we don't need to change dm_internal_suspend and 
dm_internal_resume to become so big).

> And the following patch fixes the locking issue you raised:

... it complicates the generic code even more than before.

Mikulas

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: staged dm_internal_{suspend, resume} related changes for wider review
  2014-11-05 12:43                         ` Mikulas Patocka
@ 2014-11-05 13:05                           ` Mikulas Patocka
  2014-11-05 14:11                             ` Mike Snitzer
  0 siblings, 1 reply; 23+ messages in thread
From: Mikulas Patocka @ 2014-11-05 13:05 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon,
	Zdenek Kabelac



On Wed, 5 Nov 2014, Mikulas Patocka wrote:

> You can for example set the flag in the prison meaning that the prison is 
> suspended and then call dm_internal_suspend immediatelly followed by 
> dm_internal_resume - that will clear in-progress bios and prevent new bios 
> from coming in (and we don't need to change dm_internal_suspend and 
> dm_internal_resume to become so big).

No, the correct sequence is this (do this in presuspend handler):

1. call dm_internal_suspend on all thin devices
2. set the flag in the prison meaning that the prison is blocked
3. call dm_internal_resume on all thin devices

Mikulas

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: staged dm_internal_{suspend, resume} related changes for wider review
  2014-11-05 13:05                           ` Mikulas Patocka
@ 2014-11-05 14:11                             ` Mike Snitzer
  2014-11-05 14:37                               ` Mikulas Patocka
  0 siblings, 1 reply; 23+ messages in thread
From: Mike Snitzer @ 2014-11-05 14:11 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon,
	Zdenek Kabelac

On Wed, Nov 05 2014 at  8:05am -0500,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Wed, 5 Nov 2014, Mikulas Patocka wrote:
> 
> > You can for example set the flag in the prison meaning that the prison is 
> > suspended and then call dm_internal_suspend immediatelly followed by 
> > dm_internal_resume - that will clear in-progress bios and prevent new bios 
> > from coming in (and we don't need to change dm_internal_suspend and 
> > dm_internal_resume to become so big).

It may _seem_ like they have gotten big given the code was refactored to
share code with dm_suspend and dm_resume.  BUT I know you see that the
actual code complexity isn't big.  I especially wanted you (and/or Bryn)
to evaluate the performance implications that my changes had on
dm-stats.  I'm pretty confident there won't be much if any performance
difference (given the code is identical to what you had, except some
extra checks are made but ultimately not used, e.g. lockfs/unlockfs).

The end result of the dm_internal_{suspend,resume} changes is an
interface that is useful for DM targets in addition to dm-stats.  That
is the kind of advancement DM needs.

Please focus on the performance impact of my changes, if any, and we'll
go from there.

> No, the correct sequence is this (do this in presuspend handler):
> 
> 1. call dm_internal_suspend on all thin devices
> 2. set the flag in the prison meaning that the prison is blocked
> 3. call dm_internal_resume on all thin devices

I really didn't like the idea of reusing the bio-prison to achieve the
suspend of all thins to begin with.  This proposal is even more suspect
given the desire to call dm_internal_suspend and dm_internal_resume from
pool_presuspend.

It just isn't code that I want to see making its way into the tree.
Sets bad precedent of hacking around problems within a target for
questionable gain (questionable in that there really isn't a
pattern/infrastructure for other more complex targets to follow so
they'd need to invent their own hack should they have a comparable
problem).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: staged dm_internal_{suspend, resume} related changes for wider review
  2014-11-05 14:11                             ` Mike Snitzer
@ 2014-11-05 14:37                               ` Mikulas Patocka
  2014-11-05 14:50                                 ` Zdenek Kabelac
                                                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Mikulas Patocka @ 2014-11-05 14:37 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon,
	Zdenek Kabelac



On Wed, 5 Nov 2014, Mike Snitzer wrote:

> On Wed, Nov 05 2014 at  8:05am -0500,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
> 
> > On Wed, 5 Nov 2014, Mikulas Patocka wrote:
> > 
> > > You can for example set the flag in the prison meaning that the prison is 
> > > suspended and then call dm_internal_suspend immediatelly followed by 
> > > dm_internal_resume - that will clear in-progress bios and prevent new bios 
> > > from coming in (and we don't need to change dm_internal_suspend and 
> > > dm_internal_resume to become so big).
> 
> It may _seem_ like they have gotten big given the code was refactored to
> share code with dm_suspend and dm_resume.  BUT I know you see that the
> actual code complexity isn't big.  I especially wanted you (and/or Bryn)
> to evaluate the performance implications that my changes had on
> dm-stats.  I'm pretty confident there won't be much if any performance
> difference (given the code is identical to what you had, except some
> extra checks are made but ultimately not used, e.g. lockfs/unlockfs).

This is not about performance, it is about unclear behavior.

If someone does internal_suspend followed by remove, what should be the 
correct behavior? The current code deadlocks in this case.

The patch series introduces two suspend mechanisms and it is unclear how 
should they interact with each other.

> The end result of the dm_internal_{suspend,resume} changes is an
> interface that is useful for DM targets in addition to dm-stats.  That
> is the kind of advancement DM needs.
> 
> Please focus on the performance impact of my changes, if any, and we'll
> go from there.
> 
> > No, the correct sequence is this (do this in presuspend handler):
> > 
> > 1. call dm_internal_suspend on all thin devices
> > 2. set the flag in the prison meaning that the prison is blocked
> > 3. call dm_internal_resume on all thin devices
> 
> I really didn't like the idea of reusing the bio-prison to achieve the
> suspend of all thins to begin with.  This proposal is even more suspect
> given the desire to call dm_internal_suspend and dm_internal_resume from
> pool_presuspend.
> 
> It just isn't code that I want to see making its way into the tree.
> Sets bad precedent of hacking around problems within a target for
> questionable gain (questionable in that there really isn't a
> pattern/infrastructure for other more complex targets to follow so
> they'd need to invent their own hack should they have a comparable
> problem).

At least the hack stays within dm-thin and generic dm code isn't 
contaminated by it.

Mikulas

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: staged dm_internal_{suspend, resume} related changes for wider review
  2014-11-05 14:37                               ` Mikulas Patocka
@ 2014-11-05 14:50                                 ` Zdenek Kabelac
  2014-11-05 16:10                                 ` Mike Snitzer
  2014-11-05 16:29                                 ` Mike Snitzer
  2 siblings, 0 replies; 23+ messages in thread
From: Zdenek Kabelac @ 2014-11-05 14:50 UTC (permalink / raw)
  To: Mikulas Patocka, Mike Snitzer
  Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon

Dne 5.11.2014 v 15:37 Mikulas Patocka napsal(a):
>
>
> On Wed, 5 Nov 2014, Mike Snitzer wrote:
>
>> On Wed, Nov 05 2014 at  8:05am -0500,
>> Mikulas Patocka <mpatocka@redhat.com> wrote:
>>
>>> On Wed, 5 Nov 2014, Mikulas Patocka wrote:
>>>
>>>> You can for example set the flag in the prison meaning that the prison is
>>>> suspended and then call dm_internal_suspend immediatelly followed by
>>>> dm_internal_resume - that will clear in-progress bios and prevent new bios
>>>> from coming in (and we don't need to change dm_internal_suspend and
>>>> dm_internal_resume to become so big).
>>
>> It may _seem_ like they have gotten big given the code was refactored to
>> share code with dm_suspend and dm_resume.  BUT I know you see that the
>> actual code complexity isn't big.  I especially wanted you (and/or Bryn)
>> to evaluate the performance implications that my changes had on
>> dm-stats.  I'm pretty confident there won't be much if any performance
>> difference (given the code is identical to what you had, except some
>> extra checks are made but ultimately not used, e.g. lockfs/unlockfs).
>
> This is not about performance, it is about unclear behavior.
>
> If someone does internal_suspend followed by remove, what should be the
> correct behavior? The current code deadlocks in this case.
>

yep - that would my concern as well.

If this 'internal' suspend is purely 'enforced' by incorrectly behaving user 
space part (aka lvm2 is not doing it's best) I assume it's better to fix
user space - instead of moving it into kernel with some side-effects.

Zdenek

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: staged dm_internal_{suspend, resume} related changes for wider review
  2014-11-05 14:37                               ` Mikulas Patocka
  2014-11-05 14:50                                 ` Zdenek Kabelac
@ 2014-11-05 16:10                                 ` Mike Snitzer
  2014-11-05 16:56                                   ` Mike Snitzer
  2014-11-05 16:29                                 ` Mike Snitzer
  2 siblings, 1 reply; 23+ messages in thread
From: Mike Snitzer @ 2014-11-05 16:10 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon,
	Zdenek Kabelac

On Wed, Nov 05 2014 at  9:37am -0500,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Wed, 5 Nov 2014, Mike Snitzer wrote:
> 
> > On Wed, Nov 05 2014 at  8:05am -0500,
> > Mikulas Patocka <mpatocka@redhat.com> wrote:
> > 
> > > On Wed, 5 Nov 2014, Mikulas Patocka wrote:
> > > 
> > > > You can for example set the flag in the prison meaning that the prison is 
> > > > suspended and then call dm_internal_suspend immediatelly followed by 
> > > > dm_internal_resume - that will clear in-progress bios and prevent new bios 
> > > > from coming in (and we don't need to change dm_internal_suspend and 
> > > > dm_internal_resume to become so big).
> > 
> > It may _seem_ like they have gotten big given the code was refactored to
> > share code with dm_suspend and dm_resume.  BUT I know you see that the
> > actual code complexity isn't big.  I especially wanted you (and/or Bryn)
> > to evaluate the performance implications that my changes had on
> > dm-stats.  I'm pretty confident there won't be much if any performance
> > difference (given the code is identical to what you had, except some
> > extra checks are made but ultimately not used, e.g. lockfs/unlockfs).
> 
> This is not about performance, it is about unclear behavior.
> 
> If someone does internal_suspend followed by remove, what should be the 
> correct behavior? The current code deadlocks in this case.

You need to be specific, if internal suspend was used by the thin-pool
suspend to suspend thin devices you'll need the thin-pool resumed before
you can remove any thin device!

Like any interface there is a right way and a wrong way to use it.
dm_internal_suspend must always be followed by dm_internal_resume.
I cannot yet see a hole where properly written code is exposed here.

> The patch series introduces two suspend mechanisms and it is unclear how 
> should they interact with each other.
> 
> > The end result of the dm_internal_{suspend,resume} changes is an
> > interface that is useful for DM targets in addition to dm-stats.  That
> > is the kind of advancement DM needs.
> > 
> > Please focus on the performance impact of my changes, if any, and we'll
> > go from there.
> > 
> > > No, the correct sequence is this (do this in presuspend handler):
> > > 
> > > 1. call dm_internal_suspend on all thin devices
> > > 2. set the flag in the prison meaning that the prison is blocked
> > > 3. call dm_internal_resume on all thin devices
> > 
> > I really didn't like the idea of reusing the bio-prison to achieve the
> > suspend of all thins to begin with.  This proposal is even more suspect
> > given the desire to call dm_internal_suspend and dm_internal_resume from
> > pool_presuspend.
> > 
> > It just isn't code that I want to see making its way into the tree.
> > Sets bad precedent of hacking around problems within a target for
> > questionable gain (questionable in that there really isn't a
> > pattern/infrastructure for other more complex targets to follow so
> > they'd need to invent their own hack should they have a comparable
> > problem).
> 
> At least the hack stays within dm-thin and generic dm code isn't 
> contaminated by it.

I've improved dm_internal_suspend and dm_internal_resume to not be only
serving dm-stats.  And the complexity isn't as bad as you'd like others
to think.

Please stop with the FUD.. I won't accept FUD as the basis for
dismissing changes.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: staged dm_internal_{suspend, resume} related changes for wider review
  2014-11-05 14:37                               ` Mikulas Patocka
  2014-11-05 14:50                                 ` Zdenek Kabelac
  2014-11-05 16:10                                 ` Mike Snitzer
@ 2014-11-05 16:29                                 ` Mike Snitzer
  2014-11-07 16:20                                   ` Mikulas Patocka
  2 siblings, 1 reply; 23+ messages in thread
From: Mike Snitzer @ 2014-11-05 16:29 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon,
	Zdenek Kabelac

On Wed, Nov 05 2014 at  9:37am -0500,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> The patch series introduces two suspend mechanisms and it is unclear how 
> should they interact with each other.

And this point is not correct.  As you know dm_internal_suspend and
dm_internal_resume interface predates any of my changes.

That existing interface was extended them to be (mostly) fully formed
equivalents of dm_suspend() and dm_resume().

I say "mostly" because dm_internal_resume() doesn't call into the targets'
resume hooks because no existing callers (dm-stats or dm-thinp) need
to.  But obviously dm_resume() does need to so it passes @resume_targets
as true to __dm_resume().

I'm not trying to suggest there is a bug or bugs in this new code (you
already pointed out the locking issue across ioctls that I fixed).

But a bug doesn't implicitly mean this is an imperfect way forward --
if/when a bug is found we'll deal with it.. so feel free to pour over
this code to see if there is a bug or bugs.  I really do welcome your
review -- I would just like technical issues to be the focus of any
technical review.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: staged dm_internal_{suspend, resume} related changes for wider review
  2014-11-05 16:10                                 ` Mike Snitzer
@ 2014-11-05 16:56                                   ` Mike Snitzer
  2014-11-05 20:16                                     ` Mike Snitzer
  0 siblings, 1 reply; 23+ messages in thread
From: Mike Snitzer @ 2014-11-05 16:56 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon,
	Zdenek Kabelac

On Wed, Nov 05 2014 at 11:10am -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Wed, Nov 05 2014 at  9:37am -0500,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
> 
> > 
> > 
> > On Wed, 5 Nov 2014, Mike Snitzer wrote:
> > 
> > > On Wed, Nov 05 2014 at  8:05am -0500,
> > > Mikulas Patocka <mpatocka@redhat.com> wrote:
> > > 
> > > > On Wed, 5 Nov 2014, Mikulas Patocka wrote:
> > > > 
> > > > > You can for example set the flag in the prison meaning that the prison is 
> > > > > suspended and then call dm_internal_suspend immediatelly followed by 
> > > > > dm_internal_resume - that will clear in-progress bios and prevent new bios 
> > > > > from coming in (and we don't need to change dm_internal_suspend and 
> > > > > dm_internal_resume to become so big).
> > > 
> > > It may _seem_ like they have gotten big given the code was refactored to
> > > share code with dm_suspend and dm_resume.  BUT I know you see that the
> > > actual code complexity isn't big.  I especially wanted you (and/or Bryn)
> > > to evaluate the performance implications that my changes had on
> > > dm-stats.  I'm pretty confident there won't be much if any performance
> > > difference (given the code is identical to what you had, except some
> > > extra checks are made but ultimately not used, e.g. lockfs/unlockfs).
> > 
> > This is not about performance, it is about unclear behavior.
> > 
> > If someone does internal_suspend followed by remove, what should be the 
> > correct behavior? The current code deadlocks in this case.
> 
> You need to be specific, if internal suspend was used by the thin-pool
> suspend to suspend thin devices you'll need the thin-pool resumed before
> you can remove any thin device!
> 
> Like any interface there is a right way and a wrong way to use it.
> dm_internal_suspend must always be followed by dm_internal_resume.
> I cannot yet see a hole where properly written code is exposed here.

But thinking further about what you said, you're correctly concerned
about the potential for dm-stats to have used dm_internal_suspend and
then someone attempting to remove that device while it is internally
suspended.

We just need a patch like this:

 drivers/md/dm.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ac0dcad..ffa6763 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -448,6 +448,17 @@ int dm_lock_for_deletion(struct mapped_device *md, bool mark_deferred, bool only
 {
 	int r = 0;
 
+retry:
+	mutex_lock(&md->suspend_lock);
+	if (dm_suspended_internally_md(md)) {
+		/* already internally suspended, wait for internal resume */
+		mutex_unlock(&md->suspend_lock);
+		r = wait_on_bit(&md->flags, DMF_SUSPENDED_INTERNALLY, TASK_INTERRUPTIBLE);
+		if (r)
+			return r;
+		goto retry;
+	}
+
 	spin_lock(&_minor_lock);
 
 	if (dm_open_count(md)) {
@@ -461,6 +472,8 @@ int dm_lock_for_deletion(struct mapped_device *md, bool mark_deferred, bool only
 
 	spin_unlock(&_minor_lock);
 
+	mutex_unlock(&md->suspend_lock);
+
 	return r;
 }
 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: staged dm_internal_{suspend, resume} related changes for wider review
  2014-11-05 16:56                                   ` Mike Snitzer
@ 2014-11-05 20:16                                     ` Mike Snitzer
  0 siblings, 0 replies; 23+ messages in thread
From: Mike Snitzer @ 2014-11-05 20:16 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon,
	Zdenek Kabelac

On Wed, Nov 05 2014 at 11:56am -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Wed, Nov 05 2014 at 11:10am -0500,
> Mike Snitzer <snitzer@redhat.com> wrote:
> 
> > On Wed, Nov 05 2014 at  9:37am -0500,
> > Mikulas Patocka <mpatocka@redhat.com> wrote:
> > 
> > > This is not about performance, it is about unclear behavior.
> > > 
> > > If someone does internal_suspend followed by remove, what should be the 
> > > correct behavior? The current code deadlocks in this case.
> > 
> > You need to be specific, if internal suspend was used by the thin-pool
> > suspend to suspend thin devices you'll need the thin-pool resumed before
> > you can remove any thin device!
> >
> > Like any interface there is a right way and a wrong way to use it.
> > dm_internal_suspend must always be followed by dm_internal_resume.
> > I cannot yet see a hole where properly written code is exposed here.
> 
> But thinking further about what you said, you're correctly concerned
> about the potential for dm-stats to have used dm_internal_suspend and
> then someone attempting to remove that device while it is internally
> suspended.

As we discussed (but for benefit of others): dm-stats does the
dm_internal_suspend+dm_internal_resume within a single DM message
(ioctl) so there is no potential for race with device delete.

> We just need a patch like this:

That untested patch was flawed, caused deadlock due to wait_on_bit()
while holding _hash_lock -- so resume ioctl wasn't possible.

So I'm still really not sure what your original point was about "If
someone does internal_suspend followed by remove, what should be the
correct behavior? The current code deadlocks in this case."

There is no deadlock:

dmsetup suspend thin-pool
dmsetup remove thin-thin1
dmsetup resume thin-pool

the thin-pool suspend will internal suspend thin-thin1, but I can still
remove thin-thin1 (provided it isn't in-use), and later resuming the
thin-pool works fine too.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: staged dm_internal_{suspend, resume} related changes for wider review
  2014-11-05 16:29                                 ` Mike Snitzer
@ 2014-11-07 16:20                                   ` Mikulas Patocka
  2014-11-07 18:33                                     ` Mike Snitzer
  2014-11-07 23:16                                     ` [PATCH] Suspend all active bios when the pool is suspended " Mikulas Patocka
  0 siblings, 2 replies; 23+ messages in thread
From: Mikulas Patocka @ 2014-11-07 16:20 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon,
	Zdenek Kabelac



On Wed, 5 Nov 2014, Mike Snitzer wrote:

> On Wed, Nov 05 2014 at  9:37am -0500,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
> 
> > The patch series introduces two suspend mechanisms and it is unclear how 
> > should they interact with each other.
> 
> And this point is not correct.  As you know dm_internal_suspend and
> dm_internal_resume interface predates any of my changes.
> 
> That existing interface was extended them to be (mostly) fully formed
> equivalents of dm_suspend() and dm_resume().
> 
> I say "mostly" because dm_internal_resume() doesn't call into the targets'
> resume hooks because no existing callers (dm-stats or dm-thinp) need
> to.  But obviously dm_resume() does need to so it passes @resume_targets
> as true to __dm_resume().
> 
> I'm not trying to suggest there is a bug or bugs in this new code (you
> already pointed out the locking issue across ioctls that I fixed).
> 
> But a bug doesn't implicitly mean this is an imperfect way forward --
> if/when a bug is found we'll deal with it.. so feel free to pour over
> this code to see if there is a bug or bugs.  I really do welcome your
> review -- I would just like technical issues to be the focus of any
> technical review.

Problems with that patch set:

1. You walk all thin targets in pool_presuspend and call internal_suspend 
on them and then again in pool_resume call internal_resume on them. 
Between calls to pool_presuspend and pool_resume, dm-thin devices may be 
added or removed, resulting in unballanced calls.

2. You walk all thin targets and call internal_suspend on them. If two 
thin targets are in one dm table, it calls internal_suspend twice on the 
same md device. It also calls internal_suspend twice on the same md device 
if the device has both active and inactive table with a thin target.

3. The device may be suspended internally by pool presuspend and by 
statistics at the same time - the code doesn't handle that:
if (WARN_ON(dm_suspended_internally_md(md))) goto out; /* disallow nested 
internal suspends! */

4. when pool_presuspend is called, md->suspend_lock is alrady held. Taking 
md->suspend_lock on another device results in lockdep warning.


For reasons 1 and 2, I wouldn't really deal with "thin" targets at all - 
they may be created or deleted independent on pool status. Instead, we 
should block all active bios inside the pool - the bios are already 
registered in dm_deferred_set or in the prison, so all you need to do is 
to set a flag pool's presuspend method that causes all new bios to be 
queues and the wait until the prison is empty and the counters in 
deferred_set reach zero.


Mikulas

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: staged dm_internal_{suspend, resume} related changes for wider review
  2014-11-07 16:20                                   ` Mikulas Patocka
@ 2014-11-07 18:33                                     ` Mike Snitzer
  2015-01-02 22:56                                       ` Mikulas Patocka
  2014-11-07 23:16                                     ` [PATCH] Suspend all active bios when the pool is suspended " Mikulas Patocka
  1 sibling, 1 reply; 23+ messages in thread
From: Mike Snitzer @ 2014-11-07 18:33 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon,
	Zdenek Kabelac

On Fri, Nov 07 2014 at 11:20am -0500,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Wed, 5 Nov 2014, Mike Snitzer wrote:
> 
> > On Wed, Nov 05 2014 at  9:37am -0500,
> > Mikulas Patocka <mpatocka@redhat.com> wrote:
> > 
> > > The patch series introduces two suspend mechanisms and it is unclear how 
> > > should they interact with each other.
> > 
> > And this point is not correct.  As you know dm_internal_suspend and
> > dm_internal_resume interface predates any of my changes.
> > 
> > That existing interface was extended them to be (mostly) fully formed
> > equivalents of dm_suspend() and dm_resume().
> > 
> > I say "mostly" because dm_internal_resume() doesn't call into the targets'
> > resume hooks because no existing callers (dm-stats or dm-thinp) need
> > to.  But obviously dm_resume() does need to so it passes @resume_targets
> > as true to __dm_resume().
> > 
> > I'm not trying to suggest there is a bug or bugs in this new code (you
> > already pointed out the locking issue across ioctls that I fixed).
> > 
> > But a bug doesn't implicitly mean this is an imperfect way forward --
> > if/when a bug is found we'll deal with it.. so feel free to pour over
> > this code to see if there is a bug or bugs.  I really do welcome your
> > review -- I would just like technical issues to be the focus of any
> > technical review.
> 
> Problems with that patch set:
> 
> 1. You walk all thin targets in pool_presuspend and call internal_suspend 
> on them and then again in pool_resume call internal_resume on them. 
> Between calls to pool_presuspend and pool_resume, dm-thin devices may be 
> added or removed, resulting in unballanced calls.

No, we cannot add/remove or activate/deactivate thin devices while the
pool is suspended.  Those operations should block until the pool is
resumed.  I'll audit/fix accordingly.

> 2. You walk all thin targets and call internal_suspend on them. If two 
> thin targets are in one dm table, it calls internal_suspend twice on the 
> same md device. It also calls internal_suspend twice on the same md device 
> if the device has both active and inactive table with a thin target.

While I see your point, a thin DM table will never have multiple thin
targets in it.  This probably _should_ be enforced by adding
DM_TARGET_SINGLETON to the thin_target's .features (should also probably
have DM_TARGET_IMMUTABLE).

> 3. The device may be suspended internally by pool presuspend and by 
> statistics at the same time - the code doesn't handle that:
> if (WARN_ON(dm_suspended_internally_md(md))) goto out; /* disallow nested 
> internal suspends! */

Shouldn't that be fine?  Just implies the stats won't be as precise...
BUT, the WARN_ON() is clearly overkill and needs to be removed.

> 4. when pool_presuspend is called, md->suspend_lock is alrady held. Taking 
> md->suspend_lock on another device results in lockdep warning.

That should be fixed with proper lockdep training.  As I thought was the
case by the mutex_lock_nested() I added to dm_suspend().  But I'll
recompile my kernel with lockdep enabled and silence lockdep once and
for all if it is still complaining.

> For reasons 1 and 2, I wouldn't really deal with "thin" targets at all - 
> they may be created or deleted independent on pool status. Instead, we 
> should block all active bios inside the pool - the bios are already 
> registered in dm_deferred_set or in the prison, so all you need to do is 
> to set a flag pool's presuspend method that causes all new bios to be 
> queues and the wait until the prison is empty and the counters in 
> deferred_set reach zero.

Given my explanation above, reasons 1 and 2 shouldn't really be a
concern in the end.

Thanks for the review!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] Suspend all active bios when the pool is suspended (was: staged dm_internal_{suspend, resume} related changes for wider review)
  2014-11-07 16:20                                   ` Mikulas Patocka
  2014-11-07 18:33                                     ` Mike Snitzer
@ 2014-11-07 23:16                                     ` Mikulas Patocka
  2014-11-07 23:29                                       ` Mike Snitzer
  2014-11-09  1:17                                       ` [PATCH] " Mike Snitzer
  1 sibling, 2 replies; 23+ messages in thread
From: Mikulas Patocka @ 2014-11-07 23:16 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon,
	Zdenek Kabelac



On Fri, 7 Nov 2014, Mikulas Patocka wrote:

> For reasons 1 and 2, I wouldn't really deal with "thin" targets at all - 
> they may be created or deleted independent on pool status. Instead, we 
> should block all active bios inside the pool - the bios are already 
> registered in dm_deferred_set or in the prison, so all you need to do is 
> to set a flag pool's presuspend method that causes all new bios to be 
> queues and the wait until the prison is empty and the counters in 
> deferred_set reach zero.

Here I'm sending a proof-of-concept patch that makes it possible to 
prevent bios from coming into the thin pool when the pool is suspended. It 
doesn't modify any generic dm code.

This patch may not be perfect (I don't know all bio paths in dm-thin in 
detail, so I may have missed something), but it shows that it is possible 
to suspend all bios without modifying the common dm code. I hope Joe 
reviewes it and possibly fixes it.

The patch is for 3.18-rc3, it needs other thin patches, the full patch 
series is at 
http://people.redhat.com/~mpatocka/patches/kernel/dm-thin-suspend/series.html

Mikulas


Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/dm-bio-prison.c |   29 +++++++++++++++++++++++++++
 drivers/md/dm-bio-prison.h |    3 ++
 drivers/md/dm-thin.c       |   47 +++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 75 insertions(+), 4 deletions(-)

Index: linux-2.6/drivers/md/dm-thin.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-thin.c	2014-11-08 00:02:42.000000000 +0100
+++ linux-2.6/drivers/md/dm-thin.c	2014-11-08 00:02:42.000000000 +0100
@@ -194,6 +194,8 @@ struct pool {
 	struct dm_thin_new_mapping *next_mapping;
 	mempool_t *mapping_pool;
 
+	atomic_t suspended;
+
 	process_bio_fn process_bio;
 	process_bio_fn process_discard;
 
@@ -1532,6 +1534,9 @@ static void process_thin_deferred_bios(s
 		return;
 	}
 
+	if (unlikely(atomic_read(&pool->suspended)))
+		return;
+
 	bio_list_init(&bios);
 
 	spin_lock_irqsave(&tc->lock, flags);
@@ -1930,7 +1935,8 @@ static int thin_bio_map(struct dm_target
 		return DM_MAPIO_SUBMITTED;
 	}
 
-	if (bio->bi_rw & (REQ_DISCARD | REQ_FLUSH | REQ_FUA)) {
+	if (bio->bi_rw & (REQ_DISCARD | REQ_FLUSH | REQ_FUA) ||
+	    unlikely(atomic_read(&tc->pool->suspended))) {
 		thin_defer_bio(tc, bio);
 		return DM_MAPIO_SUBMITTED;
 	}
@@ -1943,6 +1949,12 @@ static int thin_bio_map(struct dm_target
 	if (dm_bio_detain(tc->pool->prison, &key, bio, &cell1, &cell_result))
 		return DM_MAPIO_SUBMITTED;
 
+	if (unlikely(atomic_read(&tc->pool->suspended))) {
+		thin_defer_bio(tc, bio);
+		cell_defer_no_holder_no_free(tc, &cell1);
+		return DM_MAPIO_SUBMITTED;
+	}
+
 	r = dm_thin_find_block(td, block, 0, &result);
 
 	/*
@@ -2227,6 +2239,7 @@ static struct pool *pool_create(struct m
 	INIT_LIST_HEAD(&pool->prepared_discards);
 	INIT_LIST_HEAD(&pool->active_thins);
 	pool->low_water_triggered = false;
+	atomic_set(&pool->suspended, 1);
 
 	pool->shared_read_ds = dm_deferred_set_create();
 	if (!pool->shared_read_ds) {
@@ -2772,22 +2785,46 @@ static void pool_resume(struct dm_target
 	spin_lock_irqsave(&pool->lock, flags);
 	pool->low_water_triggered = false;
 	spin_unlock_irqrestore(&pool->lock, flags);
-	requeue_bios(pool);
 
-	do_waker(&pool->waker.work);
+	if (atomic_dec_and_test(&pool->suspended)) {
+		requeue_bios(pool);
+		do_waker(&pool->waker.work);
+	}
 }
 
-static void pool_postsuspend(struct dm_target *ti)
+static void pool_presuspend(struct dm_target *ti)
 {
 	struct pool_c *pt = ti->private;
 	struct pool *pool = pt->pool;
 
+	atomic_inc(&pool->suspended);
+
+	dm_drain_prison(pool->prison);
+
+	dm_drain_deferred_set(pool->shared_read_ds);
+	dm_drain_deferred_set(pool->all_io_ds);
+
 	cancel_delayed_work(&pool->waker);
 	cancel_delayed_work(&pool->no_space_timeout);
 	flush_workqueue(pool->wq);
+
+	/* ??? should this be before or after flush_workqueue ? */
+	dm_drain_deferred_set(pool->shared_read_ds);
+	dm_drain_deferred_set(pool->all_io_ds);
+
 	(void) commit(pool);
 }
 
+static void pool_presuspend_undo(struct dm_target *ti)
+{
+	pool_resume(ti);
+}
+
+static void pool_postsuspend(struct dm_target *ti)
+{
+	/* nothing */
+}
+
 static int check_arg_count(unsigned argc, unsigned args_required)
 {
 	if (argc != args_required) {
@@ -3218,6 +3255,8 @@ static struct target_type pool_target =
 	.ctr = pool_ctr,
 	.dtr = pool_dtr,
 	.map = pool_map,
+	.presuspend = pool_presuspend,
+	.presuspend_undo = pool_presuspend_undo,
 	.postsuspend = pool_postsuspend,
 	.preresume = pool_preresume,
 	.resume = pool_resume,
Index: linux-2.6/drivers/md/dm-bio-prison.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-bio-prison.c	2014-11-08 00:02:42.000000000 +0100
+++ linux-2.6/drivers/md/dm-bio-prison.c	2014-11-08 00:09:40.000000000 +0100
@@ -11,6 +11,7 @@
 #include <linux/mempool.h>
 #include <linux/module.h>
 #include <linux/slab.h>
+#include <linux/delay.h>
 
 /*----------------------------------------------------------------*/
 
@@ -241,6 +242,18 @@ void dm_cell_error(struct dm_bio_prison
 }
 EXPORT_SYMBOL_GPL(dm_cell_error);
 
+void dm_drain_prison(struct dm_bio_prison *prison)
+{
+	spin_lock_irq(&prison->lock);
+	if (!RB_EMPTY_ROOT(&prison->cells)) {
+		spin_unlock_irq(&prison->lock);
+		msleep(1);
+		spin_lock_irq(&prison->lock);
+	}
+	spin_unlock_irq(&prison->lock);
+}
+EXPORT_SYMBOL_GPL(dm_drain_prison);
+
 /*----------------------------------------------------------------*/
 
 #define DEFERRED_SET_SIZE 64
@@ -354,6 +367,22 @@ int dm_deferred_set_add_work(struct dm_d
 }
 EXPORT_SYMBOL_GPL(dm_deferred_set_add_work);
 
+void dm_drain_deferred_set(struct dm_deferred_set *ds)
+{
+	int i;
+retry:
+	spin_lock_irq(&ds->lock);
+	for (i = 0; i < DEFERRED_SET_SIZE; i++) {
+		if (ds->entries[i].count) {
+			spin_unlock_irq(&ds->lock);
+			msleep(1);
+			goto retry;
+		}
+	}
+	spin_unlock_irq(&ds->lock);
+}
+EXPORT_SYMBOL_GPL(dm_drain_deferred_set);
+
 /*----------------------------------------------------------------*/
 
 static int __init dm_bio_prison_init(void)
Index: linux-2.6/drivers/md/dm-bio-prison.h
===================================================================
--- linux-2.6.orig/drivers/md/dm-bio-prison.h	2014-11-08 00:02:42.000000000 +0100
+++ linux-2.6/drivers/md/dm-bio-prison.h	2014-11-08 00:02:42.000000000 +0100
@@ -87,6 +87,7 @@ void dm_cell_release_no_holder(struct dm
 			       struct bio_list *inmates);
 void dm_cell_error(struct dm_bio_prison *prison,
 		   struct dm_bio_prison_cell *cell, int error);
+void dm_drain_prison(struct dm_bio_prison *prison);
 
 /*----------------------------------------------------------------*/
 
@@ -107,6 +108,8 @@ struct dm_deferred_entry *dm_deferred_en
 void dm_deferred_entry_dec(struct dm_deferred_entry *entry, struct list_head *head);
 int dm_deferred_set_add_work(struct dm_deferred_set *ds, struct list_head *work);
 
+void dm_drain_deferred_set(struct dm_deferred_set *ds);
+
 /*----------------------------------------------------------------*/
 
 #endif

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Suspend all active bios when the pool is suspended (was: staged dm_internal_{suspend, resume} related changes for wider review)
  2014-11-07 23:16                                     ` [PATCH] Suspend all active bios when the pool is suspended " Mikulas Patocka
@ 2014-11-07 23:29                                       ` Mike Snitzer
  2014-11-09  1:17                                       ` [PATCH] " Mike Snitzer
  1 sibling, 0 replies; 23+ messages in thread
From: Mike Snitzer @ 2014-11-07 23:29 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon,
	Zdenek Kabelac

On Fri, Nov 07 2014 at  6:16pm -0500,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Fri, 7 Nov 2014, Mikulas Patocka wrote:
> 
> > For reasons 1 and 2, I wouldn't really deal with "thin" targets at all - 
> > they may be created or deleted independent on pool status. Instead, we 
> > should block all active bios inside the pool - the bios are already 
> > registered in dm_deferred_set or in the prison, so all you need to do is 
> > to set a flag pool's presuspend method that causes all new bios to be 
> > queues and the wait until the prison is empty and the counters in 
> > deferred_set reach zero.
> 
> Here I'm sending a proof-of-concept patch that makes it possible to 
> prevent bios from coming into the thin pool when the pool is suspended. It 
> doesn't modify any generic dm code.
> 
> This patch may not be perfect (I don't know all bio paths in dm-thin in 
> detail, so I may have missed something), but it shows that it is possible 
> to suspend all bios without modifying the common dm code. I hope Joe 
> reviewes it and possibly fixes it.
> 
> The patch is for 3.18-rc3, it needs other thin patches, the full patch 
> series is at 
> http://people.redhat.com/~mpatocka/patches/kernel/dm-thin-suspend/series.html

Happy to consider your solution, thanks for taking the time to work on
it.

I also revised my dm_internal_{suspend,resume} changes based on your
feedback and have published the changes here:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19
(rebased so the top ~11 patches are the ones to look at)

I'm happy with how this code turned out, but it is certainly a fair
amount of code churn in DM core.

All said, I'll work with Joe to figure out the best way forward next
week.  Could easily be that your approach wins out.

BTW, have a great break.
Mike

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Suspend all active bios when the pool is suspended (was: staged dm_internal_{suspend, resume} related changes for wider review)
  2014-11-07 23:16                                     ` [PATCH] Suspend all active bios when the pool is suspended " Mikulas Patocka
  2014-11-07 23:29                                       ` Mike Snitzer
@ 2014-11-09  1:17                                       ` Mike Snitzer
  1 sibling, 0 replies; 23+ messages in thread
From: Mike Snitzer @ 2014-11-09  1:17 UTC (permalink / raw)
  To: device-mapper development
  Cc: ejt, Bryn M. Reeves, Alasdair G Kergon, Zdenek Kabelac

On Fri, Nov 7, 2014 at 6:16 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
> On Fri, 7 Nov 2014, Mikulas Patocka wrote:
>
>> For reasons 1 and 2, I wouldn't really deal with "thin" targets at all -
>> they may be created or deleted independent on pool status. Instead, we
>> should block all active bios inside the pool - the bios are already
>> registered in dm_deferred_set or in the prison, so all you need to do is
>> to set a flag pool's presuspend method that causes all new bios to be
>> queues and the wait until the prison is empty and the counters in
>> deferred_set reach zero.
>
> Here I'm sending a proof-of-concept patch that makes it possible to
> prevent bios from coming into the thin pool when the pool is suspended. It
> doesn't modify any generic dm code.
>
> This patch may not be perfect (I don't know all bio paths in dm-thin in
> detail, so I may have missed something), but it shows that it is possible
> to suspend all bios without modifying the common dm code. I hope Joe
> reviewes it and possibly fixes it.
>
> The patch is for 3.18-rc3, it needs other thin patches, the full patch
> series is at
> http://people.redhat.com/~mpatocka/patches/kernel/dm-thin-suspend/series.html
>
> Mikulas
>
>
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
>
> ---
>  drivers/md/dm-bio-prison.c |   29 +++++++++++++++++++++++++++
>  drivers/md/dm-bio-prison.h |    3 ++
>  drivers/md/dm-thin.c       |   47 +++++++++++++++++++++++++++++++++++++++++----
>  3 files changed, 75 insertions(+), 4 deletions(-)
>
> Index: linux-2.6/drivers/md/dm-thin.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-thin.c 2014-11-08 00:02:42.000000000 +0100
> +++ linux-2.6/drivers/md/dm-thin.c      2014-11-08 00:02:42.000000000 +0100
> @@ -194,6 +194,8 @@ struct pool {
>         struct dm_thin_new_mapping *next_mapping;
>         mempool_t *mapping_pool;
>
> +       atomic_t suspended;
> +
>         process_bio_fn process_bio;
>         process_bio_fn process_discard;
>

I'm not seeing how using an unprotected atomic_t is thread-safe
considering both the thin-pool and thin targets will be accessing it
concurrently.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: staged dm_internal_{suspend, resume} related changes for wider review
  2014-11-07 18:33                                     ` Mike Snitzer
@ 2015-01-02 22:56                                       ` Mikulas Patocka
  2015-01-05 15:50                                         ` Mike Snitzer
  0 siblings, 1 reply; 23+ messages in thread
From: Mikulas Patocka @ 2015-01-02 22:56 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon,
	Zdenek Kabelac

Hi, I'm back.

I looked at current internal suspend code in 3.19.

One thing that seems strange is this:

static void __dm_internal_suspend(struct mapped_device *md, unsigned suspend_flags)
{
        struct dm_table *map = NULL;

        if (dm_suspended_internally_md(md))
                return; /* nested internal suspend */
....
static void __dm_internal_resume(struct mapped_device *md)
{
        if (!dm_suspended_internally_md(md))
                return; /* resume from nested internal suspend */

- the comments imply that the code supports nested internal suspend, while 
in fact it doesn't. If the caller calls:
* dm_internal_suspend_noflush
* dm_internal_suspend_noflush
* dm_internal_resume
the device is not suspended while it should be.

The "if" branch in __dm_internal_resume has nothing to do with nested 
suspend, it's called if the caller calls dm_internal_resume without 
dm_internal_suspend_noflush - that shouldn't happen, so there should be 
WARN or BUG.

If you want to support multiple internal suspends (which is needed if one 
table can have multiple dm-thin targets), you need a counter that counts 
how many times the device was suspended - the counter doesn't have to be 
atomic because the code already holds md->suspend_lock. If you don't want 
to support multiple internal suspends, there should be at least WARN when 
the caller attempts to do it.

Mikulas

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: staged dm_internal_{suspend, resume} related changes for wider review
  2015-01-02 22:56                                       ` Mikulas Patocka
@ 2015-01-05 15:50                                         ` Mike Snitzer
  2015-01-08 23:52                                           ` [PATCH] dm: handle multiple internal suspends correctly (was: staged dm_internal_{suspend, resume} related changes for wider review) Mikulas Patocka
  0 siblings, 1 reply; 23+ messages in thread
From: Mike Snitzer @ 2015-01-05 15:50 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon,
	Zdenek Kabelac

On Fri, Jan 02 2015 at  5:56pm -0500,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> Hi, I'm back.
> 
> I looked at current internal suspend code in 3.19.
> 
> One thing that seems strange is this:
> 
> static void __dm_internal_suspend(struct mapped_device *md, unsigned suspend_flags)
> {
>         struct dm_table *map = NULL;
> 
>         if (dm_suspended_internally_md(md))
>                 return; /* nested internal suspend */
> ....
> static void __dm_internal_resume(struct mapped_device *md)
> {
>         if (!dm_suspended_internally_md(md))
>                 return; /* resume from nested internal suspend */
> 
> - the comments imply that the code supports nested internal suspend, while 
> in fact it doesn't. If the caller calls:
> * dm_internal_suspend_noflush
> * dm_internal_suspend_noflush
> * dm_internal_resume
> the device is not suspended while it should be.

Yes, very true, it should.  But IIRC in practice it doesn't hurt us to
not be bothered about being so precise given the current caller.

> The "if" branch in __dm_internal_resume has nothing to do with nested 
> suspend, it's called if the caller calls dm_internal_resume without 
> dm_internal_suspend_noflush - that shouldn't happen, so there should be 
> WARN or BUG.

The first internal resume would've cleared the DM_INTERNAL_SUSPEND_FLAG,
etc.  So AFAICT the code in question does also cover multiple internal
suspends/resumes (albeit without the counter you've suggested).

But I did narrowly define "nested suspend" as an internal suspend that
arrives _after_ a normal suspend.

From the header of commit ffcc39364 ("dm: enhance internal suspend and
resume interface"):

    Both DM_SUSPEND_FLAG and DM_INTERNAL_SUSPEND_FLAG may be set if a device
    was already suspended when dm_internal_suspend_noflush() was called --
    this can be thought of as a "nested suspend".  A "nested suspend" can
    occur with legacy userspace dm-thin code that might suspend all active
    thin volumes before suspending the pool for resize.

Could be the reality is "nested suspend" is also supporting the case of
multiple internal suspends.

> If you want to support multiple internal suspends (which is needed if one 
> table can have multiple dm-thin targets), you need a counter that counts 
> how many times the device was suspended - the counter doesn't have to be 
> atomic because the code already holds md->suspend_lock. If you don't want 
> to support multiple internal suspends, there should be at least WARN when 
> the caller attempts to do it.

I'll have a look.  Will need some time to revisit this line of work.

FYI, the device-mapper-test-suite has fairly comprehensive test coverage
for all supported scenarios, e.g.:

# dmtest run --suite thin-provisioning -t /SuspendTests/
Loaded suite thin-provisioning
Started
SuspendTests
  nested_internal_suspend_using_inactive_table...PASS
  suspend_pool_active_thins_no_io...PASS
  suspend_pool_after_suspend_thin...PASS
  suspend_pool_concurrent_suspend...PASS
  suspend_pool_no_active_thins...PASS
  suspend_pool_no_thins...PASS
  suspend_pool_resume_thin...PASS
  suspend_pool_suspended_thin...PASS
  wait_on_bit_during_resume...PASS
  wait_on_bit_during_suspend...PASS

# dmtest run --suite thin-provisioning -n activate_thin_while_pool_suspended_fails
Loaded suite thin-provisioning
Started
CreationTests
  activate_thin_while_pool_suspended_fails...PASS

So any changes in this area need to be verified using dmts.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] dm: handle multiple internal suspends correctly (was: staged dm_internal_{suspend, resume} related changes for wider review)
  2015-01-05 15:50                                         ` Mike Snitzer
@ 2015-01-08 23:52                                           ` Mikulas Patocka
  0 siblings, 0 replies; 23+ messages in thread
From: Mikulas Patocka @ 2015-01-08 23:52 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon,
	Zdenek Kabelac



On Mon, 5 Jan 2015, Mike Snitzer wrote:

> On Fri, Jan 02 2015 at  5:56pm -0500,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
> 
> > Hi, I'm back.
> > 
> > I looked at current internal suspend code in 3.19.
> > 
> > One thing that seems strange is this:
> > 
> > static void __dm_internal_suspend(struct mapped_device *md, unsigned suspend_flags)
> > {
> >         struct dm_table *map = NULL;
> > 
> >         if (dm_suspended_internally_md(md))
> >                 return; /* nested internal suspend */
> > ....
> > static void __dm_internal_resume(struct mapped_device *md)
> > {
> >         if (!dm_suspended_internally_md(md))
> >                 return; /* resume from nested internal suspend */
> > 
> > - the comments imply that the code supports nested internal suspend, while 
> > in fact it doesn't. If the caller calls:
> > * dm_internal_suspend_noflush
> > * dm_internal_suspend_noflush
> > * dm_internal_resume
> > the device is not suspended while it should be.
> 
> Yes, very true, it should.  But IIRC in practice it doesn't hurt us to
> not be bothered about being so precise given the current caller.

If the device is not suspended while it should be, it could lead to 
various subtle bugs. We should either disallow multiple internal suspends 
(print warning when that happens) or handle it correctly. Here I'm sending 
a patch to fix it.

Mikulas


From: Mikulas Patocka <mpatocka@redhat.com>

The current code attempts to handle multiple internal suspends on the same
device, but it does that incorrectly. When these functions are called in
this order on the same device
	dm_internal_suspend_noflush
	dm_internal_suspend_noflush
	dm_internal_resume
the device is not suspended, while it should be. This patch fixes the bug
by maintaining a counter of suspends, and resuming the device when the
counter drops to zero.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/dm.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: linux-3.19-rc3/drivers/md/dm.c
===================================================================
--- linux-3.19-rc3.orig/drivers/md/dm.c	2015-01-08 22:04:23.000000000 +0100
+++ linux-3.19-rc3/drivers/md/dm.c	2015-01-08 22:22:14.000000000 +0100
@@ -206,6 +206,9 @@ struct mapped_device {
 	/* zero-length flush that will be cloned and submitted to targets */
 	struct bio flush_bio;
 
+	/* the number of internal suspends */
+	unsigned internal_suspend_count;
+
 	struct dm_stats stats;
 };
 
@@ -3023,7 +3026,7 @@ static void __dm_internal_suspend(struct
 {
 	struct dm_table *map = NULL;
 
-	if (dm_suspended_internally_md(md))
+	if (md->internal_suspend_count++)
 		return; /* nested internal suspend */
 
 	if (dm_suspended_md(md)) {
@@ -3048,7 +3051,9 @@ static void __dm_internal_suspend(struct
 
 static void __dm_internal_resume(struct mapped_device *md)
 {
-	if (!dm_suspended_internally_md(md))
+	BUG_ON(!md->internal_suspend_count);
+
+	if (--md->internal_suspend_count)
 		return; /* resume from nested internal suspend */
 
 	if (dm_suspended_md(md))

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-01-08 23:52 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20141028014825.GA21949@redhat.com>
     [not found] ` <544F988D.7090901@redhat.com>
     [not found]   ` <20141028134012.GA25229@redhat.com>
     [not found]     ` <20141028141749.GD25229@redhat.com>
     [not found]       ` <20141028143742.GE25229@redhat.com>
     [not found]         ` <20141028171003.GG25229@redhat.com>
     [not found]           ` <20141028224753.GA29288@redhat.com>
     [not found]             ` <20141028232638.GB29288@redhat.com>
     [not found]               ` <20141029002125.GC29288@redhat.com>
2014-10-29  1:22                 ` staged dm_internal_{suspend, resume} related changes for wider review Mike Snitzer
2014-10-29 19:06                   ` Mike Snitzer
2014-10-29 20:49                     ` Mike Snitzer
2014-11-03 23:25                   ` Mikulas Patocka
2014-11-04  0:17                     ` Mike Snitzer
2014-11-05  1:16                       ` Mike Snitzer
2014-11-05 12:43                         ` Mikulas Patocka
2014-11-05 13:05                           ` Mikulas Patocka
2014-11-05 14:11                             ` Mike Snitzer
2014-11-05 14:37                               ` Mikulas Patocka
2014-11-05 14:50                                 ` Zdenek Kabelac
2014-11-05 16:10                                 ` Mike Snitzer
2014-11-05 16:56                                   ` Mike Snitzer
2014-11-05 20:16                                     ` Mike Snitzer
2014-11-05 16:29                                 ` Mike Snitzer
2014-11-07 16:20                                   ` Mikulas Patocka
2014-11-07 18:33                                     ` Mike Snitzer
2015-01-02 22:56                                       ` Mikulas Patocka
2015-01-05 15:50                                         ` Mike Snitzer
2015-01-08 23:52                                           ` [PATCH] dm: handle multiple internal suspends correctly (was: staged dm_internal_{suspend, resume} related changes for wider review) Mikulas Patocka
2014-11-07 23:16                                     ` [PATCH] Suspend all active bios when the pool is suspended " Mikulas Patocka
2014-11-07 23:29                                       ` Mike Snitzer
2014-11-09  1:17                                       ` [PATCH] " Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.