* staged dm_internal_{suspend, resume} related changes for wider review [not found] ` <20141029002125.GC29288@redhat.com> @ 2014-10-29 1:22 ` Mike Snitzer 2014-10-29 19:06 ` Mike Snitzer 2014-11-03 23:25 ` Mikulas Patocka 0 siblings, 2 replies; 23+ messages in thread From: Mike Snitzer @ 2014-10-29 1:22 UTC (permalink / raw) To: Zdenek Kabelac Cc: ejt, Bryn Reeves, Mikulas Patocka, dm-devel, Alasdair G. Kergon I pushed rebased versions of these patches to linux-dm.git's 'dm-for-3.19' branch (and pulled into 'for-next' purely to get the kernel.org autobuilders testing the code). See top 3 commits here: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19 I'm open to any and all changes or dropping code entirely; so me staging like this is just to keep the review ball rolling as quickly as possible. But in particular we need to get Mikulas' and Bryn's feedback on how the dm_internal_{suspend,resume} changes impact dm-stats. Mike ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: staged dm_internal_{suspend, resume} related changes for wider review 2014-10-29 1:22 ` staged dm_internal_{suspend, resume} related changes for wider review Mike Snitzer @ 2014-10-29 19:06 ` Mike Snitzer 2014-10-29 20:49 ` Mike Snitzer 2014-11-03 23:25 ` Mikulas Patocka 1 sibling, 1 reply; 23+ messages in thread From: Mike Snitzer @ 2014-10-29 19:06 UTC (permalink / raw) To: Alasdair G. Kergon Cc: ejt, Bryn Reeves, Mikulas Patocka, dm-devel, Zdenek Kabelac On Tue, Oct 28 2014 at 9:22pm -0400, Mike Snitzer <snitzer@redhat.com> wrote: > I pushed rebased versions of these patches to linux-dm.git's > 'dm-for-3.19' branch (and pulled into 'for-next' purely to get the > kernel.org autobuilders testing the code). See top 3 commits here: > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19 > > I'm open to any and all changes or dropping code entirely; so me staging > like this is just to keep the review ball rolling as quickly as > possible. > > But in particular we need to get Mikulas' and Bryn's feedback on how the > dm_internal_{suspend,resume} changes impact dm-stats. Rebased, and added a few more changes, see top 5 commits here: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: staged dm_internal_{suspend, resume} related changes for wider review 2014-10-29 19:06 ` Mike Snitzer @ 2014-10-29 20:49 ` Mike Snitzer 0 siblings, 0 replies; 23+ messages in thread From: Mike Snitzer @ 2014-10-29 20:49 UTC (permalink / raw) To: Alasdair G. Kergon Cc: ejt, Bryn Reeves, Mikulas Patocka, dm-devel, Zdenek Kabelac On Wed, Oct 29 2014 at 3:06pm -0400, Mike Snitzer <snitzer@redhat.com> wrote: > On Tue, Oct 28 2014 at 9:22pm -0400, > Mike Snitzer <snitzer@redhat.com> wrote: > > > I pushed rebased versions of these patches to linux-dm.git's > > 'dm-for-3.19' branch (and pulled into 'for-next' purely to get the > > kernel.org autobuilders testing the code). See top 3 commits here: > > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19 > > > > I'm open to any and all changes or dropping code entirely; so me staging > > like this is just to keep the review ball rolling as quickly as > > possible. > > > > But in particular we need to get Mikulas' and Bryn's feedback on how the > > dm_internal_{suspend,resume} changes impact dm-stats. > > Rebased, and added a few more changes, see top 5 commits here: > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19 FYI, Joe was kind enough to write a series of tests for these changes, see: https://github.com/jthornber/device-mapper-test-suite/commit/860555a3e838e995f8f2549deb9fd7ccebb0deb5 https://github.com/jthornber/device-mapper-test-suite/commit/24da8a852bce9ef94db98bdefdf5cd8cd0d5e2eb # dmtest run --profile fusionio --suite thin-provisioning -t /SuspendTests/ Loaded suite thin-provisioning Started SuspendTests suspend_pool_active_thins_no_io...PASS suspend_pool_after_suspend_thin...PASS suspend_pool_concurrent_suspend...PASS suspend_pool_no_active_thins...PASS suspend_pool_no_thins...PASS suspend_pool_resume_thin...PASS suspend_pool_suspended_thin...PASS Finished in 22.350814169 seconds. 7 tests, 0 assertions, 0 failures, 0 errors ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: staged dm_internal_{suspend, resume} related changes for wider review 2014-10-29 1:22 ` staged dm_internal_{suspend, resume} related changes for wider review Mike Snitzer 2014-10-29 19:06 ` Mike Snitzer @ 2014-11-03 23:25 ` Mikulas Patocka 2014-11-04 0:17 ` Mike Snitzer 1 sibling, 1 reply; 23+ messages in thread From: Mikulas Patocka @ 2014-11-03 23:25 UTC (permalink / raw) To: Mike Snitzer Cc: ejt, Bryn Reeves, dm-devel, Alasdair G. Kergon, Zdenek Kabelac On Tue, 28 Oct 2014, Mike Snitzer wrote: > I pushed rebased versions of these patches to linux-dm.git's > 'dm-for-3.19' branch (and pulled into 'for-next' purely to get the > kernel.org autobuilders testing the code). See top 3 commits here: > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19 > > I'm open to any and all changes or dropping code entirely; so me staging > like this is just to keep the review ball rolling as quickly as > possible. > > But in particular we need to get Mikulas' and Bryn's feedback on how the > dm_internal_{suspend,resume} changes impact dm-stats. > > Mike Hi As I said on irc - it is not correct to take a mutex from one syscall and drop the mutex from another syscall. I hope Joe can use the bio prison to block bios while the pool is suspended. Mikulas ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: staged dm_internal_{suspend, resume} related changes for wider review 2014-11-03 23:25 ` Mikulas Patocka @ 2014-11-04 0:17 ` Mike Snitzer 2014-11-05 1:16 ` Mike Snitzer 0 siblings, 1 reply; 23+ messages in thread From: Mike Snitzer @ 2014-11-04 0:17 UTC (permalink / raw) To: device-mapper development Cc: ejt, Bryn M. Reeves, Alasdair G Kergon, Zdenek Kabelac [-- Attachment #1.1: Type: text/plain, Size: 1078 bytes --] On Nov 3, 2014 6:27 PM, "Mikulas Patocka" <mpatocka@redhat.com> wrote: > > > > On Tue, 28 Oct 2014, Mike Snitzer wrote: > > > I pushed rebased versions of these patches to linux-dm.git's > > 'dm-for-3.19' branch (and pulled into 'for-next' purely to get the > > kernel.org autobuilders testing the code). See top 3 commits here: > > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19 > > > > I'm open to any and all changes or dropping code entirely; so me staging > > like this is just to keep the review ball rolling as quickly as > > possible. > > > > But in particular we need to get Mikulas' and Bryn's feedback on how the > > dm_internal_{suspend,resume} changes impact dm-stats. > > > > Mike > > Hi > > As I said on irc - it is not correct to take a mutex from one syscall and > drop the mutex from another syscall. I didn't modify that aspect of dm_internal_suspend+resume. I only extended the interface to other targets via export. > I hope Joe can use the bio prison to block bios while the pool is > suspended. We'll see. [-- Attachment #1.2: Type: text/html, Size: 1629 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: staged dm_internal_{suspend, resume} related changes for wider review 2014-11-04 0:17 ` Mike Snitzer @ 2014-11-05 1:16 ` Mike Snitzer 2014-11-05 12:43 ` Mikulas Patocka 0 siblings, 1 reply; 23+ messages in thread From: Mike Snitzer @ 2014-11-05 1:16 UTC (permalink / raw) To: Mikulas Patocka Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon, Zdenek Kabelac On Mon, Nov 03 2014 at 7:17pm -0500, Mike Snitzer <snitzer@redhat.com> wrote: > On Nov 3, 2014 6:27 PM, "Mikulas Patocka" <mpatocka@redhat.com> wrote: > > > > > > > > On Tue, 28 Oct 2014, Mike Snitzer wrote: > > > > > I pushed rebased versions of these patches to linux-dm.git's > > > 'dm-for-3.19' branch (and pulled into 'for-next' purely to get the > > > kernel.org autobuilders testing the code). See top 3 commits here: > > > > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19 > > > > > > I'm open to any and all changes or dropping code entirely; so me staging > > > like this is just to keep the review ball rolling as quickly as > > > possible. > > > > > > But in particular we need to get Mikulas' and Bryn's feedback on how the > > > dm_internal_{suspend,resume} changes impact dm-stats. > > > > > > Mike > > > > Hi > > > > As I said on irc - it is not correct to take a mutex from one syscall and > > drop the mutex from another syscall. > > I didn't modify that aspect of dm_internal_suspend+resume. I only extended > the interface to other targets via export. > > > I hope Joe can use the bio prison to block bios while the pool is > > suspended. > > We'll see. Joe said today: "bio-prison thing isn't going to work, since we need the worker thread to complete as part of the [thin-pool] suspend". And the following patch fixes the locking issue you raised: From: Mike Snitzer <snitzer@redhat.com> Date: Tue, 4 Nov 2014 15:38:59 -0500 Subject: [PATCH] dm: leverage DM_INTERNAL_SUSPEND_FLAG instead of abusing suspend_lock Update dm_internal_{suspend,resume} to always take and release the mapped_device's suspend_lock. Also update dm_{suspend,resume} to be aware of potential for DM_INTERNAL_SUSPEND_FLAG to be set and respond accordingly by interruptibly waiting for the DM_INTERNAL_SUSPEND_FLAG to be cleared. Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- drivers/md/dm.c | 41 +++++++++++++++++++++++++++++++++-------- 1 file changed, 33 insertions(+), 8 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 38cea89..ac0dcad 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -19,6 +19,7 @@ #include <linux/idr.h> #include <linux/hdreg.h> #include <linux/delay.h> +#include <linux/wait.h> #include <trace/events/block.h> @@ -2827,6 +2828,7 @@ int dm_suspend(struct mapped_device *md, unsigned suspend_flags) bool do_lockfs = suspend_flags & DM_SUSPEND_LOCKFS_FLAG; bool noflush = suspend_flags & DM_SUSPEND_NOFLUSH_FLAG; +retry: mutex_lock_nested(&md->suspend_lock, SINGLE_DEPTH_NESTING); if (dm_suspended_md(md)) { @@ -2834,6 +2836,15 @@ int dm_suspend(struct mapped_device *md, unsigned suspend_flags) goto out_unlock; } + if (dm_suspended_internally_md(md)) { + /* already internally suspended, wait for internal resume */ + mutex_unlock(&md->suspend_lock); + r = wait_on_bit(&md->flags, DMF_SUSPENDED_INTERNALLY, TASK_INTERRUPTIBLE); + if (r) + return r; + goto retry; + } + map = rcu_dereference(md->map); r = __dm_suspend(md, map, do_lockfs, noflush, true); @@ -2876,10 +2887,20 @@ int dm_resume(struct mapped_device *md) int r = -EINVAL; struct dm_table *map = NULL; +retry: mutex_lock(&md->suspend_lock); if (!dm_suspended_md(md)) goto out; + if (dm_suspended_internally_md(md)) { + /* already internally suspended, wait for internal resume */ + mutex_unlock(&md->suspend_lock); + r = wait_on_bit(&md->flags, DMF_SUSPENDED_INTERNALLY, TASK_INTERRUPTIBLE); + if (r) + return r; + goto retry; + } + map = rcu_dereference(md->map); if (!map || !dm_table_get_size(map)) goto out; @@ -2911,13 +2932,13 @@ static void __dm_internal_suspend(struct mapped_device *md, unsigned suspend_fla struct dm_table *map = NULL; bool noflush = suspend_flags & DM_SUSPEND_NOFLUSH_FLAG; + mutex_lock(&md->suspend_lock); if (WARN_ON(dm_suspended_internally_md(md))) - return; /* disallow nested internal suspends! */ + goto out; /* disallow nested internal suspends! */ - mutex_lock(&md->suspend_lock); if (dm_suspended_md(md)) { set_bit(DMF_SUSPENDED_INTERNALLY, &md->flags); - return; /* nest suspend */ + goto out; /* nest suspend */ } map = rcu_dereference(md->map); @@ -2928,6 +2949,8 @@ static void __dm_internal_suspend(struct mapped_device *md, unsigned suspend_fla set_bit(DMF_SUSPENDED_INTERNALLY, &md->flags); dm_table_postsuspend_targets(map); +out: + mutex_unlock(&md->suspend_lock); } void dm_internal_suspend(struct mapped_device *md) @@ -2946,13 +2969,12 @@ void dm_internal_resume(struct mapped_device *md) { struct dm_table *map = NULL; + mutex_lock(&md->suspend_lock); if (WARN_ON(!dm_suspended_internally_md(md))) - return; + goto done; - if (dm_suspended_md(md)) { - clear_bit(DMF_SUSPENDED_INTERNALLY, &md->flags); - goto done; /* resume from nested suspend */ - } + if (dm_suspended_md(md)) + goto done_clear_bit; /* resume from nested suspend */ map = rcu_dereference(md->map); if (WARN_ON(!map)) @@ -2964,7 +2986,10 @@ void dm_internal_resume(struct mapped_device *md) */ (void) __dm_resume(md, map, false); +done_clear_bit: clear_bit(DMF_SUSPENDED_INTERNALLY, &md->flags); + smp_mb__after_atomic(); + wake_up_bit(&md->flags, DMF_SUSPENDED_INTERNALLY); done: mutex_unlock(&md->suspend_lock); -- 1.9.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: staged dm_internal_{suspend, resume} related changes for wider review 2014-11-05 1:16 ` Mike Snitzer @ 2014-11-05 12:43 ` Mikulas Patocka 2014-11-05 13:05 ` Mikulas Patocka 0 siblings, 1 reply; 23+ messages in thread From: Mikulas Patocka @ 2014-11-05 12:43 UTC (permalink / raw) To: Mike Snitzer Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon, Zdenek Kabelac On Tue, 4 Nov 2014, Mike Snitzer wrote: > > > Hi > > > > > > As I said on irc - it is not correct to take a mutex from one syscall and > > > drop the mutex from another syscall. > > > > I didn't modify that aspect of dm_internal_suspend+resume. I only extended > > the interface to other targets via export. > > > > > I hope Joe can use the bio prison to block bios while the pool is > > > suspended. > > > > We'll see. > > Joe said today: "bio-prison thing isn't going to work, since we need the > worker thread to complete as part of the [thin-pool] suspend". If you block bios coming into the prison in pool's presuspend method (similar to this patch https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.19&id=2d616 ), the worker thread is still active, so it should process bios. You can for example set the flag in the prison meaning that the prison is suspended and then call dm_internal_suspend immediatelly followed by dm_internal_resume - that will clear in-progress bios and prevent new bios from coming in (and we don't need to change dm_internal_suspend and dm_internal_resume to become so big). > And the following patch fixes the locking issue you raised: ... it complicates the generic code even more than before. Mikulas ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: staged dm_internal_{suspend, resume} related changes for wider review 2014-11-05 12:43 ` Mikulas Patocka @ 2014-11-05 13:05 ` Mikulas Patocka 2014-11-05 14:11 ` Mike Snitzer 0 siblings, 1 reply; 23+ messages in thread From: Mikulas Patocka @ 2014-11-05 13:05 UTC (permalink / raw) To: Mike Snitzer Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon, Zdenek Kabelac On Wed, 5 Nov 2014, Mikulas Patocka wrote: > You can for example set the flag in the prison meaning that the prison is > suspended and then call dm_internal_suspend immediatelly followed by > dm_internal_resume - that will clear in-progress bios and prevent new bios > from coming in (and we don't need to change dm_internal_suspend and > dm_internal_resume to become so big). No, the correct sequence is this (do this in presuspend handler): 1. call dm_internal_suspend on all thin devices 2. set the flag in the prison meaning that the prison is blocked 3. call dm_internal_resume on all thin devices Mikulas ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: staged dm_internal_{suspend, resume} related changes for wider review 2014-11-05 13:05 ` Mikulas Patocka @ 2014-11-05 14:11 ` Mike Snitzer 2014-11-05 14:37 ` Mikulas Patocka 0 siblings, 1 reply; 23+ messages in thread From: Mike Snitzer @ 2014-11-05 14:11 UTC (permalink / raw) To: Mikulas Patocka Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon, Zdenek Kabelac On Wed, Nov 05 2014 at 8:05am -0500, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Wed, 5 Nov 2014, Mikulas Patocka wrote: > > > You can for example set the flag in the prison meaning that the prison is > > suspended and then call dm_internal_suspend immediatelly followed by > > dm_internal_resume - that will clear in-progress bios and prevent new bios > > from coming in (and we don't need to change dm_internal_suspend and > > dm_internal_resume to become so big). It may _seem_ like they have gotten big given the code was refactored to share code with dm_suspend and dm_resume. BUT I know you see that the actual code complexity isn't big. I especially wanted you (and/or Bryn) to evaluate the performance implications that my changes had on dm-stats. I'm pretty confident there won't be much if any performance difference (given the code is identical to what you had, except some extra checks are made but ultimately not used, e.g. lockfs/unlockfs). The end result of the dm_internal_{suspend,resume} changes is an interface that is useful for DM targets in addition to dm-stats. That is the kind of advancement DM needs. Please focus on the performance impact of my changes, if any, and we'll go from there. > No, the correct sequence is this (do this in presuspend handler): > > 1. call dm_internal_suspend on all thin devices > 2. set the flag in the prison meaning that the prison is blocked > 3. call dm_internal_resume on all thin devices I really didn't like the idea of reusing the bio-prison to achieve the suspend of all thins to begin with. This proposal is even more suspect given the desire to call dm_internal_suspend and dm_internal_resume from pool_presuspend. It just isn't code that I want to see making its way into the tree. Sets bad precedent of hacking around problems within a target for questionable gain (questionable in that there really isn't a pattern/infrastructure for other more complex targets to follow so they'd need to invent their own hack should they have a comparable problem). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: staged dm_internal_{suspend, resume} related changes for wider review 2014-11-05 14:11 ` Mike Snitzer @ 2014-11-05 14:37 ` Mikulas Patocka 2014-11-05 14:50 ` Zdenek Kabelac ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Mikulas Patocka @ 2014-11-05 14:37 UTC (permalink / raw) To: Mike Snitzer Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon, Zdenek Kabelac On Wed, 5 Nov 2014, Mike Snitzer wrote: > On Wed, Nov 05 2014 at 8:05am -0500, > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Wed, 5 Nov 2014, Mikulas Patocka wrote: > > > > > You can for example set the flag in the prison meaning that the prison is > > > suspended and then call dm_internal_suspend immediatelly followed by > > > dm_internal_resume - that will clear in-progress bios and prevent new bios > > > from coming in (and we don't need to change dm_internal_suspend and > > > dm_internal_resume to become so big). > > It may _seem_ like they have gotten big given the code was refactored to > share code with dm_suspend and dm_resume. BUT I know you see that the > actual code complexity isn't big. I especially wanted you (and/or Bryn) > to evaluate the performance implications that my changes had on > dm-stats. I'm pretty confident there won't be much if any performance > difference (given the code is identical to what you had, except some > extra checks are made but ultimately not used, e.g. lockfs/unlockfs). This is not about performance, it is about unclear behavior. If someone does internal_suspend followed by remove, what should be the correct behavior? The current code deadlocks in this case. The patch series introduces two suspend mechanisms and it is unclear how should they interact with each other. > The end result of the dm_internal_{suspend,resume} changes is an > interface that is useful for DM targets in addition to dm-stats. That > is the kind of advancement DM needs. > > Please focus on the performance impact of my changes, if any, and we'll > go from there. > > > No, the correct sequence is this (do this in presuspend handler): > > > > 1. call dm_internal_suspend on all thin devices > > 2. set the flag in the prison meaning that the prison is blocked > > 3. call dm_internal_resume on all thin devices > > I really didn't like the idea of reusing the bio-prison to achieve the > suspend of all thins to begin with. This proposal is even more suspect > given the desire to call dm_internal_suspend and dm_internal_resume from > pool_presuspend. > > It just isn't code that I want to see making its way into the tree. > Sets bad precedent of hacking around problems within a target for > questionable gain (questionable in that there really isn't a > pattern/infrastructure for other more complex targets to follow so > they'd need to invent their own hack should they have a comparable > problem). At least the hack stays within dm-thin and generic dm code isn't contaminated by it. Mikulas ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: staged dm_internal_{suspend, resume} related changes for wider review 2014-11-05 14:37 ` Mikulas Patocka @ 2014-11-05 14:50 ` Zdenek Kabelac 2014-11-05 16:10 ` Mike Snitzer 2014-11-05 16:29 ` Mike Snitzer 2 siblings, 0 replies; 23+ messages in thread From: Zdenek Kabelac @ 2014-11-05 14:50 UTC (permalink / raw) To: Mikulas Patocka, Mike Snitzer Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon Dne 5.11.2014 v 15:37 Mikulas Patocka napsal(a): > > > On Wed, 5 Nov 2014, Mike Snitzer wrote: > >> On Wed, Nov 05 2014 at 8:05am -0500, >> Mikulas Patocka <mpatocka@redhat.com> wrote: >> >>> On Wed, 5 Nov 2014, Mikulas Patocka wrote: >>> >>>> You can for example set the flag in the prison meaning that the prison is >>>> suspended and then call dm_internal_suspend immediatelly followed by >>>> dm_internal_resume - that will clear in-progress bios and prevent new bios >>>> from coming in (and we don't need to change dm_internal_suspend and >>>> dm_internal_resume to become so big). >> >> It may _seem_ like they have gotten big given the code was refactored to >> share code with dm_suspend and dm_resume. BUT I know you see that the >> actual code complexity isn't big. I especially wanted you (and/or Bryn) >> to evaluate the performance implications that my changes had on >> dm-stats. I'm pretty confident there won't be much if any performance >> difference (given the code is identical to what you had, except some >> extra checks are made but ultimately not used, e.g. lockfs/unlockfs). > > This is not about performance, it is about unclear behavior. > > If someone does internal_suspend followed by remove, what should be the > correct behavior? The current code deadlocks in this case. > yep - that would my concern as well. If this 'internal' suspend is purely 'enforced' by incorrectly behaving user space part (aka lvm2 is not doing it's best) I assume it's better to fix user space - instead of moving it into kernel with some side-effects. Zdenek ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: staged dm_internal_{suspend, resume} related changes for wider review 2014-11-05 14:37 ` Mikulas Patocka 2014-11-05 14:50 ` Zdenek Kabelac @ 2014-11-05 16:10 ` Mike Snitzer 2014-11-05 16:56 ` Mike Snitzer 2014-11-05 16:29 ` Mike Snitzer 2 siblings, 1 reply; 23+ messages in thread From: Mike Snitzer @ 2014-11-05 16:10 UTC (permalink / raw) To: Mikulas Patocka Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon, Zdenek Kabelac On Wed, Nov 05 2014 at 9:37am -0500, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Wed, 5 Nov 2014, Mike Snitzer wrote: > > > On Wed, Nov 05 2014 at 8:05am -0500, > > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > On Wed, 5 Nov 2014, Mikulas Patocka wrote: > > > > > > > You can for example set the flag in the prison meaning that the prison is > > > > suspended and then call dm_internal_suspend immediatelly followed by > > > > dm_internal_resume - that will clear in-progress bios and prevent new bios > > > > from coming in (and we don't need to change dm_internal_suspend and > > > > dm_internal_resume to become so big). > > > > It may _seem_ like they have gotten big given the code was refactored to > > share code with dm_suspend and dm_resume. BUT I know you see that the > > actual code complexity isn't big. I especially wanted you (and/or Bryn) > > to evaluate the performance implications that my changes had on > > dm-stats. I'm pretty confident there won't be much if any performance > > difference (given the code is identical to what you had, except some > > extra checks are made but ultimately not used, e.g. lockfs/unlockfs). > > This is not about performance, it is about unclear behavior. > > If someone does internal_suspend followed by remove, what should be the > correct behavior? The current code deadlocks in this case. You need to be specific, if internal suspend was used by the thin-pool suspend to suspend thin devices you'll need the thin-pool resumed before you can remove any thin device! Like any interface there is a right way and a wrong way to use it. dm_internal_suspend must always be followed by dm_internal_resume. I cannot yet see a hole where properly written code is exposed here. > The patch series introduces two suspend mechanisms and it is unclear how > should they interact with each other. > > > The end result of the dm_internal_{suspend,resume} changes is an > > interface that is useful for DM targets in addition to dm-stats. That > > is the kind of advancement DM needs. > > > > Please focus on the performance impact of my changes, if any, and we'll > > go from there. > > > > > No, the correct sequence is this (do this in presuspend handler): > > > > > > 1. call dm_internal_suspend on all thin devices > > > 2. set the flag in the prison meaning that the prison is blocked > > > 3. call dm_internal_resume on all thin devices > > > > I really didn't like the idea of reusing the bio-prison to achieve the > > suspend of all thins to begin with. This proposal is even more suspect > > given the desire to call dm_internal_suspend and dm_internal_resume from > > pool_presuspend. > > > > It just isn't code that I want to see making its way into the tree. > > Sets bad precedent of hacking around problems within a target for > > questionable gain (questionable in that there really isn't a > > pattern/infrastructure for other more complex targets to follow so > > they'd need to invent their own hack should they have a comparable > > problem). > > At least the hack stays within dm-thin and generic dm code isn't > contaminated by it. I've improved dm_internal_suspend and dm_internal_resume to not be only serving dm-stats. And the complexity isn't as bad as you'd like others to think. Please stop with the FUD.. I won't accept FUD as the basis for dismissing changes. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: staged dm_internal_{suspend, resume} related changes for wider review 2014-11-05 16:10 ` Mike Snitzer @ 2014-11-05 16:56 ` Mike Snitzer 2014-11-05 20:16 ` Mike Snitzer 0 siblings, 1 reply; 23+ messages in thread From: Mike Snitzer @ 2014-11-05 16:56 UTC (permalink / raw) To: Mikulas Patocka Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon, Zdenek Kabelac On Wed, Nov 05 2014 at 11:10am -0500, Mike Snitzer <snitzer@redhat.com> wrote: > On Wed, Nov 05 2014 at 9:37am -0500, > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > On Wed, 5 Nov 2014, Mike Snitzer wrote: > > > > > On Wed, Nov 05 2014 at 8:05am -0500, > > > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > On Wed, 5 Nov 2014, Mikulas Patocka wrote: > > > > > > > > > You can for example set the flag in the prison meaning that the prison is > > > > > suspended and then call dm_internal_suspend immediatelly followed by > > > > > dm_internal_resume - that will clear in-progress bios and prevent new bios > > > > > from coming in (and we don't need to change dm_internal_suspend and > > > > > dm_internal_resume to become so big). > > > > > > It may _seem_ like they have gotten big given the code was refactored to > > > share code with dm_suspend and dm_resume. BUT I know you see that the > > > actual code complexity isn't big. I especially wanted you (and/or Bryn) > > > to evaluate the performance implications that my changes had on > > > dm-stats. I'm pretty confident there won't be much if any performance > > > difference (given the code is identical to what you had, except some > > > extra checks are made but ultimately not used, e.g. lockfs/unlockfs). > > > > This is not about performance, it is about unclear behavior. > > > > If someone does internal_suspend followed by remove, what should be the > > correct behavior? The current code deadlocks in this case. > > You need to be specific, if internal suspend was used by the thin-pool > suspend to suspend thin devices you'll need the thin-pool resumed before > you can remove any thin device! > > Like any interface there is a right way and a wrong way to use it. > dm_internal_suspend must always be followed by dm_internal_resume. > I cannot yet see a hole where properly written code is exposed here. But thinking further about what you said, you're correctly concerned about the potential for dm-stats to have used dm_internal_suspend and then someone attempting to remove that device while it is internally suspended. We just need a patch like this: drivers/md/dm.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index ac0dcad..ffa6763 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -448,6 +448,17 @@ int dm_lock_for_deletion(struct mapped_device *md, bool mark_deferred, bool only { int r = 0; +retry: + mutex_lock(&md->suspend_lock); + if (dm_suspended_internally_md(md)) { + /* already internally suspended, wait for internal resume */ + mutex_unlock(&md->suspend_lock); + r = wait_on_bit(&md->flags, DMF_SUSPENDED_INTERNALLY, TASK_INTERRUPTIBLE); + if (r) + return r; + goto retry; + } + spin_lock(&_minor_lock); if (dm_open_count(md)) { @@ -461,6 +472,8 @@ int dm_lock_for_deletion(struct mapped_device *md, bool mark_deferred, bool only spin_unlock(&_minor_lock); + mutex_unlock(&md->suspend_lock); + return r; } ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: staged dm_internal_{suspend, resume} related changes for wider review 2014-11-05 16:56 ` Mike Snitzer @ 2014-11-05 20:16 ` Mike Snitzer 0 siblings, 0 replies; 23+ messages in thread From: Mike Snitzer @ 2014-11-05 20:16 UTC (permalink / raw) To: Mikulas Patocka Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon, Zdenek Kabelac On Wed, Nov 05 2014 at 11:56am -0500, Mike Snitzer <snitzer@redhat.com> wrote: > On Wed, Nov 05 2014 at 11:10am -0500, > Mike Snitzer <snitzer@redhat.com> wrote: > > > On Wed, Nov 05 2014 at 9:37am -0500, > > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > This is not about performance, it is about unclear behavior. > > > > > > If someone does internal_suspend followed by remove, what should be the > > > correct behavior? The current code deadlocks in this case. > > > > You need to be specific, if internal suspend was used by the thin-pool > > suspend to suspend thin devices you'll need the thin-pool resumed before > > you can remove any thin device! > > > > Like any interface there is a right way and a wrong way to use it. > > dm_internal_suspend must always be followed by dm_internal_resume. > > I cannot yet see a hole where properly written code is exposed here. > > But thinking further about what you said, you're correctly concerned > about the potential for dm-stats to have used dm_internal_suspend and > then someone attempting to remove that device while it is internally > suspended. As we discussed (but for benefit of others): dm-stats does the dm_internal_suspend+dm_internal_resume within a single DM message (ioctl) so there is no potential for race with device delete. > We just need a patch like this: That untested patch was flawed, caused deadlock due to wait_on_bit() while holding _hash_lock -- so resume ioctl wasn't possible. So I'm still really not sure what your original point was about "If someone does internal_suspend followed by remove, what should be the correct behavior? The current code deadlocks in this case." There is no deadlock: dmsetup suspend thin-pool dmsetup remove thin-thin1 dmsetup resume thin-pool the thin-pool suspend will internal suspend thin-thin1, but I can still remove thin-thin1 (provided it isn't in-use), and later resuming the thin-pool works fine too. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: staged dm_internal_{suspend, resume} related changes for wider review 2014-11-05 14:37 ` Mikulas Patocka 2014-11-05 14:50 ` Zdenek Kabelac 2014-11-05 16:10 ` Mike Snitzer @ 2014-11-05 16:29 ` Mike Snitzer 2014-11-07 16:20 ` Mikulas Patocka 2 siblings, 1 reply; 23+ messages in thread From: Mike Snitzer @ 2014-11-05 16:29 UTC (permalink / raw) To: Mikulas Patocka Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon, Zdenek Kabelac On Wed, Nov 05 2014 at 9:37am -0500, Mikulas Patocka <mpatocka@redhat.com> wrote: > The patch series introduces two suspend mechanisms and it is unclear how > should they interact with each other. And this point is not correct. As you know dm_internal_suspend and dm_internal_resume interface predates any of my changes. That existing interface was extended them to be (mostly) fully formed equivalents of dm_suspend() and dm_resume(). I say "mostly" because dm_internal_resume() doesn't call into the targets' resume hooks because no existing callers (dm-stats or dm-thinp) need to. But obviously dm_resume() does need to so it passes @resume_targets as true to __dm_resume(). I'm not trying to suggest there is a bug or bugs in this new code (you already pointed out the locking issue across ioctls that I fixed). But a bug doesn't implicitly mean this is an imperfect way forward -- if/when a bug is found we'll deal with it.. so feel free to pour over this code to see if there is a bug or bugs. I really do welcome your review -- I would just like technical issues to be the focus of any technical review. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: staged dm_internal_{suspend, resume} related changes for wider review 2014-11-05 16:29 ` Mike Snitzer @ 2014-11-07 16:20 ` Mikulas Patocka 2014-11-07 18:33 ` Mike Snitzer 2014-11-07 23:16 ` [PATCH] Suspend all active bios when the pool is suspended " Mikulas Patocka 0 siblings, 2 replies; 23+ messages in thread From: Mikulas Patocka @ 2014-11-07 16:20 UTC (permalink / raw) To: Mike Snitzer Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon, Zdenek Kabelac On Wed, 5 Nov 2014, Mike Snitzer wrote: > On Wed, Nov 05 2014 at 9:37am -0500, > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > The patch series introduces two suspend mechanisms and it is unclear how > > should they interact with each other. > > And this point is not correct. As you know dm_internal_suspend and > dm_internal_resume interface predates any of my changes. > > That existing interface was extended them to be (mostly) fully formed > equivalents of dm_suspend() and dm_resume(). > > I say "mostly" because dm_internal_resume() doesn't call into the targets' > resume hooks because no existing callers (dm-stats or dm-thinp) need > to. But obviously dm_resume() does need to so it passes @resume_targets > as true to __dm_resume(). > > I'm not trying to suggest there is a bug or bugs in this new code (you > already pointed out the locking issue across ioctls that I fixed). > > But a bug doesn't implicitly mean this is an imperfect way forward -- > if/when a bug is found we'll deal with it.. so feel free to pour over > this code to see if there is a bug or bugs. I really do welcome your > review -- I would just like technical issues to be the focus of any > technical review. Problems with that patch set: 1. You walk all thin targets in pool_presuspend and call internal_suspend on them and then again in pool_resume call internal_resume on them. Between calls to pool_presuspend and pool_resume, dm-thin devices may be added or removed, resulting in unballanced calls. 2. You walk all thin targets and call internal_suspend on them. If two thin targets are in one dm table, it calls internal_suspend twice on the same md device. It also calls internal_suspend twice on the same md device if the device has both active and inactive table with a thin target. 3. The device may be suspended internally by pool presuspend and by statistics at the same time - the code doesn't handle that: if (WARN_ON(dm_suspended_internally_md(md))) goto out; /* disallow nested internal suspends! */ 4. when pool_presuspend is called, md->suspend_lock is alrady held. Taking md->suspend_lock on another device results in lockdep warning. For reasons 1 and 2, I wouldn't really deal with "thin" targets at all - they may be created or deleted independent on pool status. Instead, we should block all active bios inside the pool - the bios are already registered in dm_deferred_set or in the prison, so all you need to do is to set a flag pool's presuspend method that causes all new bios to be queues and the wait until the prison is empty and the counters in deferred_set reach zero. Mikulas ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: staged dm_internal_{suspend, resume} related changes for wider review 2014-11-07 16:20 ` Mikulas Patocka @ 2014-11-07 18:33 ` Mike Snitzer 2015-01-02 22:56 ` Mikulas Patocka 2014-11-07 23:16 ` [PATCH] Suspend all active bios when the pool is suspended " Mikulas Patocka 1 sibling, 1 reply; 23+ messages in thread From: Mike Snitzer @ 2014-11-07 18:33 UTC (permalink / raw) To: Mikulas Patocka Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon, Zdenek Kabelac On Fri, Nov 07 2014 at 11:20am -0500, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Wed, 5 Nov 2014, Mike Snitzer wrote: > > > On Wed, Nov 05 2014 at 9:37am -0500, > > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > The patch series introduces two suspend mechanisms and it is unclear how > > > should they interact with each other. > > > > And this point is not correct. As you know dm_internal_suspend and > > dm_internal_resume interface predates any of my changes. > > > > That existing interface was extended them to be (mostly) fully formed > > equivalents of dm_suspend() and dm_resume(). > > > > I say "mostly" because dm_internal_resume() doesn't call into the targets' > > resume hooks because no existing callers (dm-stats or dm-thinp) need > > to. But obviously dm_resume() does need to so it passes @resume_targets > > as true to __dm_resume(). > > > > I'm not trying to suggest there is a bug or bugs in this new code (you > > already pointed out the locking issue across ioctls that I fixed). > > > > But a bug doesn't implicitly mean this is an imperfect way forward -- > > if/when a bug is found we'll deal with it.. so feel free to pour over > > this code to see if there is a bug or bugs. I really do welcome your > > review -- I would just like technical issues to be the focus of any > > technical review. > > Problems with that patch set: > > 1. You walk all thin targets in pool_presuspend and call internal_suspend > on them and then again in pool_resume call internal_resume on them. > Between calls to pool_presuspend and pool_resume, dm-thin devices may be > added or removed, resulting in unballanced calls. No, we cannot add/remove or activate/deactivate thin devices while the pool is suspended. Those operations should block until the pool is resumed. I'll audit/fix accordingly. > 2. You walk all thin targets and call internal_suspend on them. If two > thin targets are in one dm table, it calls internal_suspend twice on the > same md device. It also calls internal_suspend twice on the same md device > if the device has both active and inactive table with a thin target. While I see your point, a thin DM table will never have multiple thin targets in it. This probably _should_ be enforced by adding DM_TARGET_SINGLETON to the thin_target's .features (should also probably have DM_TARGET_IMMUTABLE). > 3. The device may be suspended internally by pool presuspend and by > statistics at the same time - the code doesn't handle that: > if (WARN_ON(dm_suspended_internally_md(md))) goto out; /* disallow nested > internal suspends! */ Shouldn't that be fine? Just implies the stats won't be as precise... BUT, the WARN_ON() is clearly overkill and needs to be removed. > 4. when pool_presuspend is called, md->suspend_lock is alrady held. Taking > md->suspend_lock on another device results in lockdep warning. That should be fixed with proper lockdep training. As I thought was the case by the mutex_lock_nested() I added to dm_suspend(). But I'll recompile my kernel with lockdep enabled and silence lockdep once and for all if it is still complaining. > For reasons 1 and 2, I wouldn't really deal with "thin" targets at all - > they may be created or deleted independent on pool status. Instead, we > should block all active bios inside the pool - the bios are already > registered in dm_deferred_set or in the prison, so all you need to do is > to set a flag pool's presuspend method that causes all new bios to be > queues and the wait until the prison is empty and the counters in > deferred_set reach zero. Given my explanation above, reasons 1 and 2 shouldn't really be a concern in the end. Thanks for the review! ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: staged dm_internal_{suspend, resume} related changes for wider review 2014-11-07 18:33 ` Mike Snitzer @ 2015-01-02 22:56 ` Mikulas Patocka 2015-01-05 15:50 ` Mike Snitzer 0 siblings, 1 reply; 23+ messages in thread From: Mikulas Patocka @ 2015-01-02 22:56 UTC (permalink / raw) To: Mike Snitzer Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon, Zdenek Kabelac Hi, I'm back. I looked at current internal suspend code in 3.19. One thing that seems strange is this: static void __dm_internal_suspend(struct mapped_device *md, unsigned suspend_flags) { struct dm_table *map = NULL; if (dm_suspended_internally_md(md)) return; /* nested internal suspend */ .... static void __dm_internal_resume(struct mapped_device *md) { if (!dm_suspended_internally_md(md)) return; /* resume from nested internal suspend */ - the comments imply that the code supports nested internal suspend, while in fact it doesn't. If the caller calls: * dm_internal_suspend_noflush * dm_internal_suspend_noflush * dm_internal_resume the device is not suspended while it should be. The "if" branch in __dm_internal_resume has nothing to do with nested suspend, it's called if the caller calls dm_internal_resume without dm_internal_suspend_noflush - that shouldn't happen, so there should be WARN or BUG. If you want to support multiple internal suspends (which is needed if one table can have multiple dm-thin targets), you need a counter that counts how many times the device was suspended - the counter doesn't have to be atomic because the code already holds md->suspend_lock. If you don't want to support multiple internal suspends, there should be at least WARN when the caller attempts to do it. Mikulas ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: staged dm_internal_{suspend, resume} related changes for wider review 2015-01-02 22:56 ` Mikulas Patocka @ 2015-01-05 15:50 ` Mike Snitzer 2015-01-08 23:52 ` [PATCH] dm: handle multiple internal suspends correctly (was: staged dm_internal_{suspend, resume} related changes for wider review) Mikulas Patocka 0 siblings, 1 reply; 23+ messages in thread From: Mike Snitzer @ 2015-01-05 15:50 UTC (permalink / raw) To: Mikulas Patocka Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon, Zdenek Kabelac On Fri, Jan 02 2015 at 5:56pm -0500, Mikulas Patocka <mpatocka@redhat.com> wrote: > Hi, I'm back. > > I looked at current internal suspend code in 3.19. > > One thing that seems strange is this: > > static void __dm_internal_suspend(struct mapped_device *md, unsigned suspend_flags) > { > struct dm_table *map = NULL; > > if (dm_suspended_internally_md(md)) > return; /* nested internal suspend */ > .... > static void __dm_internal_resume(struct mapped_device *md) > { > if (!dm_suspended_internally_md(md)) > return; /* resume from nested internal suspend */ > > - the comments imply that the code supports nested internal suspend, while > in fact it doesn't. If the caller calls: > * dm_internal_suspend_noflush > * dm_internal_suspend_noflush > * dm_internal_resume > the device is not suspended while it should be. Yes, very true, it should. But IIRC in practice it doesn't hurt us to not be bothered about being so precise given the current caller. > The "if" branch in __dm_internal_resume has nothing to do with nested > suspend, it's called if the caller calls dm_internal_resume without > dm_internal_suspend_noflush - that shouldn't happen, so there should be > WARN or BUG. The first internal resume would've cleared the DM_INTERNAL_SUSPEND_FLAG, etc. So AFAICT the code in question does also cover multiple internal suspends/resumes (albeit without the counter you've suggested). But I did narrowly define "nested suspend" as an internal suspend that arrives _after_ a normal suspend. From the header of commit ffcc39364 ("dm: enhance internal suspend and resume interface"): Both DM_SUSPEND_FLAG and DM_INTERNAL_SUSPEND_FLAG may be set if a device was already suspended when dm_internal_suspend_noflush() was called -- this can be thought of as a "nested suspend". A "nested suspend" can occur with legacy userspace dm-thin code that might suspend all active thin volumes before suspending the pool for resize. Could be the reality is "nested suspend" is also supporting the case of multiple internal suspends. > If you want to support multiple internal suspends (which is needed if one > table can have multiple dm-thin targets), you need a counter that counts > how many times the device was suspended - the counter doesn't have to be > atomic because the code already holds md->suspend_lock. If you don't want > to support multiple internal suspends, there should be at least WARN when > the caller attempts to do it. I'll have a look. Will need some time to revisit this line of work. FYI, the device-mapper-test-suite has fairly comprehensive test coverage for all supported scenarios, e.g.: # dmtest run --suite thin-provisioning -t /SuspendTests/ Loaded suite thin-provisioning Started SuspendTests nested_internal_suspend_using_inactive_table...PASS suspend_pool_active_thins_no_io...PASS suspend_pool_after_suspend_thin...PASS suspend_pool_concurrent_suspend...PASS suspend_pool_no_active_thins...PASS suspend_pool_no_thins...PASS suspend_pool_resume_thin...PASS suspend_pool_suspended_thin...PASS wait_on_bit_during_resume...PASS wait_on_bit_during_suspend...PASS # dmtest run --suite thin-provisioning -n activate_thin_while_pool_suspended_fails Loaded suite thin-provisioning Started CreationTests activate_thin_while_pool_suspended_fails...PASS So any changes in this area need to be verified using dmts. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] dm: handle multiple internal suspends correctly (was: staged dm_internal_{suspend, resume} related changes for wider review) 2015-01-05 15:50 ` Mike Snitzer @ 2015-01-08 23:52 ` Mikulas Patocka 0 siblings, 0 replies; 23+ messages in thread From: Mikulas Patocka @ 2015-01-08 23:52 UTC (permalink / raw) To: Mike Snitzer Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon, Zdenek Kabelac On Mon, 5 Jan 2015, Mike Snitzer wrote: > On Fri, Jan 02 2015 at 5:56pm -0500, > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > Hi, I'm back. > > > > I looked at current internal suspend code in 3.19. > > > > One thing that seems strange is this: > > > > static void __dm_internal_suspend(struct mapped_device *md, unsigned suspend_flags) > > { > > struct dm_table *map = NULL; > > > > if (dm_suspended_internally_md(md)) > > return; /* nested internal suspend */ > > .... > > static void __dm_internal_resume(struct mapped_device *md) > > { > > if (!dm_suspended_internally_md(md)) > > return; /* resume from nested internal suspend */ > > > > - the comments imply that the code supports nested internal suspend, while > > in fact it doesn't. If the caller calls: > > * dm_internal_suspend_noflush > > * dm_internal_suspend_noflush > > * dm_internal_resume > > the device is not suspended while it should be. > > Yes, very true, it should. But IIRC in practice it doesn't hurt us to > not be bothered about being so precise given the current caller. If the device is not suspended while it should be, it could lead to various subtle bugs. We should either disallow multiple internal suspends (print warning when that happens) or handle it correctly. Here I'm sending a patch to fix it. Mikulas From: Mikulas Patocka <mpatocka@redhat.com> The current code attempts to handle multiple internal suspends on the same device, but it does that incorrectly. When these functions are called in this order on the same device dm_internal_suspend_noflush dm_internal_suspend_noflush dm_internal_resume the device is not suspended, while it should be. This patch fixes the bug by maintaining a counter of suspends, and resuming the device when the counter drops to zero. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/md/dm.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) Index: linux-3.19-rc3/drivers/md/dm.c =================================================================== --- linux-3.19-rc3.orig/drivers/md/dm.c 2015-01-08 22:04:23.000000000 +0100 +++ linux-3.19-rc3/drivers/md/dm.c 2015-01-08 22:22:14.000000000 +0100 @@ -206,6 +206,9 @@ struct mapped_device { /* zero-length flush that will be cloned and submitted to targets */ struct bio flush_bio; + /* the number of internal suspends */ + unsigned internal_suspend_count; + struct dm_stats stats; }; @@ -3023,7 +3026,7 @@ static void __dm_internal_suspend(struct { struct dm_table *map = NULL; - if (dm_suspended_internally_md(md)) + if (md->internal_suspend_count++) return; /* nested internal suspend */ if (dm_suspended_md(md)) { @@ -3048,7 +3051,9 @@ static void __dm_internal_suspend(struct static void __dm_internal_resume(struct mapped_device *md) { - if (!dm_suspended_internally_md(md)) + BUG_ON(!md->internal_suspend_count); + + if (--md->internal_suspend_count) return; /* resume from nested internal suspend */ if (dm_suspended_md(md)) ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] Suspend all active bios when the pool is suspended (was: staged dm_internal_{suspend, resume} related changes for wider review) 2014-11-07 16:20 ` Mikulas Patocka 2014-11-07 18:33 ` Mike Snitzer @ 2014-11-07 23:16 ` Mikulas Patocka 2014-11-07 23:29 ` Mike Snitzer 2014-11-09 1:17 ` [PATCH] " Mike Snitzer 1 sibling, 2 replies; 23+ messages in thread From: Mikulas Patocka @ 2014-11-07 23:16 UTC (permalink / raw) To: Mike Snitzer Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon, Zdenek Kabelac On Fri, 7 Nov 2014, Mikulas Patocka wrote: > For reasons 1 and 2, I wouldn't really deal with "thin" targets at all - > they may be created or deleted independent on pool status. Instead, we > should block all active bios inside the pool - the bios are already > registered in dm_deferred_set or in the prison, so all you need to do is > to set a flag pool's presuspend method that causes all new bios to be > queues and the wait until the prison is empty and the counters in > deferred_set reach zero. Here I'm sending a proof-of-concept patch that makes it possible to prevent bios from coming into the thin pool when the pool is suspended. It doesn't modify any generic dm code. This patch may not be perfect (I don't know all bio paths in dm-thin in detail, so I may have missed something), but it shows that it is possible to suspend all bios without modifying the common dm code. I hope Joe reviewes it and possibly fixes it. The patch is for 3.18-rc3, it needs other thin patches, the full patch series is at http://people.redhat.com/~mpatocka/patches/kernel/dm-thin-suspend/series.html Mikulas Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/md/dm-bio-prison.c | 29 +++++++++++++++++++++++++++ drivers/md/dm-bio-prison.h | 3 ++ drivers/md/dm-thin.c | 47 +++++++++++++++++++++++++++++++++++++++++---- 3 files changed, 75 insertions(+), 4 deletions(-) Index: linux-2.6/drivers/md/dm-thin.c =================================================================== --- linux-2.6.orig/drivers/md/dm-thin.c 2014-11-08 00:02:42.000000000 +0100 +++ linux-2.6/drivers/md/dm-thin.c 2014-11-08 00:02:42.000000000 +0100 @@ -194,6 +194,8 @@ struct pool { struct dm_thin_new_mapping *next_mapping; mempool_t *mapping_pool; + atomic_t suspended; + process_bio_fn process_bio; process_bio_fn process_discard; @@ -1532,6 +1534,9 @@ static void process_thin_deferred_bios(s return; } + if (unlikely(atomic_read(&pool->suspended))) + return; + bio_list_init(&bios); spin_lock_irqsave(&tc->lock, flags); @@ -1930,7 +1935,8 @@ static int thin_bio_map(struct dm_target return DM_MAPIO_SUBMITTED; } - if (bio->bi_rw & (REQ_DISCARD | REQ_FLUSH | REQ_FUA)) { + if (bio->bi_rw & (REQ_DISCARD | REQ_FLUSH | REQ_FUA) || + unlikely(atomic_read(&tc->pool->suspended))) { thin_defer_bio(tc, bio); return DM_MAPIO_SUBMITTED; } @@ -1943,6 +1949,12 @@ static int thin_bio_map(struct dm_target if (dm_bio_detain(tc->pool->prison, &key, bio, &cell1, &cell_result)) return DM_MAPIO_SUBMITTED; + if (unlikely(atomic_read(&tc->pool->suspended))) { + thin_defer_bio(tc, bio); + cell_defer_no_holder_no_free(tc, &cell1); + return DM_MAPIO_SUBMITTED; + } + r = dm_thin_find_block(td, block, 0, &result); /* @@ -2227,6 +2239,7 @@ static struct pool *pool_create(struct m INIT_LIST_HEAD(&pool->prepared_discards); INIT_LIST_HEAD(&pool->active_thins); pool->low_water_triggered = false; + atomic_set(&pool->suspended, 1); pool->shared_read_ds = dm_deferred_set_create(); if (!pool->shared_read_ds) { @@ -2772,22 +2785,46 @@ static void pool_resume(struct dm_target spin_lock_irqsave(&pool->lock, flags); pool->low_water_triggered = false; spin_unlock_irqrestore(&pool->lock, flags); - requeue_bios(pool); - do_waker(&pool->waker.work); + if (atomic_dec_and_test(&pool->suspended)) { + requeue_bios(pool); + do_waker(&pool->waker.work); + } } -static void pool_postsuspend(struct dm_target *ti) +static void pool_presuspend(struct dm_target *ti) { struct pool_c *pt = ti->private; struct pool *pool = pt->pool; + atomic_inc(&pool->suspended); + + dm_drain_prison(pool->prison); + + dm_drain_deferred_set(pool->shared_read_ds); + dm_drain_deferred_set(pool->all_io_ds); + cancel_delayed_work(&pool->waker); cancel_delayed_work(&pool->no_space_timeout); flush_workqueue(pool->wq); + + /* ??? should this be before or after flush_workqueue ? */ + dm_drain_deferred_set(pool->shared_read_ds); + dm_drain_deferred_set(pool->all_io_ds); + (void) commit(pool); } +static void pool_presuspend_undo(struct dm_target *ti) +{ + pool_resume(ti); +} + +static void pool_postsuspend(struct dm_target *ti) +{ + /* nothing */ +} + static int check_arg_count(unsigned argc, unsigned args_required) { if (argc != args_required) { @@ -3218,6 +3255,8 @@ static struct target_type pool_target = .ctr = pool_ctr, .dtr = pool_dtr, .map = pool_map, + .presuspend = pool_presuspend, + .presuspend_undo = pool_presuspend_undo, .postsuspend = pool_postsuspend, .preresume = pool_preresume, .resume = pool_resume, Index: linux-2.6/drivers/md/dm-bio-prison.c =================================================================== --- linux-2.6.orig/drivers/md/dm-bio-prison.c 2014-11-08 00:02:42.000000000 +0100 +++ linux-2.6/drivers/md/dm-bio-prison.c 2014-11-08 00:09:40.000000000 +0100 @@ -11,6 +11,7 @@ #include <linux/mempool.h> #include <linux/module.h> #include <linux/slab.h> +#include <linux/delay.h> /*----------------------------------------------------------------*/ @@ -241,6 +242,18 @@ void dm_cell_error(struct dm_bio_prison } EXPORT_SYMBOL_GPL(dm_cell_error); +void dm_drain_prison(struct dm_bio_prison *prison) +{ + spin_lock_irq(&prison->lock); + if (!RB_EMPTY_ROOT(&prison->cells)) { + spin_unlock_irq(&prison->lock); + msleep(1); + spin_lock_irq(&prison->lock); + } + spin_unlock_irq(&prison->lock); +} +EXPORT_SYMBOL_GPL(dm_drain_prison); + /*----------------------------------------------------------------*/ #define DEFERRED_SET_SIZE 64 @@ -354,6 +367,22 @@ int dm_deferred_set_add_work(struct dm_d } EXPORT_SYMBOL_GPL(dm_deferred_set_add_work); +void dm_drain_deferred_set(struct dm_deferred_set *ds) +{ + int i; +retry: + spin_lock_irq(&ds->lock); + for (i = 0; i < DEFERRED_SET_SIZE; i++) { + if (ds->entries[i].count) { + spin_unlock_irq(&ds->lock); + msleep(1); + goto retry; + } + } + spin_unlock_irq(&ds->lock); +} +EXPORT_SYMBOL_GPL(dm_drain_deferred_set); + /*----------------------------------------------------------------*/ static int __init dm_bio_prison_init(void) Index: linux-2.6/drivers/md/dm-bio-prison.h =================================================================== --- linux-2.6.orig/drivers/md/dm-bio-prison.h 2014-11-08 00:02:42.000000000 +0100 +++ linux-2.6/drivers/md/dm-bio-prison.h 2014-11-08 00:02:42.000000000 +0100 @@ -87,6 +87,7 @@ void dm_cell_release_no_holder(struct dm struct bio_list *inmates); void dm_cell_error(struct dm_bio_prison *prison, struct dm_bio_prison_cell *cell, int error); +void dm_drain_prison(struct dm_bio_prison *prison); /*----------------------------------------------------------------*/ @@ -107,6 +108,8 @@ struct dm_deferred_entry *dm_deferred_en void dm_deferred_entry_dec(struct dm_deferred_entry *entry, struct list_head *head); int dm_deferred_set_add_work(struct dm_deferred_set *ds, struct list_head *work); +void dm_drain_deferred_set(struct dm_deferred_set *ds); + /*----------------------------------------------------------------*/ #endif ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Suspend all active bios when the pool is suspended (was: staged dm_internal_{suspend, resume} related changes for wider review) 2014-11-07 23:16 ` [PATCH] Suspend all active bios when the pool is suspended " Mikulas Patocka @ 2014-11-07 23:29 ` Mike Snitzer 2014-11-09 1:17 ` [PATCH] " Mike Snitzer 1 sibling, 0 replies; 23+ messages in thread From: Mike Snitzer @ 2014-11-07 23:29 UTC (permalink / raw) To: Mikulas Patocka Cc: device-mapper development, Bryn M. Reeves, ejt, Alasdair G Kergon, Zdenek Kabelac On Fri, Nov 07 2014 at 6:16pm -0500, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Fri, 7 Nov 2014, Mikulas Patocka wrote: > > > For reasons 1 and 2, I wouldn't really deal with "thin" targets at all - > > they may be created or deleted independent on pool status. Instead, we > > should block all active bios inside the pool - the bios are already > > registered in dm_deferred_set or in the prison, so all you need to do is > > to set a flag pool's presuspend method that causes all new bios to be > > queues and the wait until the prison is empty and the counters in > > deferred_set reach zero. > > Here I'm sending a proof-of-concept patch that makes it possible to > prevent bios from coming into the thin pool when the pool is suspended. It > doesn't modify any generic dm code. > > This patch may not be perfect (I don't know all bio paths in dm-thin in > detail, so I may have missed something), but it shows that it is possible > to suspend all bios without modifying the common dm code. I hope Joe > reviewes it and possibly fixes it. > > The patch is for 3.18-rc3, it needs other thin patches, the full patch > series is at > http://people.redhat.com/~mpatocka/patches/kernel/dm-thin-suspend/series.html Happy to consider your solution, thanks for taking the time to work on it. I also revised my dm_internal_{suspend,resume} changes based on your feedback and have published the changes here: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19 (rebased so the top ~11 patches are the ones to look at) I'm happy with how this code turned out, but it is certainly a fair amount of code churn in DM core. All said, I'll work with Joe to figure out the best way forward next week. Could easily be that your approach wins out. BTW, have a great break. Mike ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Suspend all active bios when the pool is suspended (was: staged dm_internal_{suspend, resume} related changes for wider review) 2014-11-07 23:16 ` [PATCH] Suspend all active bios when the pool is suspended " Mikulas Patocka 2014-11-07 23:29 ` Mike Snitzer @ 2014-11-09 1:17 ` Mike Snitzer 1 sibling, 0 replies; 23+ messages in thread From: Mike Snitzer @ 2014-11-09 1:17 UTC (permalink / raw) To: device-mapper development Cc: ejt, Bryn M. Reeves, Alasdair G Kergon, Zdenek Kabelac On Fri, Nov 7, 2014 at 6:16 PM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Fri, 7 Nov 2014, Mikulas Patocka wrote: > >> For reasons 1 and 2, I wouldn't really deal with "thin" targets at all - >> they may be created or deleted independent on pool status. Instead, we >> should block all active bios inside the pool - the bios are already >> registered in dm_deferred_set or in the prison, so all you need to do is >> to set a flag pool's presuspend method that causes all new bios to be >> queues and the wait until the prison is empty and the counters in >> deferred_set reach zero. > > Here I'm sending a proof-of-concept patch that makes it possible to > prevent bios from coming into the thin pool when the pool is suspended. It > doesn't modify any generic dm code. > > This patch may not be perfect (I don't know all bio paths in dm-thin in > detail, so I may have missed something), but it shows that it is possible > to suspend all bios without modifying the common dm code. I hope Joe > reviewes it and possibly fixes it. > > The patch is for 3.18-rc3, it needs other thin patches, the full patch > series is at > http://people.redhat.com/~mpatocka/patches/kernel/dm-thin-suspend/series.html > > Mikulas > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > --- > drivers/md/dm-bio-prison.c | 29 +++++++++++++++++++++++++++ > drivers/md/dm-bio-prison.h | 3 ++ > drivers/md/dm-thin.c | 47 +++++++++++++++++++++++++++++++++++++++++---- > 3 files changed, 75 insertions(+), 4 deletions(-) > > Index: linux-2.6/drivers/md/dm-thin.c > =================================================================== > --- linux-2.6.orig/drivers/md/dm-thin.c 2014-11-08 00:02:42.000000000 +0100 > +++ linux-2.6/drivers/md/dm-thin.c 2014-11-08 00:02:42.000000000 +0100 > @@ -194,6 +194,8 @@ struct pool { > struct dm_thin_new_mapping *next_mapping; > mempool_t *mapping_pool; > > + atomic_t suspended; > + > process_bio_fn process_bio; > process_bio_fn process_discard; > I'm not seeing how using an unprotected atomic_t is thread-safe considering both the thin-pool and thin targets will be accessing it concurrently. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2015-01-08 23:52 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20141028014825.GA21949@redhat.com>
[not found] ` <544F988D.7090901@redhat.com>
[not found] ` <20141028134012.GA25229@redhat.com>
[not found] ` <20141028141749.GD25229@redhat.com>
[not found] ` <20141028143742.GE25229@redhat.com>
[not found] ` <20141028171003.GG25229@redhat.com>
[not found] ` <20141028224753.GA29288@redhat.com>
[not found] ` <20141028232638.GB29288@redhat.com>
[not found] ` <20141029002125.GC29288@redhat.com>
2014-10-29 1:22 ` staged dm_internal_{suspend, resume} related changes for wider review Mike Snitzer
2014-10-29 19:06 ` Mike Snitzer
2014-10-29 20:49 ` Mike Snitzer
2014-11-03 23:25 ` Mikulas Patocka
2014-11-04 0:17 ` Mike Snitzer
2014-11-05 1:16 ` Mike Snitzer
2014-11-05 12:43 ` Mikulas Patocka
2014-11-05 13:05 ` Mikulas Patocka
2014-11-05 14:11 ` Mike Snitzer
2014-11-05 14:37 ` Mikulas Patocka
2014-11-05 14:50 ` Zdenek Kabelac
2014-11-05 16:10 ` Mike Snitzer
2014-11-05 16:56 ` Mike Snitzer
2014-11-05 20:16 ` Mike Snitzer
2014-11-05 16:29 ` Mike Snitzer
2014-11-07 16:20 ` Mikulas Patocka
2014-11-07 18:33 ` Mike Snitzer
2015-01-02 22:56 ` Mikulas Patocka
2015-01-05 15:50 ` Mike Snitzer
2015-01-08 23:52 ` [PATCH] dm: handle multiple internal suspends correctly (was: staged dm_internal_{suspend, resume} related changes for wider review) Mikulas Patocka
2014-11-07 23:16 ` [PATCH] Suspend all active bios when the pool is suspended " Mikulas Patocka
2014-11-07 23:29 ` Mike Snitzer
2014-11-09 1:17 ` [PATCH] " Mike Snitzer
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.