[PATCH, maybe WRONG] btrfs: don't let btrfs_recover_relocation get stuck waiting for cleaner

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH, maybe WRONG] btrfs: don't let btrfs_recover_relocation get stuck waiting for cleaner_kthread to delete a snapshot
@ 2016-05-04  1:10 Zygo Blaxell
  2016-05-04  9:49 ` Filipe Manana
  0 siblings, 1 reply; 3+ messages in thread
From: Zygo Blaxell @ 2016-05-04  1:10 UTC (permalink / raw)
  To: linux-btrfs

This is one way to fix a long hang during mounts.  There's probably a
better way, but this is the one I've used to get my filesystems up
and running.

We start the cleaner kthread first because the transaction kthread wants
to wake up the cleaner kthread.  We start the transaction kthread next
because everything in btrfs wants transactions.  We do reloc recovery
once the transaction kthread is running.  This means that the cleaner
kthread could already be running when reloc recovery happens (e.g. if a
snapshot delete was started before a crash).

Reloc recovery does not play well with the cleaner kthread, so a mutex
was added in commit 5f3164813b90f7dbcb5c3ab9006906222ce471b7 "Btrfs:
fix race between balance recovery and root deletion" to prevent both
from running at the same time.

The cleaner kthread could already be holding the mutex by the time we
get to btrfs_recover_relocation, and if it is, the mount will be blocked
until at least one snapshot is deleted (possibly more if the mount process
doesn't get the lock right away).  During this time (which could be an
arbitrarily long time on a large/slow filesystem), the mount process is
stuck and the filesystem is unnecessarily inaccessible.

Fix this by locking cleaner_mutex before we start the cleaner_kthread,
and unlocking it when we have finished with it in the mount function.
This allows the mount to proceed to completion before resuming snapshot
deletion.  I'm not sure about the error cases, and the asymmetrical
pthread_mutex_lock/unlocks are just ugly.  Other than that, this patch
works.

An alternative (and possibly better) solution would be to add an extra
check in btrfs_need_cleaner_sleep() for a flag that would be set at the
end of mounting.  This should keep cleaner_kthread sleeping until just
before mount is finished.

---
 fs/btrfs/disk-io.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 07c1ad6..af5ea1d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2927,6 +2927,10 @@ retry_root_backup:
 			"too many missing devices, writeable mount should not be allowed\n");
 	}

+	/* Hold the cleaner_mutex thread here so that we don't block
+	 * on btrfs_recover_relocation later on.  cleaner_kthread
+	 * blocks on us instead. */
+	mutex_lock(&fs_info->cleaner_mutex);
 	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root,
 					       "btrfs-cleaner");
 	if (IS_ERR(fs_info->cleaner_kthread))
@@ -2986,9 +2990,8 @@ retry_root_backup:
 		if (ret)
 			goto fail_qgroup;

-		mutex_lock(&fs_info->cleaner_mutex);
+		/* We grabbed this mutex before we created the cleaner_kthread */
 		ret = btrfs_recover_relocation(tree_root);
-		mutex_unlock(&fs_info->cleaner_mutex);
 		if (ret < 0) {
 			printk(KERN_WARNING
 			       "BTRFS: failed to recover relocation\n");
@@ -2996,6 +2999,7 @@ retry_root_backup:
 			goto fail_qgroup;
 		}
 	}
+	mutex_unlock(&fs_info->cleaner_mutex);

 	location.objectid = BTRFS_FS_TREE_OBJECTID;
 	location.type = BTRFS_ROOT_ITEM_KEY;
@@ -3079,6 +3083,7 @@ fail_cleaner:
 	filemap_write_and_wait(fs_info->btree_inode->i_mapping);

 fail_sysfs:
+	mutex_unlock(&fs_info->cleaner_mutex);
 	btrfs_sysfs_remove_one(fs_info);

 fail_block_groups:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH, maybe WRONG] btrfs: don't let btrfs_recover_relocation get stuck waiting for cleaner_kthread to delete a snapshot
  2016-05-04  1:10 [PATCH, maybe WRONG] btrfs: don't let btrfs_recover_relocation get stuck waiting for cleaner_kthread to delete a snapshot Zygo Blaxell
@ 2016-05-04  9:49 ` Filipe Manana
  2016-05-04 12:29   ` David Sterba
  0 siblings, 1 reply; 3+ messages in thread
From: Filipe Manana @ 2016-05-04  9:49 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs@vger.kernel.org

On Wed, May 4, 2016 at 2:10 AM, Zygo Blaxell
<ce3g8jdj@umail.furryterror.org> wrote:
> This is one way to fix a long hang during mounts.  There's probably a
> better way, but this is the one I've used to get my filesystems up
> and running.
>
> We start the cleaner kthread first because the transaction kthread wants
> to wake up the cleaner kthread.  We start the transaction kthread next
> because everything in btrfs wants transactions.  We do reloc recovery
> once the transaction kthread is running.  This means that the cleaner
> kthread could already be running when reloc recovery happens (e.g. if a
> snapshot delete was started before a crash).
>
> Reloc recovery does not play well with the cleaner kthread, so a mutex
> was added in commit 5f3164813b90f7dbcb5c3ab9006906222ce471b7 "Btrfs:
> fix race between balance recovery and root deletion" to prevent both
> from running at the same time.
>
> The cleaner kthread could already be holding the mutex by the time we
> get to btrfs_recover_relocation, and if it is, the mount will be blocked
> until at least one snapshot is deleted (possibly more if the mount process
> doesn't get the lock right away).  During this time (which could be an
> arbitrarily long time on a large/slow filesystem), the mount process is
> stuck and the filesystem is unnecessarily inaccessible.
>
> Fix this by locking cleaner_mutex before we start the cleaner_kthread,
> and unlocking it when we have finished with it in the mount function.
> This allows the mount to proceed to completion before resuming snapshot
> deletion.  I'm not sure about the error cases, and the asymmetrical
> pthread_mutex_lock/unlocks are just ugly.  Other than that, this patch
> works.
>
> An alternative (and possibly better) solution would be to add an extra
> check in btrfs_need_cleaner_sleep() for a flag that would be set at the
> end of mounting.  This should keep cleaner_kthread sleeping until just
> before mount is finished.


Looks valid and good to me.
The alternative solution you describe would unnecessarily be more
complex without any benefit.
I prefer what you did, it's correct and simple. Just 2 comments below.

>
>
> ---
>  fs/btrfs/disk-io.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 07c1ad6..af5ea1d 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2927,6 +2927,10 @@ retry_root_backup:
>                         "too many missing devices, writeable mount should not be allowed\n");
>         }
>
> +       /* Hold the cleaner_mutex thread here so that we don't block
> +        * on btrfs_recover_relocation later on.  cleaner_kthread
> +        * blocks on us instead. */

Nitpick, the style for comments with multiple lines is:

/*
 * bla bla bla
 * bla bla bla
 */

I would also had "for too long" after the "... we don't block", just
to make it more clear that we don't
block forever, but rather for a potentially long time.

> +       mutex_lock(&fs_info->cleaner_mutex);
>         fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root,
>                                                "btrfs-cleaner");
>         if (IS_ERR(fs_info->cleaner_kthread))
> @@ -2986,9 +2990,8 @@ retry_root_backup:
>                 if (ret)
>                         goto fail_qgroup;
>
> -               mutex_lock(&fs_info->cleaner_mutex);
> +               /* We grabbed this mutex before we created the cleaner_kthread */
>                 ret = btrfs_recover_relocation(tree_root);
> -               mutex_unlock(&fs_info->cleaner_mutex);
>                 if (ret < 0) {
>                         printk(KERN_WARNING
>                                "BTRFS: failed to recover relocation\n");
> @@ -2996,6 +2999,7 @@ retry_root_backup:
>                         goto fail_qgroup;
>                 }
>         }
> +       mutex_unlock(&fs_info->cleaner_mutex);

After this point we have one more goto after an error to read the fs
root that jumps to the label fail_qgroup, which ends up unlocking
again the cleaner_mutex.
You can track whether it needs to be unlocked or not through a local
bool variable for e.g.

thanks

>
>         location.objectid = BTRFS_FS_TREE_OBJECTID;
>         location.type = BTRFS_ROOT_ITEM_KEY;
> @@ -3079,6 +3083,7 @@ fail_cleaner:
>         filemap_write_and_wait(fs_info->btree_inode->i_mapping);
>
>  fail_sysfs:
> +       mutex_unlock(&fs_info->cleaner_mutex);
>         btrfs_sysfs_remove_one(fs_info);
>
>  fail_block_groups:
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH, maybe WRONG] btrfs: don't let btrfs_recover_relocation get stuck waiting for cleaner_kthread to delete a snapshot
  2016-05-04  9:49 ` Filipe Manana
@ 2016-05-04 12:29   ` David Sterba
  0 siblings, 0 replies; 3+ messages in thread
From: David Sterba @ 2016-05-04 12:29 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Zygo Blaxell, linux-btrfs@vger.kernel.org

On Wed, May 04, 2016 at 10:49:23AM +0100, Filipe Manana wrote:
> On Wed, May 4, 2016 at 2:10 AM, Zygo Blaxell
> <ce3g8jdj@umail.furryterror.org> wrote:
> > This is one way to fix a long hang during mounts.  There's probably a
> > better way, but this is the one I've used to get my filesystems up
> > and running.
> >
> > We start the cleaner kthread first because the transaction kthread wants
> > to wake up the cleaner kthread.  We start the transaction kthread next
> > because everything in btrfs wants transactions.  We do reloc recovery
> > once the transaction kthread is running.  This means that the cleaner
> > kthread could already be running when reloc recovery happens (e.g. if a
> > snapshot delete was started before a crash).
> >
> > Reloc recovery does not play well with the cleaner kthread, so a mutex
> > was added in commit 5f3164813b90f7dbcb5c3ab9006906222ce471b7 "Btrfs:
> > fix race between balance recovery and root deletion" to prevent both
> > from running at the same time.
> >
> > The cleaner kthread could already be holding the mutex by the time we
> > get to btrfs_recover_relocation, and if it is, the mount will be blocked
> > until at least one snapshot is deleted (possibly more if the mount process
> > doesn't get the lock right away).  During this time (which could be an
> > arbitrarily long time on a large/slow filesystem), the mount process is
> > stuck and the filesystem is unnecessarily inaccessible.
> >
> > Fix this by locking cleaner_mutex before we start the cleaner_kthread,
> > and unlocking it when we have finished with it in the mount function.
> > This allows the mount to proceed to completion before resuming snapshot
> > deletion.  I'm not sure about the error cases, and the asymmetrical
> > pthread_mutex_lock/unlocks are just ugly.  Other than that, this patch
> > works.
> >
> > An alternative (and possibly better) solution would be to add an extra
> > check in btrfs_need_cleaner_sleep() for a flag that would be set at the
> > end of mounting.  This should keep cleaner_kthread sleeping until just
> > before mount is finished.
> 
> Looks valid and good to me.
> The alternative solution you describe would unnecessarily be more
> complex without any benefit.
> I prefer what you did, it's correct and simple. Just 2 comments below.

We could detect if mount is finished by checking for MS_BORN in
superblock->s_flags (set by VFS). But that would be extra code to do
just that, while cleaner kthread is prepared for the mutex contention and
sleeps if necessary.

I like the approach proposed in the patch as well.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-05-04 12:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-04  1:10 [PATCH, maybe WRONG] btrfs: don't let btrfs_recover_relocation get stuck waiting for cleaner_kthread to delete a snapshot Zygo Blaxell
2016-05-04  9:49 ` Filipe Manana
2016-05-04 12:29   ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).