linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] btrfs: wait on caching when putting the bg cache
@ 2018-09-12 14:45 Josef Bacik
  2018-09-12 15:15 ` Nikolay Borisov
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Josef Bacik @ 2018-09-12 14:45 UTC (permalink / raw)
  To: kernel-team, linux-btrfs

While testing my backport I noticed there was a panic if I ran
generic/416 generic/417 generic/418 all in a row.  This just happened to
uncover a race where we had outstanding IO after we destroy all of our
workqueues, and then we'd go to queue the endio work on those free'd
workqueues.  This is because we aren't waiting for the caching threads
to be done before freeing everything up, so to fix this make sure we
wait on any outstanding caching that's being done before we free up the
block group, so we're sure to be done with all IO by the time we get to
btrfs_stop_all_workers().  This fixes the panic I was seeing
consistently in testing.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent-tree.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 414492a18f1e..2eb2e37f2354 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9889,6 +9889,7 @@ void btrfs_put_block_group_cache(struct btrfs_fs_info *info)
 
 		block_group = btrfs_lookup_first_block_group(info, last);
 		while (block_group) {
+			wait_block_group_cache_done(block_group);
 			spin_lock(&block_group->lock);
 			if (block_group->iref)
 				break;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: wait on caching when putting the bg cache
  2018-09-12 14:45 [PATCH] btrfs: wait on caching when putting the bg cache Josef Bacik
@ 2018-09-12 15:15 ` Nikolay Borisov
  2018-09-12 15:21   ` Josef Bacik
  2018-09-12 18:26 ` Omar Sandoval
  2018-09-12 18:40 ` David Sterba
  2 siblings, 1 reply; 6+ messages in thread
From: Nikolay Borisov @ 2018-09-12 15:15 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs



On 12.09.2018 17:45, Josef Bacik wrote:
> While testing my backport I noticed there was a panic if I ran
> generic/416 generic/417 generic/418 all in a row.  This just happened to
> uncover a race where we had outstanding IO after we destroy all of our
> workqueues, and then we'd go to queue the endio work on those free'd
> workqueues.  This is because we aren't waiting for the caching threads
> to be done before freeing everything up, so to fix this make sure we
> wait on any outstanding caching that's being done before we free up the
> block group, so we're sure to be done with all IO by the time we get to
> btrfs_stop_all_workers().  This fixes the panic I was seeing
> consistently in testing.

It's not clear whether this is caused by one of the patches in your
latest patchbomb or has the issue been there all along?
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/extent-tree.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 414492a18f1e..2eb2e37f2354 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -9889,6 +9889,7 @@ void btrfs_put_block_group_cache(struct btrfs_fs_info *info)
>  
>  		block_group = btrfs_lookup_first_block_group(info, last);
>  		while (block_group) {
> +			wait_block_group_cache_done(block_group);
>  			spin_lock(&block_group->lock);
>  			if (block_group->iref)
>  				break;
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: wait on caching when putting the bg cache
  2018-09-12 15:15 ` Nikolay Borisov
@ 2018-09-12 15:21   ` Josef Bacik
  0 siblings, 0 replies; 6+ messages in thread
From: Josef Bacik @ 2018-09-12 15:21 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Josef Bacik, linux-btrfs

On Wed, Sep 12, 2018 at 06:15:41PM +0300, Nikolay Borisov wrote:
> 
> 
> On 12.09.2018 17:45, Josef Bacik wrote:
> > While testing my backport I noticed there was a panic if I ran
> > generic/416 generic/417 generic/418 all in a row.  This just happened to
> > uncover a race where we had outstanding IO after we destroy all of our
> > workqueues, and then we'd go to queue the endio work on those free'd
> > workqueues.  This is because we aren't waiting for the caching threads
> > to be done before freeing everything up, so to fix this make sure we
> > wait on any outstanding caching that's being done before we free up the
> > block group, so we're sure to be done with all IO by the time we get to
> > btrfs_stop_all_workers().  This fixes the panic I was seeing
> > consistently in testing.
> 
> It's not clear whether this is caused by one of the patches in your
> latest patchbomb or has the issue been there all along?

Been here always, I noticed this on the backport of linus/master before I even
got to pulling my shit ontop of that.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: wait on caching when putting the bg cache
  2018-09-12 14:45 [PATCH] btrfs: wait on caching when putting the bg cache Josef Bacik
  2018-09-12 15:15 ` Nikolay Borisov
@ 2018-09-12 18:26 ` Omar Sandoval
  2018-09-12 18:40 ` David Sterba
  2 siblings, 0 replies; 6+ messages in thread
From: Omar Sandoval @ 2018-09-12 18:26 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-btrfs

On Wed, Sep 12, 2018 at 10:45:45AM -0400, Josef Bacik wrote:
> While testing my backport I noticed there was a panic if I ran
> generic/416 generic/417 generic/418 all in a row.  This just happened to
> uncover a race where we had outstanding IO after we destroy all of our
> workqueues, and then we'd go to queue the endio work on those free'd
> workqueues.  This is because we aren't waiting for the caching threads
> to be done before freeing everything up, so to fix this make sure we
> wait on any outstanding caching that's being done before we free up the
> block group, so we're sure to be done with all IO by the time we get to
> btrfs_stop_all_workers().  This fixes the panic I was seeing
> consistently in testing.

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/extent-tree.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 414492a18f1e..2eb2e37f2354 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -9889,6 +9889,7 @@ void btrfs_put_block_group_cache(struct btrfs_fs_info *info)
>  
>  		block_group = btrfs_lookup_first_block_group(info, last);
>  		while (block_group) {
> +			wait_block_group_cache_done(block_group);
>  			spin_lock(&block_group->lock);
>  			if (block_group->iref)
>  				break;
> -- 
> 2.14.3
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: wait on caching when putting the bg cache
  2018-09-12 14:45 [PATCH] btrfs: wait on caching when putting the bg cache Josef Bacik
  2018-09-12 15:15 ` Nikolay Borisov
  2018-09-12 18:26 ` Omar Sandoval
@ 2018-09-12 18:40 ` David Sterba
  2018-09-13 11:52   ` David Sterba
  2 siblings, 1 reply; 6+ messages in thread
From: David Sterba @ 2018-09-12 18:40 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-btrfs

On Wed, Sep 12, 2018 at 10:45:45AM -0400, Josef Bacik wrote:
> While testing my backport I noticed there was a panic if I ran
> generic/416 generic/417 generic/418 all in a row.  This just happened to
> uncover a race where we had outstanding IO after we destroy all of our
> workqueues, and then we'd go to queue the endio work on those free'd
> workqueues.  This is because we aren't waiting for the caching threads
> to be done before freeing everything up, so to fix this make sure we
> wait on any outstanding caching that's being done before we free up the
> block group, so we're sure to be done with all IO by the time we get to
> btrfs_stop_all_workers().  This fixes the panic I was seeing
> consistently in testing.

Can you please attach the stacktrace(s)? I think I've seen similar error
once or twice but not able to reproduce.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: wait on caching when putting the bg cache
  2018-09-12 18:40 ` David Sterba
@ 2018-09-13 11:52   ` David Sterba
  0 siblings, 0 replies; 6+ messages in thread
From: David Sterba @ 2018-09-13 11:52 UTC (permalink / raw)
  To: dsterba, Josef Bacik, kernel-team, linux-btrfs

On Wed, Sep 12, 2018 at 08:40:44PM +0200, David Sterba wrote:
> On Wed, Sep 12, 2018 at 10:45:45AM -0400, Josef Bacik wrote:
> > While testing my backport I noticed there was a panic if I ran
> > generic/416 generic/417 generic/418 all in a row.  This just happened to
> > uncover a race where we had outstanding IO after we destroy all of our
> > workqueues, and then we'd go to queue the endio work on those free'd
> > workqueues.  This is because we aren't waiting for the caching threads
> > to be done before freeing everything up, so to fix this make sure we
> > wait on any outstanding caching that's being done before we free up the
> > block group, so we're sure to be done with all IO by the time we get to
> > btrfs_stop_all_workers().  This fixes the panic I was seeing
> > consistently in testing.
> 
> Can you please attach the stacktrace(s)? I think I've seen similar error
> once or twice but not able to reproduce.

I found at least this one https://patchwork.kernel.org/patch/10495885/,
when the rbio cache is destroyed, there's some in-flight IO. This is not
the example I had in mind before but still roughly matches the symptoms.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-09-13 17:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-12 14:45 [PATCH] btrfs: wait on caching when putting the bg cache Josef Bacik
2018-09-12 15:15 ` Nikolay Borisov
2018-09-12 15:21   ` Josef Bacik
2018-09-12 18:26 ` Omar Sandoval
2018-09-12 18:40 ` David Sterba
2018-09-13 11:52   ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).