Re: [PATCH v3] Btrfs: fix hang on error (such as ENOSPC) when writing extent pages

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Marc MERLIN <marc@merlins.org>
To: Filipe David Borba Manana <fdmanana@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v3] Btrfs: fix hang on error (such as ENOSPC) when writing extent pages
Date: Tue, 17 Jun 2014 12:21:43 -0700	[thread overview]
Message-ID: <20140617192143.GT19071@merlins.org> (raw)
In-Reply-To: <1399652260-9360-1-git-send-email-fdmanana@gmail.com>

On Fri, May 09, 2014 at 05:17:40PM +0100, Filipe David Borba Manana wrote:
> When running low on available disk space and having several processes
> doing buffered file IO, I got the following trace in dmesg:

I was building 3.15.1 and noticed this patch didn't make it in (your other
2 btrfs send patches I was carrying did make it). Is that on purpose, and
maybe being delayed for 3.16, or is that maybe an oversight?

Thanks,
Marc

> [ 4202.720152] INFO: task kworker/u8:1:5450 blocked for more than 120 seconds.
> [ 4202.720401]       Not tainted 3.13.0-fdm-btrfs-next-26+ #1
> [ 4202.720596] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 4202.720874] kworker/u8:1    D 0000000000000001     0  5450      2 0x00000000
> [ 4202.720904] Workqueue: btrfs-flush_delalloc normal_work_helper [btrfs]
> [ 4202.720908]  ffff8801f62ddc38 0000000000000082 ffff880203ac2490 00000000001d3f40
> [ 4202.720913]  ffff8801f62ddfd8 00000000001d3f40 ffff8800c4f0c920 ffff880203ac2490
> [ 4202.720918]  00000000001d4a40 ffff88020fe85a40 ffff88020fe85ab8 0000000000000001
> [ 4202.720922] Call Trace:
> [ 4202.720931]  [<ffffffff816a3cb9>] schedule+0x29/0x70
> [ 4202.720950]  [<ffffffffa01ec48d>] btrfs_start_ordered_extent+0x6d/0x110 [btrfs]
> [ 4202.720956]  [<ffffffff8108e620>] ? bit_waitqueue+0xc0/0xc0
> [ 4202.720972]  [<ffffffffa01ec559>] btrfs_run_ordered_extent_work+0x29/0x40 [btrfs]
> [ 4202.720988]  [<ffffffffa0201987>] normal_work_helper+0x137/0x2c0 [btrfs]
> [ 4202.720994]  [<ffffffff810680e5>] process_one_work+0x1f5/0x530
> (...)
> [ 4202.721027] 2 locks held by kworker/u8:1/5450:
> [ 4202.721028]  #0:  (%s-%s){++++..}, at: [<ffffffff81068083>] process_one_work+0x193/0x530
> [ 4202.721037]  #1:  ((&work->normal_work)){+.+...}, at: [<ffffffff81068083>] process_one_work+0x193/0x530
> [ 4202.721054] INFO: task btrfs:7891 blocked for more than 120 seconds.
> [ 4202.721258]       Not tainted 3.13.0-fdm-btrfs-next-26+ #1
> [ 4202.721444] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 4202.721699] btrfs           D 0000000000000001     0  7891   7890 0x00000001
> [ 4202.721704]  ffff88018c2119e8 0000000000000086 ffff8800a33d2490 00000000001d3f40
> [ 4202.721710]  ffff88018c211fd8 00000000001d3f40 ffff8802144b0000 ffff8800a33d2490
> [ 4202.721714]  ffff8800d8576640 ffff88020fe85bc0 ffff88020fe85bc8 7fffffffffffffff
> [ 4202.721718] Call Trace:
> [ 4202.721723]  [<ffffffff816a3cb9>] schedule+0x29/0x70
> [ 4202.721727]  [<ffffffff816a2ebc>] schedule_timeout+0x1dc/0x270
> [ 4202.721732]  [<ffffffff8109bd79>] ? mark_held_locks+0xb9/0x140
> [ 4202.721736]  [<ffffffff816a90c0>] ? _raw_spin_unlock_irq+0x30/0x40
> [ 4202.721740]  [<ffffffff8109bf0d>] ? trace_hardirqs_on_caller+0x10d/0x1d0
> [ 4202.721744]  [<ffffffff816a488f>] wait_for_completion+0xdf/0x120
> [ 4202.721749]  [<ffffffff8107fa90>] ? try_to_wake_up+0x310/0x310
> [ 4202.721765]  [<ffffffffa01ebee4>] btrfs_wait_ordered_extents+0x1f4/0x280 [btrfs]
> [ 4202.721781]  [<ffffffffa020526e>] btrfs_mksubvol.isra.62+0x30e/0x5a0 [btrfs]
> [ 4202.721786]  [<ffffffff8108e620>] ? bit_waitqueue+0xc0/0xc0
> [ 4202.721799]  [<ffffffffa02056a9>] btrfs_ioctl_snap_create_transid+0x1a9/0x1b0 [btrfs]
> [ 4202.721813]  [<ffffffffa020583a>] btrfs_ioctl_snap_create_v2+0x10a/0x170 [btrfs]
> (...)
> 
> It turns out that extent_io.c:__extent_writepage(), which ends up being called
> through filemap_fdatawrite_range() in btrfs_start_ordered_extent(), was getting
> -ENOSPC when calling the fill_delalloc callback. In this situation, it returned
> without the writepage_end_io_hook callback (inode.c:btrfs_writepage_end_io_hook)
> ever being called for the respective page, which prevents the ordered extent's
> bytes_left count from ever reaching 0, and therefore a finish_ordered_fn work
> is never queued into the endio_write_workers queue. This makes the task that
> called btrfs_start_ordered_extent() hang forever on the wait queue of the ordered
> extent.
> 
> This is fairly easy to reproduce using a small filesystem and fsstress on
> a quad core vm:
> 
>     mkfs.btrfs -f -b `expr 2100 \* 1024 \* 1024` /dev/sdd
>     mount /dev/sdd /mnt
> 
>     fsstress -p 6 -d /mnt -n 100000 -x \
>         "btrfs subvolume snapshot -r /mnt /mnt/mysnap" \
> 	    -f allocsp=0 \
> 	    -f bulkstat=0 \
> 	    -f bulkstat1=0 \
> 	    -f chown=0 \
> 	    -f creat=1 \
> 	    -f dread=0 \
> 	    -f dwrite=0 \
> 	    -f fallocate=1 \
> 	    -f fdatasync=0 \
> 	    -f fiemap=0 \
> 	    -f freesp=0 \
> 	    -f fsync=0 \
> 	    -f getattr=0 \
> 	    -f getdents=0 \
> 	    -f link=0 \
> 	    -f mkdir=0 \
> 	    -f mknod=0 \
> 	    -f punch=1 \
> 	    -f read=0 \
> 	    -f readlink=0 \
> 	    -f rename=0 \
> 	    -f resvsp=0 \
> 	    -f rmdir=0 \
> 	    -f setxattr=0 \
> 	    -f stat=0 \
> 	    -f symlink=0 \
> 	    -f sync=0 \
> 	    -f truncate=1 \
> 	    -f unlink=0 \
> 	    -f unresvsp=0 \
> 	    -f write=4
> 
> So just ensure that if an error happens while writing the extent page
> we call the writepage_end_io_hook callback. Also make it return the
> error code and ensure the caller (extent_write_cache_pages) processes
> all pages in the page vector even if an error happens only for some
> of them, so that ordered extents end up released.
> 
> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
> ---
> 
> V2: Grab error from `em` pointer if available, do the error check and
>     end_extent_writepage call before unlocking the page (just like
>     end_bio_extent_writepage does).
> 
> V3: Make extent_write_cache_pages not stop processing remaining pages
>     in the page vector if __extent_writepage returns an error for some
>     page, so that ordered extents end up being processed and released.
> 
>  fs/btrfs/extent_io.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 0c43896..4468afd 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3284,6 +3284,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
>  				     end - cur + 1, 1);
>  		if (IS_ERR_OR_NULL(em)) {
>  			SetPageError(page);
> +			ret = PTR_ERR_OR_ZERO(em);
>  			break;
>  		}
>  
> @@ -3370,13 +3371,17 @@ done:
>  		set_page_writeback(page);
>  		end_page_writeback(page);
>  	}
> +	if (PageError(page)) {
> +		ret = ret < 0 ? ret : -EIO;
> +		end_extent_writepage(page, ret, start, page_end);
> +	}
>  	unlock_page(page);
>  
>  done_unlocked:
>  
>  	/* drop our reference on any cached states */
>  	free_extent_state(cached_state);
> -	return 0;
> +	return ret;
>  }
>  
>  static int eb_wait(void *word)
> @@ -3699,6 +3704,7 @@ static int extent_write_cache_pages(struct extent_io_tree *tree,
>  	struct inode *inode = mapping->host;
>  	int ret = 0;
>  	int done = 0;
> +	int err = 0;
>  	int nr_to_write_done = 0;
>  	struct pagevec pvec;
>  	int nr_pages;
> @@ -3785,8 +3791,8 @@ retry:
>  				unlock_page(page);
>  				ret = 0;
>  			}
> -			if (ret)
> -				done = 1;
> +			if (!err && ret < 0)
> +				err = ret;
>  
>  			/*
>  			 * the filesystem may choose to bump up nr_to_write.
> @@ -3798,7 +3804,7 @@ retry:
>  		pagevec_release(&pvec);
>  		cond_resched();
>  	}
> -	if (!scanned && !done) {
> +	if (!scanned && !done && !err) {
>  		/*
>  		 * We hit the last page and there is more work to be done: wrap
>  		 * back to the start of the file
> @@ -3808,7 +3814,7 @@ retry:
>  		goto retry;
>  	}
>  	btrfs_add_delayed_iput(inode);
> -	return ret;
> +	return err;
>  }
>  
>  static void flush_epd_write_bio(struct extent_page_data *epd)
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/

     prev parent reply	other threads:[~2014-06-17 19:21 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-08 19:27 [PATCH] Btrfs: fix hang on error (such as ENOSPC) when writing extent pages Filipe David Borba Manana
2014-05-09  2:42 ` [PATCH v2] " Filipe David Borba Manana
2014-05-09 10:52   ` Liu Bo
2014-05-09 11:29     ` Filipe David Manana
2014-05-09 16:17 ` [PATCH v3] " Filipe David Borba Manana
2014-06-17 19:21   ` Marc MERLIN [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140617192143.GT19071@merlins.org \
    --to=marc@merlins.org \
    --cc=fdmanana@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.