linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Liu Bo <bo.li.liu@oracle.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation.
Date: Mon, 27 Oct 2014 16:14:57 +0800	[thread overview]
Message-ID: <20141027081456.GD27271@localhost.localdomain> (raw)
In-Reply-To: <544D8F44.8050706@cn.fujitsu.com>

On Mon, Oct 27, 2014 at 08:18:12AM +0800, Qu Wenruo wrote:
> 
> -------- Original Message --------
> Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm
> to reduce ENOSPC caused by unbalanced data/metadata allocation.
> From: Liu Bo <bo.li.liu@oracle.com>
> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Date: 2014年10月24日 19:06
> >On Thu, Oct 23, 2014 at 10:37:51AM +0800, Qu Wenruo wrote:
> >>When btrfs allocate a chunk, it will try to alloc up to 1G for data and
> >>256M for metadata, or 10% of all the writeable space if there is enough
> >10G for data,
> >         if (type & BTRFS_BLOCK_GROUP_DATA) {
> >                 max_stripe_size = 1024 * 1024 * 1024;
> >                 max_chunk_size = 10 * max_stripe_size;
> Oh, sorry, 10G is right.
> 
> Any other comments?
> 
> Thanks,
> Qu
> 
> 
> >		...
> >
> >thanks,
> >-liubo
> >
> >>space for the stripe on device.
> >>
> >>However, when we run out of space, this allocation may cause unbalanced
> >>chunk allocation.
> >>For example, there are only 1G unallocated space, and request for
> >>allocate DATA chunk is sent, and all the space will be allocated as data
> >>chunk, making later metadata chunk alloc request unable to handle, which
> >>will cause ENOSPC.
> >>This is the one of the common complains from end users about why ENOSPC
> >>happens but there is still available space.

Okay, I don't think this is the common case, AFAIK, the most ENOSPC is caused
by our runtime worst case metadata reservation problem.

btrfs has been inclined to create a fairly large metadata chunk (1G) in its
initial mkfs stage and 256M metadata chunk is also a very large one.

As of your below example, yes, we don't have space for metadata
allocation, but do we really need to allocate a new one?

Or am I missing something?

thanks,
-liubo

> >>
> >>This patch will try not to alloc chunk which is more than half of the
> >>unallocated space, making the last space more balanced at a small cost
> >>of more fragmented chunk at the last 1G.
> >>
> >>Some easy example:
> >>Preallocate 17.5G on a 20G empty btrfs fs:
> >>[Before]
> >>  # btrfs fi show /mnt/test
> >>Label: none  uuid: da8741b1-5d47-4245-9e94-bfccea34e91e
> >>	Total devices 1 FS bytes used 17.50GiB
> >>	devid    1 size 20.00GiB used 20.00GiB path /dev/sdb
> >>All space is allocated. No space later metadata space.
> >>
> >>[After]
> >>  # btrfs fi show /mnt/test
> >>Label: none  uuid: e6935aeb-a232-4140-84f9-80aab1f23d56
> >>	Total devices 1 FS bytes used 17.50GiB
> >>	devid    1 size 20.00GiB used 19.77GiB path /dev/sdb
> >>About 230M is still available for later metadata allocation.
> >>
> >>Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> >>---
> >>  fs/btrfs/volumes.c | 18 ++++++++++++++++++
> >>  1 file changed, 18 insertions(+)
> >>
> >>diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >>index d47289c..fa8de79 100644
> >>--- a/fs/btrfs/volumes.c
> >>+++ b/fs/btrfs/volumes.c
> >>@@ -4240,6 +4240,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
> >>  	int ret;
> >>  	u64 max_stripe_size;
> >>  	u64 max_chunk_size;
> >>+	u64 total_avail_space = 0;
> >>  	u64 stripe_size;
> >>  	u64 num_bytes;
> >>  	u64 raid_stripe_len = BTRFS_STRIPE_LEN;
> >>@@ -4352,10 +4353,27 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
> >>  		devices_info[ndevs].max_avail = max_avail;
> >>  		devices_info[ndevs].total_avail = total_avail;
> >>  		devices_info[ndevs].dev = device;
> >>+		total_avail_space += total_avail;
> >>  		++ndevs;
> >>  	}
> >>  	/*
> >>+	 * Try not to occupy more than half of the unallocated space.
> >>+	 * When run short of space and alloc all the space to
> >>+	 * data/metadata will cause ENOSPC to be triggered more easily.
> >>+	 *
> >>+	 * And since the minimum chunk size is 16M, the half-half will cause
> >>+	 * 16M allocated from 20M available space and reset 4M will not be
> >>+	 * used ever. In that case(16~32M), allocate all directly.
> >>+	 */
> >>+	if (total_avail_space < 32 * 1024 * 1024 &&
> >>+	    total_avail_space > 16 * 1024 * 1024)
> >>+		max_chunk_size = total_avail_space;
> >>+	else
> >>+		max_chunk_size = min(total_avail_space / 2, max_chunk_size);
> >>+	max_chunk_size = min(total_avail_space / 2, max_chunk_size);
> >>+
> >>+	/*
> >>  	 * now sort the devices by hole size / available space
> >>  	 */
> >>  	sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
> >>-- 
> >>2.1.2
> >>
> >>--
> >>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> >>the body of a message to majordomo@vger.kernel.org
> >>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

  reply	other threads:[~2014-10-27  8:15 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-23  2:37 [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation Qu Wenruo
2014-10-24 11:06 ` Liu Bo
2014-10-27  0:18   ` Qu Wenruo
2014-10-27  8:14     ` Liu Bo [this message]
2014-10-27  8:36       ` Qu Wenruo
2014-10-29 14:29         ` Liu Bo
2014-10-30  0:58           ` Qu Wenruo
2014-10-30  3:46             ` Qu Wenruo
2014-10-30  9:39               ` Liu Bo
2014-12-18  9:23                 ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141027081456.GD27271@localhost.localdomain \
    --to=bo.li.liu@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).