Re: Massive metadata size increase after upgrade from 3.2.18 to 3.4.1

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Roman Mamedov <rm@romanrm.ru>
To: dave@jikos.cz
Cc: linux-btrfs@vger.kernel.org, sensille@gmx.net, chris.mason@fusionio.com
Subject: Re: Massive metadata size increase after upgrade from 3.2.18 to 3.4.1
Date: Sun, 17 Jun 2012 21:29:08 +0600	[thread overview]
Message-ID: <20120617212908.4fcfd19c@natsu> (raw)
In-Reply-To: <20120614113316.GR32402@twin.jikos.cz>

[-- Attachment #1: Type: text/plain, Size: 3925 bytes --]

On Thu, 14 Jun 2012 13:33:16 +0200
David Sterba <dave@jikos.cz> wrote:

> On Sat, Jun 09, 2012 at 01:38:22AM +0600, Roman Mamedov wrote:
> > Before the upgrade (on 3.2.18):
> > 
> > Metadata, DUP: total=9.38GB, used=5.94GB
> > 
> > After the FS has been mounted once with 3.4.1:
> > 
> > Data: total=3.44TB, used=2.67TB
> > System, DUP: total=8.00MB, used=412.00KB
> > System: total=4.00MB, used=0.00
> > Metadata, DUP: total=84.38GB, used=5.94GB
> > 
> > Where did my 75 GB of free space just went?
> 
> This is caused by the patch (credits for bisecting it go to Arne)
> 
> commit cf1d72c9ceec391d34c48724da57282e97f01122
> Author: Chris Mason <chris.mason@oracle.com>
> Date:   Fri Jan 6 15:41:34 2012 -0500
> 
>     Btrfs: lower the bar for chunk allocation
> 
>     The chunk allocation code has tried to keep a pretty tight lid on creating new
>     metadata chunks.  This is partially because in the past the reservation
>     code didn't give us an accurate idea of how much space was being used.
> 
>     The new code is much more accurate, so we're able to get rid of some of these
>     checks.
> ---
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3263,27 +3263,12 @@ static int should_alloc_chunk(struct btrfs_root *root,
>                 if (num_bytes - num_allocated < thresh)
>                         return 1;
>         }
> -
> -       /*
> -        * we have two similar checks here, one based on percentage
> -        * and once based on a hard number of 256MB.  The idea
> -        * is that if we have a good amount of free
> -        * room, don't allocate a chunk.  A good mount is
> -        * less than 80% utilized of the chunks we have allocated,
> -        * or more than 256MB free
> -        */
> -       if (num_allocated + alloc_bytes + 256 * 1024 * 1024 < num_bytes)
> -               return 0;
> -
> -       if (num_allocated + alloc_bytes < div_factor(num_bytes, 8))
> -               return 0;
> -
>         thresh = btrfs_super_total_bytes(root->fs_info->super_copy);
> 
> -       /* 256MB or 5% of the FS */
> -       thresh = max_t(u64, 256 * 1024 * 1024, div_factor_fine(thresh, 5));
> +       /* 256MB or 2% of the FS */
> +       thresh = max_t(u64, 256 * 1024 * 1024, div_factor_fine(thresh, 2));
> 
> -       if (num_bytes > thresh && sinfo->bytes_used < div_factor(num_bytes, 3))
> +       if (num_bytes > thresh && sinfo->bytes_used < div_factor(num_bytes, 8))
>                 return 0;
>         return 1;
>  }
> ---
> 
> Originally there were 2 types of check, based on +256M and on
> percentage. The former are removed which leaves only the percentage
> thresholds. If there's less than 2% of the fs of metadata actually used,
> the metadata are reserved exactly to 2%. When acutual usage goes over
> 2%, there's always at least 20% over-reservation,
> 
>    sinfo->bytes_used < div_factor(num_bytes, 8)
> 
> ie the threshold is 80%, which may be wasteful for large fs.
> 
> So, the metadata chunks are immediately pinned to 2% of the filesystem
> after first few writes, and this is what you observe.
> 
> Running balance will remove the unused metadata chunks, but only to the
> 2% level.
> 
> [end of analysis]
> 
> So what to do now? Simply reverting the +256M checks works and restores
> more or less the original behaviour.


Thanks.
So should I try restoring both of these, and leave the rest as is?

> -       if (num_allocated + alloc_bytes + 256 * 1024 * 1024 < num_bytes)
> -               return 0;
> -
> -       if (num_allocated + alloc_bytes < div_factor(num_bytes, 8))
> -               return 0;

Or would it make more sense to try rolling back that patch completely?

-- 
With respect,
Roman

~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free."

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

     prev parent reply	other threads:[~2012-06-17 15:29 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-08 19:38 Massive metadata size increase after upgrade from 3.2.18 to 3.4.1 Roman Mamedov
2012-06-12 17:38 ` Calvin Walton
2012-06-13 10:30   ` Anand Jain
2012-06-14 11:33 ` David Sterba
2012-06-17 15:29   ` Roman Mamedov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120617212908.4fcfd19c@natsu \
    --to=rm@romanrm.ru \
    --cc=chris.mason@fusionio.com \
    --cc=dave@jikos.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=sensille@gmx.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).