Re: Massive metadata size increase after upgrade from 3.2.18 to 3.4.1

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Roman Mamedov <rm@romanrm.ru>
To: dave@jikos.cz
Cc: linux-btrfs@vger.kernel.org, sensille@gmx.net, chris.mason@fusionio.com
Subject: Re: Massive metadata size increase after upgrade from 3.2.18 to 3.4.1
Date: Sun, 17 Jun 2012 21:29:08 +0600	[thread overview]
Message-ID: <20120617212908.4fcfd19c@natsu> (raw)
In-Reply-To: <20120614113316.GR32402@twin.jikos.cz>

[-- Attachment #1: Type: text/plain, Size: 3925 bytes --]

On Thu, 14 Jun 2012 13:33:16 +0200
David Sterba <dave@jikos.cz> wrote:

> On Sat, Jun 09, 2012 at 01:38:22AM +0600, Roman Mamedov wrote:
> > Before the upgrade (on 3.2.18):
> > 
> > Metadata, DUP: total=9.38GB, used=5.94GB
> > 
> > After the FS has been mounted once with 3.4.1:
> > 
> > Data: total=3.44TB, used=2.67TB
> > System, DUP: total=8.00MB, used=412.00KB
> > System: total=4.00MB, used=0.00
> > Metadata, DUP: total=84.38GB, used=5.94GB
> > 
> > Where did my 75 GB of free space just went?
> 
> This is caused by the patch (credits for bisecting it go to Arne)
> 
> commit cf1d72c9ceec391d34c48724da57282e97f01122
> Author: Chris Mason <chris.mason@oracle.com>
> Date:   Fri Jan 6 15:41:34 2012 -0500
> 
>     Btrfs: lower the bar for chunk allocation
> 
>     The chunk allocation code has tried to keep a pretty tight lid on creating new
>     metadata chunks.  This is partially because in the past the reservation
>     code didn't give us an accurate idea of how much space was being used.
> 
>     The new code is much more accurate, so we're able to get rid of some of these
>     checks.
> ---
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3263,27 +3263,12 @@ static int should_alloc_chunk(struct btrfs_root *root,
>                 if (num_bytes - num_allocated < thresh)
>                         return 1;
>         }
> -
> -       /*
> -        * we have two similar checks here, one based on percentage
> -        * and once based on a hard number of 256MB.  The idea
> -        * is that if we have a good amount of free
> -        * room, don't allocate a chunk.  A good mount is
> -        * less than 80% utilized of the chunks we have allocated,
> -        * or more than 256MB free
> -        */
> -       if (num_allocated + alloc_bytes + 256 * 1024 * 1024 < num_bytes)
> -               return 0;
> -
> -       if (num_allocated + alloc_bytes < div_factor(num_bytes, 8))
> -               return 0;
> -
>         thresh = btrfs_super_total_bytes(root->fs_info->super_copy);
> 
> -       /* 256MB or 5% of the FS */
> -       thresh = max_t(u64, 256 * 1024 * 1024, div_factor_fine(thresh, 5));
> +       /* 256MB or 2% of the FS */
> +       thresh = max_t(u64, 256 * 1024 * 1024, div_factor_fine(thresh, 2));
> 
> -       if (num_bytes > thresh && sinfo->bytes_used < div_factor(num_bytes, 3))
> +       if (num_bytes > thresh && sinfo->bytes_used < div_factor(num_bytes, 8))
>                 return 0;
>         return 1;
>  }
> ---
> 
> Originally there were 2 types of check, based on +256M and on
> percentage. The former are removed which leaves only the percentage
> thresholds. If there's less than 2% of the fs of metadata actually used,
> the metadata are reserved exactly to 2%. When acutual usage goes over
> 2%, there's always at least 20% over-reservation,
> 
>    sinfo->bytes_used < div_factor(num_bytes, 8)
> 
> ie the threshold is 80%, which may be wasteful for large fs.
> 
> So, the metadata chunks are immediately pinned to 2% of the filesystem
> after first few writes, and this is what you observe.
> 
> Running balance will remove the unused metadata chunks, but only to the
> 2% level.
> 
> [end of analysis]
> 
> So what to do now? Simply reverting the +256M checks works and restores
> more or less the original behaviour.


Thanks.
So should I try restoring both of these, and leave the rest as is?

> -       if (num_allocated + alloc_bytes + 256 * 1024 * 1024 < num_bytes)
> -               return 0;
> -
> -       if (num_allocated + alloc_bytes < div_factor(num_bytes, 8))
> -               return 0;

Or would it make more sense to try rolling back that patch completely?

-- 
With respect,
Roman

~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free."

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

     prev parent reply	other threads:[~2012-06-17 15:29 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-08 19:38 Massive metadata size increase after upgrade from 3.2.18 to 3.4.1 Roman Mamedov
2012-06-12 17:38 ` Calvin Walton
2012-06-13 10:30   ` Anand Jain
2012-06-14 11:33 ` David Sterba
2012-06-17 15:29   ` Roman Mamedov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120617212908.4fcfd19c@natsu \
    --to=rm@romanrm.ru \
    --cc=chris.mason@fusionio.com \
    --cc=dave@jikos.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=sensille@gmx.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.