Massive metadata size increase after upgrade from 3.2.18 to 3.4.1

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Massive metadata size increase after upgrade from 3.2.18 to 3.4.1
@ 2012-06-08 19:38 Roman Mamedov
  2012-06-12 17:38 ` Calvin Walton
  2012-06-14 11:33 ` David Sterba
  0 siblings, 2 replies; 5+ messages in thread
From: Roman Mamedov @ 2012-06-08 19:38 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 505 bytes --]

Hello,

Before the upgrade (on 3.2.18):

Metadata, DUP: total=9.38GB, used=5.94GB

After the FS has been mounted once with 3.4.1:

Data: total=3.44TB, used=2.67TB
System, DUP: total=8.00MB, used=412.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=84.38GB, used=5.94GB

Where did my 75 GB of free space just went?

-- 
With respect,
Roman

~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free."

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Massive metadata size increase after upgrade from 3.2.18 to 3.4.1
  2012-06-08 19:38 Massive metadata size increase after upgrade from 3.2.18 to 3.4.1 Roman Mamedov
@ 2012-06-12 17:38 ` Calvin Walton
  2012-06-13 10:30   ` Anand Jain
  2012-06-14 11:33 ` David Sterba
  1 sibling, 1 reply; 5+ messages in thread
From: Calvin Walton @ 2012-06-12 17:38 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-btrfs

On Sat, 2012-06-09 at 01:38 +0600, Roman Mamedov wrote:
> Hello,
> 
> Before the upgrade (on 3.2.18):
> 
> Metadata, DUP: total=9.38GB, used=5.94GB
> 
> After the FS has been mounted once with 3.4.1:
> 
> Data: total=3.44TB, used=2.67TB
> System, DUP: total=8.00MB, used=412.00KB
> System: total=4.00MB, used=0.00
> Metadata, DUP: total=84.38GB, used=5.94GB
> 
> Where did my 75 GB of free space just went?

Btrfs tries to keep a certain ratio of allocated data space to allocated
metadata space at all times, in order to ensure that there is always
some free metadata space available. In 3.3 (I believe, but haven't
actually checked...) this ratio was increased, since people were still
complaining about btrfs reporting out of space errors too soon.

On a filesystem containing (a relatively small number of) large files,
it probably over-allocates the metadata space, which is what you're
seeing. I'm not sure if the ratio is tunable.

But better to have a bit of unused metadata space than to get 'out of
space' errors once you've filled your disk and you're trying to delete
some files!

-- 
Calvin Walton <calvin.walton@kepstin.ca>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Massive metadata size increase after upgrade from 3.2.18 to 3.4.1
  2012-06-12 17:38 ` Calvin Walton
@ 2012-06-13 10:30   ` Anand Jain
  0 siblings, 0 replies; 5+ messages in thread
From: Anand Jain @ 2012-06-13 10:30 UTC (permalink / raw)
  To: Calvin Walton; +Cc: Roman Mamedov, linux-btrfs


  Did you try balance ? (also there is a balance option
  to pick the least utilized metadata chunks).

  in long run when you have the understanding of your
  files and sizes tuning using mount option metadata_ratio
  might help.

  but not sure how the  metadata expanded to 84.38G
  was there any major delete operation on the filesystem?

thanks, Anand
   

On 13/06/12 01:38, Calvin Walton wrote:
> On Sat, 2012-06-09 at 01:38 +0600, Roman Mamedov wrote:
>> Hello,
>>
>> Before the upgrade (on 3.2.18):
>>
>> Metadata, DUP: total=9.38GB, used=5.94GB
>>
>> After the FS has been mounted once with 3.4.1:
>>
>> Data: total=3.44TB, used=2.67TB
>> System, DUP: total=8.00MB, used=412.00KB
>> System: total=4.00MB, used=0.00
>> Metadata, DUP: total=84.38GB, used=5.94GB
>>
>> Where did my 75 GB of free space just went?
>
> Btrfs tries to keep a certain ratio of allocated data space to allocated
> metadata space at all times, in order to ensure that there is always
> some free metadata space available. In 3.3 (I believe, but haven't
> actually checked...) this ratio was increased, since people were still
> complaining about btrfs reporting out of space errors too soon.
>
> On a filesystem containing (a relatively small number of) large files,
> it probably over-allocates the metadata space, which is what you're
> seeing. I'm not sure if the ratio is tunable.
>
> But better to have a bit of unused metadata space than to get 'out of
> space' errors once you've filled your disk and you're trying to delete
> some files!
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Massive metadata size increase after upgrade from 3.2.18 to 3.4.1
  2012-06-08 19:38 Massive metadata size increase after upgrade from 3.2.18 to 3.4.1 Roman Mamedov
  2012-06-12 17:38 ` Calvin Walton
@ 2012-06-14 11:33 ` David Sterba
  2012-06-17 15:29   ` Roman Mamedov
  1 sibling, 1 reply; 5+ messages in thread
From: David Sterba @ 2012-06-14 11:33 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-btrfs, sensille, chris.mason

On Sat, Jun 09, 2012 at 01:38:22AM +0600, Roman Mamedov wrote:
> Before the upgrade (on 3.2.18):
> 
> Metadata, DUP: total=9.38GB, used=5.94GB
> 
> After the FS has been mounted once with 3.4.1:
> 
> Data: total=3.44TB, used=2.67TB
> System, DUP: total=8.00MB, used=412.00KB
> System: total=4.00MB, used=0.00
> Metadata, DUP: total=84.38GB, used=5.94GB
> 
> Where did my 75 GB of free space just went?

This is caused by the patch (credits for bisecting it go to Arne)

commit cf1d72c9ceec391d34c48724da57282e97f01122
Author: Chris Mason <chris.mason@oracle.com>
Date:   Fri Jan 6 15:41:34 2012 -0500

    Btrfs: lower the bar for chunk allocation

    The chunk allocation code has tried to keep a pretty tight lid on creating new
    metadata chunks.  This is partially because in the past the reservation
    code didn't give us an accurate idea of how much space was being used.

    The new code is much more accurate, so we're able to get rid of some of these
    checks.
---
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3263,27 +3263,12 @@ static int should_alloc_chunk(struct btrfs_root *root,
                if (num_bytes - num_allocated < thresh)
                        return 1;
        }
-
-       /*
-        * we have two similar checks here, one based on percentage
-        * and once based on a hard number of 256MB.  The idea
-        * is that if we have a good amount of free
-        * room, don't allocate a chunk.  A good mount is
-        * less than 80% utilized of the chunks we have allocated,
-        * or more than 256MB free
-        */
-       if (num_allocated + alloc_bytes + 256 * 1024 * 1024 < num_bytes)
-               return 0;
-
-       if (num_allocated + alloc_bytes < div_factor(num_bytes, 8))
-               return 0;
-
        thresh = btrfs_super_total_bytes(root->fs_info->super_copy);

-       /* 256MB or 5% of the FS */
-       thresh = max_t(u64, 256 * 1024 * 1024, div_factor_fine(thresh, 5));
+       /* 256MB or 2% of the FS */
+       thresh = max_t(u64, 256 * 1024 * 1024, div_factor_fine(thresh, 2));

-       if (num_bytes > thresh && sinfo->bytes_used < div_factor(num_bytes, 3))
+       if (num_bytes > thresh && sinfo->bytes_used < div_factor(num_bytes, 8))
                return 0;
        return 1;
 }
---

Originally there were 2 types of check, based on +256M and on
percentage. The former are removed which leaves only the percentage
thresholds. If there's less than 2% of the fs of metadata actually used,
the metadata are reserved exactly to 2%. When acutual usage goes over
2%, there's always at least 20% over-reservation,

   sinfo->bytes_used < div_factor(num_bytes, 8)

ie the threshold is 80%, which may be wasteful for large fs.

So, the metadata chunks are immediately pinned to 2% of the filesystem
after first few writes, and this is what you observe.

Running balance will remove the unused metadata chunks, but only to the
2% level.

[end of analysis]

So what to do now? Simply reverting the +256M checks works and restores
more or less the original behaviour. I don't know the reason why the
patch has been added. The patch preceeding the 'lower-the-bar' is

commit 203bf287cb01a5dc26c20bd3737cecf3aeba1d48
Author: Chris Mason <chris.mason@oracle.com>
Date:   Fri Jan 6 15:23:57 2012 -0500

    Btrfs: run chunk allocations while we do delayed refs

    Btrfs tries to batch extent allocation tree changes to improve performance
    and reduce metadata trashing.  But it doesn't allocate new metadata chunks
    while it is doing allocations for the extent allocation tree.
---

"but it doesn't allocate ... while ..." this sounds like the scenario
where over-reservation of metadata would help and avoid ENOSPC. I did
tests that are presumably metadata-hungrly, like heavy snapshoting and
then decowing and watched the metadata growth rate. At most hundred megs
per second on a fast box in the worst case. And it's not a single
operation, probably spans more transactions and doing lots of other
stuff with opportunities to grab more chunks in advance and not leading
to the hypothetical problematic situtation.

I've been working on this for some time, trying to break it, without
"success". The solution I'd propose here is to reintroduce the +256M
checks (or similar threshold value).

david

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Massive metadata size increase after upgrade from 3.2.18 to 3.4.1
  2012-06-14 11:33 ` David Sterba
@ 2012-06-17 15:29   ` Roman Mamedov
  0 siblings, 0 replies; 5+ messages in thread
From: Roman Mamedov @ 2012-06-17 15:29 UTC (permalink / raw)
  To: dave; +Cc: linux-btrfs, sensille, chris.mason

[-- Attachment #1: Type: text/plain, Size: 3925 bytes --]

On Thu, 14 Jun 2012 13:33:16 +0200
David Sterba <dave@jikos.cz> wrote:

> On Sat, Jun 09, 2012 at 01:38:22AM +0600, Roman Mamedov wrote:
> > Before the upgrade (on 3.2.18):
> > 
> > Metadata, DUP: total=9.38GB, used=5.94GB
> > 
> > After the FS has been mounted once with 3.4.1:
> > 
> > Data: total=3.44TB, used=2.67TB
> > System, DUP: total=8.00MB, used=412.00KB
> > System: total=4.00MB, used=0.00
> > Metadata, DUP: total=84.38GB, used=5.94GB
> > 
> > Where did my 75 GB of free space just went?
> 
> This is caused by the patch (credits for bisecting it go to Arne)
> 
> commit cf1d72c9ceec391d34c48724da57282e97f01122
> Author: Chris Mason <chris.mason@oracle.com>
> Date:   Fri Jan 6 15:41:34 2012 -0500
> 
>     Btrfs: lower the bar for chunk allocation
> 
>     The chunk allocation code has tried to keep a pretty tight lid on creating new
>     metadata chunks.  This is partially because in the past the reservation
>     code didn't give us an accurate idea of how much space was being used.
> 
>     The new code is much more accurate, so we're able to get rid of some of these
>     checks.
> ---
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3263,27 +3263,12 @@ static int should_alloc_chunk(struct btrfs_root *root,
>                 if (num_bytes - num_allocated < thresh)
>                         return 1;
>         }
> -
> -       /*
> -        * we have two similar checks here, one based on percentage
> -        * and once based on a hard number of 256MB.  The idea
> -        * is that if we have a good amount of free
> -        * room, don't allocate a chunk.  A good mount is
> -        * less than 80% utilized of the chunks we have allocated,
> -        * or more than 256MB free
> -        */
> -       if (num_allocated + alloc_bytes + 256 * 1024 * 1024 < num_bytes)
> -               return 0;
> -
> -       if (num_allocated + alloc_bytes < div_factor(num_bytes, 8))
> -               return 0;
> -
>         thresh = btrfs_super_total_bytes(root->fs_info->super_copy);
> 
> -       /* 256MB or 5% of the FS */
> -       thresh = max_t(u64, 256 * 1024 * 1024, div_factor_fine(thresh, 5));
> +       /* 256MB or 2% of the FS */
> +       thresh = max_t(u64, 256 * 1024 * 1024, div_factor_fine(thresh, 2));
> 
> -       if (num_bytes > thresh && sinfo->bytes_used < div_factor(num_bytes, 3))
> +       if (num_bytes > thresh && sinfo->bytes_used < div_factor(num_bytes, 8))
>                 return 0;
>         return 1;
>  }
> ---
> 
> Originally there were 2 types of check, based on +256M and on
> percentage. The former are removed which leaves only the percentage
> thresholds. If there's less than 2% of the fs of metadata actually used,
> the metadata are reserved exactly to 2%. When acutual usage goes over
> 2%, there's always at least 20% over-reservation,
> 
>    sinfo->bytes_used < div_factor(num_bytes, 8)
> 
> ie the threshold is 80%, which may be wasteful for large fs.
> 
> So, the metadata chunks are immediately pinned to 2% of the filesystem
> after first few writes, and this is what you observe.
> 
> Running balance will remove the unused metadata chunks, but only to the
> 2% level.
> 
> [end of analysis]
> 
> So what to do now? Simply reverting the +256M checks works and restores
> more or less the original behaviour.


Thanks.
So should I try restoring both of these, and leave the rest as is?

> -       if (num_allocated + alloc_bytes + 256 * 1024 * 1024 < num_bytes)
> -               return 0;
> -
> -       if (num_allocated + alloc_bytes < div_factor(num_bytes, 8))
> -               return 0;

Or would it make more sense to try rolling back that patch completely?

-- 
With respect,
Roman

~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free."

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-06-17 15:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-08 19:38 Massive metadata size increase after upgrade from 3.2.18 to 3.4.1 Roman Mamedov
2012-06-12 17:38 ` Calvin Walton
2012-06-13 10:30   ` Anand Jain
2012-06-14 11:33 ` David Sterba
2012-06-17 15:29   ` Roman Mamedov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).