From: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>, linux-btrfs@vger.kernel.org
Cc: Josef Bacik <jbacik@fb.com>
Subject: Re: btrfs filesystem keeps allocating new chunks for no apparent reason
Date: Fri, 7 Apr 2017 23:25:29 +0200 [thread overview]
Message-ID: <5b642448-951e-5b5e-1343-0299a950089c@mendix.com> (raw)
In-Reply-To: <89a684c7-364e-f409-5348-bc0077fd438c@cn.fujitsu.com>
Ok, I'm going to revive a year old mail thread here with interesting new
info:
On 05/31/2016 03:36 AM, Qu Wenruo wrote:
>
>
> Hans van Kranenburg wrote on 2016/05/06 23:28 +0200:
>> Hi,
>>
>> I've got a mostly inactive btrfs filesystem inside a virtual machine
>> somewhere that shows interesting behaviour: while no interesting disk
>> activity is going on, btrfs keeps allocating new chunks, a GiB at a time.
>>
>> A picture, telling more than 1000 words:
>> https://syrinx.knorrie.org/~knorrie/btrfs/keep/btrfs_usage_ichiban.png
>> (when the amount of allocated/unused goes down, I did a btrfs balance)
That picture is still there, for the idea.
> Nice picture.
> Really better than 1000 words.
>
> AFAIK, the problem may be caused by fragments.
Free space fragmentation is a key thing here indeed.
The major two things involved here are 1) the extent allocator, which
causes the free space fragmentation 2) the extent allocator, which
doesn't handle the fragmentation it just caused really well.
Let's start with the pictures, instead of too many words. The following
two videos are png images of the 4 block groups with highest vaddr.
Every 15 minutes a picture is created, and then they're added together:
https://syrinx.knorrie.org/~knorrie/btrfs/keep/2017-01-19-noautodefrag-ichiban.mp4
And, with autodefrag enabled, which was the first thing I tried as a change:
https://syrinx.knorrie.org/~knorrie/btrfs/keep/2017-01-13-autodefrag-ichiban.mp4
So, this is why putting your /var/log, /var/lib/mailman and /var/spool
on btrfs is a terrible idea.
Because the allocator keeps walking forward every file that is created
and then removed leaves a blank spot behind.
Autodefrag makes the situation only a little bit better, changing the
resulting pattern from a sky full of stars into a snowstorm. The result
of taking a few small writes and rewriting them again is that again the
small parts of free space are left behind.
Just a random idea.. for this write pattern, always putting new writes
in the first free available spot at the beginning of the block group
would make a total difference, since the little 4/8KiB parts would be
filled up again all the time, preventing the shotgun blast to spread all
over.
> And even I saw some early prototypes inside the codes to allow btrfs do
> allocation smaller extent than required.
> (E.g. caller needs 2M extent, but btrfs returns 2 1M extents)
>
> But it's still prototype and seems no one is really working on it now.
>
> So when btrfs is writing new data, for example, to write about 16M data,
> it will need to allocate a 16M continuous extent, and if it can't find
> large enough space to allocate, then create a new data chunk.
>
> [...]
That's the cluster idea right? Combining free space fragments into a
bigger piece of space to fill with writes?
The fun thing is that this might work, but because of the pattern we end
up with, a large write apparently fails (the files downloaded when doing
apt-get update by daily cron) which causes a new chunk allocation. This
is clearly visible in the videos. Directly after that, the new chunk
gets filled with the same pattern, because the extent allocator now
continues there and next day same thing happens again etc...
And voila, there's the answer to my original question.
Now, another surprise:
>From the exact moment I did mount -o remount,nossd on this filesystem,
the problem vanished.
https://syrinx.knorrie.org/~knorrie/btrfs/keep/2017-04-07-ichiban-munin-nossd.png
I don't have a new video yet, but I'll set up a cron tonight and post it
later.
I'm going to send another mail specifically about the nossd/ssd
behaviour and other things I found out last week, but that'll probably
be tomorrow.
--
Hans van Kranenburg
next prev parent reply other threads:[~2017-04-07 21:25 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-06 21:28 btrfs filesystem keeps allocating new chunks for no apparent reason Hans van Kranenburg
2016-05-30 11:07 ` Hans van Kranenburg
2016-05-30 19:55 ` Duncan
2016-05-30 21:18 ` Hans van Kranenburg
2016-05-30 21:55 ` Duncan
2016-05-31 1:36 ` Qu Wenruo
2016-06-08 23:10 ` Hans van Kranenburg
2016-06-09 8:52 ` Marc Haber
2016-06-09 10:37 ` Hans van Kranenburg
2016-06-09 15:41 ` Duncan
2016-06-10 17:07 ` Henk Slager
2016-06-11 15:23 ` Hans van Kranenburg
2016-06-09 18:07 ` Chris Murphy
2017-04-07 21:25 ` Hans van Kranenburg [this message]
2017-04-07 23:56 ` Peter Grandi
2017-04-08 7:09 ` Duncan
2017-04-08 11:16 ` Hans van Kranenburg
2017-04-08 11:35 ` Hans van Kranenburg
2017-04-09 23:23 ` Hans van Kranenburg
2017-04-10 12:39 ` Austin S. Hemmelgarn
2017-04-10 12:45 ` Kai Krakow
2017-04-10 12:51 ` Austin S. Hemmelgarn
2017-04-10 16:53 ` Kai Krakow
[not found] ` <20170410184444.08ced097@jupiter.sol.local>
2017-04-10 16:54 ` Kai Krakow
2017-04-10 17:13 ` Austin S. Hemmelgarn
2017-04-10 18:18 ` Kai Krakow
2017-04-10 19:43 ` Austin S. Hemmelgarn
2017-04-10 22:21 ` Adam Borowski
2017-04-11 4:01 ` Kai Krakow
2017-04-11 9:55 ` Adam Borowski
2017-04-11 11:16 ` Austin S. Hemmelgarn
2017-04-10 23:45 ` Janos Toth F.
2017-04-11 3:56 ` Kai Krakow
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5b642448-951e-5b5e-1343-0299a950089c@mendix.com \
--to=hans.van.kranenburg@mendix.com \
--cc=jbacik@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).