From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Georgi Georgiev <georgi-georgiev-btrfs@japannext.co.jp>,
<linux-btrfs@vger.kernel.org>
Subject: Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
Date: Wed, 29 Jul 2015 14:19:17 +0800 [thread overview]
Message-ID: <55B87065.4060703@cn.fujitsu.com> (raw)
In-Reply-To: <20150729054659.GD9039@jnext-0060.corp.japannext.co.jp>
Hi,
Georgi Georgiev wrote on 2015/07/29 14:46 +0900:
> Using BTRFS on a very large filesystem, and as we put and more data to
> it, the time it takes to mount it grew to, presently, about 30 minutes.
> Is there something wrong with the filesystem? Is there a way to bring
> this time down?
>
> ...
>
> Here is a snippet from dmesg, showing how long it takes to mount (the
> EXT4-fs line is the filesystem mounted next in the boot sequence):
>
> $ dmesg | grep -A1 btrfs
> [ 12.215764] TECH PREVIEW: btrfs may not be fully supported.
> [ 12.215766] Please review provided documentation for limitations.
> --
> [ 12.220266] btrfs: use zlib compression
> [ 12.220815] btrfs: disk space caching is enabled
> [ 22.427258] btrfs: bdev /dev/mapper/datavg-backuplv errs: wr 0, rd 0, flush 0, corrupt 0, gen 0
> [ 2022.397318] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts:
>
Quite common, especial when it grows large.
But it would be much better to use ftrace to show which btrfs operation
takes the most time.
We have some guess on this, from reading space cache to reading chunk info.
But didn't know which takes the most of time.
> The btrfs filesystem is quite large:
>
> $ sudo btrfs filesystem usage /dev/mapper/datavg-backuplv
> Overall:
> Device size: 82.58TiB
> Device allocated: 82.58TiB
> Device unallocated: 0.00B
> Device missing: 0.00B
> Used: 62.01TiB
> Free (estimated): 17.76TiB (min: 17.76TiB)
> Data ratio: 1.00
> Metadata ratio: 2.00
> Global reserve: 0.00B (used: 0.00B)
>
> Data,single: Size:79.28TiB, Used:61.52TiB
> /dev/mapper/datavg-backuplv 79.28TiB
>
> Metadata,single: Size:8.00MiB, Used:0.00B
> /dev/mapper/datavg-backuplv 8.00MiB
>
> Metadata,DUP: Size:1.65TiB, Used:252.68GiB
> /dev/mapper/datavg-backuplv 3.30TiB
>
> System,single: Size:4.00MiB, Used:0.00B
> /dev/mapper/datavg-backuplv 4.00MiB
>
> System,DUP: Size:40.00MiB, Used:8.66MiB
> /dev/mapper/datavg-backuplv 80.00MiB
>
> Unallocated:
> /dev/mapper/datavg-backuplv 0.00B
Wow, near 100T, that really huge now.
>
> Other info about the filesystem is that it has a rather large number of
> files and subvolumes and read only snapshots, which started from about
> zero in March, and grew over to the current state of 3000 snapshots and
> no idea how many files (filesystem usage is quite stable at the moment).
>
> I also noticed that while the machine is rebooted on a weekly basis, the
> time it takes to come up after a reboot has been growing. This is likely
> correlated to how long it takes to mount the filesystem, and maybe
> correlated to how much data there is on the filesystem.
>
> Reboot time used to be normally about 3 minutes, then it jumped to 8
> minutes on March 21 and the following weeks it went like this:
> 8 minutes, 11 minutes, 15 minutes...
> 19, 19, 19, 19, 23, 21, 22
> 32, 33, 36, 42, 46, 37, 30
>
> This is on CentOS 6.6, and while I understand that the version of btrfs
> is definitely oldish, even trying to mount the filesystem on a much more
> recent kernel (3.14.43) there is no improvement. Switching the regular
> OS kernel from the CentOS one (2.6.32-504.12.2.el6.x86_64) to something
> more recent is also feasible.
>
> I wanted to check the sytem for problems, so tried an offline "btrfs
> check" using the latest btrfs-progs (version 4.1.2 freshly compiled from
> source), but "btrfs check" ran out of memory after about 30 minutes.
>
> The only output I get is this (timestamps added by me):
>
> 2015-07-28 18:14:45 $ sudo btrfs check /dev/datavg/backuplv
> 2015-07-28 18:33:05 checking extents
>
> And at 19:04:55 btrfs was killed by OOM: (abbreviated log below,
> full excerpt as an attachment).
Not surprised at all.
As for extent/chunk tree checking, it will read all the the chunk and
extents, and restore needed info into memory, and then do cross
reference check.
The btrfsck process really takes a lot of memory.
Maybe 1/10 or more of the metadata space.
In your case, your metadata is about 250GB, so maybe 25GB memory is used
to hold the needed info.
That's already known but we don't have some good idea or deveopler to
reduce the space usage yet.
Maybe we can change the behavior to do chunk by chunk extent cross
checking to reduce memory usage, but not now...
Thanks,
Qu
>
> 2015-07-28T19:04:55.224855+09:00 localhost kernel: [11689.692680] htop invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
> ...
> 2015-07-28T19:04:55.225855+09:00 localhost kernel: [11689.801354] 631 total pagecache pages
> 2015-07-28T19:04:55.225857+09:00 localhost kernel: [11689.801829] 0 pages in swap cache
> 2015-07-28T19:04:55.225859+09:00 localhost kernel: [11689.802305] Swap cache stats: add 0, delete 0, find 0/0
> 2015-07-28T19:04:55.225861+09:00 localhost kernel: [11689.802781] Free swap = 0kB
> 2015-07-28T19:04:55.225863+09:00 localhost kernel: [11689.803341] Total swap = 0kB
> 2015-07-28T19:04:55.225864+09:00 localhost kernel: [11689.946223] 16777215 pages RAM
> 2015-07-28T19:04:55.225867+09:00 localhost kernel: [11689.946724] 295175 pages reserved
> 2015-07-28T19:04:55.225869+09:00 localhost kernel: [11689.947223] 5173 pages shared
> 2015-07-28T19:04:55.225871+09:00 localhost kernel: [11689.947721] 16369184 pages non-shared
> 2015-07-28T19:04:55.225874+09:00 localhost kernel: [11689.948222] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
> ...
> 2015-07-28T19:04:55.225970+09:00 localhost kernel: [11689.994240] [16291] 0 16291 47166 177 18 0 0 sudo
> 2015-07-28T19:04:55.225972+09:00 localhost kernel: [11689.995232] [16292] 1000 16292 981 20 3 0 0 tai64n
> 2015-07-28T19:04:55.225974+09:00 localhost kernel: [11689.996241] [16293] 0 16293 47166 177 22 0 0 sudo
> 2015-07-28T19:04:55.225978+09:00 localhost kernel: [11689.997230] [16294] 1000 16294 1018 21 1 0 0 tai64nlocal
> 2015-07-28T19:04:55.225993+09:00 localhost kernel: [11689.998227] [16295] 0 16295 16122385 16118611 7 0 0 btrfs
> 2015-07-28T19:04:55.225995+09:00 localhost kernel: [11689.999210] [16296] 0 16296 25228 25 5 0 0 tee
> 2015-07-28T19:04:55.225997+09:00 localhost kernel: [11690.000201] [16297] 1000 16297 27133 162 1 0 0 bash
> ...
> 2015-07-28T19:04:55.226030+09:00 localhost kernel: [11690.008288] Out of memory: Kill process 16295 (btrfs) score 949 or sacrifice child
> 2015-07-28T19:04:55.226031+09:00 localhost kernel: [11690.009300] Killed process 16295, UID 0, (btrfs) total-vm:64489540kB, anon-rss:64474408kB, file-rss:36kB
>
> Thanks in advance for any advice,
>
next prev parent reply other threads:[~2015-07-29 6:19 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-29 5:46 mount btrfs takes 30 minutes, btrfs check runs out of memory Georgi Georgiev
2015-07-29 6:19 ` Qu Wenruo [this message]
[not found] <CAJ3TwYQXqUZiKhYc5rciTmvGX1RLkHnkQb5SSYAJ7AD+kbudag@mail.gmail.com>
2015-07-31 2:34 ` Qu Wenruo
2015-07-31 4:10 ` John Ettedgui
2015-08-02 5:44 ` Georgi Georgiev
[not found] ` <CAJ3TwYRN+1tJY+paz=qZT0_XP=r9CcTKbBgX_kZRFOWj8vSK=w@mail.gmail.com>
2015-07-31 4:52 ` Qu Wenruo
[not found] ` <CAJ3TwYR5g-JhjmGnZUXqLXc7qV1_=AN5_6sj54JQODbtgG9Aag@mail.gmail.com>
2015-07-31 5:40 ` Qu Wenruo
2015-07-31 5:45 ` John Ettedgui
2015-08-01 4:35 ` John Ettedgui
2015-08-01 10:05 ` Russell Coker
2015-08-04 1:39 ` Qu Wenruo
2015-08-04 1:55 ` John Ettedgui
2015-08-04 2:31 ` John Ettedgui
2015-08-04 3:01 ` Qu Wenruo
2015-08-04 4:58 ` John Ettedgui
2015-08-04 6:47 ` Duncan
2015-08-04 11:28 ` Austin S Hemmelgarn
2015-08-04 17:36 ` John Ettedgui
2015-08-05 11:30 ` Austin S Hemmelgarn
2015-08-13 22:38 ` Vincent Olivier
2015-08-13 23:19 ` Chris Murphy
2015-08-14 0:30 ` Duncan
2015-08-14 2:42 ` Vincent Olivier
2015-08-18 17:36 ` Vincent Olivier
2015-08-14 2:39 ` Vincent Olivier
[not found] ` <CAJ3TwYSW+SvbBrh1u_x+c3HTRx03qSR6BoH5cj_VzCXxZYv6EA@mail.gmail.com>
2016-07-15 3:56 ` Qu Wenruo
[not found] ` <CAJ3TwYRXwDVVfT0TRRiM9dEw-7TvY8qG=WvMYKczZOv6wkFWAQ@mail.gmail.com>
2016-07-15 5:24 ` Qu Wenruo
2016-07-15 6:56 ` Kai Krakow
[not found] ` <CAJ3TwYSTnQfj=qmBLtnmtXQKexMMD4x=9Gk3p3anf4uF+G26kw@mail.gmail.com>
[not found] ` <CAJ3TwYTnMPVwkrZEU-=Q_Nq+9Bn0vM3z+EFC8RP=RTyaufSoqw@mail.gmail.com>
2016-07-18 1:13 ` Qu Wenruo
[not found] ` <CAJ3TwYRpc_R-wVur0T6+Uy_aPVXTGpvp_ag1Ar9K2HoB0H1ySQ@mail.gmail.com>
2016-07-18 8:41 ` Qu Wenruo
[not found] ` <CAJ3TwYRH8JVkuv2Hu7FYb+BSwKGrq1spx079zwOF_FO1y=9NFA@mail.gmail.com>
2016-07-18 9:07 ` Qu Wenruo
2016-07-18 15:31 ` Duncan
[not found] ` <CAJ3TwYS6UTkWf=PNku3RG7hPrXMKz3yhk2WqCRLix4v_VwgrmA@mail.gmail.com>
2016-07-21 8:10 ` Qu Wenruo
[not found] ` <CAJ3TwYQ47SVpbO1Pb-TWjhaTCCpMFFmijwTgmV8=7+1_a6_3Ww@mail.gmail.com>
2016-07-21 8:19 ` Qu Wenruo
2016-07-21 15:47 ` Graham Cobb
2017-04-10 0:52 ` Qu Wenruo
2018-02-13 10:21 ` John Ettedgui
2018-02-13 11:04 ` Qu Wenruo
2018-02-13 11:25 ` John Ettedgui
2018-02-13 11:40 ` Qu Wenruo
2018-02-13 12:06 ` John Ettedgui
2018-02-13 12:46 ` Qu Wenruo
2018-02-13 12:52 ` John Ettedgui
2018-02-13 12:26 ` Holger Hoffstätte
2018-02-13 12:54 ` Qu Wenruo
2018-02-13 16:24 ` Holger Hoffstätte
2018-02-14 0:43 ` Qu Wenruo
2016-07-15 11:29 ` Christian Rohmann
2016-07-16 23:53 ` Qu Wenruo
2016-07-18 13:42 ` Josef Bacik
2016-07-19 0:35 ` Qu Wenruo
2016-07-25 13:01 ` David Sterba
2016-07-25 13:38 ` Josef Bacik
2015-08-04 14:38 ` Chris Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55B87065.4060703@cn.fujitsu.com \
--to=quwenruo@cn.fujitsu.com \
--cc=georgi-georgiev-btrfs@japannext.co.jp \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).