From: "Ospan, Abylay" <aospan@amazon.com>
To: Filipe Manana <fdmanana@kernel.org>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: RE: btrfs_extent_map memory consumption results in "Out of memory"
Date: Tue, 10 Oct 2023 21:23:22 +0000 [thread overview]
Message-ID: <ddb589008e7a4419b67134be7ae90f8b@amazon.com> (raw)
In-Reply-To: <ZSVyFaWA5KZ0nTEN@debian0.Home>
Hi Filipe,
Thanks for the info!
I was just wondering about "direct IO writes", so I ran a quick test by fully removing fio's config option "direct=1" (default value is false).
Unfortunately, I'm still experiencing the same oom-kill:
[ 4843.936881] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=fio,pid=649,uid=0
[ 4843.939001] Out of memory: Killed process 649 (fio) total-vm:216868kB, anon-rss:896kB, file-rss:128kB, shmem-rss:2176kB, UID:0 pgtables:100kB oom_score_a0
[ 5306.210082] tmux: server invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
...
[ 5306.240968] Unreclaimable slab info:
[ 5306.241271] Name Used Total
[ 5306.242700] btrfs_extent_map 26093KB 26093KB
Here's my updated fio config:
[global]
name=fio-rand-write
filename=fio-rand-write
rw=randwrite
bs=4K
numjobs=1
time_based
runtime=90000
[file1]
size=3G
iodepth=1
"slabtop -s -a" output:
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
206080 206080 100% 0.14K 7360 28 29440K btrfs_extent_map
I accelerated my testing by running fio test inside a QEMU VM with a limited amount of RAM (140MB):
qemu-kvm
-kernel bzImage.v6.6 \
-m 140M \
-drive file=rootfs.btrfs,format=raw,if=none,id=drive0
...
It appears that this issue may not be limited to direct IO writes alone?
Thank you!
-----Original Message-----
From: Filipe Manana <fdmanana@kernel.org>
Sent: Tuesday, October 10, 2023 11:48 AM
To: Ospan, Abylay <aospan@amazon.com>
Cc: linux-btrfs@vger.kernel.org
Subject: RE: [EXTERNAL] btrfs_extent_map memory consumption results in "Out of memory"
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
On Tue, Oct 10, 2023 at 03:02:21PM +0000, Ospan, Abylay wrote:
> Greetings Btrfs development team!
>
> I would like to express my gratitude for your outstanding work on Btrfs. However, I recently experienced an 'out of memory' issue as described below.
>
> Steps to reproduce:
>
> 1. Run FIO test on a btrfs partition with random write on a 300GB file:
>
> cat <<EOF >> rand.fio
> [global]
> name=fio-rand-write
> filename=fio-rand-write
> rw=randwrite
> bs=4K
> direct=1
> numjobs=16
> time_based
> runtime=90000
>
> [file1]
> size=300G
> ioengine=libaio
> iodepth=16
> EOF
>
> fio rand.fio
>
> 2. Monitor slab consumption with "slabtop -s -a"
>
> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> 25820620 23138538 89% 0.14K 922165 28 3688660K btrfs_extent_map
>
> 3. Observe oom-killer:
> [49689.294138] ip invoked oom-killer:
> gfp_mask=0xc2cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_NOMEMALLOC), order=3, oom_score_adj=0 ...
> [49689.294425] Unreclaimable slab info:
> [49689.294426] Name Used Total
> [49689.329363] btrfs_extent_map 3207098KB 3375622KB
> ...
>
> Memory usage by btrfs_extent_map gradually increases until it reaches a critical point, causing the system to run out of memory.
>
> Test environment: Intel CPU, 8GB RAM (To expedite the reproduction of this issue, I also conducted tests within QEMU with a restricted amount of memory).
> Linux kernel tested: LTS 5.15.133, and mainline 6.6-rc5
>
> Quick review of the 'fs/btrfs/extent_map.c' code reveals no built-in limitations on memory allocation for extents mapping.
> Are there any known workarounds or alternative solutions to mitigate this issue?
No workarounds really.
So once we add an extent map to the inode's rbtree, it will stay there until:
1) The corresponding pages in the file range get released due to memory pressure or whatever reason decided by the MM side.
The release happens in the callback struct address_space_operations::release_folio, which is btrfs_release_folio for btrfs.
In your case it's direct IO writes... so there are no pages to release, and therefore extent maps don't get released by
that mechanism.
2) The inode is evicted - when evicted of course we drop all extent maps and release all memory.
If some application is keeping a file descriptor open for the inode, and writes keep happening (or reads, since they create
an extent map for the range read too), then no extent maps are released...
Databases and other software often do that, keep file descriptors open for long periods, while reads and writes are happening
by the some process or others.
The other side effect, even if no memory exhaustion happens, is that it slows down writes, reads, and other operations, due to large red black trees of extent maps.
I don't have a ready solution for that, but I had some thinking about this about a year ago or so.
The simplest solution would be to not keep extent maps in memory for direct IO writes/reads...
but it may slow down in some cases at least.
Soon I may start some work to improve that.
Thanks.
>
> Thank you!
>
> --
> Abylay Ospan
>
>
next prev parent reply other threads:[~2023-10-10 21:23 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-10 15:02 btrfs_extent_map memory consumption results in "Out of memory" Ospan, Abylay
2023-10-10 15:47 ` Filipe Manana
2023-10-10 21:23 ` Ospan, Abylay [this message]
2023-10-10 21:44 ` Filipe Manana
2023-10-12 14:24 ` Ospan, Abylay
-- strict thread matches above, loose matches on Subject: below --
2023-10-18 22:45 fdavidl073rnovn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ddb589008e7a4419b67134be7ae90f8b@amazon.com \
--to=aospan@amazon.com \
--cc=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).