linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ospan, Abylay" <aospan@amazon.com>
To: Filipe Manana <fdmanana@kernel.org>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: RE: btrfs_extent_map memory consumption results in "Out of memory"
Date: Tue, 10 Oct 2023 21:23:22 +0000	[thread overview]
Message-ID: <ddb589008e7a4419b67134be7ae90f8b@amazon.com> (raw)
In-Reply-To: <ZSVyFaWA5KZ0nTEN@debian0.Home>

Hi Filipe,

Thanks for the info!

I was just wondering about "direct IO writes", so I ran a quick test by fully removing fio's config option "direct=1" (default value is false).
Unfortunately, I'm still experiencing the same oom-kill: 

[ 4843.936881] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=fio,pid=649,uid=0
[ 4843.939001] Out of memory: Killed process 649 (fio) total-vm:216868kB, anon-rss:896kB, file-rss:128kB, shmem-rss:2176kB, UID:0 pgtables:100kB oom_score_a0
[ 5306.210082] tmux: server invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
...
[ 5306.240968] Unreclaimable slab info:
[ 5306.241271] Name                      Used          Total
[ 5306.242700] btrfs_extent_map       26093KB      26093KB

Here's my updated fio config:
[global]
name=fio-rand-write
filename=fio-rand-write
rw=randwrite
bs=4K
numjobs=1
time_based
runtime=90000

[file1]
size=3G
iodepth=1

"slabtop -s -a" output:
  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
206080 206080 100%    0.14K   7360       28     29440K btrfs_extent_map

I accelerated my testing by running fio test inside a QEMU VM with a limited amount of RAM (140MB):

qemu-kvm
  -kernel bzImage.v6.6 \
  -m 140M  \
  -drive file=rootfs.btrfs,format=raw,if=none,id=drive0
...

It appears that this issue may not be limited to direct IO writes alone?

Thank you!

-----Original Message-----
From: Filipe Manana <fdmanana@kernel.org> 
Sent: Tuesday, October 10, 2023 11:48 AM
To: Ospan, Abylay <aospan@amazon.com>
Cc: linux-btrfs@vger.kernel.org
Subject: RE: [EXTERNAL] btrfs_extent_map memory consumption results in "Out of memory"

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



On Tue, Oct 10, 2023 at 03:02:21PM +0000, Ospan, Abylay wrote:
> Greetings Btrfs development team!
>
> I would like to express my gratitude for your outstanding work on Btrfs. However, I recently experienced an 'out of memory' issue as described below.
>
> Steps to reproduce:
>
> 1. Run FIO test on a btrfs partition with random write on a 300GB file:
>
> cat <<EOF >> rand.fio
> [global]
> name=fio-rand-write
> filename=fio-rand-write
> rw=randwrite
> bs=4K
> direct=1
> numjobs=16
> time_based
> runtime=90000
>
> [file1]
> size=300G
> ioengine=libaio
> iodepth=16
> EOF
>
> fio rand.fio
>
> 2. Monitor slab consumption with "slabtop -s -a"
>
>   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> 25820620 23138538  89%    0.14K 922165       28   3688660K btrfs_extent_map
>
> 3. Observe oom-killer:
> [49689.294138] ip invoked oom-killer: 
> gfp_mask=0xc2cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_NOMEMALLOC), order=3, oom_score_adj=0 ...
> [49689.294425] Unreclaimable slab info:
> [49689.294426] Name                      Used          Total
> [49689.329363] btrfs_extent_map     3207098KB    3375622KB
> ...
>
> Memory usage by btrfs_extent_map gradually increases until it reaches a critical point, causing the system to run out of memory.
>
> Test environment: Intel CPU, 8GB RAM (To expedite the reproduction of this issue, I also conducted tests within QEMU with a restricted amount of memory).
> Linux kernel tested: LTS 5.15.133, and mainline 6.6-rc5
>
> Quick review of the 'fs/btrfs/extent_map.c' code reveals no built-in limitations on memory allocation for extents mapping.
> Are there any known workarounds or alternative solutions to mitigate this issue?

No workarounds really.

So once we add an extent map to the inode's rbtree, it will stay there until:

1) The corresponding pages in the file range get released due to memory pressure or whatever reason decided by the MM side.
   The release happens in the callback struct address_space_operations::release_folio, which is btrfs_release_folio for btrfs.
   In your case it's direct IO writes... so there are no pages to release, and therefore extent maps don't get released by
   that mechanism.

2) The inode is evicted - when evicted of course we drop all extent maps and release all memory.
   If some application is keeping a file descriptor open for the inode, and writes keep happening (or reads, since they create
   an extent map for the range read too), then no extent maps are released...
   Databases and other software often do that, keep file descriptors open for long periods, while reads and writes are happening
   by the some process or others.

The other side effect, even if no memory exhaustion happens, is that it slows down writes, reads, and other operations, due to large red black trees of extent maps.

I don't have a ready solution for that, but I had some thinking about this about a year ago or so.
The simplest solution would be to not keep extent maps in memory for direct IO writes/reads...
but it may slow down in some cases at least.

Soon I may start some work to improve that.

Thanks.

>
> Thank you!
>
> --
> Abylay Ospan
>
>

  reply	other threads:[~2023-10-10 21:23 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-10 15:02 btrfs_extent_map memory consumption results in "Out of memory" Ospan, Abylay
2023-10-10 15:47 ` Filipe Manana
2023-10-10 21:23   ` Ospan, Abylay [this message]
2023-10-10 21:44     ` Filipe Manana
2023-10-12 14:24       ` Ospan, Abylay
  -- strict thread matches above, loose matches on Subject: below --
2023-10-18 22:45 fdavidl073rnovn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ddb589008e7a4419b67134be7ae90f8b@amazon.com \
    --to=aospan@amazon.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).