Re: Deleting large amounts of data causes system freeze due to OOM.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: fdavidl073rnovn@tutanota.com
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Qu Wenruo <wqu@suse.com>, Linux Btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Deleting large amounts of data causes system freeze due to OOM.
Date: Wed, 27 Sep 2023 03:46:24 +0200 (CEST)	[thread overview]
Message-ID: <NfJJCdh--3-9@tutanota.com> (raw)
In-Reply-To: <NeKx2tK--3-9@tutanota.com>


Sep 14, 2023, 23:08 by fdavidl073rnovn@tutanota.com:

>
> Sep 14, 2023, 05:12 by quwenruo.btrfs@gmx.com:
>
>>
>>
>> On 2023/9/14 13:08, fdavidl073rnovn@tutanota.com wrote:
>>
>>> Sep 13, 2023, 05:55 by wqu@suse.com:
>>>
>>>>
>>>>
>>>> On 2023/9/13 11:58, fdavidl073rnovn@tutanota.com wrote:
>>>>
>>>>> Dear Btrfs Mailing List,
>>>>>
>>>>> Full disclosure I reported this on kernel.org but am hoping to get more exposure on the mailing list.
>>>>>
>>>>> When I delete several terabytes of data memory usage increases until the system becomes entirely unresponsive. This has been an issue for several kernel version since at least 5.19 and continues to be an issue up to 6.5.2-artix1-1. This is on an older computer with several hard drives, eight gigabytes of memory, and a four core x86_64 cpu. Slabtop output right before the system becomes unresponsive shows about four gigabytes used by khugepaged_mm_slot and three used by btrfs_extent_map. This happens in over the span of a couple minutes and during this time btrfs-transaction is using a moderate amount of cpu time.
>>>>>
>>>>
>>>> This looks exactly like something caused by btrfs qgroup.
>>>>
>>>> Could you try to disable qgroup to see if it helps?
>>>> The amount of CPU time and IO of qgroup overhead is directly related to the amount of extent being updated.
>>>>
>>>> For normal writes the IO itself would take most of the CPU/memory thus qgroup is not a big deal.
>>>> But for massive snapshots drop or file deletion qgroup can be too large to be handled in just one transaction.
>>>>
>>>> For now you can disable the qgroup as a workaround.
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>> I've never enabled quotas and my most recent attempt using the single profile for data was on kernel 6.4 so they would have been disabled by default. Running "btrfs qgroup show [path]" returns "ERROR: can't list qgroups: quotas not enabled".
>>>
>>
>> OK, at least we can rule out qgroup.
>>
>> Mind to provide more info? Including:
>>
>> - How many files are involved?
>> A large file vs a ton of small files have very different workloads.
>> Any values on the average file size would also help.
>>
>> - Is the fs using v1 or v2 space cache?
>> - Do the deleted files have any snapshot/reflink?
>> - Is there any other processes reading the to-be-deleted files?
>>
>> One of my concern is the btrfs_extent_map usage, that's mostly used by
>> regular files as an in-memory cache so that they don't need to lookup
>> the tree on-disk.
>>
>> I just checked the code, evicting an inode won't trigger
>> btrfs_extent_map usage, it's mostly read/write triggering such
>> btrfs_extent_map usage.
>>
>> Thus there must be something else causing the unexpected
>> btrfs_extent_map usage.
>>
>> Thanks,
>> Qu
>>
>>>
>>> Sincerely,
>>> David
>>>
> On my latest attempt using the single profile there is about fifteen terabytes total of space used, around eight hundred and fifty thousand files, over 9000 directories, and there are three very large files (two two terabyte and one four terabyte). There are also about two terabytes of compressed files using zstd at a fifty percent ratio.
>
> The device is using space cache version two, there are no reflink or snapshots as far as I know and nothing else is reading or happening when this occurs. The system idles at about three hundred megabytes of memory used with negligible cpu activity before this happens.
>
> For some context the device is currently mounted with compress-force=zstd:3 and noatime. The data currently on the device was transferred via send-receive version two (and was already compressed) as a snapshot but it is the only copy of it on the disk so I am not sure if that counts as a snapshot. I do not think the snapshot is related because I have deleted a single four terabyte file (from the snapshot) as a test and the memory usage went from about three hundred megabytes to over a gigabyte before going back down. I assume that was the same thing but the system just did not run out of memory.
>
> Sincerely,
> David
>
>
To follow up on this I've tried creating a ten terabyte file then deleting it then tried creating approximately ten terabytes of files randomly between one and thirty two megabytes then deleting that folder. I tried this both at the root of the btrfs device and inside a subvolume. Each trial did increase the memory usage by up to one gigabyte at points but did not cause the system to run out of memory.

I still believe the cause is that requests are being queued faster than they're completed until there is no memory left so my current thought is that this either has something to do with nested directories or my real backup is significantly more fragmented. I think either of those possibilities might cause significantly more  seeks for the harddrives and slow down how fast operations are completed causing them to pile up.

I might try to put together something to make nested directories with lots of small files and delete that but otherwise I am out of ideas (I cannot think how I could properly replicate fragmentation easily). If you have any thoughts or things you think it'd be worthwhile to test I would love to hear them.

Sincerely,
David

next prev parent reply	other threads:[~2023-09-27  3:28 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-13  2:28 Deleting large amounts of data causes system freeze due to OOM fdavidl073rnovn
2023-09-13  5:55 ` Qu Wenruo
2023-09-14  3:38   ` fdavidl073rnovn
2023-09-14  5:12     ` Qu Wenruo
2023-09-14 23:08       ` fdavidl073rnovn
2023-09-27  1:46         ` fdavidl073rnovn [this message]
2023-09-27  4:53           ` Qu Wenruo
2023-09-28 23:32             ` fdavidl073rnovn
2023-09-29  1:01               ` Qu Wenruo
2023-10-13 22:28                 ` fdavidl073rnovn
2023-10-13 22:32                   ` Qu Wenruo
2023-10-14 19:09                     ` Chris Murphy
2023-10-14 22:10                       ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=NfJJCdh--3-9@tutanota.com \
    --to=fdavidl073rnovn@tutanota.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.