From: Baokun Li <libaokun1@huawei.com>
To: Andi Kleen <ak@linux.intel.com>
Cc: <linux-ext4@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 01/17] ext4: add ext4_try_lock_group() to skip busy groups
Date: Sat, 19 Jul 2025 08:29:37 +0800 [thread overview]
Message-ID: <d87eab9a-8224-477f-ae81-d4f205ee78b6@huawei.com> (raw)
In-Reply-To: <87pldy78qc.fsf@linux.intel.com>
On 2025/7/18 6:28, Andi Kleen wrote:
> Baokun Li <libaokun1@huawei.com> writes:
>
>> When ext4 allocates blocks, we used to just go through the block groups
>> one by one to find a good one. But when there are tons of block groups
>> (like hundreds of thousands or even millions) and not many have free space
>> (meaning they're mostly full), it takes a really long time to check them
>> all, and performance gets bad. So, we added the "mb_optimize_scan" mount
>> option (which is on by default now). It keeps track of some group lists,
>> so when we need a free block, we can just grab a likely group from the
>> right list. This saves time and makes block allocation much faster.
>>
>> But when multiple processes or containers are doing similar things, like
>> constantly allocating 8k blocks, they all try to use the same block group
>> in the same list. Even just two processes doing this can cut the IOPS in
>> half. For example, one container might do 300,000 IOPS, but if you run two
>> at the same time, the total is only 150,000.
>>
>> Since we can already look at block groups in a non-linear way, the first
>> and last groups in the same list are basically the same for finding a block
>> right now. Therefore, add an ext4_try_lock_group() helper function to skip
>> the current group when it is locked by another process, thereby avoiding
>> contention with other processes. This helps ext4 make better use of having
>> multiple block groups.
> It seems this makes block allocation non deterministic, but depend on
> the system load. I can see where this could cause problems when
> reproducing bugs at least, but perhaps also in other cases.
>
> Better perhaps just round robin the groups?
> Or at least add a way to turn it off.
>
> -Andi
>
As Ted mentioned, Ext4 has never guaranteed deterministic allocation. We
do attempt a predetermined goal in ext4_mb_find_by_goal(), and this part
has no trylock logic, meaning we'll always attempt to scan the target
group once—that's deterministic.
However, if the target attempt fails, the primary goal for subsequent
allocation is to find suitable free space as quickly as possible, so
there's no need to contend with other processes for non-target groups.
Cheers,
Baokun
next prev parent reply other threads:[~2025-07-19 0:29 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-14 13:03 [PATCH v3 00/17] ext4: better scalability for ext4 block allocation Baokun Li
2025-07-14 13:03 ` [PATCH v3 01/17] ext4: add ext4_try_lock_group() to skip busy groups Baokun Li
2025-07-17 10:09 ` Ojaswin Mujoo
2025-07-19 0:37 ` Baokun Li
2025-07-17 22:28 ` Andi Kleen
2025-07-18 3:09 ` Theodore Ts'o
2025-07-19 0:29 ` Baokun Li [this message]
2025-07-22 20:59 ` Andi Kleen
2025-07-14 13:03 ` [PATCH v3 02/17] ext4: separate stream goal hits from s_bal_goals for better tracking Baokun Li
2025-07-17 10:29 ` Ojaswin Mujoo
2025-07-19 1:37 ` Baokun Li
2025-07-14 13:03 ` [PATCH v3 03/17] ext4: remove unnecessary s_mb_last_start Baokun Li
2025-07-17 10:31 ` Ojaswin Mujoo
2025-07-14 13:03 ` [PATCH v3 04/17] ext4: remove unnecessary s_md_lock on update s_mb_last_group Baokun Li
2025-07-17 13:36 ` Ojaswin Mujoo
2025-07-19 1:54 ` Baokun Li
2025-07-14 13:03 ` [PATCH v3 05/17] ext4: utilize multiple global goals to reduce contention Baokun Li
2025-07-14 13:03 ` [PATCH v3 06/17] ext4: get rid of some obsolete EXT4_MB_HINT flags Baokun Li
2025-07-14 13:03 ` [PATCH v3 07/17] ext4: fix typo in CR_GOAL_LEN_SLOW comment Baokun Li
2025-07-14 13:03 ` [PATCH v3 08/17] ext4: convert sbi->s_mb_free_pending to atomic_t Baokun Li
2025-07-14 13:03 ` [PATCH v3 09/17] ext4: merge freed extent with existing extents before insertion Baokun Li
2025-07-14 13:03 ` [PATCH v3 10/17] ext4: fix zombie groups in average fragment size lists Baokun Li
2025-07-14 13:03 ` [PATCH v3 11/17] ext4: fix largest free orders lists corruption on mb_optimize_scan switch Baokun Li
2025-07-14 13:03 ` [PATCH v3 12/17] ext4: factor out __ext4_mb_scan_group() Baokun Li
2025-07-14 13:03 ` [PATCH v3 13/17] ext4: factor out ext4_mb_might_prefetch() Baokun Li
2025-07-14 13:03 ` [PATCH v3 14/17] ext4: factor out ext4_mb_scan_group() Baokun Li
2025-07-14 13:03 ` [PATCH v3 15/17] ext4: convert free groups order lists to xarrays Baokun Li
2025-07-21 11:07 ` Jan Kara
2025-07-21 12:33 ` Baokun Li
2025-07-21 13:45 ` Baokun Li
2025-07-21 18:01 ` Theodore Ts'o
2025-07-22 5:58 ` Baokun Li
2025-07-24 3:55 ` Guenter Roeck
2025-07-24 4:54 ` Theodore Ts'o
2025-07-24 5:20 ` Guenter Roeck
2025-07-24 11:14 ` Zhang Yi
2025-07-24 14:30 ` Guenter Roeck
2025-07-24 14:54 ` Theodore Ts'o
2025-07-25 2:28 ` Zhang Yi
2025-07-26 0:50 ` Baokun Li
2025-07-14 13:03 ` [PATCH v3 16/17] ext4: refactor choose group to scan group Baokun Li
2025-07-14 13:03 ` [PATCH v3 17/17] ext4: implement linear-like traversal across order xarrays Baokun Li
2025-07-15 1:11 ` [PATCH v3 00/17] ext4: better scalability for ext4 block allocation Zhang Yi
2025-07-19 21:45 ` Theodore Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d87eab9a-8224-477f-ae81-d4f205ee78b6@huawei.com \
--to=libaokun1@huawei.com \
--cc=ak@linux.intel.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).