From: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
To: linux-ext4@vger.kernel.org
Cc: tytso@mit.edu, adilger@dilger.ca,
Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Subject: [PATCH v3 0/5] Improve group scanning in CR 0 and CR 1 passes
Date: Fri, 26 Feb 2021 11:36:07 -0800 [thread overview]
Message-ID: <20210226193612.1199321-1-harshadshirwadkar@gmail.com> (raw)
ext4: improve cr 0 and cr 1 passes
This patch series improves cr 0 and cr 1 passes of the allocator
signficantly. Currently, at cr 0 and 1, we perform linear lookups to
find the matching groups. That's very inefficient for large file
systems where there are millions of block groups. At cr 0, we only
care about the groups that have the largest free order >= the
request's order and at cr 1 we only care about groups where average
fragment size > the request size. so, this patchset introduces new
data structures that allow us to perform cr 0 lookup in constant time
and cr 1 lookup in log (number of groups) time instead of linear.
For cr 0, we add a list for each order and all the groups are enqueued
to the appropriate list based on the largest free order in its buddy
bitmap. This allows us to lookup a match at cr 0 in constant time.
For cr 1, we add a new rb tree of groups sorted by largest fragment
size. This allows us to lookup a match for cr 1 in log (num groups)
time.
These optimizations can be enabled by passing "mb_optimize_scan" mount
option.
These changes may result in allocations to be spread across the block
device. While that would not matter some block devices (such as flash)
it may be a cause of concern for other block devices that benefit from
storing related content togetther such as disk. However, it can be
argued that in high fragmentation scenrio, especially for large disks,
it's still worth optimizing the scanning since in such cases, we get
cpu bound on group scanning instead of getting IO bound. Perhaps, in
future, we could dynamically turn this new optimization on based on
fragmentation levels for such devices.
Verified that there are no regressions in smoke tests (-g quick -c 4k).
Also, to demonstrate the effectiveness for the patch series, following
experiment was performed:
Created a highly fragmented disk of size 65TB. The disk had no
contiguous 2M regions. Following command was run consecutively for 3
times:
time dd if=/dev/urandom of=file bs=2M count=10
Here are the results with and without cr 0/1 optimizations:
|---------+------------------------------+---------------------------|
| | Without CR 0/1 Optimizations | With CR 0/1 Optimizations |
|---------+------------------------------+---------------------------|
| 1st run | 5m1.871s | 2m47.642s |
| 2nd run | 2m28.390s | 0m0.611s |
| 3rd run | 2m26.530s | 0m1.255s |
|---------+------------------------------+---------------------------|
The patch [2/5] "ext4: add mballoc stats proc file" is a modified
version of the patch originally written by Artem Blagodarenko
(artem.blagodarenko@gmail.com). With that patch, I ran following
command with and without optimizations.
dd if=/dev/zero of=/mnt/file bs=2M count=2 conv=fsync
Without optimizations:
mballoc:
reqs: 41
success: 1
groups_scanned: 63
groups_considered: 20643620
extents_scanned: 7851
goal_hits: 0
2^n_hits: 1
breaks: 39
lost: 0
useless_c0_loops: 3
useless_c1_loops: 39
useless_c2_loops: 0
useless_c3_loops: 0
buddies_generated: 491561/491520
buddies_time_used: 13078539152
preallocated: 0
discarded: 0
With optimizations:
mballoc:
reqs: 42
success: 1
groups_scanned: 62
groups_considered: 1011
extents_scanned: 8062
goal_hits: 0
2^n_hits: 0
breaks: 40
lost: 0
useless_c0_loops: 0
useless_c1_loops: 0
useless_c2_loops: 0
useless_c3_loops: 0
buddies_generated: 491561/491520
buddies_time_used: 13165943648
preallocated: 0
discarded: 0
This shows that CR0 and CR1 optimizations get rid of useless CR0 and
CR1 loops altogether thereby significantly reducing the number of
groups that get considered.
Changes from V2:
----------------
- Added mb_linear_limit sysfs tunable that controls how many groups
should the allocator search in linear fashion before consulting the
the new data structures.
- Added following optimizations:
* Full groups are not added to either structures
* MB_OPTIMIZE_SCAN is disabled for small file systems
- Updated documentation in the code
- Made output of mb_structs_summary output file to be YAML compatible
- Small fixes to change location of increment ac_groups_considered
variable and added missed MB_NUM_ORDERS macro
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Harshad Shirwadkar (5):
ext4: drop s_mb_bal_lock and convert protected fields to atomic
ext4: add mballoc stats proc file
ext4: add MB_NUM_ORDERS macro
ext4: improve cr 0 / cr 1 group scanning
ext4: add proc files to monitor new structures
fs/ext4/ext4.h | 24 +-
fs/ext4/mballoc.c | 541 +++++++++++++++++++++++++++++++++++++++++++---
fs/ext4/mballoc.h | 20 ++
fs/ext4/super.c | 6 +-
fs/ext4/sysfs.c | 6 +
5 files changed, 564 insertions(+), 33 deletions(-)
--
2.30.1.766.gb4fecdf3b7-goog
next reply other threads:[~2021-02-26 19:37 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-26 19:36 Harshad Shirwadkar [this message]
2021-02-26 19:36 ` [PATCH v3 1/5] ext4: drop s_mb_bal_lock and convert protected fields to atomic Harshad Shirwadkar
2021-02-26 19:36 ` [PATCH v3 2/5] ext4: add mballoc stats proc file Harshad Shirwadkar
2021-02-26 19:36 ` [PATCH v3 3/5] ext4: add MB_NUM_ORDERS macro Harshad Shirwadkar
2021-02-26 19:36 ` [PATCH v3 4/5] ext4: improve cr 0 / cr 1 group scanning Harshad Shirwadkar
2021-03-01 22:22 ` Andreas Dilger
[not found] ` <CAD+ocbxSp1XrTFhy9UTu+bSxr-XVqhRQzVqy4mtiRHFbDARuhw@mail.gmail.com>
2021-03-02 7:48 ` Andreas Dilger
2021-03-03 11:31 ` Благодаренко Артём
2021-03-04 0:10 ` harshad shirwadkar
2021-02-26 19:36 ` [PATCH v3 5/5] ext4: add proc files to monitor new structures Harshad Shirwadkar
2021-03-01 19:43 ` Andreas Dilger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210226193612.1199321-1-harshadshirwadkar@gmail.com \
--to=harshadshirwadkar@gmail.com \
--cc=adilger@dilger.ca \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).