linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH v6 0/7] Block Allocator Improvements
Date: Fri, 9 Apr 2021 12:36:03 -0400	[thread overview]
Message-ID: <YHCCc5UKANOd2VbQ@mit.edu> (raw)
In-Reply-To: <20210401172129.189766-1-harshadshirwadkar@gmail.com>

Thanks, I've applied this patch series into the ext4 git tree.

	     	     	  	- Ted

On Thu, Apr 01, 2021 at 10:21:22AM -0700, Harshad Shirwadkar wrote:
> This patch series improves cr 0 and cr 1 passes of the allocator
> signficantly. Currently, at cr 0 and 1, we perform linear lookups to
> find the matching groups. That's very inefficient for large file
> systems where there are millions of block groups. At cr 0, we only
> care about the groups that have the largest free order >= the
> request's order and at cr 1 we only care about groups where average
> fragment size > the request size. so, this patchset introduces new
> data structures that allow us to perform cr 0 lookup in constant time
> and cr 1 lookup in log (number of groups) time instead of linear.
> 
> For cr 0, we add a list for each order and all the groups are enqueued
> to the appropriate list based on the largest free order in its buddy
> bitmap. This allows us to lookup a match at cr 0 in constant time.
> 
> For cr 1, we add a new rb tree of groups sorted by largest fragment
> size. This allows us to lookup a match for cr 1 in log (num groups)
> time.
> 
> These optimizations can be enabled by passing "mb_optimize_scan" mount
> option.
> 
> These changes may result in allocations to be spread across the block
> device. While that would not matter some block devices (such as flash)
> it may be a cause of concern for other block devices that benefit from
> storing related content togetther such as disk. However, it can be
> argued that in high fragmentation scenrio, especially for large disks,
> it's still worth optimizing the scanning since in such cases, we get
> cpu bound on group scanning instead of getting IO bound. Perhaps, in
> future, we could dynamically turn this new optimization on based on
> fragmentation levels for such devices.
> 
> Verified that there are no regressions in smoke tests (-g quick -c 4k).
> 
> Also, to demonstrate the effectiveness for the patch series, following
> experiment was performed:
> 
> Created a highly fragmented disk of size 65TB. The disk had no
> contiguous 2M regions. Following command was run consecutively for 3
> times:
> 
> time dd if=/dev/urandom of=file bs=2M count=10
> 
> Here are the results with and without cr 0/1 optimizations:
> 
> |---------+------------------------------+---------------------------|
> |         | Without CR 0/1 Optimizations | With CR 0/1 Optimizations |
> |---------+------------------------------+---------------------------|
> | 1st run | 5m1.871s                     | 2m47.642s                 |
> | 2nd run | 2m28.390s                    | 0m0.611s                  |
> | 3rd run | 2m26.530s                    | 0m1.255s                  |
> |---------+------------------------------+---------------------------|
> 
> The patch [3/6] "ext4: add mballoc stats proc file" is a modified
> version of the patch originally written by Artem Blagodarenko
> (artem.blagodarenko@gmail.com). With that patch, I ran following
> command with and without optimizations.
> 
> dd if=/dev/zero of=/mnt/file bs=2M count=2 conv=fsync
> 
> Without optimizations:
> 
> useless_c0_loops: 3
> useless_c1_loops: 39
> useless_c2_loops: 0
> useless_c3_loops: 0
> 
> With optimizations:
> 
> useless_c0_loops: 0
> useless_c1_loops: 0
> useless_c2_loops: 0
> useless_c3_loops: 0
> 
> This shows that CR0 and CR1 optimizations get rid of useless CR0 and
> CR1 loops altogether thereby significantly reducing the number of
> groups that get considered.
> 
> Changes from V5:
> ----------------
> - Turned block bitmap prefetching on by default
> - Fixed a bug where for cr >= 2, we were skipping first group without
>   searching in it
> - Renamed mb_linear_limit to mb_max_linear_groups
> 
> Harshad Shirwadkar (7):
>   ext4: drop s_mb_bal_lock and convert protected fields to atomic
>   ext4: add ability to return parsed options from parse_options
>   ext4: add mballoc stats proc file
>   ext4: add MB_NUM_ORDERS macro
>   ext4: improve cr 0 / cr 1 group scanning
>   ext4: add proc files to monitor new structures
>   ext4: make prefetch_block_bitmaps default
> 
>  fs/ext4/ext4.h    |  34 ++-
>  fs/ext4/mballoc.c | 590 +++++++++++++++++++++++++++++++++++++++++++---
>  fs/ext4/mballoc.h |  22 +-
>  fs/ext4/super.c   |  92 +++++---
>  fs/ext4/sysfs.c   |   6 +
>  5 files changed, 680 insertions(+), 64 deletions(-)
> 
> -- 
> 2.31.0.291.g576ba9dcdaf-goog
> 

      parent reply	other threads:[~2021-04-09 16:36 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01 17:21 [PATCH v6 0/7] Block Allocator Improvements Harshad Shirwadkar
2021-04-01 17:21 ` [PATCH v6 1/7] ext4: drop s_mb_bal_lock and convert protected fields to atomic Harshad Shirwadkar
2021-04-01 17:21 ` [PATCH v6 2/7] ext4: add ability to return parsed options from parse_options Harshad Shirwadkar
2021-04-01 17:21 ` [PATCH v6 3/7] ext4: add mballoc stats proc file Harshad Shirwadkar
2021-04-01 17:21 ` [PATCH v6 4/7] ext4: add MB_NUM_ORDERS macro Harshad Shirwadkar
2021-04-01 17:21 ` [PATCH v6 5/7] ext4: improve cr 0 / cr 1 group scanning Harshad Shirwadkar
2021-04-01 17:21 ` [PATCH v6 6/7] ext4: add proc files to monitor new structures Harshad Shirwadkar
2021-04-01 17:21 ` [PATCH v6 7/7] ext4: make prefetch_block_bitmaps default Harshad Shirwadkar
2021-04-02  5:16   ` Andreas Dilger
2021-04-02 16:46     ` harshad shirwadkar
2021-04-02 17:39       ` Andreas Dilger
2021-04-09 16:36 ` Theodore Ts'o [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YHCCc5UKANOd2VbQ@mit.edu \
    --to=tytso@mit.edu \
    --cc=harshadshirwadkar@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).