linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Cc: Jan Kara <jack@suse.cz>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Ritesh Harjani <ritesh.list@gmail.com>,
	Andreas Dilger <adilger@dilger.ca>
Subject: Re: [RFC 08/11] ext4: Don't skip prefetching BLOCK_UNINIT groups
Date: Sat, 25 Mar 2023 23:54:02 -0400	[thread overview]
Message-ID: <20230326035402.GA323408@mit.edu> (raw)
In-Reply-To: <ZBRHCHySeQ0KC/f7@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com>

On Fri, Mar 17, 2023 at 04:25:04PM +0530, Ojaswin Mujoo wrote:
> > > This improves the accuracy of CR0/1 allocation as earlier, we could have
> > > essentially empty BLOCK_UNINIT groups being ignored by CR0/1 due to their buddy
> > > not being initialized, leading to slower CR2 allocations. With this patch CR0/1
> > > will be able to discover these groups as well, thus improving performance.
> >
> > The patch looks good. I just somewhat wonder - this change may result in
> > uninitialized groups being initialized and used earlier (previously we'd
> > rather search in other already initialized groups) which may spread
> > allocations more. But I suppose that's fine and uninit groups are not
> > really a feature meant to limit fragmentation and as the filesystem ages
> > the differences should be minimal. So feel free to add:
> 
> Another point I wanted to discuss wrt this patch series was why were the
> BLOCK_UNINIT groups not being prefetched earlier. One point I can think
> of is that this might lead to memory pressure when we have too many
> empty BGs in a very large (say terabytes) disk.

Originally the prefetch logic was simply something to optimize I/O ---
that is, normally, all of the block bitmaps for a flex_bg are
contiguous, so why not just read them all in a single I/O which is
issued all at once, instead of doing them as separate 4k reads.

Skipping block groups that hadn't yet been prefetched was something
which was added later, in order to improve performance of the
allocator for freshly mounted file systems where the prefetch hadn't
yet had a chance to pull in block bitmaps; the problem was that if the
block groups hadn't been prefetch yet, then the cr0 scan would fetch
them, and if you have a storage device where blocks with monotonically
increasing LBA numbers aren't necessarily stored adjacently on disk
(for example, on a dm-thin volume, but if one were to do an experiment
on certain emulated block devices on certain hyperscalar cloud
environments, one might find a similar performance profile), resulting
in a cr0 scan potentially issuing a series of 16 sequential 4k I/O's,
that could be substantially worse from a performance standpoint than
doing a single squential 64k I/O.

When this change was made, the focus was on *initialized* bitmaps
taking a long time if they were issued as individual sequential 4k
I/O's; the fix was to skip scanning them initially, since the hope was
that the prefetch would pull them in fairly quickly, and a few bad
allocations when the file system was freshly mounted was an acceptable
tradeoff.

But prefetching prefetching BLOCK_UNINIT groups makes sense, that
should fix the problem that you've identified (at least for
BLOCK_UNINIT groups; for initialized block bitmaps, we'll still have
less optimal allocation patterns until we've managed to prefetch those
block groups).

Cheers,

					0 Ted

  parent reply	other threads:[~2023-03-26  3:54 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-27 12:37 [RFC 00/11] multiblock allocator improvements Ojaswin Mujoo
2023-01-27 12:37 ` [RFC 01/11] ext4: mballoc: Remove useless setting of ac_criteria Ojaswin Mujoo
2023-03-09 11:36   ` Jan Kara
2023-01-27 12:37 ` [RFC 02/11] ext4: Remove unused extern variables declaration Ojaswin Mujoo
2023-03-09 11:37   ` Jan Kara
2023-01-27 12:37 ` [RFC 03/11] ext4: mballoc: Fix getting the right group desc in ext4_mb_prefetch_fini Ojaswin Mujoo
2023-03-09 11:42   ` Jan Kara
2023-01-27 12:37 ` [RFC 04/11] ext4: Convert mballoc cr (criteria) to enum Ojaswin Mujoo
2023-03-09 12:11   ` Jan Kara
2023-03-17 10:26     ` Ojaswin Mujoo
2023-03-23 10:55       ` Jan Kara
2023-03-25 14:42         ` Ojaswin Mujoo
2023-04-20  6:32           ` Ojaswin Mujoo
2023-04-20 14:58             ` Jan Kara
2023-01-27 12:37 ` [RFC 05/11] ext4: Add per CR extent scanned counter Ojaswin Mujoo
2023-03-09 12:14   ` Jan Kara
2023-01-27 12:37 ` [RFC 06/11] ext4: Add counter to track successful allocation of goal length Ojaswin Mujoo
2023-03-09 12:17   ` Jan Kara
2023-01-27 12:37 ` [RFC 07/11] ext4: Avoid scanning smaller extents in BG during CR1 Ojaswin Mujoo
2023-03-09 12:20   ` Jan Kara
2023-01-27 12:37 ` [RFC 08/11] ext4: Don't skip prefetching BLOCK_UNINIT groups Ojaswin Mujoo
2023-03-09 14:14   ` Jan Kara
2023-03-17 10:55     ` Ojaswin Mujoo
2023-03-23 10:57       ` Jan Kara
2023-03-25 14:43         ` Ojaswin Mujoo
2023-03-26  3:54       ` Theodore Ts'o [this message]
2023-01-27 12:37 ` [RFC 09/11] ext4: Ensure ext4_mb_prefetch_fini() is called for all prefetched BGs Ojaswin Mujoo
2023-03-09 14:23   ` Jan Kara
2023-01-27 12:37 ` [RFC 10/11] ext4: Abstract out logic to search average fragment list Ojaswin Mujoo
2023-03-09 14:25   ` Jan Kara
2023-01-27 12:37 ` [RFC 11/11] ext4: Add allocation criteria 1.5 (CR1_5) Ojaswin Mujoo
2023-03-09 15:06   ` Jan Kara
2023-03-17 11:37     ` Ojaswin Mujoo
2023-03-23 11:05       ` Jan Kara
2023-03-25 14:46         ` Ojaswin Mujoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230326035402.GA323408@mit.edu \
    --to=tytso@mit.edu \
    --cc=adilger@dilger.ca \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).