public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@kernel.org>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH RFC 0/2] btrfs: remove the metadata readahead mechanism
Date: Tue, 7 Dec 2021 11:56:10 +0000	[thread overview]
Message-ID: <Ya9L2qSe+XKgtesq@debian9.Home> (raw)
In-Reply-To: <e019c8d6-4d59-4559-b56a-73dd2276903c@gmx.com>

On Tue, Dec 07, 2021 at 07:43:49PM +0800, Qu Wenruo wrote:
> 
> 
> On 2021/12/7 19:02, Filipe Manana wrote:
> > On Tue, Dec 07, 2021 at 03:43:58PM +0800, Qu Wenruo wrote:
> > > This is originally just my preparation for scrub refactors, but when the
> > > readahead is involved, it won't just be a small cleanup.
> > > 
> > > The metadata readahead code is introduced in 2011 (surprisingly, the
> > > commit message even contains changelog), but now only one user for it,
> > > and even for the only one user, the readahead mechanism can't provide
> > > much help in fact.
> > > 
> > > Scrub needs readahead for commit root, but the existing one can only do
> > > current root readahead.
> > 
> > If support for the commit root is added, is there a noticeable speedup?
> > Have you tested that?
> > 
> 
> Will craft a benchmark for that.
> 
> Although I don't have any HDD available for benchmark, thus would only
> have result from SATA SSD.
> 
> > > 
> > > And the code is at a very bad layer inside btrfs, all metadata are at
> > > btrfs logical address space, but the readahead is kinda working at
> > > device layer (to manage the in-flight readahead).
> > > 
> > > Personally speaking, I don't think such "optimization" is really even
> > > needed, since we have better way like setting IO priority.
> > 
> > Have you done any benchmarks?
> > How? On physical machines or VMs?
> > 
> > Please include such details in the changelogs.
> > 
> > > 
> > > I really prefer to let the professional block layer guys do whatever
> > > they are good at (and in fact, those block layer guys rock!).
> > > Immature optimization is the cause of bugs, and it has already caused
> > > several bugs recently.
> > > 
> > > Nowadays we have btrfs_path::reada to do the readahead, I doubt if we
> > > really need such facility.
> > 
> > btrfs_path:reada is important and it makes a difference.
> > I recently changed send to use it, and benchmarks can be found in the
> > changelogs.
> 
> For the "such facility" I mean the btrfs_reada_add() facility, not the
> btrfs_path::reada one.
> 
> > 
> > There are also other places where it makes a difference, such as when
> > reading a large chunk tree during mount or when reading a large directory.
> > 
> > It's all about reading other leaves/nodes in the background that will be
> > needed in the near future while the task is doing something else. Even if
> > the nodes/leaves are not physically contiguous on disk (that's the main
> > reason why the mechanism exists).
> 
> Unfortunately, not really in the background.
> 
> For scrub usage, it kicks readahead and wait for it, not really doing it
> in the background.
> 
> (Nor btrfs_path::reada either though, btrfs_path reada also happens at
> tree search time, and it's synchronous).

No, the btrfs_path::reada mechanism is not synchronous - it does not wait
for the reads (IO) to complete.

btrfs_readahead_node_child() triggers a read for the extent buffer's
pages but does not wait for the reads to complete. I.e. we end up calling:

    read_extent_buffer_pages(eb, WAIT_NONE, 0);

So it does not wait on the read bios to complete.
Otherwise that would be pointless.

> 
> Another reason why the existing btrfs_reada_add() facility is not
> suitable for scrub is, our default tree block size is way larger than
> the scrub data length.
> 
> The current data length is 16 pages (64K), while even one 16K leaf can
> contain at least csum for 8M (CRC32) or 1M (SHA256).
> This means for most readahead, it doesn't make much sense as it won't
> cross leaf boundaries that frequently.
> 
> (BTW, in this particular case, btrfs_path::reada may perform better than
> the start/end based reada, as that would really do some readahead)
> 
> Anyway, only benchmark can prove whether I'm correct or wrong.

Yep, and preferably on a spinning disk and bare metal (no VM).

Thanks.

> 
> Thanks,
> Qu
> > 
> > > 
> > > So here I purpose to completely remove the old and under utilized
> > > metadata readahead system.
> > > 
> > > Qu Wenruo (2):
> > >    btrfs: remove the unnecessary path parameter for scrub_raid56_parity()
> > >    btrfs: remove reada mechanism
> > > 
> > >   fs/btrfs/Makefile      |    2 +-
> > >   fs/btrfs/ctree.h       |   25 -
> > >   fs/btrfs/dev-replace.c |    5 -
> > >   fs/btrfs/disk-io.c     |   20 +-
> > >   fs/btrfs/extent_io.c   |    3 -
> > >   fs/btrfs/reada.c       | 1086 ----------------------------------------
> > >   fs/btrfs/scrub.c       |   64 +--
> > >   fs/btrfs/super.c       |    1 -
> > >   fs/btrfs/volumes.c     |    7 -
> > >   fs/btrfs/volumes.h     |    7 -
> > >   10 files changed, 17 insertions(+), 1203 deletions(-)
> > >   delete mode 100644 fs/btrfs/reada.c
> > > 
> > > --
> > > 2.34.1
> > > 

  reply	other threads:[~2021-12-07 11:56 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-07  7:43 [PATCH RFC 0/2] btrfs: remove the metadata readahead mechanism Qu Wenruo
2021-12-07  7:43 ` [PATCH RFC 1/2] btrfs: remove the unnecessary path parameter for scrub_raid56_parity() Qu Wenruo
2021-12-07  7:44 ` [PATCH RFC 2/2] btrfs: remove reada mechanism Qu Wenruo
2021-12-07 11:02 ` [PATCH RFC 0/2] btrfs: remove the metadata readahead mechanism Filipe Manana
2021-12-07 11:43   ` Qu Wenruo
2021-12-07 11:56     ` Filipe Manana [this message]
2021-12-07 12:01       ` Qu Wenruo
2021-12-07 14:53         ` David Sterba
2021-12-07 15:40           ` David Sterba
2021-12-07 15:53             ` Filipe Manana
2021-12-08  0:08               ` Qu Wenruo
2021-12-08 14:04               ` David Sterba
2021-12-09 10:25                 ` Filipe Manana
2021-12-09 13:25                   ` Qu Wenruo
2021-12-09 14:33                     ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ya9L2qSe+XKgtesq@debian9.Home \
    --to=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox