public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: "Steinar H. Gunderson" <steinar+kernel@gunderson.no>
Cc: Dave Chinner <david@fromorbit.com>, linux-xfs@vger.kernel.org
Subject: Re: Slow deduplication
Date: Mon, 3 Mar 2025 06:03:07 -0800	[thread overview]
Message-ID: <Z8W2m8U9uniM8AAc@infradead.org> (raw)
In-Reply-To: <20250302214933.dkp743wxlo624aj7@sesse.net>

On Sun, Mar 02, 2025 at 10:49:33PM +0100, Steinar H. Gunderson wrote:
> On Mon, Mar 03, 2025 at 08:35:57AM +1100, Dave Chinner wrote:
> > This does comparison one folio at a time and does no readahead.
> > Hence if the data isn't already in cache, it is doing synchronous
> > small reads and waiting for every single one of them. This really
> > should use an internal interface that is capable of issuing
> > readahead...
> 
> Yes, I noticed that if I do dummy read() of each extent first,
> it becomes _massively_ faster. I'm not sure if I trust posix_fadvise()
> to just to MADV_WILLNEED given the manpage; would it work (and give
> roughly the same readahead that read() seems to be doing)?

The right thing to do it to just issue readahead in
vfs_dedupe_file_range_compare.  The ractl structure is a bit odd so
it'll need slightky more careful thoughts than just a hacked up
one-liner, but it should still be realtively simple.  I can look into
it once I find a little time if no one beats me to it.


  reply	other threads:[~2025-03-03 14:03 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-02  8:47 Slow deduplication Steinar H. Gunderson
2025-03-02 21:35 ` Dave Chinner
2025-03-02 21:49   ` Steinar H. Gunderson
2025-03-03 14:03     ` Christoph Hellwig [this message]
2025-03-06  0:35       ` Christoph Hellwig
2025-03-06  8:17         ` Steinar H. Gunderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z8W2m8U9uniM8AAc@infradead.org \
    --to=hch@infradead.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=steinar+kernel@gunderson.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox