From: Dave Chinner <david@fromorbit.com>
To: "Steinar H. Gunderson" <steinar+kernel@gunderson.no>
Cc: linux-xfs@vger.kernel.org
Subject: Re: Slow deduplication
Date: Mon, 3 Mar 2025 08:35:57 +1100 [thread overview]
Message-ID: <Z8TPPX3g9rA5XND_@dread.disaster.area> (raw)
In-Reply-To: <20250302084710.3g5ipnj46xxhd33r@sesse.net>
On Sun, Mar 02, 2025 at 09:47:10AM +0100, Steinar H. Gunderson wrote:
> This ioctl call successfully deduplicated the data, but it took 71.52 _seconds_.
> Deduplicating the entire set is on the order of days. I don't understand why
> this would take so much time; I understand that it needs to make a read to
> verify that the file ranges are indeed the same (this is the only sane API
> design!), but it comes out to something like 2800 kB/sec from an array that
> can deliver almost 400 times that. There is no other activity on the file
> system in question, so it should not conflict with other activity (locks
> etc.), and the process does not appear to be taking significant amounts of
> CPU time. iostat shows read activity varying from maybe 300 kB/sec to
> 12000 kB/sec or so; /proc/<pid>/stack says:
>
> [<0>] folio_wait_bit_common+0x174/0x220
> [<0>] filemap_read_folio+0x64/0x8b
> [<0>] do_read_cache_folio+0x119/0x164
> [<0>] __generic_remap_file_range_prep+0x372/0x568
> [<0>] generic_remap_file_range_prep+0x7/0xd
This does comparison one folio at a time and does no readahead.
Hence if the data isn't already in cache, it is doing synchronous
small reads and waiting for every single one of them. This really
should use an internal interface that is capable of issuing
readahead...
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2025-03-02 21:36 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-02 8:47 Slow deduplication Steinar H. Gunderson
2025-03-02 21:35 ` Dave Chinner [this message]
2025-03-02 21:49 ` Steinar H. Gunderson
2025-03-03 14:03 ` Christoph Hellwig
2025-03-06 0:35 ` Christoph Hellwig
2025-03-06 8:17 ` Steinar H. Gunderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z8TPPX3g9rA5XND_@dread.disaster.area \
--to=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
--cc=steinar+kernel@gunderson.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox