From: Dimitrios Apostolou <jimis@gmx.net>
To: Boris Burkov <boris@bur.io>
Cc: Christoph Hellwig <hch@infradead.org>,
linux-btrfs@vger.kernel.org, Qu Wenruo <quwenruo.btrfs@gmx.com>,
Anand Jain <anand.jain@oracle.com>,
Matthew Wilcox <willy@infradead.org>
Subject: Re: Sequential read(8K) from compressed files are very slow
Date: Thu, 5 Jun 2025 19:09:07 +0200 (CEST) [thread overview]
Message-ID: <d934d1ea-4e3e-71ef-8b42-698ccd747799@gmx.net> (raw)
In-Reply-To: <20250604180303.GA978719@zen.localdomain>
Hi Boris, thank you for investigating! I've been chasing this for years
and I was hitting a wall, the bottleneck was not obvious at all when
looking from outside the kernel. I've started a few threads before but
they were fruitless.
On Wed, 4 Jun 2025, Boris Burkov wrote:
>
> stats from an 8K run:
> $ sudo bpftrace readahead.bt
> Attaching 4 probes...
>
> @add_ra_delay_ms: 19450
> @add_ra_delay_ns: 19450937640
> @add_ra_delay_s: 19
>
> @ra_sz_freq[8]: 81920
> @ra_sz_hist:
> [8, 16) 81920 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
>
>
> stats from a 128K run:
> $ sudo bpftrace readahead.bt
> Attaching 4 probes...
>
> @add_ra_delay_ms: 15
> @add_ra_delay_ns: 15333301
> @add_ra_delay_s: 0
>
> @ra_sz_freq[512]: 1
> @ra_sz_freq[256]: 1
> @ra_sz_freq[128]: 2
> @ra_sz_freq[1024]: 2559
> @ra_sz_hist:
> [128, 256) 2 | |
> [256, 512) 1 | |
> [512, 1K) 1 | |
> [1K, 2K) 2559 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
>
>
> so we are spending 19 seconds (vs 0) in add_ra_bio_pages and calling
> btrfs_readahead() 81920 times with 8 pages vs 2559 times with 1024
> pages.
I specifically like the bpftrace utility you are using, it opens up new
possibilities without custom kernel compiles, so I want to experiment.
Could you please include the script you used for this histogram?
>
> The total time difference is ~30s on my setup, so there are still ~10
> seconds unaccounted for in my analysis here, though.
This is outstanding. I expect such improvement will give a *huge* boost to
postgresql workloads on compressed filesystems. By huge I mean 5-10x for
sequential table scans.
I'm also wondering, in the past I was trying to see if it makes any
difference to tweak the setting /sys/block/sdX/queue/read_ahead_kb but
couldn't see any substantial change. Do you see it affecting your results,
with your patch applied? Or is btrfs following different code paths and
completely ignoring that?
>
>>> I removed all the extent locking as an experiment, as it is not really
>>> needed for safety in this single threaded test and did see an
>>> improvement but not full parity between 8k and 128k for the compressed
>>> file. I'll keep poking at the other sources of overhead in the builtin
>>> readahead logic and in calling btrfs_readahead more looping inside it.
Since your findings indicate that the issue is probably lock contention,
you might want to try /proc/lock_stat. It requires a kernel built with
CONFIG_LOCK_STAT, which is what blocks me at the moment, but it might be
easier for you if you already compile it for developing btrfs. Docs at:
https://docs.kernel.org/locking/lockstat.html
Thank you,
Dimitris
next prev parent reply other threads:[~2025-06-05 17:09 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-03 19:56 Sequential read(8K) from compressed files are very slow Dimitrios Apostolou
2025-06-04 1:36 ` Boris Burkov
2025-06-04 6:22 ` Christoph Hellwig
2025-06-04 18:03 ` Boris Burkov
2025-06-04 21:49 ` Boris Burkov
2025-06-05 4:35 ` Christoph Hellwig
2025-06-05 17:09 ` Dimitrios Apostolou [this message]
2025-06-07 0:37 ` Boris Burkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d934d1ea-4e3e-71ef-8b42-698ccd747799@gmx.net \
--to=jimis@gmx.net \
--cc=anand.jain@oracle.com \
--cc=boris@bur.io \
--cc=hch@infradead.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox