From: Dimitrios Apostolou <jimis@gmx.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: btrfs sequential 8K read()s from compressed files are not merging
Date: Mon, 17 Jul 2023 16:11:09 +0200 (CEST) [thread overview]
Message-ID: <4b16bd02-a446-8000-b10e-4b24aaede854@gmx.net> (raw)
In-Reply-To: <0db91235-810e-1c6e-7192-48f698c55c59@gmx.net>
Ping, any feedback on this issue?
Sorry if I was not clear, the problem here is that the filesystem is very
slow (10-20 MB/s on the device) in sequential reads from compressed
files, when the block size is 8K.
It looks like a bug to me (read requests are not merging, i.e. no
read-ahead is happening). Any opinions?
On Mon, 10 Jul 2023, Dimitrios Apostolou wrote:
> Hello list,
>
> I discovered this issue because of very slow sequential read speed in
> Postgresql, which performs all reads using blocking pread() calls of 8192
> size (postgres' default page size). I verified reads are similarly slow when
> I read files using dd bs=8k. Here are my measurements:
>
> Reading a 1GB postgres file using dd (which uses read() internally) in 8K and
> 32K chunks:
>
> # dd if=4156889.4 of=/dev/null bs=8k
> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.18829 s, 174 MB/s
>
> # dd if=4156889.4 of=/dev/null bs=8k # 2nd run, data is cached
> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.287623 s, 3.7 GB/s
>
> # dd if=4156889.8 of=/dev/null bs=32k
> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.02688 s, 1.0 GB/s
>
> # dd if=4156889.8 of=/dev/null bs=32k # 2nd run, data is cached
> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.264049 s, 4.1 GB/s
>
> Notice that the read rate (after transparent decompression) with bs=8k is
> 174MB/s (I see ~20MB/s on the device), slow and similar to what Postgresql
> does. With bs=32k the rate increases to 1GB/s (I see ~80MB/s on the device,
> but the time is very short to register properly). The device limit is 1GB/s,
> of course I'm not expecting to reach this while decompressing. The cached
> reads are fast in both cases, I'm guessing the kernel buffercache contains
> the decompressed blocks.
>
> The above results have been verified with multiple runs. The kernel is 5.15
> Ubuntu LTS and the block device is an LVM logical volume on a high
> performance DAS system, but I verified the same behaviour on a separate
> system with kernel 6.3.9 and btrfs directly on a local spinning disk. Btrfs
> filesystem is mounted with compress=zstd:3 and the files have been
> defragmented prior to running the commands.
>
> Focusing on the cold cache cases, iostat gives interesting insight: For both
> postgres doing sequential scan and for dd with bs=8k, the kernel block layer
> does not appear to merge the I/O requests. `iostat -x` shows 16 (sectors?)
> average read request size, 0 merged requests, and very high reads/s IOPS
> number.
>
> The dd commands with bs=32k block size show fewer IOPS on `iostat -x`, higher
> speed, larger average block size and high number of merged requests. To me
> it appears as btrfs is doing read-ahead only when the read block is large.
>
> Example output for some random second out of dd bs=8k:
>
> Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz
> sdc 1313.00 20.93 2.00 0.15 0.53 16.32
>
> with dd bs=32k:
>
> Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz
> sdc 290.00 76.44 4528.00 93.98 1.71 269.92
>
> *On the same filesystem, doing dd bs=8k reads from a file that has not been
> compressed by the filesystem I get 1GB/s throughput, which is the limit of my
> device. This is what makes me believe it's an issue with btrfs compression.*
>
> Is this a bug or known behaviour?
>
> Thanks in advance,
> Dimitris
>
>
>
next prev parent reply other threads:[~2023-07-17 14:11 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-10 18:56 btrfs sequential 8K read()s from compressed files are not merging Dimitrios Apostolou
2023-07-17 14:11 ` Dimitrios Apostolou [this message]
2023-07-26 10:59 ` (PING) " Dimitrios Apostolou
2023-07-26 12:54 ` Christoph Hellwig
2023-07-26 13:44 ` Dimitrios Apostolou
2023-08-29 13:02 ` Dimitrios Apostolou
2023-08-30 11:54 ` Qu Wenruo
2023-08-30 18:18 ` Dimitrios Apostolou
2023-08-31 0:22 ` Anand Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4b16bd02-a446-8000-b10e-4b24aaede854@gmx.net \
--to=jimis@gmx.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).