From: Dimitrios Apostolou <jimis@gmx.net>
To: linux-btrfs@vger.kernel.org
Subject: (PING) btrfs sequential 8K read()s from compressed files are not merging
Date: Wed, 26 Jul 2023 12:59:48 +0200 (CEST) [thread overview]
Message-ID: <fd0bbbc3-4a42-3472-dc6e-5a1cb51df10e@gmx.net> (raw)
In-Reply-To: <4b16bd02-a446-8000-b10e-4b24aaede854@gmx.net>
Any feedback? Is this a bug? I verified that others see the same slow read
speads from compressed files when the block size is small.
P.S. Is there a bugtracker to report btrfs bugs? My understanding is that
neither kernel's bugzilla nor github issues are endorsed.
On Mon, 17 Jul 2023, Dimitrios Apostolou wrote:
> Ping, any feedback on this issue?
>
> Sorry if I was not clear, the problem here is that the filesystem is very
> slow (10-20 MB/s on the device) in sequential reads from compressed files,
> when the block size is 8K.
>
> It looks like a bug to me (read requests are not merging, i.e. no read-ahead
> is happening). Any opinions?
>
>
> On Mon, 10 Jul 2023, Dimitrios Apostolou wrote:
>
>> Hello list,
>>
>> I discovered this issue because of very slow sequential read speed in
>> Postgresql, which performs all reads using blocking pread() calls of 8192
>> size (postgres' default page size). I verified reads are similarly slow
>> when I read files using dd bs=8k. Here are my measurements:
>>
>> Reading a 1GB postgres file using dd (which uses read() internally) in 8K
>> and 32K chunks:
>>
>> # dd if=4156889.4 of=/dev/null bs=8k
>> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.18829 s, 174 MB/s
>>
>> # dd if=4156889.4 of=/dev/null bs=8k # 2nd run, data is cached
>> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.287623 s, 3.7 GB/s
>>
>> # dd if=4156889.8 of=/dev/null bs=32k
>> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.02688 s, 1.0 GB/s
>>
>> # dd if=4156889.8 of=/dev/null bs=32k # 2nd run, data is cached
>> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.264049 s, 4.1 GB/s
>>
>> Notice that the read rate (after transparent decompression) with bs=8k is
>> 174MB/s (I see ~20MB/s on the device), slow and similar to what Postgresql
>> does. With bs=32k the rate increases to 1GB/s (I see ~80MB/s on the
>> device, but the time is very short to register properly). The device limit
>> is 1GB/s, of course I'm not expecting to reach this while decompressing.
>> The cached reads are fast in both cases, I'm guessing the kernel
>> buffercache contains the decompressed blocks.
>>
>> The above results have been verified with multiple runs. The kernel is
>> 5.15 Ubuntu LTS and the block device is an LVM logical volume on a high
>> performance DAS system, but I verified the same behaviour on a separate
>> system with kernel 6.3.9 and btrfs directly on a local spinning disk.
>> Btrfs filesystem is mounted with compress=zstd:3 and the files have been
>> defragmented prior to running the commands.
>>
>> Focusing on the cold cache cases, iostat gives interesting insight: For
>> both postgres doing sequential scan and for dd with bs=8k, the kernel
>> block layer does not appear to merge the I/O requests. `iostat -x` shows
>> 16 (sectors?) average read request size, 0 merged requests, and very high
>> reads/s IOPS number.
>>
>> The dd commands with bs=32k block size show fewer IOPS on `iostat -x`,
>> higher speed, larger average block size and high number of merged
>> requests. To me it appears as btrfs is doing read-ahead only when the
>> read block is large.
>>
>> Example output for some random second out of dd bs=8k:
>>
>> Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz
>> sdc 1313.00 20.93 2.00 0.15 0.53 16.32
>>
>> with dd bs=32k:
>>
>> Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz
>> sdc 290.00 76.44 4528.00 93.98 1.71 269.92
>>
>> *On the same filesystem, doing dd bs=8k reads from a file that has not
>> been compressed by the filesystem I get 1GB/s throughput, which is the
>> limit of my device. This is what makes me believe it's an issue with btrfs
>> compression.*
>>
>> Is this a bug or known behaviour?
>>
>> Thanks in advance,
>> Dimitris
>>
>>
>>
>
>
next prev parent reply other threads:[~2023-07-26 10:59 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-10 18:56 btrfs sequential 8K read()s from compressed files are not merging Dimitrios Apostolou
2023-07-17 14:11 ` Dimitrios Apostolou
2023-07-26 10:59 ` Dimitrios Apostolou [this message]
2023-07-26 12:54 ` (PING) " Christoph Hellwig
2023-07-26 13:44 ` Dimitrios Apostolou
2023-08-29 13:02 ` Dimitrios Apostolou
2023-08-30 11:54 ` Qu Wenruo
2023-08-30 18:18 ` Dimitrios Apostolou
2023-08-31 0:22 ` Anand Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fd0bbbc3-4a42-3472-dc6e-5a1cb51df10e@gmx.net \
--to=jimis@gmx.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).