linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dimitrios Apostolou <jimis@gmx.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: btrfs sequential 8K read()s from compressed files are not merging
Date: Mon, 17 Jul 2023 16:11:09 +0200 (CEST)	[thread overview]
Message-ID: <4b16bd02-a446-8000-b10e-4b24aaede854@gmx.net> (raw)
In-Reply-To: <0db91235-810e-1c6e-7192-48f698c55c59@gmx.net>

Ping, any feedback on this issue?

Sorry if I was not clear, the problem here is that the filesystem is very
slow (10-20 MB/s on the device) in sequential reads from compressed
files, when the block size is 8K.

It looks like a bug to me (read requests are not merging, i.e. no
read-ahead is happening). Any opinions?


On Mon, 10 Jul 2023, Dimitrios Apostolou wrote:

> Hello list,
>
> I discovered this issue because of very slow sequential read speed in
> Postgresql, which performs all reads using blocking pread() calls of 8192
> size (postgres' default page size). I verified reads are similarly slow when
> I read files using dd bs=8k. Here are my measurements:
>
> Reading a 1GB postgres file using dd (which uses read() internally) in 8K and
> 32K chunks:
>
>     # dd if=4156889.4 of=/dev/null bs=8k
>     1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.18829 s, 174 MB/s
>
>     # dd if=4156889.4 of=/dev/null bs=8k    # 2nd run, data is cached
>     1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.287623 s, 3.7 GB/s
>
>     # dd if=4156889.8 of=/dev/null bs=32k
>     1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.02688 s, 1.0 GB/s
>
>     # dd if=4156889.8 of=/dev/null bs=32k    # 2nd run, data is cached
>     1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.264049 s, 4.1 GB/s
>
> Notice that the read rate (after transparent decompression) with bs=8k is
> 174MB/s (I see ~20MB/s on the device), slow and similar to what Postgresql
> does. With bs=32k the rate increases to 1GB/s (I see ~80MB/s on the device,
> but the time is very short to register properly). The device limit is 1GB/s,
> of course I'm not expecting to reach this while decompressing. The cached
> reads are fast in both cases, I'm guessing the kernel buffercache contains
> the decompressed blocks.
>
> The above results have been verified with multiple runs. The kernel is 5.15
> Ubuntu LTS and the block device is an LVM logical volume on a high
> performance DAS system, but I verified the same behaviour on a separate
> system with kernel 6.3.9 and btrfs directly on a local spinning disk. Btrfs
> filesystem is mounted with compress=zstd:3 and the files have been
> defragmented prior to running the commands.
>
> Focusing on the cold cache cases, iostat gives interesting insight: For both
> postgres doing sequential scan and for dd with bs=8k, the kernel block layer
> does not appear to merge the I/O requests. `iostat -x` shows 16 (sectors?)
> average read request size, 0 merged requests, and very high reads/s IOPS
> number.
>
> The dd commands with bs=32k block size show fewer IOPS on `iostat -x`, higher
> speed, larger average block size and high number of merged requests.  To me
> it appears as btrfs is doing read-ahead only when the read block is large.
>
> Example output for some random second out of dd bs=8k:
>
>     Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz
>     sdc           1313.00     20.93     2.00   0.15    0.53    16.32
>
> with dd bs=32k:
>
>     Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz
>     sdc            290.00     76.44  4528.00  93.98    1.71   269.92
>
> *On the same filesystem, doing dd bs=8k reads from a file that has not been
> compressed by the filesystem I get 1GB/s throughput, which is the limit of my
> device. This is what makes me believe it's an issue with btrfs compression.*
>
> Is this a bug or known behaviour?
>
> Thanks in advance,
> Dimitris
>
>
>

  reply	other threads:[~2023-07-17 14:11 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-10 18:56 btrfs sequential 8K read()s from compressed files are not merging Dimitrios Apostolou
2023-07-17 14:11 ` Dimitrios Apostolou [this message]
2023-07-26 10:59   ` (PING) " Dimitrios Apostolou
2023-07-26 12:54     ` Christoph Hellwig
2023-07-26 13:44       ` Dimitrios Apostolou
2023-08-29 13:02       ` Dimitrios Apostolou
2023-08-30 11:54         ` Qu Wenruo
2023-08-30 18:18           ` Dimitrios Apostolou
2023-08-31  0:22             ` Anand Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4b16bd02-a446-8000-b10e-4b24aaede854@gmx.net \
    --to=jimis@gmx.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).