(PING) btrfs sequential 8K read()s from compressed files are not merging

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dimitrios Apostolou <jimis@gmx.net>
To: linux-btrfs@vger.kernel.org
Subject: (PING) btrfs sequential 8K read()s from compressed files are not merging
Date: Wed, 26 Jul 2023 12:59:48 +0200 (CEST)	[thread overview]
Message-ID: <fd0bbbc3-4a42-3472-dc6e-5a1cb51df10e@gmx.net> (raw)
In-Reply-To: <4b16bd02-a446-8000-b10e-4b24aaede854@gmx.net>

Any feedback? Is this a bug? I verified that others see the same slow read
speads from compressed files when the block size is small.

P.S. Is there a bugtracker to report btrfs bugs? My understanding is that
      neither kernel's bugzilla nor github issues are endorsed.

On Mon, 17 Jul 2023, Dimitrios Apostolou wrote:

> Ping, any feedback on this issue?
>
> Sorry if I was not clear, the problem here is that the filesystem is very
> slow (10-20 MB/s on the device) in sequential reads from compressed files,
> when the block size is 8K.
>
> It looks like a bug to me (read requests are not merging, i.e. no read-ahead
> is happening). Any opinions?
>
>
> On Mon, 10 Jul 2023, Dimitrios Apostolou wrote:
>
>>  Hello list,
>>
>>  I discovered this issue because of very slow sequential read speed in
>>  Postgresql, which performs all reads using blocking pread() calls of 8192
>>  size (postgres' default page size). I verified reads are similarly slow
>>  when I read files using dd bs=8k. Here are my measurements:
>>
>>  Reading a 1GB postgres file using dd (which uses read() internally) in 8K
>>  and 32K chunks:
>>
>>      # dd if=4156889.4 of=/dev/null bs=8k
>>      1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.18829 s, 174 MB/s
>>
>>      # dd if=4156889.4 of=/dev/null bs=8k    # 2nd run, data is cached
>>      1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.287623 s, 3.7 GB/s
>>
>>      # dd if=4156889.8 of=/dev/null bs=32k
>>      1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.02688 s, 1.0 GB/s
>>
>>      # dd if=4156889.8 of=/dev/null bs=32k    # 2nd run, data is cached
>>      1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.264049 s, 4.1 GB/s
>>
>>  Notice that the read rate (after transparent decompression) with bs=8k is
>>  174MB/s (I see ~20MB/s on the device), slow and similar to what Postgresql
>>  does. With bs=32k the rate increases to 1GB/s (I see ~80MB/s on the
>>  device, but the time is very short to register properly). The device limit
>>  is 1GB/s, of course I'm not expecting to reach this while decompressing.
>>  The cached reads are fast in both cases, I'm guessing the kernel
>>  buffercache contains the decompressed blocks.
>>
>>  The above results have been verified with multiple runs. The kernel is
>>  5.15 Ubuntu LTS and the block device is an LVM logical volume on a high
>>  performance DAS system, but I verified the same behaviour on a separate
>>  system with kernel 6.3.9 and btrfs directly on a local spinning disk.
>>  Btrfs filesystem is mounted with compress=zstd:3 and the files have been
>>  defragmented prior to running the commands.
>>
>>  Focusing on the cold cache cases, iostat gives interesting insight: For
>>  both postgres doing sequential scan and for dd with bs=8k, the kernel
>>  block layer does not appear to merge the I/O requests. `iostat -x` shows
>>  16 (sectors?) average read request size, 0 merged requests, and very high
>>  reads/s IOPS number.
>>
>>  The dd commands with bs=32k block size show fewer IOPS on `iostat -x`,
>>  higher speed, larger average block size and high number of merged
>>  requests.  To me it appears as btrfs is doing read-ahead only when the
>>  read block is large.
>>
>>  Example output for some random second out of dd bs=8k:
>>
>>      Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz
>>      sdc           1313.00     20.93     2.00   0.15    0.53    16.32
>>
>>  with dd bs=32k:
>>
>>      Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz
>>      sdc            290.00     76.44  4528.00  93.98    1.71   269.92
>>
>>  *On the same filesystem, doing dd bs=8k reads from a file that has not
>>  been compressed by the filesystem I get 1GB/s throughput, which is the
>>  limit of my device. This is what makes me believe it's an issue with btrfs
>>  compression.*
>>
>>  Is this a bug or known behaviour?
>>
>>  Thanks in advance,
>>  Dimitris
>>
>>
>>
>
>

next prev parent reply	other threads:[~2023-07-26 10:59 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-10 18:56 btrfs sequential 8K read()s from compressed files are not merging Dimitrios Apostolou
2023-07-17 14:11 ` Dimitrios Apostolou
2023-07-26 10:59   ` Dimitrios Apostolou [this message]
2023-07-26 12:54     ` (PING) " Christoph Hellwig
2023-07-26 13:44       ` Dimitrios Apostolou
2023-08-29 13:02       ` Dimitrios Apostolou
2023-08-30 11:54         ` Qu Wenruo
2023-08-30 18:18           ` Dimitrios Apostolou
2023-08-31  0:22             ` Anand Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fd0bbbc3-4a42-3472-dc6e-5a1cb51df10e@gmx.net \
    --to=jimis@gmx.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).