* btrfs sequential 8K read()s from compressed files are not merging
@ 2023-07-10 18:56 Dimitrios Apostolou
2023-07-17 14:11 ` Dimitrios Apostolou
0 siblings, 1 reply; 9+ messages in thread
From: Dimitrios Apostolou @ 2023-07-10 18:56 UTC (permalink / raw)
To: linux-btrfs
Hello list,
I discovered this issue because of very slow sequential read speed in
Postgresql, which performs all reads using blocking pread() calls of 8192
size (postgres' default page size). I verified reads are similarly slow
when I read files using dd bs=8k. Here are my measurements:
Reading a 1GB postgres file using dd (which uses read() internally) in 8K
and 32K chunks:
# dd if=4156889.4 of=/dev/null bs=8k
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.18829 s, 174 MB/s
# dd if=4156889.4 of=/dev/null bs=8k # 2nd run, data is cached
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.287623 s, 3.7 GB/s
# dd if=4156889.8 of=/dev/null bs=32k
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.02688 s, 1.0 GB/s
# dd if=4156889.8 of=/dev/null bs=32k # 2nd run, data is cached
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.264049 s, 4.1 GB/s
Notice that the read rate (after transparent decompression) with bs=8k is
174MB/s (I see ~20MB/s on the device), slow and similar to what Postgresql
does. With bs=32k the rate increases to 1GB/s (I see ~80MB/s on the
device, but the time is very short to register properly). The device limit
is 1GB/s, of course I'm not expecting to reach this while decompressing.
The cached reads are fast in both cases, I'm guessing the kernel
buffercache contains the decompressed blocks.
The above results have been verified with multiple runs. The kernel is
5.15 Ubuntu LTS and the block device is an LVM logical volume on a high
performance DAS system, but I verified the same behaviour on a separate
system with kernel 6.3.9 and btrfs directly on a local spinning disk.
Btrfs filesystem is mounted with compress=zstd:3 and the files have been
defragmented prior to running the commands.
Focusing on the cold cache cases, iostat gives interesting insight: For
both postgres doing sequential scan and for dd with bs=8k, the kernel
block layer does not appear to merge the I/O requests. `iostat -x` shows
16 (sectors?) average read request size, 0 merged requests, and very high
reads/s IOPS number.
The dd commands with bs=32k block size show fewer IOPS on `iostat -x`,
higher speed, larger average block size and high number of merged
requests. To me it appears as btrfs is doing read-ahead only when the
read block is large.
Example output for some random second out of dd bs=8k:
Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz
sdc 1313.00 20.93 2.00 0.15 0.53 16.32
with dd bs=32k:
Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz
sdc 290.00 76.44 4528.00 93.98 1.71 269.92
*On the same filesystem, doing dd bs=8k reads from a file that has not
been compressed by the filesystem I get 1GB/s throughput, which is the
limit of my device. This is what makes me believe it's an issue with btrfs
compression.*
Is this a bug or known behaviour?
Thanks in advance,
Dimitris
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: btrfs sequential 8K read()s from compressed files are not merging 2023-07-10 18:56 btrfs sequential 8K read()s from compressed files are not merging Dimitrios Apostolou @ 2023-07-17 14:11 ` Dimitrios Apostolou 2023-07-26 10:59 ` (PING) " Dimitrios Apostolou 0 siblings, 1 reply; 9+ messages in thread From: Dimitrios Apostolou @ 2023-07-17 14:11 UTC (permalink / raw) To: linux-btrfs Ping, any feedback on this issue? Sorry if I was not clear, the problem here is that the filesystem is very slow (10-20 MB/s on the device) in sequential reads from compressed files, when the block size is 8K. It looks like a bug to me (read requests are not merging, i.e. no read-ahead is happening). Any opinions? On Mon, 10 Jul 2023, Dimitrios Apostolou wrote: > Hello list, > > I discovered this issue because of very slow sequential read speed in > Postgresql, which performs all reads using blocking pread() calls of 8192 > size (postgres' default page size). I verified reads are similarly slow when > I read files using dd bs=8k. Here are my measurements: > > Reading a 1GB postgres file using dd (which uses read() internally) in 8K and > 32K chunks: > > # dd if=4156889.4 of=/dev/null bs=8k > 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.18829 s, 174 MB/s > > # dd if=4156889.4 of=/dev/null bs=8k # 2nd run, data is cached > 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.287623 s, 3.7 GB/s > > # dd if=4156889.8 of=/dev/null bs=32k > 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.02688 s, 1.0 GB/s > > # dd if=4156889.8 of=/dev/null bs=32k # 2nd run, data is cached > 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.264049 s, 4.1 GB/s > > Notice that the read rate (after transparent decompression) with bs=8k is > 174MB/s (I see ~20MB/s on the device), slow and similar to what Postgresql > does. With bs=32k the rate increases to 1GB/s (I see ~80MB/s on the device, > but the time is very short to register properly). The device limit is 1GB/s, > of course I'm not expecting to reach this while decompressing. The cached > reads are fast in both cases, I'm guessing the kernel buffercache contains > the decompressed blocks. > > The above results have been verified with multiple runs. The kernel is 5.15 > Ubuntu LTS and the block device is an LVM logical volume on a high > performance DAS system, but I verified the same behaviour on a separate > system with kernel 6.3.9 and btrfs directly on a local spinning disk. Btrfs > filesystem is mounted with compress=zstd:3 and the files have been > defragmented prior to running the commands. > > Focusing on the cold cache cases, iostat gives interesting insight: For both > postgres doing sequential scan and for dd with bs=8k, the kernel block layer > does not appear to merge the I/O requests. `iostat -x` shows 16 (sectors?) > average read request size, 0 merged requests, and very high reads/s IOPS > number. > > The dd commands with bs=32k block size show fewer IOPS on `iostat -x`, higher > speed, larger average block size and high number of merged requests. To me > it appears as btrfs is doing read-ahead only when the read block is large. > > Example output for some random second out of dd bs=8k: > > Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz > sdc 1313.00 20.93 2.00 0.15 0.53 16.32 > > with dd bs=32k: > > Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz > sdc 290.00 76.44 4528.00 93.98 1.71 269.92 > > *On the same filesystem, doing dd bs=8k reads from a file that has not been > compressed by the filesystem I get 1GB/s throughput, which is the limit of my > device. This is what makes me believe it's an issue with btrfs compression.* > > Is this a bug or known behaviour? > > Thanks in advance, > Dimitris > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* (PING) btrfs sequential 8K read()s from compressed files are not merging 2023-07-17 14:11 ` Dimitrios Apostolou @ 2023-07-26 10:59 ` Dimitrios Apostolou 2023-07-26 12:54 ` Christoph Hellwig 0 siblings, 1 reply; 9+ messages in thread From: Dimitrios Apostolou @ 2023-07-26 10:59 UTC (permalink / raw) To: linux-btrfs Any feedback? Is this a bug? I verified that others see the same slow read speads from compressed files when the block size is small. P.S. Is there a bugtracker to report btrfs bugs? My understanding is that neither kernel's bugzilla nor github issues are endorsed. On Mon, 17 Jul 2023, Dimitrios Apostolou wrote: > Ping, any feedback on this issue? > > Sorry if I was not clear, the problem here is that the filesystem is very > slow (10-20 MB/s on the device) in sequential reads from compressed files, > when the block size is 8K. > > It looks like a bug to me (read requests are not merging, i.e. no read-ahead > is happening). Any opinions? > > > On Mon, 10 Jul 2023, Dimitrios Apostolou wrote: > >> Hello list, >> >> I discovered this issue because of very slow sequential read speed in >> Postgresql, which performs all reads using blocking pread() calls of 8192 >> size (postgres' default page size). I verified reads are similarly slow >> when I read files using dd bs=8k. Here are my measurements: >> >> Reading a 1GB postgres file using dd (which uses read() internally) in 8K >> and 32K chunks: >> >> # dd if=4156889.4 of=/dev/null bs=8k >> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.18829 s, 174 MB/s >> >> # dd if=4156889.4 of=/dev/null bs=8k # 2nd run, data is cached >> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.287623 s, 3.7 GB/s >> >> # dd if=4156889.8 of=/dev/null bs=32k >> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.02688 s, 1.0 GB/s >> >> # dd if=4156889.8 of=/dev/null bs=32k # 2nd run, data is cached >> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.264049 s, 4.1 GB/s >> >> Notice that the read rate (after transparent decompression) with bs=8k is >> 174MB/s (I see ~20MB/s on the device), slow and similar to what Postgresql >> does. With bs=32k the rate increases to 1GB/s (I see ~80MB/s on the >> device, but the time is very short to register properly). The device limit >> is 1GB/s, of course I'm not expecting to reach this while decompressing. >> The cached reads are fast in both cases, I'm guessing the kernel >> buffercache contains the decompressed blocks. >> >> The above results have been verified with multiple runs. The kernel is >> 5.15 Ubuntu LTS and the block device is an LVM logical volume on a high >> performance DAS system, but I verified the same behaviour on a separate >> system with kernel 6.3.9 and btrfs directly on a local spinning disk. >> Btrfs filesystem is mounted with compress=zstd:3 and the files have been >> defragmented prior to running the commands. >> >> Focusing on the cold cache cases, iostat gives interesting insight: For >> both postgres doing sequential scan and for dd with bs=8k, the kernel >> block layer does not appear to merge the I/O requests. `iostat -x` shows >> 16 (sectors?) average read request size, 0 merged requests, and very high >> reads/s IOPS number. >> >> The dd commands with bs=32k block size show fewer IOPS on `iostat -x`, >> higher speed, larger average block size and high number of merged >> requests. To me it appears as btrfs is doing read-ahead only when the >> read block is large. >> >> Example output for some random second out of dd bs=8k: >> >> Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz >> sdc 1313.00 20.93 2.00 0.15 0.53 16.32 >> >> with dd bs=32k: >> >> Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz >> sdc 290.00 76.44 4528.00 93.98 1.71 269.92 >> >> *On the same filesystem, doing dd bs=8k reads from a file that has not >> been compressed by the filesystem I get 1GB/s throughput, which is the >> limit of my device. This is what makes me believe it's an issue with btrfs >> compression.* >> >> Is this a bug or known behaviour? >> >> Thanks in advance, >> Dimitris >> >> >> > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: (PING) btrfs sequential 8K read()s from compressed files are not merging 2023-07-26 10:59 ` (PING) " Dimitrios Apostolou @ 2023-07-26 12:54 ` Christoph Hellwig 2023-07-26 13:44 ` Dimitrios Apostolou 2023-08-29 13:02 ` Dimitrios Apostolou 0 siblings, 2 replies; 9+ messages in thread From: Christoph Hellwig @ 2023-07-26 12:54 UTC (permalink / raw) To: Dimitrios Apostolou; +Cc: linux-btrfs FYI, I can reproduce similar findings to yours. I'm somewhere between dealing with regressions and travel and don't actually have time to fully root cause it. The most likely scenario is probably some interaction between the read ahead window that is based around the actual I/O size, and the btrfs compressed extent design that always compressed a fixed sized chunk of data. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: (PING) btrfs sequential 8K read()s from compressed files are not merging 2023-07-26 12:54 ` Christoph Hellwig @ 2023-07-26 13:44 ` Dimitrios Apostolou 2023-08-29 13:02 ` Dimitrios Apostolou 1 sibling, 0 replies; 9+ messages in thread From: Dimitrios Apostolou @ 2023-07-26 13:44 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-btrfs Thanks for responding while travelling! :-) On Wed, 26 Jul 2023, Christoph Hellwig wrote: > FYI, I can reproduce similar findings to yours. I'm somewhere between > dealing with regressions and travel and don't actually have time to > fully root cause it. > > The most likely scenario is probably some interaction between the read > ahead window that is based around the actual I/O size, and the btrfs > compressed extent design that always compressed a fixed sized chunk > of data. AFAIK the compressed extents are of size 128KB. I would expect btrfs to decompress it as a whole, so no clever read-ahead would be needed, btrfs should read 128KB chunks from disk and not 8KB which is the application block size. But the data shows otherwise. Any idea about how btrfs reads and decompresses the 128KB extents? Also do you know if btrfs keeps the full decompressed chunk cached, or does it re-decompress it every time the application reads 8KB? Dimitris ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: (PING) btrfs sequential 8K read()s from compressed files are not merging 2023-07-26 12:54 ` Christoph Hellwig 2023-07-26 13:44 ` Dimitrios Apostolou @ 2023-08-29 13:02 ` Dimitrios Apostolou 2023-08-30 11:54 ` Qu Wenruo 1 sibling, 1 reply; 9+ messages in thread From: Dimitrios Apostolou @ 2023-08-29 13:02 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-btrfs On Wed, 26 Jul 2023, Christoph Hellwig wrote: > FYI, I can reproduce similar findings to yours. I'm somewhere between > dealing with regressions and travel and don't actually have time to > fully root cause it. > > The most likely scenario is probably some interaction between the read > ahead window that is based around the actual I/O size, and the btrfs > compressed extent design that always compressed a fixed sized chunk > of data. So the issue is still an issue (btrfs being unreasonably slow when reading sequentially 8K blocks from a compressed file) and I'm trying to figure out the reasons. I'm wondering, when an application read()s an 8K block from a big btrfs-compressed file, apparently the full 128KB compressed chunk has to be decompressed. But what does btrfs store in the kernel buffercache? a. Does it store only the specific 8K block of decompressed data that was requested? b. Does it store the full compressed block (128KB AFAIK) and will be re-decompressed upon read() from any application? c. Or does it store the full de-compressed block, which might even be 1MB in size? I guess it's doing [a], because of the performance issue I'm facing. Both [b] and [c] would work as some kind of automatic read-ahead. But any kind of verification would be helpful to nail the problem, as I can't see this level of detail exposed in any way, from a userspace point of view. Thanks, Dimitris ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: (PING) btrfs sequential 8K read()s from compressed files are not merging 2023-08-29 13:02 ` Dimitrios Apostolou @ 2023-08-30 11:54 ` Qu Wenruo 2023-08-30 18:18 ` Dimitrios Apostolou 0 siblings, 1 reply; 9+ messages in thread From: Qu Wenruo @ 2023-08-30 11:54 UTC (permalink / raw) To: Dimitrios Apostolou, Christoph Hellwig; +Cc: linux-btrfs On 2023/8/29 21:02, Dimitrios Apostolou wrote: > On Wed, 26 Jul 2023, Christoph Hellwig wrote: > >> FYI, I can reproduce similar findings to yours. I'm somewhere between >> dealing with regressions and travel and don't actually have time to >> fully root cause it. >> >> The most likely scenario is probably some interaction between the read >> ahead window that is based around the actual I/O size, and the btrfs >> compressed extent design that always compressed a fixed sized chunk >> of data. > > So the issue is still an issue (btrfs being unreasonably slow when reading > sequentially 8K blocks from a compressed file) and I'm trying to figure > out the reasons. > > I'm wondering, when an application read()s an 8K block from a big > btrfs-compressed file, apparently the full 128KB compressed chunk has to > be decompressed. But what does btrfs store in the kernel buffercache? The kernel page cache is mostly for inode, aka the decompressed data. As long as you're doing cached read, the decompressed data would be cached. But there is another catch, if the file extent only points to a very small part of the decompressed range, we still need to read the full compressed extent, do the decompression, and only copy the small range into the page cache. > > a. Does it store only the specific 8K block of decompressed data that was > requested? If it's buffered read, the read can be merged with other blocks, and we also have readahead, in that case we can still submit a much larger read. But mostly it's case a), as for dd, it would wait for the read to finish. Meanwhile if it's direct IO, there would be no merge, nor any cache. (That's expected though) > > b. Does it store the full compressed block (128KB AFAIK) and will be > re-decompressed upon read() from any application? > > c. Or does it store the full de-compressed block, which might even be 1MB > in size? > > I guess it's doing [a], because of the performance issue I'm facing. Both > [b] and [c] would work as some kind of automatic read-ahead. But any kind > of verification would be helpful to nail the problem, as I can't see this > level of detail exposed in any way, from a userspace point of view. Although there are other factors which can be involved, like fragments (especially damaging performance for compressed extents). One thing I want to verify is, could you create a big file with all compressed extents (dd writes, blocksize doesn't matter that much as by default it's buffered write), other than postgres data bases? Then do the same read with 32K and 512K and see if there is still the same slow performance. (The compressed extent size limit is 128K, thus 512K would cover 4 file extents, and hopefully to increase the performance.) I'm afraid the postgres data may be fragmented due to the database workload, and contributes to the slow down. Thanks, Qu > > > Thanks, > Dimitris > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: (PING) btrfs sequential 8K read()s from compressed files are not merging 2023-08-30 11:54 ` Qu Wenruo @ 2023-08-30 18:18 ` Dimitrios Apostolou 2023-08-31 0:22 ` Anand Jain 0 siblings, 1 reply; 9+ messages in thread From: Dimitrios Apostolou @ 2023-08-30 18:18 UTC (permalink / raw) To: Qu Wenruo; +Cc: Christoph Hellwig, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 4993 bytes --] Thanks for the feedback! On Wed, 30 Aug 2023, Qu Wenruo wrote: > > On 2023/8/29 21:02, Dimitrios Apostolou wrote: > >> a. Does it store only the specific 8K block of decompressed data that was >> requested? > > If it's buffered read, the read can be merged with other blocks, and we > also have readahead, in that case we can still submit a much larger read. > > But mostly it's case a), as for dd, it would wait for the read to finish. This is definitely not the case in other filesystems, where I see blocking 8K buffered reads going much faster. But I understand better now, and I think I have expressed the problem wrong in the subject. The problem is not that IOs are not *merging*, but that there is no read-ahead/pre-fetch happening. This sounds more accurate to me. But this brings the question: shouldn't read-ahead/pre-fetch happen on the block layer? I remember I have seen some configurable knobs on the elevator level, or even on the device driver level. Is btrfs circumventing those? For the sake of completeness all of the read()s in my (previous and current) measurements are buffered and blocking, and same are the ones from postgres. > One thing I want to verify is, could you create a big file with all > compressed extents (dd writes, blocksize doesn't matter that much as by > default it's buffered write), other than postgres data bases? > > Then do the same read with 32K and 512K and see if there is still the > same slow performance. I assume you also want to see 8KB reads here, which is the main problem I reported. > (The compressed extent size limit is 128K, thus 512K would cover 4 file > extents, and hopefully to increase the performance.) > > I'm afraid the postgres data may be fragmented due to the database > workload, and contributes to the slow down. ==== Measurements I created a zero-filled file with the size of the host's RAM to avoid caching issues. I did many re-runs of every dd command and verified there is no variation. I should also mention that the filesystem is 85% free, so there shouldn't be any fragmentation issues. # dd if=/dev/zero of=blah bs=1G count=16 16+0 records in 16+0 records out 17179869184 bytes (17 GB, 16 GiB) copied, 14.2627 s, 1.2 GB/s I verified the file is well compressed: # compsize blah Processed 1 file, 131073 regular extents (131073 refs), 0 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 3% 512M 16G 16G zstd 3% 512M 16G 16G I'm surprised that such a file needed 128Kextents and required 512MB of disk space (the filesystem is mounted with compress=zstd:3) but it is what it is. On to reading the file: # dd if=blah of=/dev/null bs=512k 32768+0 records in 32768+0 records out 17179869184 bytes (17 GB, 16 GiB) copied, 7.40493 s, 2.3 GB/s ### iostat showed 30MB/s to 100MB/s device read speed # dd if=blah of=/dev/null bs=32k 524288+0 records in 524288+0 records out 17179869184 bytes (17 GB, 16 GiB) copied, 8.34762 s, 2.1 GB/s ### iostat showed 30MB/s to 90MB/s device read speed # dd if=blah of=/dev/null bs=8k 2097152+0 records in 2097152+0 records out 17179869184 bytes (17 GB, 16 GiB) copied, 18.7143 s, 918 MB/s ### iostat showed very variable 8MB/s to 60MB/s device read speed ### average maybe around 40MB/s Also worth noting is the IO request size that iostat is reporting. For bs=8k it reports a request size of about 4 (KB?), while it's order of magnitudes higher for all the other measurements in this email. ==== Same test with uncompressable file I performed the same experiments with a urandom-filled file. I assume here that btrfs is detecting the file can't be compressed, so it's treating it differently. This is what the measurements are showing here, that the device speed limits are reached in all cases (this host has an HDD with limit 200MB/s). # dd if=/dev/urandom of=blah-random bs=1G count=16 16+0 records in 16+0 records out 17179869184 bytes (17 GB, 16 GiB) copied, 84.0045 s, 205 MB/s # compsize blah-random Processed 1 file, 133 regular extents (133 refs), 0 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 100% 15G 15G 15G none 100% 15G 15G 15G # dd if=blah-random of=/dev/null bs=512k 32768+0 records in 32768+0 records out 17179869184 bytes (17 GB, 16 GiB) copied, 87.82 s, 196 MB/s ### iostat showed 180-205MB/s device read speed # dd if=blah-random of=/dev/null bs=32k 524288+0 records in 524288+0 records out 17179869184 bytes (17 GB, 16 GiB) copied, 88.3785 s, 194 MB/s ### iostat showed 180-205MB/s device read speed # dd if=blah-random of=/dev/null bs=8k 2097152+0 records in 2097152+0 records out 17179869184 bytes (17 GB, 16 GiB) copied, 88.7887 s, 193 MB/s ### iostat showed 180-205MB/s device read speed Thanks, Dimitris ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: (PING) btrfs sequential 8K read()s from compressed files are not merging 2023-08-30 18:18 ` Dimitrios Apostolou @ 2023-08-31 0:22 ` Anand Jain 0 siblings, 0 replies; 9+ messages in thread From: Anand Jain @ 2023-08-31 0:22 UTC (permalink / raw) To: Dimitrios Apostolou, Qu Wenruo; +Cc: Christoph Hellwig, linux-btrfs > # dd if=/dev/zero of=blah bs=1G count=16 > 16+0 records in > 16+0 records out > 17179869184 bytes (17 GB, 16 GiB) copied, 14.2627 s, 1.2 GB/s > > I verified the file is well compressed: > > # compsize blah > Processed 1 file, 131073 regular extents (131073 refs), 0 inline. > Type Perc Disk Usage Uncompressed Referenced > TOTAL 3% 512M 16G 16G > zstd 3% 512M 16G 16G > > I'm surprised that such a file needed 128Kextents and required 512MB of > disk space (the filesystem is mounted with compress=zstd:3) but it is what > it is. On to reading the file: > > # dd if=blah of=/dev/null bs=512k > 32768+0 records in > 32768+0 records out > 17179869184 bytes (17 GB, 16 GiB) copied, 7.40493 s, 2.3 GB/s > ### iostat showed 30MB/s to 100MB/s device read speed > > # dd if=blah of=/dev/null bs=32k > 524288+0 records in > 524288+0 records out > 17179869184 bytes (17 GB, 16 GiB) copied, 8.34762 s, 2.1 GB/s > ### iostat showed 30MB/s to 90MB/s device read speed > > # dd if=blah of=/dev/null bs=8k > 2097152+0 records in > 2097152+0 records out > 17179869184 bytes (17 GB, 16 GiB) copied, 18.7143 s, 918 MB/s > ### iostat showed very variable 8MB/s to 60MB/s device read speed > ### average maybe around 40MB/s > > > Also worth noting is the IO request size that iostat is reporting. For > bs=8k it reports a request size of about 4 (KB?), while it's order of > magnitudes higher for all the other measurements in this email. > The sector size is 4k, and the compression block size is 128k. There will be a lot more read IO, which may not be mergeable for reads with lower block sizes. > > ==== Same test with uncompressable file > > I performed the same experiments with a urandom-filled file. I assume here > that btrfs is detecting the file can't be compressed, so it's treating it > differently. This is what the measurements are showing here, that the > device speed limits are reached in all cases > (this host has an HDD with limit 200MB/s). > > # dd if=/dev/urandom of=blah-random bs=1G count=16 > 16+0 records in > 16+0 records out > 17179869184 bytes (17 GB, 16 GiB) copied, 84.0045 s, 205 MB/s > > # compsize blah-random > Processed 1 file, 133 regular extents (133 refs), 0 inline. > Type Perc Disk Usage Uncompressed Referenced > TOTAL 100% 15G 15G 15G > none 100% 15G 15G 15G > > # dd if=blah-random of=/dev/null bs=512k > 32768+0 records in > 32768+0 records out > 17179869184 bytes (17 GB, 16 GiB) copied, 87.82 s, 196 MB/s > ### iostat showed 180-205MB/s device read speed > > # dd if=blah-random of=/dev/null bs=32k > 524288+0 records in > 524288+0 records out > 17179869184 bytes (17 GB, 16 GiB) copied, 88.3785 s, 194 MB/s > ### iostat showed 180-205MB/s device read speed > > # dd if=blah-random of=/dev/null bs=8k > 2097152+0 records in > 2097152+0 records out > 17179869184 bytes (17 GB, 16 GiB) copied, 88.7887 s, 193 MB/s > ### iostat showed 180-205MB/s device read speed The heuristic will disable compression on the file if the data is incompressible, such as that from /dev/urandom. Generally, to test compression in fstests, we use the 'dd' command as below. od /dev/urandom | dd iflag=fullblock of=.. bs=.. count=.. Thanks, Anand ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-08-31 0:22 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-10 18:56 btrfs sequential 8K read()s from compressed files are not merging Dimitrios Apostolou 2023-07-17 14:11 ` Dimitrios Apostolou 2023-07-26 10:59 ` (PING) " Dimitrios Apostolou 2023-07-26 12:54 ` Christoph Hellwig 2023-07-26 13:44 ` Dimitrios Apostolou 2023-08-29 13:02 ` Dimitrios Apostolou 2023-08-30 11:54 ` Qu Wenruo 2023-08-30 18:18 ` Dimitrios Apostolou 2023-08-31 0:22 ` Anand Jain
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).