* Transparent compression with ext4 - especially with zstd @ 2025-01-19 14:37 Gerhard Wiesinger 2025-01-21 4:01 ` Theodore Ts'o 0 siblings, 1 reply; 13+ messages in thread From: Gerhard Wiesinger @ 2025-01-19 14:37 UTC (permalink / raw) To: linux-ext4 Hello, Are there any plans to include transparent compression with ext4 (especially with zstd)? Thnx. Ciao, Gerhard ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Transparent compression with ext4 - especially with zstd 2025-01-19 14:37 Transparent compression with ext4 - especially with zstd Gerhard Wiesinger @ 2025-01-21 4:01 ` Theodore Ts'o 2025-01-21 9:42 ` Artem Blagodarenko 2025-01-21 18:47 ` Gerhard Wiesinger 0 siblings, 2 replies; 13+ messages in thread From: Theodore Ts'o @ 2025-01-21 4:01 UTC (permalink / raw) To: Gerhard Wiesinger; +Cc: linux-ext4 On Sun, Jan 19, 2025 at 03:37:27PM +0100, Gerhard Wiesinger wrote: > > Are there any plans to include transparent compression with ext4 (especially > with zstd)? I'm not aware of anyone in the ext4 deveopment commuity working on something like this. Fully transparent compression is challenging, since supporting random writes into a compressed file is tricky. There are solutions (for example, the Stac patent which resulted in Microsoft to pay $120 million dollars), but even ignoring the intellectual property issues, they tend to compromise the efficiency of the compression. More to the point, given how cheap byte storage tends to be (dollars per IOPS tend to be far more of a constraint than dollars per GB), it's unclear what the business case would be for any company to fund development work in this area, when the cost of a slightly large HDD or SSD is going to be far cheaper than the necessary software engineering investrment needed, even for a hyperscaler cloud company (and even there, it's unclear that transparent compression is really needed). What is the business and/or technical problem which you are trying to solve? Cheers, - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Transparent compression with ext4 - especially with zstd 2025-01-21 4:01 ` Theodore Ts'o @ 2025-01-21 9:42 ` Artem Blagodarenko 2025-01-21 18:47 ` Gerhard Wiesinger 1 sibling, 0 replies; 13+ messages in thread From: Artem Blagodarenko @ 2025-01-21 9:42 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Gerhard Wiesinger, linux-ext4 Hi Gerhard, Theodore, >even for a hyperscaler cloud company >(and even there, it's unclear that transparent compression is really >needed). Regarding exascale storages. Lustre FS (which uses EXT4 (LDISKFS) as a backend) has a “Client-side data compression” project (LU-10026) which adds transparent compression with an extendable set of algorithms. The initial release includes gzip, lz4, lz4hc, lzo, zstd, zstdfast algorithms with levels. More details are in the LUG and LAD 2023-2024 years presentations. Best regards, Artem Blagodarenko From: Theodore Ts'o <tytso@mit.edu> Date: Tuesday, 21 January 2025 at 04:01 To: Gerhard Wiesinger <lists@wiesinger.com> Cc: linux-ext4@vger.kernel.org <linux-ext4@vger.kernel.org> Subject: Re: Transparent compression with ext4 - especially with zstd On Sun, Jan 19, 2025 at 03:37:27PM +0100, Gerhard Wiesinger wrote: > > Are there any plans to include transparent compression with ext4 (especially > with zstd)? I'm not aware of anyone in the ext4 deveopment commuity working on something like this. Fully transparent compression is challenging, since supporting random writes into a compressed file is tricky. There are solutions (for example, the Stac patent which resulted in Microsoft to pay $120 million dollars), but even ignoring the intellectual property issues, they tend to compromise the efficiency of the compression. More to the point, given how cheap byte storage tends to be (dollars per IOPS tend to be far more of a constraint than dollars per GB), it's unclear what the business case would be for any company to fund development work in this area, when the cost of a slightly large HDD or SSD is going to be far cheaper than the necessary software engineering investrment needed, even for a hyperscaler cloud company (and even there, it's unclear that transparent compression is really needed). What is the business and/or technical problem which you are trying to solve? Cheers, - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Transparent compression with ext4 - especially with zstd 2025-01-21 4:01 ` Theodore Ts'o 2025-01-21 9:42 ` Artem Blagodarenko @ 2025-01-21 18:47 ` Gerhard Wiesinger 2025-01-21 19:33 ` Theodore Ts'o 2025-01-21 21:26 ` Dave Chinner 1 sibling, 2 replies; 13+ messages in thread From: Gerhard Wiesinger @ 2025-01-21 18:47 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4 On 21.01.2025 05:01, Theodore Ts'o wrote: > On Sun, Jan 19, 2025 at 03:37:27PM +0100, Gerhard Wiesinger wrote: >> Are there any plans to include transparent compression with ext4 (especially >> with zstd)? > I'm not aware of anyone in the ext4 deveopment commuity working on > something like this. Fully transparent compression is challenging, > since supporting random writes into a compressed file is tricky. > There are solutions (for example, the Stac patent which resulted in > Microsoft to pay $120 million dollars), but even ignoring the > intellectual property issues, they tend to compromise the efficiency > of the compression. > > More to the point, given how cheap byte storage tends to be (dollars > per IOPS tend to be far more of a constraint than dollars per GB), > it's unclear what the business case would be for any company to fund > development work in this area, when the cost of a slightly large HDD > or SSD is going to be far cheaper than the necessary software > engineering investrment needed, even for a hyperscaler cloud company > (and even there, it's unclear that transparent compression is really > needed). > > What is the business and/or technical problem which you are trying to > solve? > Regarding necessity: We are talking in some scenarios about some factors of diskspace. E.g. in my database scenario with PostgreSQL around 85% of disk space can be saved (e.g. around factor 7). In cloud usage scenarios you can easily reduce that amount of allocated diskspace by around a factor 7 and reduce cost therefore. You might also get a performance boost by using caching mechanism more efficient (e.g. using less RAM). Also with precompressed files (e.g. photo, videos) you can safe around 5-10% overall disk space which sounds less but in the area of several hundred Gigabytes or even some Petabytes this is a lot of storage. On evenly distributed data store you can save even more. The technical topic is that IMHO no stable and practical usable Linux filesystem which is included in the default kernel exists. - ZFS works but is not included in the default kernel - BTRFS has stability and repair issues (see mailing lists) and bugs with compression (does not compress on the fly in some scenarios) - bcachefs is experimental Regarding patents: IMHO at least the STAC patents are all no longer valid anymore. Thnx. Ciao, Gerhard ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Transparent compression with ext4 - especially with zstd 2025-01-21 18:47 ` Gerhard Wiesinger @ 2025-01-21 19:33 ` Theodore Ts'o 2025-01-22 0:19 ` Kiselev, Oleg 2025-01-22 7:29 ` Gerhard Wiesinger 2025-01-21 21:26 ` Dave Chinner 1 sibling, 2 replies; 13+ messages in thread From: Theodore Ts'o @ 2025-01-21 19:33 UTC (permalink / raw) To: Gerhard Wiesinger; +Cc: linux-ext4 On Tue, Jan 21, 2025 at 07:47:24PM +0100, Gerhard Wiesinger wrote: > We are talking in some scenarios about some factors of diskspace. E.g. in > my database scenario with PostgreSQL around 85% of disk space can be saved > (e.g. around factor 7). So the problem with using compression with databases is that they need to be able to do random writes into the middle of a file. So that means you need to use tricks such as writing into clusters, typically 32k or 64k. What this means is that a single 4k random write gets amplified into a 32k or 64k write. > In cloud usage scenarios you can easily reduce that amount of allocated > diskspace by around a factor 7 and reduce cost therefore. If you are running this on a cloud platform, where you are limited (on GCE) or charged (on AWS) by IOPS and throughput, this can be a performance bottleneck (or cost you extra). At the minimum the extra I/O throughput will very likely show up on various performance benchmarks. Worse, using a transparent compression breaks the ACID properties of the database. If you crash or have a power failure while rewriting the 64k compression cluster, all or part of that 64k compression cluster can be corrupted. And if your customers care about (their) data integrity, the fact that you cheaped out on disk space might not be something that would impress them terribly. The short version is that transparent compression is not free, even if you ignore the SWE development costs of implementing such a feature, and then getting that feature to be fit for use in an enterprise use case. No matter what file system you might want to use, I *strongly* suggest that you get a power fail rack and try putting the whole stack on said power fail rack, and try dropping power while running a stress test --- over, and over, and over again. What you might find would surprise you. > The technical topic is that IMHO no stable and practical usable Linux > filesystem which is included in the default kernel exists. > - ZFS works but is not included in the default kernel > - BTRFS has stability and repair issues (see mailing lists) and bugs with > compression (does not compress on the fly in some scenarios) > - bcachefs is experimental When I started work at Google 15 years ago to deploy ext4 into production, we did precisely this, and as well as deploying to a small percentage of Google's test fleet to do A:B comparisons before we deployed to the entire production fleet. Whether or not it is "practical" and "usable" depends on your definition, I guess, but from my perspective "stable" and "not losing users' data" is job #1. But hey, if it's worth so much to you, I suggest you cost out what it would cost to actually implement the features that you so much want, or how much it would cost to make the more complex file systems to be stable for production use. You might decide that paying the extra storage costs is way cheaper than software engineering investment costs involved. At Google, and when I was at IBM before that, we were always super disciplined about trying to figure out the ROI costs of some particular project and not just doing it because it was "cool". There's a famous story about how the engineers working on ZFS didn't ask for management's permission or input from the sales team before they started. Sounds great, and there was some cool technology there in ZFS --- but note that Sun had to put the company up for sale because they were losing money... Cheers, - Ted P.S. Note: using a compression cluster is the only real way to support transparent compression if you are using an update-in-place file system like ext4 or xfs. (And that is what was coverd by the Stac patents that I mentioned.) If you are using a log-structed file system, such as ZFS, then you can simply rewrite the compression cluster *and* update the file system metadata to point at the new compression cluster --- but then the garbage collection costs, and the file system metadata update costs for each database commit are *huge*, and the I/O throughput hit is even higher. So much so that ZFS recommends that you turn off the log-structured write and do update-in-place if you want to use a database on ZFS. But I'm pretty sure that this disables transparent compression if you are using update-in-place. TNSTAAFL. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Transparent compression with ext4 - especially with zstd 2025-01-21 19:33 ` Theodore Ts'o @ 2025-01-22 0:19 ` Kiselev, Oleg 2025-01-22 6:10 ` Gerhard Wiesinger 2025-01-22 7:29 ` Gerhard Wiesinger 1 sibling, 1 reply; 13+ messages in thread From: Kiselev, Oleg @ 2025-01-22 0:19 UTC (permalink / raw) To: Theodore Ts'o, Gerhard Wiesinger; +Cc: linux-ext4@vger.kernel.org MySQL, MariaDB and PostgreSQL do their own, schema and page-size aware compression. Why not let the databases do this? They are in a better position to do it and trade off the costs where and when it matters to them. -- Oleg Kiselev On 1/21/25, 11:35, "Theodore Ts'o" <tytso@mit.edu <mailto:tytso@mit.edu>> wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On Tue, Jan 21, 2025 at 07:47:24PM +0100, Gerhard Wiesinger wrote: > We are talking in some scenarios about some factors of diskspace. E.g. in > my database scenario with PostgreSQL around 85% of disk space can be saved > (e.g. around factor 7). So the problem with using compression with databases is that they need to be able to do random writes into the middle of a file. So that means you need to use tricks such as writing into clusters, typically 32k or 64k. What this means is that a single 4k random write gets amplified into a 32k or 64k write. > In cloud usage scenarios you can easily reduce that amount of allocated > diskspace by around a factor 7 and reduce cost therefore. If you are running this on a cloud platform, where you are limited (on GCE) or charged (on AWS) by IOPS and throughput, this can be a performance bottleneck (or cost you extra). At the minimum the extra I/O throughput will very likely show up on various performance benchmarks. Worse, using a transparent compression breaks the ACID properties of the database. If you crash or have a power failure while rewriting the 64k compression cluster, all or part of that 64k compression cluster can be corrupted. And if your customers care about (their) data integrity, the fact that you cheaped out on disk space might not be something that would impress them terribly. The short version is that transparent compression is not free, even if you ignore the SWE development costs of implementing such a feature, and then getting that feature to be fit for use in an enterprise use case. No matter what file system you might want to use, I *strongly* suggest that you get a power fail rack and try putting the whole stack on said power fail rack, and try dropping power while running a stress test --- over, and over, and over again. What you might find would surprise you. > The technical topic is that IMHO no stable and practical usable Linux > filesystem which is included in the default kernel exists. > - ZFS works but is not included in the default kernel > - BTRFS has stability and repair issues (see mailing lists) and bugs with > compression (does not compress on the fly in some scenarios) > - bcachefs is experimental When I started work at Google 15 years ago to deploy ext4 into production, we did precisely this, and as well as deploying to a small percentage of Google's test fleet to do A:B comparisons before we deployed to the entire production fleet. Whether or not it is "practical" and "usable" depends on your definition, I guess, but from my perspective "stable" and "not losing users' data" is job #1. But hey, if it's worth so much to you, I suggest you cost out what it would cost to actually implement the features that you so much want, or how much it would cost to make the more complex file systems to be stable for production use. You might decide that paying the extra storage costs is way cheaper than software engineering investment costs involved. At Google, and when I was at IBM before that, we were always super disciplined about trying to figure out the ROI costs of some particular project and not just doing it because it was "cool". There's a famous story about how the engineers working on ZFS didn't ask for management's permission or input from the sales team before they started. Sounds great, and there was some cool technology there in ZFS --- but note that Sun had to put the company up for sale because they were losing money... Cheers, - Ted P.S. Note: using a compression cluster is the only real way to support transparent compression if you are using an update-in-place file system like ext4 or xfs. (And that is what was coverd by the Stac patents that I mentioned.) If you are using a log-structed file system, such as ZFS, then you can simply rewrite the compression cluster *and* update the file system metadata to point at the new compression cluster --- but then the garbage collection costs, and the file system metadata update costs for each database commit are *huge*, and the I/O throughput hit is even higher. So much so that ZFS recommends that you turn off the log-structured write and do update-in-place if you want to use a database on ZFS. But I'm pretty sure that this disables transparent compression if you are using update-in-place. TNSTAAFL. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Transparent compression with ext4 - especially with zstd 2025-01-22 0:19 ` Kiselev, Oleg @ 2025-01-22 6:10 ` Gerhard Wiesinger 0 siblings, 0 replies; 13+ messages in thread From: Gerhard Wiesinger @ 2025-01-22 6:10 UTC (permalink / raw) To: Kiselev, Oleg, Theodore Ts'o; +Cc: linux-ext4@vger.kernel.org On 22.01.2025 01:19, Kiselev, Oleg wrote: > MySQL, MariaDB and PostgreSQL do their own, schema and page-size aware compression. Why not let the databases do this? They are in a better position to do it and trade off the costs where and when it matters to them. Hello Oleg, Thnx for the input. For PostgreSQL: AFAIK compression only works for larger tables (e.g. >2kB) and it looks like it doesn't work for my usecase. But will have a deeper look into it. Ciao, Gerhard ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Transparent compression with ext4 - especially with zstd 2025-01-21 19:33 ` Theodore Ts'o 2025-01-22 0:19 ` Kiselev, Oleg @ 2025-01-22 7:29 ` Gerhard Wiesinger 2025-01-22 7:37 ` Christoph Hellwig 1 sibling, 1 reply; 13+ messages in thread From: Gerhard Wiesinger @ 2025-01-22 7:29 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4 On 21.01.2025 20:33, Theodore Ts'o wrote: > On Tue, Jan 21, 2025 at 07:47:24PM +0100, Gerhard Wiesinger wrote: >> We are talking in some scenarios about some factors of diskspace. E.g. in >> my database scenario with PostgreSQL around 85% of disk space can be saved >> (e.g. around factor 7). > Worse, using a transparent compression breaks the ACID properties of > the database. If you crash or have a power failure while rewriting > the 64k compression cluster, all or part of that 64k compression > cluster can be corrupted. And if your customers care about (their) > data integrity, the fact that you cheaped out on disk space might not > be something that would impress them terribly. > BTW: Why does it break the ACID properties? Typically the transaction log will be (and have to be) flushed/synced to disk (fsync). If that's ok everything is fine and all DB transactions can be forwared if necessary. If that fails the last transaction is not recorded. I also don't see any compression related. That can also happen without compression. Any clarification? Ciao, Gerhard ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Transparent compression with ext4 - especially with zstd 2025-01-22 7:29 ` Gerhard Wiesinger @ 2025-01-22 7:37 ` Christoph Hellwig 2025-01-22 13:19 ` Theodore Ts'o 0 siblings, 1 reply; 13+ messages in thread From: Christoph Hellwig @ 2025-01-22 7:37 UTC (permalink / raw) To: Gerhard Wiesinger; +Cc: Theodore Ts'o, linux-ext4 On Wed, Jan 22, 2025 at 08:29:09AM +0100, Gerhard Wiesinger wrote: > BTW: Why does it break the ACID properties? It doesn't if implemented properly, which of course means out of place writes. The only sane way to implement compression in XFS would be using out of place writes, which we support for reflinks and which is heavily used by the new zoned mode. For the latter retrofitting compression would be relatively easy, but it first needs to get merged, then stabilize and mature, and then we'll need to see if we have enough use cases. So don't plan for it. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Transparent compression with ext4 - especially with zstd 2025-01-22 7:37 ` Christoph Hellwig @ 2025-01-22 13:19 ` Theodore Ts'o 2025-01-22 14:11 ` Christoph Hellwig 0 siblings, 1 reply; 13+ messages in thread From: Theodore Ts'o @ 2025-01-22 13:19 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Gerhard Wiesinger, linux-ext4 On Tue, Jan 21, 2025 at 11:37:38PM -0800, Christoph Hellwig wrote: > On Wed, Jan 22, 2025 at 08:29:09AM +0100, Gerhard Wiesinger wrote: > > BTW: Why does it break the ACID properties? > > It doesn't if implemented properly, which of course means out of place > writes. > > The only sane way to implement compression in XFS would be using out > of place writes, which we support for reflinks and which is heavily > used by the new zoned mode. For the latter retrofitting compression > would be relatively easy, but it first needs to get merged, then > stabilize and mature, and then we'll need to see if we have enough > use cases. So don't plan for it. ... but out of place writes means that every single fdatasync() called by the database now requires a file system level transaction commits. So now every single fdatasync(2) results in the data blocks getting written out to a new location on disk (this is what out of place writes mean), followed by a CACHE FLUSH, followed by the metadata updates to point at the new location on the disk, first written to the file system tranaction log, followed by the fs commit block, followed by a *second* CACHE FLUSH command. So now let's look at a sample scenario where the database needs to update 3 different 4k blocks (for example, where you are are crediting $100 to an income account, followed by a $100 debit to an expense account, followed by the database commit. Without transparent compression commits (assuming the database is properly using fdatasync so it's not asking the file system to update the ctime/mtime of the database file): 1) random write A (4k write) 2) random write B (4k write) 3) random write C (4k write) 4) CACHE FLUSH With transparent compression: 1) random write A 2) random write B 3) random write C 4) CACHE FLUSH 5) update the location of compression cluster A written to the fs journal 6) update the location of compression cluster B written to the fs journal 7) update the location of compression cluster C written to the fs journal 8) write the commit block to the fs journal 9) CACHE FLUSH This kills performance, and as I mentioned, in general, IOPS are expensive and write bandwidth is often far more expensive than bytes storage. This is true both for the raw storage by the cloud provider, the extra network bandwidth bewteen the host and cluster file system storing the emulated cloud block device, and amount of money charged to the cloud customer because it does cost more money to the cloud provider. If you try to do transparent compression using update-in-place (for example, via the technique in the Stac patent) then you don't need to update the location on disk, but given that you are replacing a 64k compression cluster every time you update a 4k block, if you crash in the middle of the 64k compression cluster update, that cluster could get corrupted --- at which point you break the database's ACID properties. Finally, note that both Amazon and Google have first party cloud products (RDS and CloudSQL, respectively) that provide to the customer the full MySQL and Postgres feature set. So if you want to enable database level compression, I believe you *can* do it. Compression is not free, and not magic, but if it works for you, you *can* enable it if you are using MySQL or Postgres. Now, if you are using a database that doesn't support database-level compression, then why don't you try demanding your vendor that is providing the database to add compression as a feature? Of course, they might ask you as the customer to pay $$$, but the development cost to add new features, whether in the database or the file system, is also not free. Cheers, - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Transparent compression with ext4 - especially with zstd 2025-01-22 13:19 ` Theodore Ts'o @ 2025-01-22 14:11 ` Christoph Hellwig 0 siblings, 0 replies; 13+ messages in thread From: Christoph Hellwig @ 2025-01-22 14:11 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Christoph Hellwig, Gerhard Wiesinger, linux-ext4 On Wed, Jan 22, 2025 at 08:19:12AM -0500, Theodore Ts'o wrote: > ... but out of place writes means that every single fdatasync() called > by the database now requires a file system level transaction commits. Yes. > So now every single fdatasync(2) results in the data blocks getting > written out to a new location on disk (this is what out of place > writes mean), followed by a CACHE FLUSH, followed by the metadata > updates to point at the new location on the disk, first written to the > file system tranaction log, followed by the fs commit block, followed > by a *second* CACHE FLUSH command. Or you put the compressed data in the log and have a single FUA write. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Transparent compression with ext4 - especially with zstd 2025-01-21 18:47 ` Gerhard Wiesinger 2025-01-21 19:33 ` Theodore Ts'o @ 2025-01-21 21:26 ` Dave Chinner 2025-01-22 6:47 ` Gerhard Wiesinger 1 sibling, 1 reply; 13+ messages in thread From: Dave Chinner @ 2025-01-21 21:26 UTC (permalink / raw) To: Gerhard Wiesinger; +Cc: Theodore Ts'o, linux-ext4 On Tue, Jan 21, 2025 at 07:47:24PM +0100, Gerhard Wiesinger wrote: > On 21.01.2025 05:01, Theodore Ts'o wrote: > > On Sun, Jan 19, 2025 at 03:37:27PM +0100, Gerhard Wiesinger wrote: > > > Are there any plans to include transparent compression with ext4 (especially > > > with zstd)? > > I'm not aware of anyone in the ext4 deveopment commuity working on > > something like this. Fully transparent compression is challenging, > > since supporting random writes into a compressed file is tricky. > > There are solutions (for example, the Stac patent which resulted in > > Microsoft to pay $120 million dollars), but even ignoring the > > intellectual property issues, they tend to compromise the efficiency > > of the compression. > > > > More to the point, given how cheap byte storage tends to be (dollars > > per IOPS tend to be far more of a constraint than dollars per GB), > > it's unclear what the business case would be for any company to fund > > development work in this area, when the cost of a slightly large HDD > > or SSD is going to be far cheaper than the necessary software > > engineering investrment needed, even for a hyperscaler cloud company > > (and even there, it's unclear that transparent compression is really > > needed). > > > > What is the business and/or technical problem which you are trying to > > solve? > > > Regarding necessity: > We are talking in some scenarios about some factors of diskspace. E.g. in my > database scenario with PostgreSQL around 85% of disk space can be saved > (e.g. around factor 7). So use a database that has built-in data compression capabilities. e.g. Mysql has transparent table compression functionality. This requires sparse files and FALLOC_FL_PUNCH_HOLE support in the filesystem, but there is no need for any special filesystem side support for data compression to get space gains of up to 75% on compressible data sets with the default database (16kB record size) and filesystem configs (4kB block size). The argument that "application level compression is hard, so we want the filesystem to do it for us" ignores the fact that it is -much harder- to do efficient compression in the filesystem than at the application level. The OS and filesystem doesn't have the freedom to control application level data access patterns nor tailor the compression algorithms to match how the application manages data, so everything the filesystem implements is a compromise. It will never be optimal for any given workload, because we have to make sure that it is not complete garbage for any given workload... > In cloud usage scenarios you can easily reduce that amount of allocated > diskspace by around a factor 7 and reduce cost therefore. Same argument: cloud applications should be managing their data sets appropriately and efficiently, not relying on the cloud storage infrastructure to magically do stuff to "reduce costs" for them. Remeber: there's a massive conflict of interest on the vendor side here - the less efficient the application (be it CPU, RAM or storage capacity), the more money the cloud vendor makes from users running that application. Hence they have little motivation to provide infrastructure or application functionality that costs them money to implement and has the impact of reducing their overall revenue stream... > You might also get a performance boost by using caching mechanism more > efficient (e.g. using less RAM). Not true. Linux caches uncompressed data in the page cache - caching compressed data will significantly increase the memory footprint and CPU consumption as it has to be constantly uncompressed and recompressed as the data changes. This is not a viable caching strategy for a general purpose OS. > Also with precompressed files (e.g. photo, videos) you can safe around 5-10% Video and photos do not compress sufficiently to be a viable runtime compression target for filesystem based compression. It's a massive waste of resources to attempt compression of internally compressed data formats for anything but cold data storage. And even then, if it's cold storage then the data should be compressed and checksummed by the cold storage application before it is written to the filesystem. > The technical topic is that IMHO no stable and practical usable Linux > filesystem which is included in the default kernel exists. > - ZFS works but is not included in the default kernel > - BTRFS has stability and repair issues (see mailing lists) and bugs with > compression (does not compress on the fly in some scenarios) I hear this sort of generic "btrfs is not stable/has bugs" complaint as a reason for not using btrfs all the time. I hear just as many, if not more, generic "XFS is unstable and loses data" claims as a reason for not using XFS, too. Anecdotal claims are not proof of fact, and I don't see any real evidence that btrfs is unstable. e.g. Fedora has been using btrfs as the root filesystem (and has for quite a while now) and there has been no noticable increase in bug reports (either for fs functionality or data loss) compared to when ext4 or XFS was used as the default filesystem type... IOWs, I redirect generic "btrfs is unstable" complaints to /dev/null these days, just like I do with generic "XFS is unstable" complaints. -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Transparent compression with ext4 - especially with zstd 2025-01-21 21:26 ` Dave Chinner @ 2025-01-22 6:47 ` Gerhard Wiesinger 0 siblings, 0 replies; 13+ messages in thread From: Gerhard Wiesinger @ 2025-01-22 6:47 UTC (permalink / raw) To: Dave Chinner; +Cc: Theodore Ts'o, linux-ext4 On 21.01.2025 22:26, Dave Chinner wrote: > On Tue, Jan 21, 2025 at 07:47:24PM +0100, Gerhard Wiesinger wrote: >> On 21.01.2025 05:01, Theodore Ts'o wrote: >>> On Sun, Jan 19, 2025 at 03:37:27PM +0100, Gerhard Wiesinger wrote: >>>> Are there any plans to include transparent compression with ext4 (especially >>>> with zstd)? >>> I'm not aware of anyone in the ext4 deveopment commuity working on >>> something like this. Fully transparent compression is challenging, >>> since supporting random writes into a compressed file is tricky. >>> There are solutions (for example, the Stac patent which resulted in >>> Microsoft to pay $120 million dollars), but even ignoring the >>> intellectual property issues, they tend to compromise the efficiency >>> of the compression. >>> >>> More to the point, given how cheap byte storage tends to be (dollars >>> per IOPS tend to be far more of a constraint than dollars per GB), >>> it's unclear what the business case would be for any company to fund >>> development work in this area, when the cost of a slightly large HDD >>> or SSD is going to be far cheaper than the necessary software >>> engineering investrment needed, even for a hyperscaler cloud company >>> (and even there, it's unclear that transparent compression is really >>> needed). >>> >>> What is the business and/or technical problem which you are trying to >>> solve? >>> >> Regarding necessity: >> We are talking in some scenarios about some factors of diskspace. E.g. in my >> database scenario with PostgreSQL around 85% of disk space can be saved >> (e.g. around factor 7). > So use a database that has built-in data compression capabilities. > > e.g. Mysql has transparent table compression functionality. > This requires sparse files and FALLOC_FL_PUNCH_HOLE support in the > filesystem, but there is no need for any special filesystem side > support for data compression to get space gains of up to 75% on > compressible data sets with the default database (16kB record size) > and filesystem configs (4kB block size). > > The argument that "application level compression is hard, so we want > the filesystem to do it for us" ignores the fact that it is -much > harder- to do efficient compression in the filesystem than at the > application level. > > The OS and filesystem doesn't have the freedom to control > application level data access patterns nor tailor the compression > algorithms to match how the application manages data, so everything > the filesystem implements is a compromise. It will never be optimal > for any given workload, because we have to make sure that it is > not complete garbage for any given workload... MySQL/MariaDB isnt't an option for me. But will look into this. > >> In cloud usage scenarios you can easily reduce that amount of allocated >> diskspace by around a factor 7 and reduce cost therefore. > Same argument: cloud applications should be managing their data > sets appropriately and efficiently, not relying on the cloud storage > infrastructure to magically do stuff to "reduce costs" for them. > > Remeber: there's a massive conflict of interest on the vendor side > here - the less efficient the application (be it CPU, RAM or storage > capacity), the more money the cloud vendor makes from users running > that application. Hence they have little motivation to provide > infrastructure or application functionality that costs them money to > implement and has the impact of reducing their overall revenue > stream... Right, therefore we want to make the storage usage as small as possible either on appication level or filesystem level. >> You might also get a performance boost by using caching mechanism more >> efficient (e.g. using less RAM). > Not true. Linux caches uncompressed data in the page cache - caching > compressed data will significantly increase the memory footprint and > CPU consumption as it has to be constantly uncompressed and > recompressed as the data changes. This is not a viable caching > strategy for a general purpose OS. AFAIK ZFS caches compressed data in the ARC cache. zstd really has a very low overhead on decompression with a very good compression ratio (even better than gz and bz2). >> Also with precompressed files (e.g. photo, videos) you can safe around 5-10% > Video and photos do not compress sufficiently to be a viable runtime > compression target for filesystem based compression. It's a massive > waste of resources to attempt compression of internally compressed > data formats for anything but cold data storage. And even then, if > it's cold storage then the data should be compressed and checksummed > by the cold storage application before it is written to the > filesystem. ZFS uses with zstd the lz4 "early abort" feature which detects with very low CPU ressources that not compression is necessary and aborts the compression and stores it uncompressed. If lz4 doesn't abort early, zstd compression is used. So there are solutions for low ressource usage. Reagarding rations: In my case 3%: zfs list -o name,compressratio,compression big/shares/fotovideo NAME RATIO COMPRESS big/shares/fotovideo 1.03x zstd-3 > >> The technical topic is that IMHO no stable and practical usable Linux >> filesystem which is included in the default kernel exists. >> - ZFS works but is not included in the default kernel >> - BTRFS has stability and repair issues (see mailing lists) and bugs with >> compression (does not compress on the fly in some scenarios) > I hear this sort of generic "btrfs is not stable/has bugs" complaint > as a reason for not using btrfs all the time. That's my practical experience. I tried BTRFS several times and failed on testing and production. Had a storage topic where some blocks (several thousand 4k blocks were damaged). On top several VMs were running. All other filesystems (XFS, ext4, ZFS, UFS2, ) except BTRFS and bcachefs (which is experimental) were repairable to a consistent state (of course with some blocks lost). You can repair BTRFS "forever" without getting it into a consistent state. A friend of mine had also the experience that it was not mountable and crashed immediately after a reboot ... Find the details here on the mailing list: https://marc.info/?l=linux-btrfs&m=172519149923874&w=2 > > I hear just as many, if not more, generic "XFS is unstable and loses > data" claims as a reason for not using XFS, too. I'm not having that experience. But I try to use ext4 primarily as it is best for "repair" scenarios. > > Anecdotal claims are not proof of fact, and I don't see any real > evidence that btrfs is unstable. e.g. Fedora has been using btrfs > as the root filesystem (and has for quite a while now) and there has > been no noticable increase in bug reports (either for fs > functionality or data loss) compared to when ext4 or XFS was used as > the default filesystem type... That are not anecdotal claims that's my practical experience that BTRFS is not stable and repairable to a consisent state. Reproduceable, you can try for yourself. I'm using Fedora since Fedora FC1 for all production systems. > > IOWs, I redirect generic "btrfs is unstable" complaints to /dev/null > these days, just like I do with generic "XFS is unstable" > complaints. > Try it and you will see it that it is non repairable. You can find details and testcase (simulation of what I had on overwriting random blocks) in the link. As with Fedora I'm using latest and "fresh" stable kernel versions as well as filesystem utilities. I'm still having that "unrepairable" original BTRFS filesystem and will try to repair it to a consistent state from time to time. Until now not successful. Find the details here on the mailing list: https://marc.info/?l=linux-btrfs&m=172519149923874&w=2 So you should't redirect the complaints to /dev/null to get BTRFS better :-) Thnx. Ciao, Gerhard ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-01-22 14:11 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-19 14:37 Transparent compression with ext4 - especially with zstd Gerhard Wiesinger 2025-01-21 4:01 ` Theodore Ts'o 2025-01-21 9:42 ` Artem Blagodarenko 2025-01-21 18:47 ` Gerhard Wiesinger 2025-01-21 19:33 ` Theodore Ts'o 2025-01-22 0:19 ` Kiselev, Oleg 2025-01-22 6:10 ` Gerhard Wiesinger 2025-01-22 7:29 ` Gerhard Wiesinger 2025-01-22 7:37 ` Christoph Hellwig 2025-01-22 13:19 ` Theodore Ts'o 2025-01-22 14:11 ` Christoph Hellwig 2025-01-21 21:26 ` Dave Chinner 2025-01-22 6:47 ` Gerhard Wiesinger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox