rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ariel Miculas <amiculas@cisco.com>
To: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Benno Lossin <benno.lossin@proton.me>,
	rust-for-linux@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>, Gary Guo <gary@garyguo.net>,
	linux-fsdevel@vger.kernel.org, linux-erofs@lists.ozlabs.org
Subject: Re: [RFC PATCH 03/24] erofs: add Errno in Rust
Date: Thu, 26 Sep 2024 12:51:40 +0300	[thread overview]
Message-ID: <20240926095140.fej2mys5dee4aar2@amiculas-l-PF3FCGJH> (raw)
In-Reply-To: <54bf7cc6-a62a-44e9-9ff0-ca2e334d364f@linux.alibaba.com>

On 24/09/26 04:25, Gao Xiang wrote:
> 
> 
> On 2024/9/26 16:10, Ariel Miculas wrote:
> > On 24/09/26 09:04, Gao Xiang wrote:
> > > 
> 
> 
> ...
> 
> > 
> > And here [4] you can see the space savings achieved by PuzzleFS. In
> > short, if you take 10 versions of Ubuntu Jammy from dockerhub, they take
> > up 282 MB. Convert them to PuzzleFS and they only take up 130 MB (this
> > is before applying any compression, the space savings are only due to
> > the chunking algorithm). If we enable compression (PuzzleFS uses Zstd
> > seekable compression), which is a fairer comparison (considering that
> > the OCI image uses gzip compression), then we get down to 53 MB for
> > storing all 10 Ubuntu Jammy versions using PuzzleFS.
> > 
> > Here's a summary:
> > # Steps
> > 
> > * I’ve downloaded 10 versions of Jammy from hub.docker.com
> > * These images only have one layer which is in tar.gz format
> > * I’ve built 10 equivalent puzzlefs images
> > * Compute the tarball_total_size by summing the sizes of every Jammy
> >    tarball (uncompressed) => 766 MB (use this as baseline)
> > * Sum the sizes of every oci/puzzlefs image => total_size
> > * Compute the total size as if all the versions were stored in a single
> >    oci/puzzlefs repository => total_unified_size
> > * Saved space = tarball_total_size - total_unified_size
> > 
> > # Results
> > (See [5] if you prefer the video format)
> > 
> > | Type | Total size (MB) | Average layer size (MB) | Unified size (MB) | Saved (MB) / 766 MB |
> > | --- | --- | --- | --- | --- |
> > | Oci (uncompressed) | 766 | 77 | 766 | 0 (0%) |
> > | PuzzleFS uncompressed | 748 | 74 | 130 | 635 (83%) |
> > | Oci (compressed) | 282 | 28 | 282 | 484 (63%) |
> > | PuzzleFS (compressed) | 298 | 30 | 53 | 713 (93%) |
> > 
> > Here's the script I used to download the Ubuntu Jammy versions and
> > generate the PuzzleFS images [6] to get an idea about how I got to these
> > results.
> > 
> > Can we achieve these results with the current erofs features?  I'm
> > referring specifically to this comment: "EROFS already supports
> > variable-sized chunks + CDC" [7].
> 
> Please see
> https://erofs.docs.kernel.org/en/latest/comparsion/dedupe.html

Great, I see you've used the same example as I did. Though I must admit
I'm a little surprised there's no mention of PuzzleFS in your document.

> 
> 	                Total Size (MiB)	Average layer size (MiB)	Saved / 766.1MiB
> Compressed OCI (tar.gz)	282.5	28.3	63%
> Uncompressed OCI (tar)	766.1	76.6	0%
> Uncomprssed EROFS	109.5	11.0	86%
> EROFS (DEFLATE,9,32k)	46.4	4.6	94%
> EROFS (LZ4HC,12,64k)	54.2	5.4	93%
> 
> I don't know which compression algorithm are you using (maybe Zstd?),
> but from the result is
>   EROFS (LZ4HC,12,64k)  54.2
>   PuzzleFS compressed   53?
>   EROFS (DEFLATE,9,32k) 46.4
> 
> I could reran with EROFS + Zstd, but it should be smaller. This feature
> has been supported since Linux 6.1, thanks.

The average layer size is very impressive for EROFS, great work.
However, if we multiply the average layer size by 10, we get the total
size (5.4 MiB * 10 ~ 54.2 MiB), whereas for PuzzleFS, we see that while
the average layer size is 30 MIB (for the compressed case), the unified
size is only 53 MiB. So this tells me there's blob sharing between the
different versions of Ubuntu Jammy with PuzzleFS, but there's no sharing
with EROFS (what I'm talking about is deduplication across the multiple
versions of Ubuntu Jammy and not within one single version).

Of course, with only 10 images, the space savings don't seem that
impressive for PuzzleFS compared to EROFS, but imagine we are storing
hundreds/thousands of Ubuntu versions. Then we're also building OCI
images on top of these versions. So if the user already has all the
blobs for an Ubuntu version, then we only need to ship the chunks that
have changed / have been added as a result of the specific application
that we've built on top of an existing Ubuntu version.

One more thing: the "Unified size" column is the key for understanding
the space savings offered by PuzzleFS and I see that you've left this
column out of your table.

Regards,
Ariel

> 
> Thanks,
> Gao Xiang

  reply	other threads:[~2024-09-26  9:52 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 01/24] erofs: lift up erofs_fill_inode to global Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 02/24] erofs: add superblock data structure in Rust Yiyang Wu
2024-09-16 17:55   ` Greg KH
2024-09-17  0:18     ` Gao Xiang
2024-09-17  5:34       ` Greg KH
2024-09-17  5:45         ` Gao Xiang
2024-09-17  5:27     ` Yiyang Wu
2024-09-17  5:39     ` Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 03/24] erofs: add Errno " Yiyang Wu
2024-09-16 17:51   ` Greg KH
2024-09-16 23:45     ` Gao Xiang
2024-09-20  2:49     ` [PATCH RESEND 0/1] rust: introduce declare_err! autogeneration Yiyang Wu
2024-09-20  2:49       ` [PATCH RESEND 1/1] rust: error: auto-generate error declarations Yiyang Wu
2024-09-20  2:57     ` [RFC PATCH 03/24] erofs: add Errno in Rust Yiyang Wu
2024-09-16 20:01   ` Gary Guo
2024-09-16 23:58     ` Gao Xiang
2024-09-19 13:45       ` Benno Lossin
2024-09-19 15:13         ` Gao Xiang
2024-09-19 19:36           ` Benno Lossin
2024-09-20  0:49             ` Gao Xiang
2024-09-21  8:37               ` Greg Kroah-Hartman
2024-09-21  9:29                 ` Gao Xiang
2024-09-25 15:48             ` Ariel Miculas
2024-09-25 16:35               ` Gao Xiang
2024-09-25 21:45                 ` Ariel Miculas
2024-09-26  0:40                   ` Gao Xiang
2024-09-26  1:04                     ` Gao Xiang
2024-09-26  8:10                       ` Ariel Miculas
2024-09-26  8:25                         ` Gao Xiang
2024-09-26  9:51                           ` Ariel Miculas [this message]
2024-09-26 10:46                             ` Gao Xiang
2024-09-26 11:01                               ` Ariel Miculas
2024-09-26 11:05                                 ` Gao Xiang
2024-09-26 11:23                                 ` Gao Xiang
2024-09-26 12:50                                   ` Ariel Miculas
2024-09-27  2:18                                     ` Gao Xiang
2024-09-26  8:48                         ` Gao Xiang
2024-09-16 13:56 ` [RFC PATCH 04/24] erofs: add xattrs data structure " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 05/24] erofs: add inode " Yiyang Wu
2024-09-18 13:04   ` [External Mail][RFC " Huang Jianan
2024-09-16 13:56 ` [RFC PATCH 06/24] erofs: add alloc_helper " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 07/24] erofs: add data abstraction " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 08/24] erofs: add device data structure " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 09/24] erofs: add continuous iterators " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 10/24] erofs: add device_infos implementation " Yiyang Wu
2024-09-21  9:44   ` Jianan Huang
2024-09-16 13:56 ` [RFC PATCH 11/24] erofs: add map data structure " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 12/24] erofs: add directory entry " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 13/24] erofs: add runtime filesystem and inode " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 14/24] erofs: add block mapping capability " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 15/24] erofs: add iter methods in filesystem " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 16/24] erofs: implement dir and inode operations " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 17/24] erofs: introduce Rust SBI to C Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 18/24] erofs: introduce iget alternative " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 19/24] erofs: introduce namei " Yiyang Wu
2024-09-16 17:08   ` Al Viro
2024-09-17  6:48     ` Yiyang Wu
2024-09-17  7:14       ` Gao Xiang
2024-09-17  7:31         ` Al Viro
2024-09-17  7:44           ` Al Viro
2024-09-17  8:08             ` Gao Xiang
2024-09-17 22:22             ` Al Viro
2024-09-17  8:06           ` Gao Xiang
2024-09-16 13:56 ` [RFC PATCH 20/24] erofs: introduce readdir " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 21/24] erofs: introduce erofs_map_blocks " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 22/24] erofs: add skippable iters in Rust Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 23/24] erofs: implement xattrs operations " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 24/24] erofs: introduce xattrs replacement to C Yiyang Wu
  -- strict thread matches above, loose matches on Subject: below --
2024-09-16 13:55 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
2024-09-16 13:55 ` [RFC PATCH 03/24] erofs: add Errno in Rust Yiyang Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240926095140.fej2mys5dee4aar2@amiculas-l-PF3FCGJH \
    --to=amiculas@cisco.com \
    --cc=benno.lossin@proton.me \
    --cc=gary@garyguo.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=hsiangkao@linux.alibaba.com \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).