public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Gao Xiang <hsiangkao@linux.alibaba.com>
To: Ariel Miculas <amiculas@cisco.com>
Cc: Benno Lossin <benno.lossin@proton.me>,
	rust-for-linux@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>, Gary Guo <gary@garyguo.net>,
	linux-fsdevel@vger.kernel.org, linux-erofs@lists.ozlabs.org
Subject: Re: [RFC PATCH 03/24] erofs: add Errno in Rust
Date: Thu, 26 Sep 2024 19:23:26 +0800	[thread overview]
Message-ID: <ec17a30e-c63a-4615-8784-69aef2bb2bae@linux.alibaba.com> (raw)
In-Reply-To: <20240926110151.52cuuidfpjtgwnjd@amiculas-l-PF3FCGJH>



On 2024/9/26 19:01, Ariel Miculas via Linux-erofs wrote:
> On 24/09/26 06:46, Gao Xiang wrote:

...

>>
>>>
>>>>
>>>> 	                Total Size (MiB)	Average layer size (MiB)	Saved / 766.1MiB
>>>> Compressed OCI (tar.gz)	282.5	28.3	63%
>>>> Uncompressed OCI (tar)	766.1	76.6	0%
>>>> Uncomprssed EROFS	109.5	11.0	86%
>>>> EROFS (DEFLATE,9,32k)	46.4	4.6	94%
>>>> EROFS (LZ4HC,12,64k)	54.2	5.4	93%
>>>>
>>>> I don't know which compression algorithm are you using (maybe Zstd?),
>>>> but from the result is
>>>>     EROFS (LZ4HC,12,64k)  54.2
>>>>     PuzzleFS compressed   53?
>>>>     EROFS (DEFLATE,9,32k) 46.4
>>>>
>>>> I could reran with EROFS + Zstd, but it should be smaller. This feature
>>>> has been supported since Linux 6.1, thanks.
>>>
>>> The average layer size is very impressive for EROFS, great work.
>>> However, if we multiply the average layer size by 10, we get the total
>>> size (5.4 MiB * 10 ~ 54.2 MiB), whereas for PuzzleFS, we see that while
>>> the average layer size is 30 MIB (for the compressed case), the unified
>>> size is only 53 MiB. So this tells me there's blob sharing between the
>>> different versions of Ubuntu Jammy with PuzzleFS, but there's no sharing
>>> with EROFS (what I'm talking about is deduplication across the multiple
>>> versions of Ubuntu Jammy and not within one single version).
>>
>> Don't make me wrong, I don't think you got the point.
>>
>> First, what you asked was `I'm referring specifically to this
>> comment: "EROFS already supports variable-sized chunks + CDC"`,
>> so I clearly answered with the result of compressed data global
>> deduplication with CDC.
>>
>> Here both EROFS and Squashfs compresses 10 Ubuntu images into
>> one image for fair comparsion to show the benefit of CDC, so
> 
> It might be a fair comparison, but that's not how container images are
> distributed. You're trying to argue that I should just use EROFS and I'm

First, OCI layer is just distributed like what I said.

For example, I could introduce some common blobs to keep
chunks as chunk dictionary.   And then the each image
will be just some index, and all data will be
deduplicated.  That is also what Nydus works.

> showing you that EROFS doesn't currently support the functionality
> provided by PuzzleFS: the deduplication across multiple images.

No, EROFS supports external devices/blobs to keep a lot of
chunks too (as dictionary to share data among images), but
clearly it has the upper limit.

But PuzzleFS just treat each individual chunk as a seperate
file, that will cause unavoidable "open arbitary number of
files on reading, even in page fault context".

> 
>> I believe they basically equal to your `Unified size`s, so
>> the result is
>>
>> 			Your unified size
>> 	EROFS (LZ4HC,12,64k)  54.2
>> 	PuzzleFS compressed   53?
>> 	EROFS (DEFLATE,9,32k) 46.4
>>
>> That is why I used your 53 unified size to show EROFS is much
>> smaller than PuzzleFS.
>>
>> The reason why EROFS and SquashFS doesn't have the `Total Size`s
>> is just because we cannot store every individual chunk into some
>> seperate file.
> 
> Well storing individual chunks into separate files is the entire point
> of PuzzleFS.
> 
>>
>> Currently, I have seen no reason to open arbitary kernel files
>> (maybe hundreds due to large folio feature at once) in the page
>> fault context.  If I modified `mkfs.erofs` tool, I could give
>> some similar numbers, but I don't want to waste time now due
>> to `open arbitary kernel files in the page fault context`.
>>
>> As I said, if PuzzleFS finally upstream some work to open kernel
>> files in page fault context, I will definitely work out the same
>> feature for EROFS soon, but currently I don't do that just
>> because it's very controversal and no in-tree kernel filesystem
>> does that.
> 
> The PuzzleFS kernel filesystem driver is still in an early POC stage, so
> there's still a lot more work to be done.

I suggest that you could just ask FS/MM folks about this ("open
kernel files when reading in the page fault") first.

If they say "no", I suggest please don't waste on this anymore.

Thanks,
Gao Xiang

  parent reply	other threads:[~2024-09-26 11:23 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-16 13:56 [RFC PATCH 00/24] erofs: introduce Rust implementation Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 01/24] erofs: lift up erofs_fill_inode to global Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 02/24] erofs: add superblock data structure in Rust Yiyang Wu
2024-09-16 17:55   ` Greg KH
2024-09-17  0:18     ` Gao Xiang
2024-09-17  5:34       ` Greg KH
2024-09-17  5:45         ` Gao Xiang
2024-09-17  5:27     ` Yiyang Wu
2024-09-17  5:39     ` Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 03/24] erofs: add Errno " Yiyang Wu
2024-09-16 17:51   ` Greg KH
2024-09-16 23:45     ` Gao Xiang
2024-09-20  2:49     ` [PATCH RESEND 0/1] rust: introduce declare_err! autogeneration Yiyang Wu
2024-09-20  2:49       ` [PATCH RESEND 1/1] rust: error: auto-generate error declarations Yiyang Wu
2024-09-20  2:57     ` [RFC PATCH 03/24] erofs: add Errno in Rust Yiyang Wu
2024-09-16 20:01   ` Gary Guo
2024-09-16 23:58     ` Gao Xiang
2024-09-19 13:45       ` Benno Lossin
2024-09-19 15:13         ` Gao Xiang
2024-09-19 19:36           ` Benno Lossin
2024-09-20  0:49             ` Gao Xiang
2024-09-21  8:37               ` Greg Kroah-Hartman
2024-09-21  9:29                 ` Gao Xiang
2024-09-25 15:48             ` Ariel Miculas
2024-09-25 16:35               ` Gao Xiang
2024-09-25 21:45                 ` Ariel Miculas
2024-09-26  0:40                   ` Gao Xiang
2024-09-26  1:04                     ` Gao Xiang
2024-09-26  8:10                       ` Ariel Miculas
2024-09-26  8:25                         ` Gao Xiang
2024-09-26  9:51                           ` Ariel Miculas
2024-09-26 10:46                             ` Gao Xiang
2024-09-26 11:01                               ` Ariel Miculas
2024-09-26 11:05                                 ` Gao Xiang
2024-09-26 11:23                                 ` Gao Xiang [this message]
2024-09-26 12:50                                   ` Ariel Miculas
2024-09-27  2:18                                     ` Gao Xiang
2024-09-26  8:48                         ` Gao Xiang
2024-09-16 13:56 ` [RFC PATCH 04/24] erofs: add xattrs data structure " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 05/24] erofs: add inode " Yiyang Wu
2024-09-18 13:04   ` [External Mail][RFC " Huang Jianan
2024-09-16 13:56 ` [RFC PATCH 06/24] erofs: add alloc_helper " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 07/24] erofs: add data abstraction " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 08/24] erofs: add device data structure " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 09/24] erofs: add continuous iterators " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 10/24] erofs: add device_infos implementation " Yiyang Wu
2024-09-21  9:44   ` Jianan Huang
2024-09-16 13:56 ` [RFC PATCH 11/24] erofs: add map data structure " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 12/24] erofs: add directory entry " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 13/24] erofs: add runtime filesystem and inode " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 14/24] erofs: add block mapping capability " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 15/24] erofs: add iter methods in filesystem " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 16/24] erofs: implement dir and inode operations " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 17/24] erofs: introduce Rust SBI to C Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 18/24] erofs: introduce iget alternative " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 19/24] erofs: introduce namei " Yiyang Wu
2024-09-16 17:08   ` Al Viro
2024-09-17  6:48     ` Yiyang Wu
2024-09-17  7:14       ` Gao Xiang
2024-09-17  7:31         ` Al Viro
2024-09-17  7:44           ` Al Viro
2024-09-17  8:08             ` Gao Xiang
2024-09-17 22:22             ` Al Viro
2024-09-17  8:06           ` Gao Xiang
2024-09-16 13:56 ` [RFC PATCH 20/24] erofs: introduce readdir " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 21/24] erofs: introduce erofs_map_blocks " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 22/24] erofs: add skippable iters in Rust Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 23/24] erofs: implement xattrs operations " Yiyang Wu
2024-09-16 13:56 ` [RFC PATCH 24/24] erofs: introduce xattrs replacement to C Yiyang Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ec17a30e-c63a-4615-8784-69aef2bb2bae@linux.alibaba.com \
    --to=hsiangkao@linux.alibaba.com \
    --cc=amiculas@cisco.com \
    --cc=benno.lossin@proton.me \
    --cc=gary@garyguo.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox