All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: "Viacheslav A.Dubeyko" <viacheslav.dubeyko@bytedance.com>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	Luka Perkov <luka.perkov@sartura.hr>,
	bruno.banelli@sartura.hr
Subject: Re: [External] [RFC PATCH 00/76] SSDFS: flash-friendly LFS file system for ZNS SSD
Date: Tue, 28 Feb 2023 08:59:01 -0500	[thread overview]
Message-ID: <Y/4Ipfn7YkPoTjo2@fedora> (raw)
In-Reply-To: <0237BC64-C920-4A63-B676-B2E972A5AF49@bytedance.com>

[-- Attachment #1: Type: text/plain, Size: 7060 bytes --]

On Mon, Feb 27, 2023 at 02:59:08PM -0800, Viacheslav A.Dubeyko wrote:
> > On Feb 27, 2023, at 5:53 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > These comparisions include file systems that don't support zoned devices
> > natively, maybe that's why IOPS comparisons cannot be made?
> > 
> 
> Performance comparison can be made for conventional SSD devices.
> Of course, ZNS SSD has some peculiarities (limited number of open/active
> zones, zone size, write pointer, strict append-only mode) and it requires
> fair comparison. Because, these peculiarities/restrictions can as help as
> make life more difficult. However, even if we can compare file systems for
> the same type of storage device, then various configuration options
> (logical block size, erase block size, segment size, and so on) or particular
> workload can significantly change a file system behavior. It’s always not so
> easy statement that this file system faster than another one.

I incorrectly assumed ssdfs was only for zoned devices.

> 
> >> (3) decrease the write amplification factor compared with:
> >>    1.3x - 116x (ext4),
> >>    14x - 42x (xfs),
> >>    6x - 9x (btrfs),
> >>    1.5x - 50x (f2fs),
> >>    1.2x - 20x (nilfs2);
> >> (4) prolong SSD lifetime compared with:
> > 
> > Is this measuring how many times blocks are erased? I guess this
> > measurement includes the background I/O from ssdfs migration and moving?
> > 
> 
> So, first of all, I need to explain the testing methodology. Testing included:
> (1) create file (empty, 64 bytes, 16K, 100K), (2) update file, (3) delete file.
> Every particular test-case is executed as multiple mount/unmount operations
> sequence. For example, total number of file creation operations were 1000 and
> 10000, but one mount cycle included 10, 100, or 1000 file creation, file update,
> or file delete operations. Finally, file system must flush all dirty metadata and
> user data during unmount operation.
> 
> The blktrace tool registers LBAs and size for every I/O request. These data are
> the basis for estimation how many erase blocks have been involved into
> operations. SSDFS volumes have been created by using 128KB, 512KB, and
> 8MB erase block sizes. So, I used these erase block sizes for estimation.
> Generally speaking, we can estimate the total number of erase blocks that
> were involved into file system operations for particular use-case by means of
> calculation of number of bytes of all I/O requests and division on erase block size.
> If file system uses in-place updates, then it is possible to estimate how many times
> the same erase block (we know LBA numbers) has been completely re-written.
> For example, if erase block (starting from LBA #32) received 1310720 bytes of
> write I/O requests, then erase block of 128KB in size has been re-written 10x times.
> So, it means that FTL needs to store all these data into 10 X 128KB erase blocks
> in the background or execute around 9 erase operation to keep the actual state
> of data into one 128KB erase block. So, this is the estimation of FTL GC responsibility.
> 
> However, if we would like to estimate the total number of erase operation, then
> we need to take into account:
> 
> E total = E(FTL GC) + E(TRIM) + E(FS GC) + E(read disturbance) + E(retention)
> 
> The estimation of erase operation on the basis of retention issue is tricky and
> it shows negligibly small number for such short testing. So, we can ignore it.
> However, retention issue is important factor of decreasing SSD lifetime.
> I executed the estimation of this factor and I made comparison for various
> file systems. But this factor is deeply depends on time, workload, and
> payload size. So, it’s really hard to share any stable and reasonable numbers
> for this factor. Especially, it heavily depends on FTL implementation.
> 
> It is possible to make estimation of read disturbance but, again, it heavily
> depends on NAND flash type, organization, and FTL algorithms. Also, this
> estimation shows really small numbers that can be ignored for short testing.
> I’ve made this estimation and I can see that, currently, SSDFS has read-intensive
> nature because of offset translation table distribution policy. I am testing the fix
> and I have hope to remove this issue.
> 
> SSDFS has efficient TRIM/erase policy. So, I can see TRIM/erase operations
> even for such “short" test-cases. As far as I can see, no other file system issues
> discard operations for the same test-cases. I included TRIM/erase operations
> into the calculation of total number of erase operations.
> 
> Estimation of GC operations on FS side (F2FS, NILFS2) is the most speculative one.
> I’ve made estimation of number of erase operations that FS GC can generate.
> However, as far as I can see, even without taking into account the FS GC erase
> operations, SSDFS looks better compared with F2FS and NILFS2.
> I need to add here that SSDFS uses migration scheme and doesn’t need
> in classical GC. But even for such “short” test-cases migration scheme shows
> really efficient TRIM/erase policy. 
> 
> So, write amplification factor was estimated on the basis of write I/O requests
> comparison. And SSD lifetime prolongation has been estimated and compared
> by using the model that I explained above. I hope I explained it's clear enough.
> Feel free to ask additional questions if I missed something.
> 
> The measurement includes all operations (foreground and background) that
> file system initiates because of using mount/unmount model. However, migration
> scheme requires additional explanation. Generally speaking, migration scheme
> doesn’t generate additional I/O requests. Oppositely, migration scheme decreases
> number of I/O requests. It could be tricky to follow. SSDFS uses compression,
> delta-encoding, compaction scheme, and migration stimulation. It means that
> reqular file system’s update operations are the main vehicle of migration scheme.
> Let imagine that application updates 4KB logical block. It means that SSDFS
> tries to compress (or delta-encode) this piece of data. Let compression gives us
> 1KB compressed piece of data (4KB uncompressed size). It means that we can
> place 1KB into 4KB memory page and we have 3KB free space. So, migration
> logic checks that exhausted (completely full) old erase block that received update
> operation has another valid block(s). If we have such valid logical blocks, then
> we can compress this logical blocks and store it into free space of 4K memory page.
> So, we can finally store 4 compressed logical blocks (1KB in size each), for example,
> into 4KB memory page. It means that SSDFS issues one I/O request for 4 logical
> blocks instead of 4 ones. I simplify the explanation, but idea remains the same.
> I hope I clarified the point. Feel free to ask additional questions if I missed something.

Thanks for these explanations, that clarifies things!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

      reply	other threads:[~2023-02-28 13:59 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-25  1:08 [RFC PATCH 00/76] SSDFS: flash-friendly LFS file system for ZNS SSD Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 01/76] ssdfs: introduce SSDFS on-disk layout Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 02/76] ssdfs: key file system declarations Viacheslav Dubeyko
2023-02-25 10:22   ` kernel test robot
2023-02-25  1:08 ` [RFC PATCH 03/76] ssdfs: implement raw device operations Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 04/76] ssdfs: implement super operations Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 05/76] ssdfs: implement commit superblock operation Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 06/76] ssdfs: segment header + log footer operations Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 07/76] ssdfs: basic mount logic implementation Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 08/76] ssdfs: search last actual superblock Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 09/76] ssdfs: internal array/sequence primitives Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 10/76] ssdfs: introduce PEB's block bitmap Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 11/76] ssdfs: block bitmap search operations implementation Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 12/76] ssdfs: block bitmap modification " Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 13/76] ssdfs: introduce PEB block bitmap Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 14/76] ssdfs: PEB block bitmap modification operations Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 15/76] ssdfs: introduce segment block bitmap Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 16/76] ssdfs: introduce segment request queue Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 17/76] ssdfs: introduce offset translation table Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 18/76] ssdfs: flush " Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 19/76] ssdfs: offset translation table API implementation Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 20/76] ssdfs: introduce PEB object Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 21/76] ssdfs: introduce PEB container Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 22/76] ssdfs: create/destroy " Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 23/76] ssdfs: PEB container API implementation Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 24/76] ssdfs: PEB read thread's init logic Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 25/76] ssdfs: block bitmap initialization logic Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 26/76] ssdfs: offset translation table " Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 27/76] ssdfs: read/readahead logic of PEB's thread Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 28/76] ssdfs: PEB flush thread's finite state machine Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 29/76] ssdfs: commit log logic Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 30/76] ssdfs: commit log payload Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 31/76] ssdfs: process update request Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 32/76] ssdfs: process create request Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 33/76] ssdfs: create log logic Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 34/76] ssdfs: auxilairy GC threads logic Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 35/76] ssdfs: introduce segment object Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 36/76] ssdfs: segment object's add data/metadata operations Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 37/76] ssdfs: segment object's update/invalidate data/metadata Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 38/76] ssdfs: introduce PEB mapping table Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 39/76] ssdfs: flush " Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 40/76] ssdfs: convert/map LEB to PEB functionality Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 41/76] ssdfs: support migration scheme by PEB state Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 42/76] ssdfs: PEB mapping table thread logic Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 43/76] ssdfs: introduce PEB mapping table cache Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 44/76] ssdfs: PEB mapping table cache's modification operations Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 45/76] ssdfs: introduce segment bitmap Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 46/76] ssdfs: segment bitmap API implementation Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 47/76] ssdfs: introduce b-tree object Viacheslav Dubeyko
2023-02-25  1:08 ` [RFC PATCH 48/76] ssdfs: add/delete b-tree node Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 49/76] ssdfs: b-tree API implementation Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 50/76] ssdfs: introduce b-tree node object Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 51/76] ssdfs: flush " Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 52/76] ssdfs: b-tree node index operations Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 53/76] ssdfs: search/allocate/insert b-tree node operations Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 54/76] ssdfs: change/delete " Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 55/76] ssdfs: range operations of b-tree node Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 56/76] ssdfs: introduce b-tree hierarchy object Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 57/76] ssdfs: check b-tree hierarchy for add operation Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 58/76] ssdfs: check b-tree hierarchy for update/delete operation Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 59/76] ssdfs: execute b-tree hierarchy modification Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 60/76] ssdfs: introduce inodes b-tree Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 61/76] ssdfs: inodes b-tree node operations Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 62/76] ssdfs: introduce dentries b-tree Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 63/76] ssdfs: dentries b-tree specialized operations Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 64/76] ssdfs: dentries b-tree node's " Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 65/76] ssdfs: introduce extents queue object Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 66/76] ssdfs: introduce extents b-tree Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 67/76] ssdfs: extents b-tree specialized operations Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 68/76] ssdfs: search extent logic in extents b-tree node Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 69/76] ssdfs: add/change/delete extent " Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 70/76] ssdfs: introduce invalidated extents b-tree Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 71/76] ssdfs: find item in " Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 72/76] ssdfs: modification operations of " Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 73/76] ssdfs: implement inode operations support Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 74/76] ssdfs: implement directory " Viacheslav Dubeyko
2023-02-25  1:09 ` [RFC PATCH 75/76] ssdfs: implement file " Viacheslav Dubeyko
2023-02-25  3:01   ` Matthew Wilcox
2023-02-26 23:42     ` [External] " Viacheslav A.Dubeyko
2023-02-25  1:09 ` [RFC PATCH 76/76] introduce SSDFS file system Viacheslav Dubeyko
2023-02-25  7:47   ` kernel test robot
2023-02-27 13:53 ` [RFC PATCH 00/76] SSDFS: flash-friendly LFS file system for ZNS SSD Stefan Hajnoczi
2023-02-27 22:59   ` [External] " Viacheslav A.Dubeyko
2023-02-28 13:59     ` Stefan Hajnoczi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y/4Ipfn7YkPoTjo2@fedora \
    --to=stefanha@redhat.com \
    --cc=bruno.banelli@sartura.hr \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=luka.perkov@sartura.hr \
    --cc=slava@dubeyko.com \
    --cc=viacheslav.dubeyko@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.