[PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system
@ 2026-03-16  2:17 Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 01/79] ssdfs: introduce SSDFS on-disk layout Viacheslav Dubeyko
                   ` (32 more replies)
  0 siblings, 33 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Hello,

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

[PROBLEM DECLARATION]

SSD is a sophisticated device capable of managing in-place
updates. However, in-place updates generate significant FTL GC
responsibilities that increase write amplification factor, require
substantial NAND flash overprovisioning, decrease SSD lifetime,
and introduce performance spikes. Log-structured File System (LFS)
approach can introduce a more flash-friendly Copy-On-Write (COW) model.
However, F2FS and NILFS2 issue in-place updates anyway, even by using
the COW policy for main volume area. Also, GC is an inevitable subsystem
of any LFS file system that introduces write amplification, retention
issue, excessive copy operations, and performance degradation for
aged volume. Generally speaking, available file system technologies
have side effects: (1) write amplification issue, (2) significant FTL GC
responsibilities, (3) inevitable FS GC overhead, (4) read disturbance,
(5) retention issue. As a result, SSD lifetime reduction, perfomance
degradation, early SSD failure, and increased TCO cost are reality of
data infrastructure.

[WHY YET ANOTHER FS?]

QLC NAND flash makes really tough requirements for file systems
to be really flash friendly. ZNS SSD and FDP SSD technologies try
to help to manage these strict requirements. But, anyway, file
systems need to play properly to employ ZNS/FDP SSDs benefits.
Ideally, file system needs to use only Copy-On-Write (COW) or
append-only policy, to be a Log-structured (LFS) file system,
and to be capable to work without GC subsystem. However, F2FS
and NILFS2 file systems heavily rely on GC subsystem that
inevitably increase write amplification. These file systems
use the COW policy not for the whole volume.

Generally speaking, it will be good to see LFS file system architecture
that is capable:
(1) eliminate FS GC overhead,
(2) decrease/eliminate FTL GC responsibilities,
(3) decrease write amplification factor,
(4) introduce native architectural support of ZNS SSD + SMR HDD,
(5) increase compression ratio by using delta-encoding and deduplication,
(6) introduce smart management of "cold" data and efficient TRIM policy,
(7) employ parallelism of multiple NAND dies/channels,
(8) prolong SSD lifetime and decrease TCO cost,
(9) guarantee strong reliability and capability to reconstruct heavily
    corrupted file system volume,
(10) guarantee stable performance.

SSDFS is an open-source, kernel-space LFS file system designed:
(1) eliminate GC overhead, (2) prolong SSD lifetime, (3) natively support
a strict append-only mode (ZNS SSD + SMR HDD compatible), (4) guarantee
strong reliability, (5) guarantee stable performance.

[SSDFS ARCHITECTURE]

One of the key goals of SSDFS is to decrease the write amplification
factor. Logical extent concept is the fundamental technique to achieve
the goal. Logical extent describes any volume extent on the basis of
{segment ID, logical block ID, and length}. Segment is a portion of
file system volume that has to be aligned on erase block size and
always located at the same offset. It is basic unit to allocate and
to manage free space of file system volume. Every segment can include
one or several Logical Erase Blocks (LEB). LEB can be mapped into
"Physical" Erase Block (PEB). Generally speaking, PEB is fixed-sized
container that includes a number of logical blocks (physical sectors
or NAND flash pages). SSDFS is pure Log-structured File System (LFS).
It means that any write operation into erase block is the creation of
log. Content of every erase block is a sequence of logs. PEB has block
bitmap with the goal of tracking the state (free, pre-allocated,
allocated, invalid) of logical blocks and to account the physical space
is used for storing log's metadata (segment header, partial log header,
footer). Also, log contains an offset translation table that converts
logical block ID into particular offset inside of log's payload.
Log concept implements a support of compression, delta-encoding,
and compaction scheme. As a result, it provides the way: (1) decrease
write amplification, (2) decrease FTL GC responsibilities, (3) improve
compression ration and decrease payload size. Finally, SSD lifetime
can be longer and write I/O performance can be improved.

SSDFS file system is based on the concept of logical segment that
is the aggregation of Logical Erase Blocks (LEB). Moreover, initially,
LEB hasn’t association with a particular "Physical" Erase Block (PEB).
It means that segment could have the association not for all LEBs or,
even, to have no association at all with any PEB (for example, in the
case of clean segment). Generally speaking, SSDFS file system needs
a special metadata structure (PEB mapping table) that is capable of
associating any LEB with any PEB. The PEB mapping table is the crucial
metadata structure that has several goals: (1) mapping LEB to PEB,
(2) implementation of the logical extent concept, (3) implementation of
the concept of PEB migration, (4) implementation of the delayed erase
operation by specialized thread.

SSDFS implements a migration scheme. Migration scheme is a fundamental
technique of GC overhead management. The key responsibility of the
migration scheme is to guarantee the presence of data in the same segment
for any update operations. Generally speaking, the migration scheme’s model
is implemented on the basis of association an exhausted "Physical" Erase
Block (PEB) with a clean one. The goal of such association of two PEBs is
to implement the gradual migration of data by means of the update
operations in the initial (exhausted) PEB. As a result, the old, exhausted
PEB becomes invalidated after complete data migration and it will be
possible to apply the erase operation to convert it to a clean state.
The migration scheme is capable of decreasing GC activity significantly
by means of excluding the necessity to update metadata and by means of
self-migration of data between PEBs is triggered by regular update
operations. Finally, migration scheme can: (1) eliminate GC overhead,
(2) implement efficient TRIM policy, (3) prolong SDD lifetime,
(4) guarantee stable performance.

Generally speaking, SSDFS doesn't need a classical model of garbage
collection that is used in NILFS2 or F2FS. However, SSDFS has several
global GC threads (dirty, pre-dirty, used, using segment states) and
segment bitmap. The main responsibility of global GC threads is:
(1) find segment in a particular state, (2) check that segment object
is constructed and initialized by file system driver logic,
(3) check the necessity to stimulate or finish the migration
(if segment is under update operations or has update operations
recently, then migration stimulation is not necessary),
(4) define valid blocks that require migration, (5) add recommended
migration request to PEB update queue, (6) destroy in-core segment
object if no migration is necessary and no create/update requests
have been received by segment object recently. Global GC threads are
used to recommend migration stimulation for particular PEBs and
to destroy in-core segment objects that have no requests for
processing. Segment bitmap is the critical metadata structure of
SSDFS file system that implements several goals: (1) searching for
a candidate for a current segment capable of storing new data,
(2) searching by GC subsystem for the most optimal segment (dirty
state, for example) with the goal of preparing the segment in
background for storing new data (converting in a clean state).

SSDFS file system uses b-tree architecture for metadata representation
(for example, inodes tree, extents tree, dentries tree, xattr tree)
because it provides the compact way of reserving the metadata space
without the necessity to use the excessive overprovisioning of
metadata reservation (for example, in the case of plain table or array).
SSDFS file system uses a hybrid b-tree architecture with the goal
to eliminate the index nodes’ side effect. The hybrid b-tree operates by
three node types: (1) index node, (2) hybrid node, (3) leaf node.
Generally speaking, the peculiarity of hybrid node is the mixture
as index as data records into one node. Hybrid b-tree starts with
root node that is capable to keep the two index records or two data
records inline (if size of data record is equal or lesser than size
of index record). If the b-tree needs to contain more than two items
then it should be added the first hybrid node into the b-tree.
The root level of b-tree is able to contain only two nodes because
the root node is capable to store only two index records. Generally speaking,
the initial goal of hybrid node is to store the data records in
the presence of reserved index area. B-tree implements compact and
flexible metadata structure that can decrease payload size and
isolate hot, warm, and cold metadata types in different erase blocks.

Migration scheme is completely enough for the case of conventional SSDs
as for metadata as for user data. But ZNS SSD has huge zone size and
limited number of active/open zones. As a result, it requires introducing
a moving scheme for user data in the case of ZNS SSD. Finally, migration
scheme works for metadata and moving scheme works for user data
(ZNS SSD case). Initially, user data can be stored into current user
data segment/zone. And user data can be updated at the same zone until
exhaustion. Next, moving scheme starts to work. Updated user data is moved
into current user data zone for updates. As a result, it needs to update
the extents tree and to store invalidated extents of old zone into
invalidated extents tree. Invalidated extents tree needs to track
the moment when the old zone is completely invalidated and is ready
to be erased.

[BENCHMARKING]

Benchmarking results show that SSDFS is capable:
(1) generate smaller amount of write I/O requests compared with:
    1.4x - 116x (ext4),
    14x - 42x (xfs),
    6.2x - 9.8x (btrfs),
    1.5x - 41x (f2fs),
    0.6x - 22x (nilfs2);
(2) create smaller payload compared with:
    0.3x - 300x (ext4),
    0.3x - 190x (xfs),
    0.7x - 400x (btrfs),
    1.2x - 400x (f2fs),
    0.9x - 190x (nilfs2);
(3) decrease the write amplification factor compared with:
    1.3x - 116x (ext4),
    14x - 42x (xfs),
    6x - 9x (btrfs),
    1.5x - 50x (f2fs),
    1.2x - 20x (nilfs2);
(4) prolong SSD lifetime compared with:
    1.4x - 7.8x (ext4),
    15x - 60x (xfs),
    6x - 12x (btrfs),
    1.5x - 7x (f2fs),
    1x - 4.6x (nilfs2).

v2
(*) File system code has been completely switched on memory folios.
(*) PEB-based deduplication model has been introduced.
(*) 8K, 16K, 32K logical block size support has been stabilized.
(*) PEB inflation model has been implemented.
(*) Shared dictionary and b-tree subsystem has been reworked significantly.

[CURRENT ISSUES]
(*) FSCK tool is not fully implemented.
(*) Multiple issues during xfstests run.
(*) ZNS SSD + SMR HDD support is not stable.
(*) Multiple erase blocks in segment model functionality is not stable.
(*) Collaboration of PEB inflation model, migration scheme, and
    moving scheme has issues.

[TODO]
(*) Multi-drive support.

[REFERENCES]
[1] SSDFS tools: https://github.com/dubeyko/ssdfs-tools.git
[2] SSDFS driver: https://github.com/dubeyko/ssdfs-driver.git
[3] Linux kernel with SSDFS support: https://github.com/dubeyko/linux.git
[4] SSDFS (paper): https://arxiv.org/abs/1907.11825
[5] Embedded Linux 2022: https://www.youtube.com/watch?v=x5gklnkvi_Q
[6] Linux Plumbers 2022: https://www.youtube.com/watch?v=sBGddJBHsIo
[7] Why do you need SSDFS?: https://www.youtube.com/watch?v=7b_vrtRvsGM
[8] Linux Plumbers 2024: https://www.youtube.com/watch?v=0_f1kD7fGnE

Viacheslav Dubeyko (79):
  ssdfs: introduce SSDFS on-disk layout
  ssdfs: add key file system declarations
  ssdfs: add key file system's function declarations
  ssdfs: implement raw device operations
  ssdfs: implement basic read/write primitives
  ssdfs: implement super operations
  ssdfs: implement commit superblock logic
  ssdfs: segment header + log footer operations
  ssdfs: add declaration of functions for superblock search
  ssdfs: basic mount logic implementation
  ssdfs: implement folio vector
  ssdfs: implement dynamic array
  ssdfs: implement sequence array
  ssdfs: implement folio array
  ssdfs: introduce PEB's block bitmap
  ssdfs: implement PEB's block bitmap functionality
  ssdfs: implement support of migration scheme in PEB bitmap
  ssdfs: implement functionality of migration scheme in PEB bitmap
  ssdfs: introduce segment block bitmap
  ssdfs: implement functionality of segment block bitmap
  ssdfs: introduce segment request queue
  ssdfs: introduce offset translation table
  ssdfs: implement offsets translation table functionality
  ssdfs: introduce PEB object
  ssdfs: implement PEB object functionality
  ssdfs: implement compression logic support
  ssdfs: introduce PEB container
  ssdfs: implement PEB container functionality
  ssdfs: implement migration scheme
  ssdfs: PEB read thread logic
  ssdfs: PEB flush thread's finite state machine
  ssdfs: auxilairy GC threads logic
  ssdfs: introduce segment object
  ssdfs: implement segment object's functionality
  ssdfs: implement current segment functionality
  ssdfs: implement segment tree functionality
  ssdfs: introduce PEB mapping queue
  ssdfs: introduce PEB mapping table
  ssdfs: implement PEB mapping table functionality
  ssdfs: introduce PEB mapping table cache
  ssdfs: implement PEB mapping table cache logic
  ssdfs: introduce segment bitmap
  ssdfs: implement segment bitmap's functionality
  ssdfs: introduce b-tree object
  ssdfs: implement b-tree object's functionality
  ssdfs: introduce b-tree node object
  ssdfs: implement b-tree node's functionality
  ssdfs: introduce b-tree hierarchy object
  ssdfs: implement b-tree hierarchy logic
  ssdfs: introduce inodes b-tree
  ssdfs: implement inodes b-tree functionality
  ssdfs: introduce dentries b-tree
  ssdfs: implement dentries b-tree functionality
  ssdfs: introduce extents queue object
  ssdfs: introduce extents b-tree
  ssdfs: implement extents b-tree functionality
  ssdfs: introduce invalidated extents b-tree
  ssdfs: implement invalidated extents b-tree functionality
  ssdfs: introduce shared extents b-tree
  ssdfs: implement shared extents b-tree functionality
  ssdfs: introduce PEB-based deduplication technique
  ssdfs: introduce shared dictionary b-tree
  ssdfs: implement shared dictionary b-tree functionality
  ssdfs: implement snapshot requests queue functionality
  ssdfs: introduce snapshots b-tree
  ssdfs: implement snapshots b-tree functionality
  ssdfs: implement extended attributes support
  ssdfs: implement extended attributes b-tree functionality
  ssdfs: introduce Diff-On-Write approach
  ssdfs: implement sysfs support
  ssdfs: implement IOCTL operations
  ssdfs: introduce online FSCK stub logic
  ssdfs: introduce application-based unit-tests
  ssdfs: introduce Kunit-based unit-tests
  ssdfs: implement inode operations support
  ssdfs: implement directory operations support
  ssdfs: implement file operations support
  ssdfs: implement initial support of tunefs operations
  Introduce SSDFS file system

 fs/Kconfig                            |     1 +
 fs/Makefile                           |     1 +
 fs/ssdfs/.kunitconfig                 |     9 +
 fs/ssdfs/Kconfig                      |   408 +
 fs/ssdfs/Makefile                     |    63 +
 fs/ssdfs/acl.c                        |   260 +
 fs/ssdfs/acl.h                        |    54 +
 fs/ssdfs/block_bitmap.c               |  6948 +++++++
 fs/ssdfs/block_bitmap.h               |   393 +
 fs/ssdfs/block_bitmap_tables.c        |   311 +
 fs/ssdfs/block_bitmap_test.c          |  2380 +++
 fs/ssdfs/btree.c                      |  8506 +++++++++
 fs/ssdfs/btree.h                      |   219 +
 fs/ssdfs/btree_hierarchy.c            | 11632 ++++++++++++
 fs/ssdfs/btree_hierarchy.h            |   336 +
 fs/ssdfs/btree_node.c                 | 18780 ++++++++++++++++++
 fs/ssdfs/btree_node.h                 |   891 +
 fs/ssdfs/btree_search.c               |  1114 ++
 fs/ssdfs/btree_search.h               |   424 +
 fs/ssdfs/common_bitmap.h              |   230 +
 fs/ssdfs/compr_lzo.c                  |   268 +
 fs/ssdfs/compr_lzo_test.c             |   570 +
 fs/ssdfs/compr_zlib.c                 |   374 +
 fs/ssdfs/compr_zlib_test.c            |   401 +
 fs/ssdfs/compression.c                |   569 +
 fs/ssdfs/compression.h                |   108 +
 fs/ssdfs/compression_test.c           |   310 +
 fs/ssdfs/current_segment.c            |   949 +
 fs/ssdfs/current_segment.h            |   116 +
 fs/ssdfs/dentries_tree.c              | 10485 ++++++++++
 fs/ssdfs/dentries_tree.h              |   158 +
 fs/ssdfs/dev_bdev.c                   |  1065 ++
 fs/ssdfs/dev_mtd.c                    |   650 +
 fs/ssdfs/dev_zns.c                    |  1344 ++
 fs/ssdfs/diff_on_write.c              |   158 +
 fs/ssdfs/diff_on_write.h              |   106 +
 fs/ssdfs/diff_on_write_metadata.c     |  2969 +++
 fs/ssdfs/diff_on_write_user_data.c    |   851 +
 fs/ssdfs/dir.c                        |  2197 +++
 fs/ssdfs/dynamic_array.c              |  1594 ++
 fs/ssdfs/dynamic_array.h              |   103 +
 fs/ssdfs/dynamic_array_test.c         |   660 +
 fs/ssdfs/extents_queue.c              |  2013 ++
 fs/ssdfs/extents_queue.h              |   110 +
 fs/ssdfs/extents_tree.c               | 15349 +++++++++++++++
 fs/ssdfs/extents_tree.h               |   188 +
 fs/ssdfs/file.c                       |  4341 +++++
 fs/ssdfs/fingerprint.h                |   261 +
 fs/ssdfs/fingerprint_array.c          |   795 +
 fs/ssdfs/fingerprint_array.h          |    82 +
 fs/ssdfs/folio_array.c                |  1781 ++
 fs/ssdfs/folio_array.h                |   146 +
 fs/ssdfs/folio_array_test.c           |  1107 ++
 fs/ssdfs/folio_vector.c               |   523 +
 fs/ssdfs/folio_vector.h               |    70 +
 fs/ssdfs/folio_vector_test.c          |   495 +
 fs/ssdfs/fs_error.c                   |   265 +
 fs/ssdfs/global_fsck.c                |   598 +
 fs/ssdfs/inode.c                      |  1262 ++
 fs/ssdfs/inodes_tree.c                |  6261 ++++++
 fs/ssdfs/inodes_tree.h                |   181 +
 fs/ssdfs/invalidated_extents_tree.c   |  7128 +++++++
 fs/ssdfs/invalidated_extents_tree.h   |    96 +
 fs/ssdfs/ioctl.c                      |   453 +
 fs/ssdfs/ioctl.h                      |    58 +
 fs/ssdfs/log_footer.c                 |   991 +
 fs/ssdfs/offset_translation_table.c   | 12175 ++++++++++++
 fs/ssdfs/offset_translation_table.h   |   459 +
 fs/ssdfs/options.c                    |   170 +
 fs/ssdfs/peb.c                        |  1120 ++
 fs/ssdfs/peb.h                        |   600 +
 fs/ssdfs/peb_block_bitmap.c           |  5740 ++++++
 fs/ssdfs/peb_block_bitmap.h           |   179 +
 fs/ssdfs/peb_container.c              |  6605 +++++++
 fs/ssdfs/peb_container.h              |   631 +
 fs/ssdfs/peb_deduplication.c          |   483 +
 fs/ssdfs/peb_flush_thread.c           | 24221 ++++++++++++++++++++++++
 fs/ssdfs/peb_fsck_thread.c            |   242 +
 fs/ssdfs/peb_gc_thread.c              |  3734 ++++
 fs/ssdfs/peb_init.c                   |  1338 ++
 fs/ssdfs/peb_init.h                   |   364 +
 fs/ssdfs/peb_mapping_queue.c          |   340 +
 fs/ssdfs/peb_mapping_queue.h          |    68 +
 fs/ssdfs/peb_mapping_table.c          | 13868 ++++++++++++++
 fs/ssdfs/peb_mapping_table.h          |   784 +
 fs/ssdfs/peb_mapping_table_cache.c    |  4897 +++++
 fs/ssdfs/peb_mapping_table_cache.h    |   120 +
 fs/ssdfs/peb_mapping_table_thread.c   |  2959 +++
 fs/ssdfs/peb_migration_scheme.c       |  1445 ++
 fs/ssdfs/peb_read_thread.c            | 14978 +++++++++++++++
 fs/ssdfs/readwrite.c                  |   973 +
 fs/ssdfs/recovery.c                   |  3706 ++++
 fs/ssdfs/recovery.h                   |   451 +
 fs/ssdfs/recovery_fast_search.c       |  1200 ++
 fs/ssdfs/recovery_slow_search.c       |   587 +
 fs/ssdfs/recovery_thread.c            |  1215 ++
 fs/ssdfs/request_queue.c              |  1726 ++
 fs/ssdfs/request_queue.h              |   818 +
 fs/ssdfs/segment.c                    |  8525 +++++++++
 fs/ssdfs/segment.h                    |  1367 ++
 fs/ssdfs/segment_bitmap.c             |  5157 +++++
 fs/ssdfs/segment_bitmap.h             |   482 +
 fs/ssdfs/segment_bitmap_tables.c      |   887 +
 fs/ssdfs/segment_block_bitmap.c       |  1929 ++
 fs/ssdfs/segment_block_bitmap.h       |   240 +
 fs/ssdfs/segment_tree.c               |   996 +
 fs/ssdfs/segment_tree.h               |   107 +
 fs/ssdfs/sequence_array.c             |  1160 ++
 fs/ssdfs/sequence_array.h             |   140 +
 fs/ssdfs/shared_dictionary.c          | 21342 +++++++++++++++++++++
 fs/ssdfs/shared_dictionary.h          |   204 +
 fs/ssdfs/shared_dictionary_thread.c   |   457 +
 fs/ssdfs/shared_extents_tree.c        |  6866 +++++++
 fs/ssdfs/shared_extents_tree.h        |   146 +
 fs/ssdfs/shared_extents_tree_thread.c |   808 +
 fs/ssdfs/snapshot.c                   |    99 +
 fs/ssdfs/snapshot.h                   |   283 +
 fs/ssdfs/snapshot_requests_queue.c    |  1249 ++
 fs/ssdfs/snapshot_requests_queue.h    |    65 +
 fs/ssdfs/snapshot_rules.c             |   739 +
 fs/ssdfs/snapshot_rules.h             |    55 +
 fs/ssdfs/snapshots_tree.c             |  8917 +++++++++
 fs/ssdfs/snapshots_tree.h             |   248 +
 fs/ssdfs/snapshots_tree_thread.c      |   665 +
 fs/ssdfs/ssdfs.h                      |   502 +
 fs/ssdfs/ssdfs_constants.h            |   214 +
 fs/ssdfs/ssdfs_fs_info.h              |   821 +
 fs/ssdfs/ssdfs_inline.h               |  3037 +++
 fs/ssdfs/ssdfs_inode_info.h           |   144 +
 fs/ssdfs/ssdfs_thread_info.h          |    43 +
 fs/ssdfs/super.c                      |  4873 +++++
 fs/ssdfs/sysfs.c                      |  6558 +++++++
 fs/ssdfs/sysfs.h                      |   305 +
 fs/ssdfs/testing.c                    |  5949 ++++++
 fs/ssdfs/testing.h                    |   226 +
 fs/ssdfs/tunefs.c                     |   487 +
 fs/ssdfs/version.h                    |     9 +
 fs/ssdfs/volume_header.c              |  1431 ++
 fs/ssdfs/xattr.c                      |  1700 ++
 fs/ssdfs/xattr.h                      |    88 +
 fs/ssdfs/xattr_security.c             |   159 +
 fs/ssdfs/xattr_tree.c                 | 10132 ++++++++++
 fs/ssdfs/xattr_tree.h                 |   143 +
 fs/ssdfs/xattr_trusted.c              |    93 +
 fs/ssdfs/xattr_user.c                 |    93 +
 include/linux/ssdfs_fs.h              |  3565 ++++
 include/trace/events/ssdfs.h          |   256 +
 include/uapi/linux/magic.h            |     1 +
 include/uapi/linux/ssdfs_fs.h         |   126 +
 149 files changed, 335803 insertions(+)
 create mode 100644 fs/ssdfs/.kunitconfig
 create mode 100644 fs/ssdfs/Kconfig
 create mode 100644 fs/ssdfs/Makefile
 create mode 100644 fs/ssdfs/acl.c
 create mode 100644 fs/ssdfs/acl.h
 create mode 100644 fs/ssdfs/block_bitmap.c
 create mode 100644 fs/ssdfs/block_bitmap.h
 create mode 100644 fs/ssdfs/block_bitmap_tables.c
 create mode 100644 fs/ssdfs/block_bitmap_test.c
 create mode 100644 fs/ssdfs/btree.c
 create mode 100644 fs/ssdfs/btree.h
 create mode 100644 fs/ssdfs/btree_hierarchy.c
 create mode 100644 fs/ssdfs/btree_hierarchy.h
 create mode 100644 fs/ssdfs/btree_node.c
 create mode 100644 fs/ssdfs/btree_node.h
 create mode 100644 fs/ssdfs/btree_search.c
 create mode 100644 fs/ssdfs/btree_search.h
 create mode 100644 fs/ssdfs/common_bitmap.h
 create mode 100644 fs/ssdfs/compr_lzo.c
 create mode 100644 fs/ssdfs/compr_lzo_test.c
 create mode 100644 fs/ssdfs/compr_zlib.c
 create mode 100644 fs/ssdfs/compr_zlib_test.c
 create mode 100644 fs/ssdfs/compression.c
 create mode 100644 fs/ssdfs/compression.h
 create mode 100644 fs/ssdfs/compression_test.c
 create mode 100644 fs/ssdfs/current_segment.c
 create mode 100644 fs/ssdfs/current_segment.h
 create mode 100644 fs/ssdfs/dentries_tree.c
 create mode 100644 fs/ssdfs/dentries_tree.h
 create mode 100644 fs/ssdfs/dev_bdev.c
 create mode 100644 fs/ssdfs/dev_mtd.c
 create mode 100644 fs/ssdfs/dev_zns.c
 create mode 100644 fs/ssdfs/diff_on_write.c
 create mode 100644 fs/ssdfs/diff_on_write.h
 create mode 100644 fs/ssdfs/diff_on_write_metadata.c
 create mode 100644 fs/ssdfs/diff_on_write_user_data.c
 create mode 100644 fs/ssdfs/dir.c
 create mode 100644 fs/ssdfs/dynamic_array.c
 create mode 100644 fs/ssdfs/dynamic_array.h
 create mode 100644 fs/ssdfs/dynamic_array_test.c
 create mode 100644 fs/ssdfs/extents_queue.c
 create mode 100644 fs/ssdfs/extents_queue.h
 create mode 100644 fs/ssdfs/extents_tree.c
 create mode 100644 fs/ssdfs/extents_tree.h
 create mode 100644 fs/ssdfs/file.c
 create mode 100644 fs/ssdfs/fingerprint.h
 create mode 100644 fs/ssdfs/fingerprint_array.c
 create mode 100644 fs/ssdfs/fingerprint_array.h
 create mode 100644 fs/ssdfs/folio_array.c
 create mode 100644 fs/ssdfs/folio_array.h
 create mode 100644 fs/ssdfs/folio_array_test.c
 create mode 100644 fs/ssdfs/folio_vector.c
 create mode 100644 fs/ssdfs/folio_vector.h
 create mode 100644 fs/ssdfs/folio_vector_test.c
 create mode 100644 fs/ssdfs/fs_error.c
 create mode 100644 fs/ssdfs/global_fsck.c
 create mode 100644 fs/ssdfs/inode.c
 create mode 100644 fs/ssdfs/inodes_tree.c
 create mode 100644 fs/ssdfs/inodes_tree.h
 create mode 100644 fs/ssdfs/invalidated_extents_tree.c
 create mode 100644 fs/ssdfs/invalidated_extents_tree.h
 create mode 100644 fs/ssdfs/ioctl.c
 create mode 100644 fs/ssdfs/ioctl.h
 create mode 100644 fs/ssdfs/log_footer.c
 create mode 100644 fs/ssdfs/offset_translation_table.c
 create mode 100644 fs/ssdfs/offset_translation_table.h
 create mode 100644 fs/ssdfs/options.c
 create mode 100644 fs/ssdfs/peb.c
 create mode 100644 fs/ssdfs/peb.h
 create mode 100644 fs/ssdfs/peb_block_bitmap.c
 create mode 100644 fs/ssdfs/peb_block_bitmap.h
 create mode 100644 fs/ssdfs/peb_container.c
 create mode 100644 fs/ssdfs/peb_container.h
 create mode 100644 fs/ssdfs/peb_deduplication.c
 create mode 100644 fs/ssdfs/peb_flush_thread.c
 create mode 100644 fs/ssdfs/peb_fsck_thread.c
 create mode 100644 fs/ssdfs/peb_gc_thread.c
 create mode 100644 fs/ssdfs/peb_init.c
 create mode 100644 fs/ssdfs/peb_init.h
 create mode 100644 fs/ssdfs/peb_mapping_queue.c
 create mode 100644 fs/ssdfs/peb_mapping_queue.h
 create mode 100644 fs/ssdfs/peb_mapping_table.c
 create mode 100644 fs/ssdfs/peb_mapping_table.h
 create mode 100644 fs/ssdfs/peb_mapping_table_cache.c
 create mode 100644 fs/ssdfs/peb_mapping_table_cache.h
 create mode 100644 fs/ssdfs/peb_mapping_table_thread.c
 create mode 100644 fs/ssdfs/peb_migration_scheme.c
 create mode 100644 fs/ssdfs/peb_read_thread.c
 create mode 100644 fs/ssdfs/readwrite.c
 create mode 100644 fs/ssdfs/recovery.c
 create mode 100644 fs/ssdfs/recovery.h
 create mode 100644 fs/ssdfs/recovery_fast_search.c
 create mode 100644 fs/ssdfs/recovery_slow_search.c
 create mode 100644 fs/ssdfs/recovery_thread.c
 create mode 100644 fs/ssdfs/request_queue.c
 create mode 100644 fs/ssdfs/request_queue.h
 create mode 100644 fs/ssdfs/segment.c
 create mode 100644 fs/ssdfs/segment.h
 create mode 100644 fs/ssdfs/segment_bitmap.c
 create mode 100644 fs/ssdfs/segment_bitmap.h
 create mode 100644 fs/ssdfs/segment_bitmap_tables.c
 create mode 100644 fs/ssdfs/segment_block_bitmap.c
 create mode 100644 fs/ssdfs/segment_block_bitmap.h
 create mode 100644 fs/ssdfs/segment_tree.c
 create mode 100644 fs/ssdfs/segment_tree.h
 create mode 100644 fs/ssdfs/sequence_array.c
 create mode 100644 fs/ssdfs/sequence_array.h
 create mode 100644 fs/ssdfs/shared_dictionary.c
 create mode 100644 fs/ssdfs/shared_dictionary.h
 create mode 100644 fs/ssdfs/shared_dictionary_thread.c
 create mode 100644 fs/ssdfs/shared_extents_tree.c
 create mode 100644 fs/ssdfs/shared_extents_tree.h
 create mode 100644 fs/ssdfs/shared_extents_tree_thread.c
 create mode 100644 fs/ssdfs/snapshot.c
 create mode 100644 fs/ssdfs/snapshot.h
 create mode 100644 fs/ssdfs/snapshot_requests_queue.c
 create mode 100644 fs/ssdfs/snapshot_requests_queue.h
 create mode 100644 fs/ssdfs/snapshot_rules.c
 create mode 100644 fs/ssdfs/snapshot_rules.h
 create mode 100644 fs/ssdfs/snapshots_tree.c
 create mode 100644 fs/ssdfs/snapshots_tree.h
 create mode 100644 fs/ssdfs/snapshots_tree_thread.c
 create mode 100644 fs/ssdfs/ssdfs.h
 create mode 100644 fs/ssdfs/ssdfs_constants.h
 create mode 100644 fs/ssdfs/ssdfs_fs_info.h
 create mode 100644 fs/ssdfs/ssdfs_inline.h
 create mode 100644 fs/ssdfs/ssdfs_inode_info.h
 create mode 100644 fs/ssdfs/ssdfs_thread_info.h
 create mode 100644 fs/ssdfs/super.c
 create mode 100644 fs/ssdfs/sysfs.c
 create mode 100644 fs/ssdfs/sysfs.h
 create mode 100644 fs/ssdfs/testing.c
 create mode 100644 fs/ssdfs/testing.h
 create mode 100644 fs/ssdfs/tunefs.c
 create mode 100644 fs/ssdfs/version.h
 create mode 100644 fs/ssdfs/volume_header.c
 create mode 100644 fs/ssdfs/xattr.c
 create mode 100644 fs/ssdfs/xattr.h
 create mode 100644 fs/ssdfs/xattr_security.c
 create mode 100644 fs/ssdfs/xattr_tree.c
 create mode 100644 fs/ssdfs/xattr_tree.h
 create mode 100644 fs/ssdfs/xattr_trusted.c
 create mode 100644 fs/ssdfs/xattr_user.c
 create mode 100644 include/linux/ssdfs_fs.h
 create mode 100644 include/trace/events/ssdfs.h
 create mode 100644 include/uapi/linux/ssdfs_fs.h

-- 
2.34.1

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 01/79] ssdfs: introduce SSDFS on-disk layout
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 02/79] ssdfs: add key file system declarations Viacheslav Dubeyko
                   ` (31 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

SSDFS architecture is based on segment concept. Segment is a portion of
file system volume that has to be aligned on erase block size. Segment
can include one or several erase blocks. It is basic unit to allocate
and to manage a free space of file system volume. Erase block is a basic
unit to keep metadata and user data. Every erase block contains a
sequence of logs. Log starts from segment header (struct ssdfs_segment_header)
or partial log header (struct ssdfs_partial_log_header). Full log can be
finished with log footer (struct ssdfs_log_footer).

Log's header (+footer) contains all necessary metadata describing
the log's payload. The log's metadata includes:
(1) block bitmap (struct ssdfs_block_bitmap_fragment) +
    (struct ssdfs_block_bitmap_header): tracking the state of logical
    blocks (free, pre-allocated, valid, invalid) in segment.
(2) offset translation table (struct ssdfs_blk2off_table_header) +
    (struct ssdfs_phys_offset_table_header) +
    (struct ssdfs_area_block_table): converts logical block into
    position inside of particular erase block.

Additionally, log's header is the copy of superblock that keeps
knowledge of location the all SSDFS metadata structures. SSDFS has:
(1) mapping table (struct ssdfs_leb_table_fragment_header) +
    (struct ssdfs_peb_table_fragment_header): implements the mapping of
    logical erase blocks into "physical" ones.
(2) mapping table cache (struct ssdfs_maptbl_cache_header): copy of content of
    mapping table for some type of erase blocks. The cache is used for
    conversion logical erase block ID into "physical" erase block ID in
    the case when the fragment of mapping table is not initialized yet.
(3) segment bitmap (struct ssdfs_segbmap_fragment_header): tracking state
    (clean, using, used, pre-dirty, dirty, reserved) of segments with
    the goal of searching, allocation, erase, and garbage collection.
(4) b-tree (struct ssdfs_btree_descriptor) + (struct ssdfs_btree_index_key) +
    (struct ssdfs_btree_node_header): all the rest metadata structures are
    represented by b-trees.
(5) inodes b-tree (struct ssdfs_inodes_btree) +
    (struct ssdfs_inodes_btree_node_header): keeps raw inodes of existing
    file system objects (struct ssdfs_inode).
(6) dentries b-tree (struct ssdfs_dentries_btree_descriptor) +
    (struct ssdfs_dentries_btree_node_header): keeps directory entries
    (struct ssdfs_dir_entry).
(7) extents b-tree (struct ssdfs_extents_btree_descriptor) +
    (struct ssdfs_extents_btree_node_header): keeps raw extents describing
    the location of piece of data (struct ssdfs_raw_fork) +
    (struct ssdfs_raw_extent).
(8) xattr b-tree (struct ssdfs_xattr_btree_descriptor) +
    (struct ssdfs_xattrs_btree_node_header): keeps extended attributes of
    file or folder (struct ssdfs_xattr_entry).
(9) invalidated extents b-tree (struct ssdfs_invalidated_extents_btree) +
    (struct ssdfs_invextree_node_header): keeps information about invalidated
    extents for ZNS SSD + SMR HDD use cases.
(10) shared dictionary b-tree (struct ssdfs_shared_dictionary_btree) +
     (struct ssdfs_shared_dictionary_node_header): keeps long names
     (more than 12 symbols) in the form of tries.
(11) snapshots b-tree (struct ssdfs_snapshots_btree) +
     (struct ssdfs_snapshots_btree_node_header): keeps snapshots info
     (struct ssdfs_snapshot) and association of erase block IDs with
     timestamps (struct ssdfs_peb2time_set) + (struct ssdfs_peb2time_pair).

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 include/linux/ssdfs_fs.h | 3565 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 3565 insertions(+)
 create mode 100644 include/linux/ssdfs_fs.h

diff --git a/include/linux/ssdfs_fs.h b/include/linux/ssdfs_fs.h
new file mode 100644
index 000000000000..1ed83126ca70
--- /dev/null
+++ b/include/linux/ssdfs_fs.h
@@ -0,0 +1,3565 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * include/linux/ssdfs_fs.h - SSDFS on-disk structures and common declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _LINUX_SSDFS_H
+#define _LINUX_SSDFS_H
+
+#include <uapi/linux/ssdfs_fs.h>
+
+typedef u8 __le8;
+
+struct ssdfs_inode;
+
+/*
+ * struct ssdfs_revision - metadata structure version
+ * @major: major version number
+ * @minor: minor version number
+ */
+struct ssdfs_revision {
+/* 0x0000 */
+	__le8 major;
+	__le8 minor;
+
+/* 0x0002 */
+}  __packed;
+
+/*
+ * struct ssdfs_signature - metadata structure magic signature
+ * @common: common magic value
+ * @key: detailed magic value
+ */
+struct ssdfs_signature {
+/* 0x0000 */
+	__le32 common;
+	__le16 key;
+	struct ssdfs_revision version;
+
+/* 0x0008 */
+} __packed;
+
+/*
+ * struct ssdfs_metadata_check - metadata structure checksum
+ * @bytes: bytes count of CRC calculation for the structure
+ * @flags: flags
+ * @csum: checksum
+ */
+struct ssdfs_metadata_check {
+/* 0x0000 */
+	__le16 bytes;
+#define SSDFS_CRC32			(1 << 0)
+#define SSDFS_ZLIB_COMPRESSED		(1 << 1)
+#define SSDFS_LZO_COMPRESSED		(1 << 2)
+	__le16 flags;
+	__le32 csum;
+
+/* 0x0008 */
+} __packed;
+
+/*
+ * struct ssdfs_padding_header - padding block header
+ * @magic: magic signature + revision
+ * @check: metadata checksum
+ * @blob: padding blob
+ */
+struct ssdfs_padding_header {
+/* 0x0000 */
+	struct ssdfs_signature magic;
+
+/* 0x0008 */
+	struct ssdfs_metadata_check check;
+
+/* 0x0010 */
+	__le64 blob;
+
+/* 0x0020 */
+} __packed;
+
+/*
+ * struct ssdfs_raw_extent - raw (on-disk) extent
+ * @seg_id: segment number
+ * @logical_blk: logical block number
+ * @len: count of blocks in extent
+ */
+struct ssdfs_raw_extent {
+/* 0x0000 */
+	__le64 seg_id;
+	__le32 logical_blk;
+	__le32 len;
+
+/* 0x0010 */
+} __packed;
+
+/*
+ * struct ssdfs_meta_area_extent - metadata area extent
+ * @start_id: starting identification number
+ * @len: count of items in metadata area
+ * @type: item's type
+ * @flags: flags
+ */
+struct ssdfs_meta_area_extent {
+/* 0x0000 */
+	__le64 start_id;
+	__le32 len;
+	__le16 type;
+	__le16 flags;
+
+/* 0x0010 */
+} __packed;
+
+/* Type of item in metadata area */
+enum {
+	SSDFS_EMPTY_EXTENT_TYPE,
+	SSDFS_SEG_EXTENT_TYPE,
+	SSDFS_PEB_EXTENT_TYPE,
+	SSDFS_BLK_EXTENT_TYPE,
+};
+
+/* Type of segbmap's segments */
+enum {
+	SSDFS_MAIN_SEGBMAP_SEG,
+	SSDFS_COPY_SEGBMAP_SEG,
+	SSDFS_SEGBMAP_SEG_COPY_MAX,
+};
+
+#define SSDFS_SEGBMAP_SEGS	8
+
+/*
+ * struct ssdfs_segbmap_sb_header - superblock's segment bitmap header
+ * @fragments_count: fragments count in segment bitmap
+ * @fragments_per_seg: segbmap's fragments per segment
+ * @fragments_per_peb: segbmap's fragments per PEB
+ * @fragment_size: size of fragment in bytes
+ * @bytes_count: size of segment bitmap in bytes (payload part)
+ * @flags: segment bitmap's flags
+ * @segs_count: count of really reserved segments in one chain
+ * @segs: array of segbmap's segment numbers
+ */
+struct ssdfs_segbmap_sb_header {
+/* 0x0000 */
+	__le16 fragments_count;
+	__le16 fragments_per_seg;
+	__le16 fragments_per_peb;
+	__le16 fragment_size;
+
+/* 0x0008 */
+	__le32 bytes_count;
+	__le16 flags;
+	__le16 segs_count;
+
+/* 0x0010 */
+	__le64 segs[SSDFS_SEGBMAP_SEGS][SSDFS_SEGBMAP_SEG_COPY_MAX];
+
+/* 0x0090 */
+} __packed;
+
+/* Segment bitmap's flags */
+#define SSDFS_SEGBMAP_HAS_COPY		(1 << 0)
+#define SSDFS_SEGBMAP_ERROR		(1 << 1)
+#define SSDFS_SEGBMAP_MAKE_ZLIB_COMPR	(1 << 2)
+#define SSDFS_SEGBMAP_MAKE_LZO_COMPR	(1 << 3)
+#define SSDFS_SEGBMAP_FLAGS_MASK	(0xF)
+
+enum {
+	SSDFS_MAIN_MAPTBL_SEG,
+	SSDFS_COPY_MAPTBL_SEG,
+	SSDFS_MAPTBL_SEG_COPY_MAX,
+};
+
+#define SSDFS_MAPTBL_RESERVED_EXTENTS	(3)
+
+/*
+ * struct ssdfs_maptbl_sb_header - superblock's mapping table header
+ * @fragments_count: count of fragments in mapping table
+ * @fragment_bytes: bytes in one mapping table's fragment
+ * @last_peb_recover_cno: checkpoint of last trying to recover PEBs
+ * @lebs_count: count of Logical Erase Blocks (LEBs) are described by table
+ * @pebs_count: count of Physical Erase Blocks (PEBs) are described by table
+ * @fragments_per_seg: count of mapping table's fragments in segment
+ * @fragments_per_peb: count of mapping table's fragments in PEB
+ * @flags: mapping table's flags
+ * @pre_erase_pebs: count of PEBs in pre-erase state
+ * @lebs_per_fragment: count of LEBs are described by fragment
+ * @pebs_per_fragment: count of PEBs are described by fragment
+ * @pebs_per_stripe: count of PEBs are described by stripe
+ * @stripes_per_fragment: count of stripes in fragment
+ * @extents: metadata extents that describe mapping table location
+ */
+struct ssdfs_maptbl_sb_header {
+/* 0x0000 */
+	__le32 fragments_count;
+	__le32 fragment_bytes;
+	__le64 last_peb_recover_cno;
+
+/* 0x0010 */
+	__le64 lebs_count;
+	__le64 pebs_count;
+
+/* 0x0020 */
+	__le16 fragments_per_seg;
+	__le16 fragments_per_peb;
+	__le16 flags;
+	__le16 pre_erase_pebs;
+
+/* 0x0028 */
+	__le16 lebs_per_fragment;
+	__le16 pebs_per_fragment;
+	__le16 pebs_per_stripe;
+	__le16 stripes_per_fragment;
+
+/* 0x0030 */
+#define MAPTBL_LIMIT1	(SSDFS_MAPTBL_RESERVED_EXTENTS)
+#define MAPTBL_LIMIT2	(SSDFS_MAPTBL_SEG_COPY_MAX)
+	struct ssdfs_meta_area_extent extents[MAPTBL_LIMIT1][MAPTBL_LIMIT2];
+
+/* 0x0090 */
+} __packed;
+
+/* Mapping table's flags */
+#define SSDFS_MAPTBL_HAS_COPY		(1 << 0)
+#define SSDFS_MAPTBL_ERROR		(1 << 1)
+#define SSDFS_MAPTBL_MAKE_ZLIB_COMPR	(1 << 2)
+#define SSDFS_MAPTBL_MAKE_LZO_COMPR	(1 << 3)
+#define SSDFS_MAPTBL_UNDER_FLUSH	(1 << 4)
+#define SSDFS_MAPTBL_START_MIGRATION	(1 << 5)
+#define SSDFS_MAPTBL_FLAGS_MASK		(0x3F)
+
+/*
+ * struct ssdfs_btree_descriptor - generic btree descriptor
+ * @magic: magic signature
+ * @flags: btree flags
+ * @type: btree type
+ * @log_node_size: log2(node size in bytes)
+ * @pages_per_node: physical pages per btree node
+ * @node_ptr_size: size in bytes of pointer on btree node
+ * @index_size: size in bytes of btree's index
+ * @item_size: size in bytes of btree's item
+ * @index_area_min_size: minimal size in bytes of index area in btree node
+ *
+ * The goal of a btree descriptor is to keep
+ * the main features of a tree.
+ */
+struct ssdfs_btree_descriptor {
+/* 0x0000 */
+	__le32 magic;
+#define SSDFS_BTREE_DESC_INDEX_AREA_RESIZABLE		(1 << 0)
+#define SSDFS_BTREE_DESC_FLAGS_MASK			0x1
+	__le16 flags;
+	__le8 type;
+	__le8 log_node_size;
+
+/* 0x0008 */
+	__le8 pages_per_node;
+	__le8 node_ptr_size;
+	__le16 index_size;
+	__le16 item_size;
+	__le16 index_area_min_size;
+
+/* 0x0010 */
+} __packed;
+
+/* Btree types */
+enum {
+	SSDFS_BTREE_UNKNOWN_TYPE,
+	SSDFS_INODES_BTREE,
+	SSDFS_DENTRIES_BTREE,
+	SSDFS_EXTENTS_BTREE,
+	SSDFS_SHARED_EXTENTS_BTREE,
+	SSDFS_XATTR_BTREE,
+	SSDFS_SHARED_XATTR_BTREE,
+	SSDFS_SHARED_DICTIONARY_BTREE,
+	SSDFS_SNAPSHOTS_BTREE,
+	SSDFS_INVALIDATED_EXTENTS_BTREE,
+	SSDFS_BTREE_TYPE_MAX
+};
+
+/*
+ * struct ssdfs_dentries_btree_descriptor - dentries btree descriptor
+ * @desc: btree descriptor
+ */
+struct ssdfs_dentries_btree_descriptor {
+/* 0x0000 */
+	struct ssdfs_btree_descriptor desc;
+
+/* 0x0010 */
+	__le8 reserved[0x10];
+
+/* 0x0020 */
+} __packed;
+
+/*
+ * struct ssdfs_extents_btree_descriptor - extents btree descriptor
+ * @desc: btree descriptor
+ */
+struct ssdfs_extents_btree_descriptor {
+/* 0x0000 */
+	struct ssdfs_btree_descriptor desc;
+
+/* 0x0010 */
+	__le8 reserved[0x10];
+
+/* 0x0020 */
+} __packed;
+
+/*
+ * struct ssdfs_xattr_btree_descriptor - extended attr btree descriptor
+ * @desc: btree descriptor
+ */
+struct ssdfs_xattr_btree_descriptor {
+/* 0x0000 */
+	struct ssdfs_btree_descriptor desc;
+
+/* 0x0010 */
+	__le8 reserved[0x10];
+
+/* 0x0020 */
+} __packed;
+
+/* Type of superblock segments */
+enum {
+	SSDFS_MAIN_SB_SEG,
+	SSDFS_COPY_SB_SEG,
+	SSDFS_SB_SEG_COPY_MAX,
+};
+
+/* Different phases of superblok segment */
+enum {
+	SSDFS_CUR_SB_SEG,
+	SSDFS_NEXT_SB_SEG,
+	SSDFS_RESERVED_SB_SEG,
+	SSDFS_PREV_SB_SEG,
+	SSDFS_SB_CHAIN_MAX,
+};
+
+/*
+ * struct ssdfs_leb2peb_pair - LEB/PEB numbers association
+ * @leb_id: LEB ID number
+ * @peb_id: PEB ID number
+ */
+struct ssdfs_leb2peb_pair {
+/* 0x0000 */
+	__le64 leb_id;
+	__le64 peb_id;
+
+/* 0x0010 */
+} __packed;
+
+/*
+ * struct ssdfs_btree_index - btree index
+ * @hash: hash value
+ * @extent: btree node's extent
+ *
+ * The goal of btree index is to provide the way to search
+ * a proper btree node by means of hash value. The hash
+ * value could be inode_id, string hash and so on.
+ */
+struct ssdfs_btree_index {
+/* 0x0000 */
+	__le64 hash;
+
+/* 0x0008 */
+	struct ssdfs_raw_extent extent;
+
+/* 0x0018 */
+} __packed;
+
+#define SSDFS_BTREE_NODE_INVALID_ID	(U32_MAX)
+
+/*
+ * struct ssdfs_btree_index_key - node identification key
+ * @node_id: node identification key
+ * @node_type: type of the node
+ * @height: node's height
+ * @flags: index flags
+ * @index: node's index
+ */
+struct ssdfs_btree_index_key {
+/* 0x0000 */
+	__le32 node_id;
+	__le8 node_type;
+	__le8 height;
+#define SSDFS_BTREE_INDEX_HAS_VALID_EXTENT		(1 << 0)
+#define SSDFS_BTREE_INDEX_SHOW_EMPTY_NODE		(1 << 1)
+#define SSDFS_BTREE_INDEX_SHOW_FREE_ITEMS		(1 << 2)
+#define SSDFS_BTREE_INDEX_HAS_CHILD_WITH_FREE_ITEMS	(1 << 3)
+#define SSDFS_BTREE_INDEX_SHOW_PREALLOCATED_CHILD	(1 << 4)
+#define SSDFS_BTREE_INDEX_FLAGS_MASK			0x1F
+	__le16 flags;
+
+/* 0x0008 */
+	struct ssdfs_btree_index index;
+
+/* 0x0020 */
+} __packed;
+
+#define SSDFS_BTREE_ROOT_NODE_INDEX_COUNT	(2)
+
+/*
+ * struct ssdfs_btree_root_node_header - root node header
+ * @height: btree height
+ * @items_count: count of items in the root node
+ * @flags: root node flags
+ * @type: root node type
+ * @upper_node_id: last allocated the node identification number
+ * @node_ids: root node's children IDs
+ */
+struct ssdfs_btree_root_node_header {
+/* 0x0000 */
+#define SSDFS_BTREE_LEAF_NODE_HEIGHT	(0)
+	__le8 height;
+	__le8 items_count;
+	__le8 flags;
+	__le8 type;
+
+/* 0x0004 */
+#define SSDFS_BTREE_ROOT_NODE_ID		(0)
+	__le32 upper_node_id;
+
+/* 0x0008 */
+	__le32 node_ids[SSDFS_BTREE_ROOT_NODE_INDEX_COUNT];
+
+/* 0x0010 */
+} __packed;
+
+/*
+ * struct ssdfs_btree_inline_root_node - btree root node
+ * @header: node header
+ * @indexes: root node's index array
+ *
+ * The goal of root node is to live inside of 0x40 bytes
+ * space and to keep the root index node of the tree.
+ * The inline root node could be the part of inode
+ * structure or the part of btree root. The inode has
+ * 0x80 bytes space. But inode needs to store as
+ * extent/dentries tree as extended attributes tree.
+ * So, 0x80 bytes is used for storing two btrees.
+ *
+ * The root node's indexes has pre-defined type.
+ * If height of the tree equals to 1 - 3 range then
+ * root node's indexes define hybrid nodes. Otherwise,
+ * if tree's height is greater than 3 then root node's
+ * indexes define pure index nodes.
+ */
+struct ssdfs_btree_inline_root_node {
+/* 0x0000 */
+	struct ssdfs_btree_root_node_header header;
+
+/* 0x0010 */
+#define SSDFS_ROOT_NODE_LEFT_LEAF_NODE		(0)
+#define SSDFS_ROOT_NODE_RIGHT_LEAF_NODE		(1)
+#define SSDFS_BTREE_ROOT_NODE_INDEX_COUNT	(2)
+	struct ssdfs_btree_index indexes[SSDFS_BTREE_ROOT_NODE_INDEX_COUNT];
+
+/* 0x0040 */
+} __packed;
+
+/*
+ * struct ssdfs_inodes_btree - inodes btree
+ * @desc: btree descriptor
+ * @allocated_inodes: count of allocated inodes
+ * @free_inodes: count of free inodes
+ * @inodes_capacity: count of inodes in the whole btree
+ * @leaf_nodes: count of leaf btree nodes
+ * @nodes_count: count of nodes in the whole btree
+ * @upper_allocated_ino: maximal allocated inode ID number
+ * @root_node: btree's root node
+ *
+ * The goal of a btree root is to keep
+ * the main features of a tree and knowledge
+ * about two root indexes. These indexes splits
+ * the whole btree on two branches.
+ */
+struct ssdfs_inodes_btree {
+/* 0x0000 */
+	struct ssdfs_btree_descriptor desc;
+
+/* 0x0010 */
+	__le64 allocated_inodes;
+	__le64 free_inodes;
+
+/* 0x0020 */
+	__le64 inodes_capacity;
+	__le32 leaf_nodes;
+	__le32 nodes_count;
+
+/* 0x0030 */
+	__le64 upper_allocated_ino;
+	__le8 reserved[0x8];
+
+/* 0x0040 */
+	struct ssdfs_btree_inline_root_node root_node;
+
+/* 0x0080 */
+} __packed;
+
+/*
+ * struct ssdfs_shared_extents_btree - shared extents btree
+ * @desc: btree descriptor
+ * @root_node: btree's root node
+ *
+ * The goal of a btree root is to keep
+ * the main features of a tree and knowledge
+ * about two root indexes. These indexes splits
+ * the whole btree on two branches.
+ */
+struct ssdfs_shared_extents_btree {
+/* 0x0000 */
+	struct ssdfs_btree_descriptor desc;
+
+/* 0x0010 */
+	__le8 reserved[0x30];
+
+/* 0x0040 */
+	struct ssdfs_btree_inline_root_node root_node;
+
+/* 0x0080 */
+} __packed;
+
+/*
+ * ssdfs_shared_dictionary_btree - shared strings dictionary btree
+ * @desc: btree descriptor
+ * @root_node: btree's root node
+ *
+ * The goal of a btree root is to keep
+ * the main features of a tree and knowledge
+ * about two root indexes. These indexes splits
+ * the whole btree on two branches.
+ */
+struct ssdfs_shared_dictionary_btree {
+/* 0x0000 */
+	struct ssdfs_btree_descriptor desc;
+
+/* 0x0010 */
+	__le8 reserved[0x30];
+
+/* 0x0040 */
+	struct ssdfs_btree_inline_root_node root_node;
+
+/* 0x0080 */
+} __packed;
+
+/*
+ * struct ssdfs_shared_xattr_btree - shared extended attributes btree
+ * @desc: btree descriptor
+ * @root_node: btree's root node
+ *
+ * The goal of a btree root is to keep
+ * the main features of a tree and knowledge
+ * about two root indexes. These indexes splits
+ * the whole btree on two branches.
+ */
+struct ssdfs_shared_xattr_btree {
+/* 0x0000 */
+	struct ssdfs_btree_descriptor desc;
+
+/* 0x0010 */
+	__le8 reserved[0x30];
+
+/* 0x0040 */
+	struct ssdfs_btree_inline_root_node root_node;
+
+/* 0x0080 */
+} __packed;
+
+/*
+ * struct ssdfs_snapshots_btree - snapshots btree
+ * @desc: btree descriptor
+ * @root_node: btree's root node
+ *
+ * The goal of a btree root is to keep
+ * the main features of a tree and knowledge
+ * about two root indexes. These indexes splits
+ * the whole btree on two branches.
+ */
+struct ssdfs_snapshots_btree {
+/* 0x0000 */
+	struct ssdfs_btree_descriptor desc;
+
+/* 0x0010 */
+	__le8 reserved[0x30];
+
+/* 0x0040 */
+	struct ssdfs_btree_inline_root_node root_node;
+
+/* 0x0080 */
+} __packed;
+
+/*
+ * struct ssdfs_invalidated_extents_btree - invalidated extents btree
+ * @desc: btree descriptor
+ * @root_node: btree's root node
+ *
+ * The goal of a btree root is to keep
+ * the main features of a tree and knowledge
+ * about two root indexes. These indexes splits
+ * the whole btree on two branches.
+ */
+struct ssdfs_invalidated_extents_btree {
+/* 0x0000 */
+	struct ssdfs_btree_descriptor desc;
+
+/* 0x0010 */
+	__le8 reserved[0x30];
+
+/* 0x0040 */
+	struct ssdfs_btree_inline_root_node root_node;
+
+/* 0x0080 */
+} __packed;
+
+enum {
+	SSDFS_CUR_DATA_SEG,
+	SSDFS_CUR_LNODE_SEG,
+	SSDFS_CUR_HNODE_SEG,
+	SSDFS_CUR_IDXNODE_SEG,
+	SSDFS_CUR_DATA_UPDATE_SEG,
+	SSDFS_CUR_SEGS_COUNT,
+};
+
+/*
+ * struct ssdfs_blk_bmap_options - block bitmap options
+ * @flags: block bitmap's flags
+ * @compression: compression type
+ */
+struct ssdfs_blk_bmap_options {
+/* 0x0000 */
+#define SSDFS_BLK_BMAP_CREATE_COPY		(1 << 0)
+#define SSDFS_BLK_BMAP_MAKE_COMPRESSION		(1 << 1)
+#define SSDFS_BLK_BMAP_OPTIONS_MASK		(0x3)
+	__le16 flags;
+#define SSDFS_BLK_BMAP_NOCOMPR_TYPE		(0)
+#define SSDFS_BLK_BMAP_ZLIB_COMPR_TYPE		(1)
+#define SSDFS_BLK_BMAP_LZO_COMPR_TYPE		(2)
+	__le8 compression;
+	__le8 reserved;
+
+/* 0x0004 */
+} __packed;
+
+/*
+ * struct ssdfs_blk2off_tbl_options - offset translation table options
+ * @flags: offset translation table's flags
+ * @compression: compression type
+ */
+struct ssdfs_blk2off_tbl_options {
+/* 0x0000 */
+#define SSDFS_BLK2OFF_TBL_CREATE_COPY		(1 << 0)
+#define SSDFS_BLK2OFF_TBL_MAKE_COMPRESSION	(1 << 1)
+#define SSDFS_BLK2OFF_TBL_OPTIONS_MASK		(0x3)
+	__le16 flags;
+#define SSDFS_BLK2OFF_TBL_NOCOMPR_TYPE		(0)
+#define SSDFS_BLK2OFF_TBL_ZLIB_COMPR_TYPE	(1)
+#define SSDFS_BLK2OFF_TBL_LZO_COMPR_TYPE	(2)
+	__le8 compression;
+	__le8 reserved;
+
+/* 0x0004 */
+} __packed;
+
+/*
+ * struct ssdfs_user_data_options - user data options
+ * @flags: user data's flags
+ * @compression: compression type
+ * @migration_threshold: default value of destination PEBs in migration
+ */
+struct ssdfs_user_data_options {
+/* 0x0000 */
+#define SSDFS_USER_DATA_MAKE_COMPRESSION	(1 << 0)
+#define SSDFS_USER_DATA_OPTIONS_MASK		(0x1)
+	__le16 flags;
+#define SSDFS_USER_DATA_NOCOMPR_TYPE		(0)
+#define SSDFS_USER_DATA_ZLIB_COMPR_TYPE		(1)
+#define SSDFS_USER_DATA_LZO_COMPR_TYPE		(2)
+	__le8 compression;
+	__le8 reserved1;
+	__le16 migration_threshold;
+	__le16 reserved2;
+
+/* 0x0008 */
+} __packed;
+
+#define SSDFS_INODE_HASNT_INLINE_FORKS		(0)
+#define SSDFS_INLINE_FORKS_COUNT		(2)
+#define SSDFS_INLINE_EXTENTS_COUNT		(3)
+
+/*
+ * struct ssdfs_raw_fork - contiguous sequence of raw (on-disk) extents
+ * @start_offset: start logical offset in pages (blocks) from file's beginning
+ * @blks_count: count of logical blocks in the fork (no holes)
+ * @extents: sequence of raw (on-disk) extents
+ */
+struct ssdfs_raw_fork {
+/* 0x0000 */
+	__le64 start_offset;
+	__le64 blks_count;
+
+/* 0x0010 */
+	struct ssdfs_raw_extent extents[SSDFS_INLINE_EXTENTS_COUNT];
+
+/* 0x0040 */
+} __packed;
+
+/*
+ * struct ssdfs_name_hash - hash of the name
+ * @raw: raw value of the hash64
+ *
+ * The name's hash is 64 bits wide (8 bytes). But the hash64 has
+ * special structure. The first 4 bytes are the low hash (hash32_lo)
+ * of the name. The second 4 bytes is the high hash (hash32_hi)
+ * of the name. If the name lesser or equal to 12 symbols (inline
+ * name's string) then hash32_hi will be equal to zero always.
+ * If the name is greater than 12 symbols then the hash32_hi
+ * will be the hash of the rest of the name (excluding the
+ * first 12 symbols). The hash32_lo will be defined by inline
+ * name's length. The inline names (12 symbols long) will be
+ * stored into dentries only. The regular names will be stored
+ * partially in the dentry (12 symbols) and the whole name string
+ * will be stored into shared dictionary.
+ */
+struct ssdfs_name_hash {
+/* 0x0000 */
+	__le64 raw;
+
+/* 0x0008 */
+} __packed;
+
+/* Name hash related macros */
+#define SSDFS_NAME_HASH(hash32_lo, hash32_hi)({ \
+	u64 hash64 = (u32)hash32_lo; \
+	hash64 <<= 32; \
+	hash64 |= hash32_hi; \
+	hash64; \
+})
+#define SSDFS_NAME_HASH_LE64(hash32_lo, hash32_hi) \
+	(cpu_to_le64(SSDFS_NAME_HASH(hash32_lo, hash32_hi)))
+#define LE64_TO_SSDFS_HASH32_LO(hash_le64) \
+	((u32)(le64_to_cpu(hash_le64) >> 32))
+#define SSDFS_HASH32_LO(hash64) \
+	((u32)(hash64 >> 32))
+#define LE64_TO_SSDFS_HASH32_HI(hash_le64) \
+	((u32)(le64_to_cpu(hash_le64) & 0xFFFFFFFF))
+#define SSDFS_HASH32_HI(hash64) \
+	((u32)(hash64 & 0xFFFFFFFF))
+
+/*
+ * struct ssdfs_dir_entry - directory entry
+ * @ino: inode number
+ * @hash_code: name string's hash code
+ * @name_len: name length in bytes
+ * @dentry_type: dentry type
+ * @file_type: directory file types
+ * @flags: dentry's flags
+ * @inline_string: inline copy of the name or exclusive storage of short name
+ */
+struct ssdfs_dir_entry {
+/* 0x0000 */
+	__le64 ino;
+	__le64 hash_code;
+
+/* 0x0010 */
+	__le8 name_len;
+	__le8 dentry_type;
+	__le8 file_type;
+	__le8 flags;
+#define SSDFS_DENTRY_INLINE_NAME_MAX_LEN	(12)
+	__le8 inline_string[SSDFS_DENTRY_INLINE_NAME_MAX_LEN];
+
+/* 0x0020 */
+} __packed;
+
+/* Dentry types */
+enum {
+	SSDFS_DENTRY_UNKNOWN_TYPE,
+	SSDFS_INLINE_DENTRY,
+	SSDFS_REGULAR_DENTRY,
+	SSDFS_DENTRY_TYPE_MAX
+};
+
+/*
+ * SSDFS directory file types.
+ */
+enum {
+	SSDFS_FT_UNKNOWN,
+	SSDFS_FT_REG_FILE,
+	SSDFS_FT_DIR,
+	SSDFS_FT_CHRDEV,
+	SSDFS_FT_BLKDEV,
+	SSDFS_FT_FIFO,
+	SSDFS_FT_SOCK,
+	SSDFS_FT_SYMLINK,
+	SSDFS_FT_MAX
+};
+
+/* Dentry flags */
+#define SSDFS_DENTRY_HAS_EXTERNAL_STRING	(1 << 0)
+#define SSDFS_DENTRY_FLAGS_MASK			0x1
+
+/*
+ * struct ssdfs_blob_extent - blob's extent descriptor
+ * @hash: blob's hash
+ * @extent: blob's extent
+ */
+struct ssdfs_blob_extent {
+/* 0x0000 */
+	__le64 hash;
+	__le64 reserved;
+	struct ssdfs_raw_extent extent;
+
+/* 0x0020 */
+} __packed;
+
+#define SSDFS_XATTR_INLINE_BLOB_MAX_LEN		(32)
+#define SSDFS_XATTR_EXTERNAL_BLOB_MAX_LEN	(32768)
+
+/*
+ * struct ssdfs_blob_bytes - inline blob's byte stream
+ * @bytes: byte stream
+ */
+struct ssdfs_blob_bytes {
+/* 0x0000 */
+	__le8 bytes[SSDFS_XATTR_INLINE_BLOB_MAX_LEN];
+
+/* 0x0020 */
+} __packed;
+
+/*
+ * struct ssdfs_xattr_entry - extended attribute entry
+ * @name_hash: hash of the name
+ * @inline_index: index of the inline xattr
+ * @name_len: length of the name
+ * @name_type: type of the name
+ * @name_flags: flags of the name
+ * @blob_len: blob length in bytes
+ * @blob_type: type of the blob
+ * @blob_flags: flags of the blob
+ * @inline_string: inline string of the name
+ * @blob.descriptor.hash: hash of the blob
+ * @blob.descriptor.extent: extent of the blob
+ * @blob.inline_value: inline value of the blob
+ *
+ * The extended attribute can be described by fixed size
+ * descriptor. The name of extended attribute can be inline
+ * or to be stored into the shared dictionary. If the name
+ * is greater than 16 symbols then it will be stored into shared
+ * dictionary. The blob part can be stored inline or,
+ * otherwise, the descriptor contains the hash of the blob
+ * and blob will be stored as ordinary file inside
+ * of logical blocks.
+ */
+struct ssdfs_xattr_entry {
+/* 0x0000 */
+	__le64 name_hash;
+
+/* 0x0008 */
+	__le8 inline_index;
+	__le8 name_len;
+	__le8 name_type;
+	__le8 name_flags;
+
+/* 0x000C */
+	__le16 blob_len;
+	__le8 blob_type;
+	__le8 blob_flags;
+
+/* 0x0010 */
+#define SSDFS_XATTR_INLINE_NAME_MAX_LEN	(16)
+	__le8 inline_string[SSDFS_XATTR_INLINE_NAME_MAX_LEN];
+
+/* 0x0020 */
+	union {
+		struct ssdfs_blob_extent descriptor;
+		struct ssdfs_blob_bytes inline_value;
+	} blob;
+
+/* 0x0040 */
+} __packed;
+
+/* registered names' prefixes */
+enum {
+	SSDFS_USER_NS_INDEX,
+	SSDFS_TRUSTED_NS_INDEX,
+	SSDFS_SYSTEM_NS_INDEX,
+	SSDFS_SECURITY_NS_INDEX,
+	SSDFS_REGISTERED_NS_NUMBER
+};
+
+static const char * const SSDFS_NS_PREFIX[] = {
+	"user.",
+	"trusted.",
+	"system.",
+	"security.",
+};
+
+/* xattr name types */
+enum {
+	SSDFS_XATTR_NAME_UNKNOWN_TYPE,
+	SSDFS_XATTR_INLINE_NAME,
+	SSDFS_XATTR_USER_INLINE_NAME,
+	SSDFS_XATTR_TRUSTED_INLINE_NAME,
+	SSDFS_XATTR_SYSTEM_INLINE_NAME,
+	SSDFS_XATTR_SECURITY_INLINE_NAME,
+	SSDFS_XATTR_REGULAR_NAME,
+	SSDFS_XATTR_USER_REGULAR_NAME,
+	SSDFS_XATTR_TRUSTED_REGULAR_NAME,
+	SSDFS_XATTR_SYSTEM_REGULAR_NAME,
+	SSDFS_XATTR_SECURITY_REGULAR_NAME,
+	SSDFS_XATTR_NAME_TYPE_MAX
+};
+
+/* xattr name flags */
+#define SSDFS_XATTR_HAS_EXTERNAL_STRING		(1 << 0)
+#define SSDFS_XATTR_NAME_FLAGS_MASK		0x1
+
+/* xattr blob types */
+enum {
+	SSDFS_XATTR_BLOB_UNKNOWN_TYPE,
+	SSDFS_XATTR_INLINE_BLOB,
+	SSDFS_XATTR_REGULAR_BLOB,
+	SSDFS_XATTR_BLOB_TYPE_MAX
+};
+
+/* xattr blob flags */
+#define SSDFS_XATTR_HAS_EXTERNAL_BLOB		(1 << 0)
+#define SSDFS_XATTR_BLOB_FLAGS_MASK		0x1
+
+#define SSDFS_INLINE_DENTRIES_PER_AREA		(2)
+#define SSDFS_INLINE_STREAM_SIZE_PER_AREA	(64)
+#define SSDFS_DEFAULT_INLINE_XATTR_COUNT	(1)
+
+/*
+ * struct ssdfs_inode_inline_stream - inode's inline stream
+ * @bytes: bytes array
+ */
+struct ssdfs_inode_inline_stream {
+/* 0x0000 */
+	__le8 bytes[SSDFS_INLINE_STREAM_SIZE_PER_AREA];
+
+/* 0x0040 */
+} __packed;
+
+/*
+ * struct ssdfs_inode_inline_dentries - inline dentries array
+ * @array: dentries array
+ */
+struct ssdfs_inode_inline_dentries {
+/* 0x0000 */
+	struct ssdfs_dir_entry array[SSDFS_INLINE_DENTRIES_PER_AREA];
+
+/* 0x0040 */
+} __packed;
+
+/*
+ * struct ssdfs_inode_private_area - inode's private area
+ * @area1.inline_stream: inline file's content
+ * @area1.extents_root: extents btree root node
+ * @area1.fork: inline fork
+ * @area1.dentries_root: dentries btree root node
+ * @area1.dentries: inline dentries
+ * @area2.inline_stream: inline file's content
+ * @area2.inline_xattr: inline extended attribute
+ * @area2.xattr_root: extended attributes btree root node
+ * @area2.fork: inline fork
+ * @area2.dentries: inline dentries
+ */
+struct ssdfs_inode_private_area {
+/* 0x0000 */
+	union {
+		struct ssdfs_inode_inline_stream inline_stream;
+		struct ssdfs_btree_inline_root_node extents_root;
+		struct ssdfs_raw_fork fork;
+		struct ssdfs_btree_inline_root_node dentries_root;
+		struct ssdfs_inode_inline_dentries dentries;
+	} area1;
+
+/* 0x0040 */
+	union {
+		struct ssdfs_inode_inline_stream inline_stream;
+		struct ssdfs_xattr_entry inline_xattr;
+		struct ssdfs_btree_inline_root_node xattr_root;
+		struct ssdfs_raw_fork fork;
+		struct ssdfs_inode_inline_dentries dentries;
+	} area2;
+
+/* 0x0080 */
+} __packed;
+
+/*
+ * struct ssdfs_inode - raw (on-disk) inode
+ * @magic: inode magic
+ * @mode: file mode
+ * @flags: file attributes
+ * @uid: owner user ID
+ * @gid: owner group ID
+ * @atime: access time (seconds)
+ * @ctime: change time (seconds)
+ * @mtime: modification time (seconds)
+ * @birthtime: inode creation time (seconds)
+ * @atime_nsec: access time in nano scale
+ * @ctime_nsec: change time in nano scale
+ * @mtime_nsec: modification time in nano scale
+ * @birthtime_nsec: creation time in nano scale
+ * @generation: file version (for NFS)
+ * @size: file size in bytes
+ * @blocks: file size in blocks
+ * @parent_ino: parent inode number
+ * @refcount: links count
+ * @checksum: inode checksum
+ * @ino: inode number
+ * @hash_code: hash code of file name
+ * @name_len: lengh of file name
+ * @forks_count: count of forks
+ * @internal: array of inline private areas of inode
+ */
+struct ssdfs_inode {
+/* 0x0000 */
+	__le16 magic;			/* Inode magic */
+	__le16 mode;			/* File mode */
+	__le32 flags;			/* file attributes */
+
+/* 0x0008 */
+	__le32 uid;			/* user ID */
+	__le32 gid;			/* group ID */
+
+/* 0x0010 */
+	__le64 atime;			/* access time */
+	__le64 ctime;			/* change time */
+	__le64 mtime;			/* modification time */
+	__le64 birthtime;		/* inode creation time */
+
+/* 0x0030 */
+	__le32 atime_nsec;		/* access time in nano scale */
+	__le32 ctime_nsec;		/* change time in nano scale */
+	__le32 mtime_nsec;		/* modification time in nano scale */
+	__le32 birthtime_nsec;		/* creation time in nano scale */
+
+/* 0x0040 */
+	__le64 generation;		/* file version (for NFS) */
+	__le64 size;			/* file size in bytes */
+	__le64 blocks;			/* file size in blocks */
+	__le64 parent_ino;		/* parent inode number */
+
+/* 0x0060 */
+	__le32 refcount;		/* links count */
+	__le32 checksum;		/* inode checksum */
+
+/* 0x0068 */
+	__le64 ino;			/* Inode number */
+	__le64 hash_code;		/* hash code of file name */
+	__le16 name_len;		/* lengh of file name */
+#define SSDFS_INODE_HAS_INLINE_EXTENTS		(1 << 0)
+#define SSDFS_INODE_HAS_EXTENTS_BTREE		(1 << 1)
+#define SSDFS_INODE_HAS_INLINE_DENTRIES		(1 << 2)
+#define SSDFS_INODE_HAS_DENTRIES_BTREE		(1 << 3)
+#define SSDFS_INODE_HAS_INLINE_XATTR		(1 << 4)
+#define SSDFS_INODE_HAS_XATTR_BTREE		(1 << 5)
+#define SSDFS_INODE_HAS_INLINE_FILE		(1 << 6)
+#define SSDFS_INODE_PRIVATE_FLAGS_MASK		0x7F
+	__le16 private_flags;
+
+	union {
+		__le32 forks;
+		__le32 dentries;
+	} count_of __packed;
+
+/* 0x0080 */
+	struct ssdfs_inode_private_area internal[1];
+
+/* 0x0100 */
+} __packed;
+
+#define SSDFS_IFREG_PRIVATE_FLAG_MASK \
+	(SSDFS_INODE_HAS_INLINE_EXTENTS | \
+	 SSDFS_INODE_HAS_EXTENTS_BTREE | \
+	 SSDFS_INODE_HAS_INLINE_XATTR | \
+	 SSDFS_INODE_HAS_XATTR_BTREE | \
+	 SSDFS_INODE_HAS_INLINE_FILE)
+
+#define SSDFS_IFDIR_PRIVATE_FLAG_MASK \
+	(SSDFS_INODE_HAS_INLINE_DENTRIES | \
+	 SSDFS_INODE_HAS_DENTRIES_BTREE | \
+	 SSDFS_INODE_HAS_INLINE_XATTR | \
+	 SSDFS_INODE_HAS_XATTR_BTREE)
+
+/*
+ * struct ssdfs_volume_header - static part of superblock
+ * @magic: magic signature + revision
+ * @check: metadata checksum
+ * @log_pagesize: log2(page size)
+ * @log_erasesize: log2(erase block size)
+ * @log_segsize: log2(segment size)
+ * @log_pebs_per_seg: log2(erase blocks per segment)
+ * @megabytes_per_peb: MBs in one PEB
+ * @pebs_per_seg: number of PEBs per segment
+ * @create_time: volume create timestamp (mkfs phase)
+ * @create_cno: volume create checkpoint
+ * @flags: volume creation flags
+ * @lebs_per_peb_index: difference of LEB IDs between PEB indexes in segment
+ * @sb_pebs: array of prev, cur and next superblock's PEB numbers
+ * @segbmap: superblock's segment bitmap header
+ * @maptbl: superblock's mapping table header
+ * @sb_seg_log_pages: full log size in sb segment (pages count)
+ * @segbmap_log_pages: full log size in segbmap segment (pages count)
+ * @maptbl_log_pages: full log size in maptbl segment (pages count)
+ * @lnodes_seg_log_pages: full log size in leaf nodes segment (pages count)
+ * @hnodes_seg_log_pages: full log size in hybrid nodes segment (pages count)
+ * @inodes_seg_log_pages: full log size in index nodes segment (pages count)
+ * @user_data_log_pages: full log size in user data segment (pages count)
+ * @create_threads_per_seg: number of creation threads per segment
+ * @dentries_btree: descriptor of all dentries btrees
+ * @extents_btree: descriptor of all extents btrees
+ * @xattr_btree: descriptor of all extended attributes btrees
+ * @invextree: b-tree of invalidated extents (ZNS SSD)
+ * @uuid: 128-bit uuid for volume
+ */
+struct ssdfs_volume_header {
+/* 0x0000 */
+	struct ssdfs_signature magic;
+
+/* 0x0008 */
+	struct ssdfs_metadata_check check;
+
+/* 0x0010 */
+	__le8 log_pagesize;
+	__le8 log_erasesize;
+	__le8 log_segsize;
+	__le8 log_pebs_per_seg;
+	__le16 megabytes_per_peb;
+	__le16 pebs_per_seg;
+
+/* 0x0018 */
+	__le64 create_time;
+	__le64 create_cno;
+#define SSDFS_VH_ZNS_BASED_VOLUME	(1 << 0)
+#define SSDFS_VH_UNALIGNED_ZONE		(1 << 1)
+#define SSDFS_VH_FLAGS_MASK		(0x3)
+	__le32 flags;
+	__le32 lebs_per_peb_index;
+
+/* 0x0030 */
+#define VH_LIMIT1	SSDFS_SB_CHAIN_MAX
+#define VH_LIMIT2	SSDFS_SB_SEG_COPY_MAX
+	struct ssdfs_leb2peb_pair sb_pebs[VH_LIMIT1][VH_LIMIT2];
+
+/* 0x00B0 */
+	struct ssdfs_segbmap_sb_header segbmap;
+
+/* 0x0140 */
+	struct ssdfs_maptbl_sb_header maptbl;
+
+/* 0x01D0 */
+	__le16 sb_seg_log_pages;
+	__le16 segbmap_log_pages;
+	__le16 maptbl_log_pages;
+	__le16 lnodes_seg_log_pages;
+	__le16 hnodes_seg_log_pages;
+	__le16 inodes_seg_log_pages;
+	__le16 user_data_log_pages;
+	__le16 create_threads_per_seg;
+
+/* 0x01E0 */
+	struct ssdfs_dentries_btree_descriptor dentries_btree;
+
+/* 0x0200 */
+	struct ssdfs_extents_btree_descriptor extents_btree;
+
+/* 0x0220 */
+	struct ssdfs_xattr_btree_descriptor xattr_btree;
+
+/* 0x0240 */
+	struct ssdfs_invalidated_extents_btree invextree;
+
+/* 0x02C0 */
+	__le8 uuid[SSDFS_UUID_SIZE];
+
+/* 0x02D0 */
+	__le8 reserved4[0x130];
+
+/* 0x0400 */
+} __packed;
+
+#define SSDFS_LEBS_PER_PEB_INDEX_DEFAULT	(1)
+
+/*
+ * struct ssdfs_volume_state - changeable part of superblock
+ * @magic: magic signature + revision
+ * @check: metadata checksum
+ * @nsegs: segments count
+ * @free_pages: free pages count
+ * @timestamp: write timestamp
+ * @cno: write checkpoint
+ * @flags: volume flags
+ * @state: file system state
+ * @errors: behaviour when detecting errors
+ * @feature_compat: compatible feature set
+ * @feature_compat_ro: read-only compatible feature set
+ * @feature_incompat: incompatible feature set
+ * @uuid: 128-bit uuid for volume
+ * @label: volume name
+ * @cur_segs: array of current segment numbers
+ * @migration_threshold: default value of destination PEBs in migration
+ * @blkbmap: block bitmap options
+ * @blk2off_tbl: offset translation table options
+ * @user_data: user data options
+ * @open_zones: number of open/active zones
+ * @root_folder: copy of root folder's inode
+ * @inodes_btree: inodes btree root
+ * @shared_extents_btree: shared extents btree root
+ * @shared_dict_btree: shared dictionary btree root
+ * @snapshots_btree: snapshots btree root
+ */
+struct ssdfs_volume_state {
+/* 0x0000 */
+	struct ssdfs_signature magic;
+
+/* 0x0008 */
+	struct ssdfs_metadata_check check;
+
+/* 0x0010 */
+	__le64 nsegs;
+	__le64 free_pages;
+
+/* 0x0020 */
+	__le64 timestamp;
+	__le64 cno;
+
+/* 0x0030 */
+#define SSDFS_HAS_INLINE_INODES_TREE		(1 << 0)
+#define SSDFS_VOLUME_STATE_FLAGS_MASK		0x1
+	__le32 flags;
+	__le16 state;
+	__le16 errors;
+
+/* 0x0038 */
+	__le64 feature_compat;
+	__le64 feature_compat_ro;
+	__le64 feature_incompat;
+
+/* 0x0050 */
+	__le8 uuid[SSDFS_UUID_SIZE];
+	char label[SSDFS_VOLUME_LABEL_MAX];
+
+/* 0x0070 */
+	__le64 cur_segs[SSDFS_CUR_SEGS_COUNT];
+
+/* 0x0098 */
+	__le16 migration_threshold;
+	__le16 reserved1;
+
+/* 0x009C */
+	struct ssdfs_blk_bmap_options blkbmap;
+	struct ssdfs_blk2off_tbl_options blk2off_tbl;
+
+/* 0x00A4 */
+	struct ssdfs_user_data_options user_data;
+
+/* 0x00AC */
+	__le32 open_zones;
+
+/* 0x00B0 */
+	struct ssdfs_inode root_folder;
+
+/* 0x01B0 */
+	__le8 reserved3[0x50];
+
+/* 0x0200 */
+	struct ssdfs_inodes_btree inodes_btree;
+
+/* 0x0280 */
+	struct ssdfs_shared_extents_btree shared_extents_btree;
+
+/* 0x0300 */
+	struct ssdfs_shared_dictionary_btree shared_dict_btree;
+
+/* 0x0380 */
+	struct ssdfs_snapshots_btree snapshots_btree;
+
+/* 0x0400 */
+} __packed;
+
+/* Compatible feature flags */
+#define SSDFS_HAS_SEGBMAP_COMPAT_FLAG			(1 << 0)
+#define SSDFS_HAS_MAPTBL_COMPAT_FLAG			(1 << 1)
+#define SSDFS_HAS_SHARED_EXTENTS_COMPAT_FLAG		(1 << 2)
+#define SSDFS_HAS_SHARED_XATTRS_COMPAT_FLAG		(1 << 3)
+#define SSDFS_HAS_SHARED_DICT_COMPAT_FLAG		(1 << 4)
+#define SSDFS_HAS_INODES_TREE_COMPAT_FLAG		(1 << 5)
+#define SSDFS_HAS_SNAPSHOTS_TREE_COMPAT_FLAG		(1 << 6)
+#define SSDFS_HAS_INVALID_EXTENTS_TREE_COMPAT_FLAG	(1 << 7)
+
+/* Read-Only compatible feature flags */
+#define SSDFS_ZLIB_COMPAT_RO_FLAG	(1 << 0)
+#define SSDFS_LZO_COMPAT_RO_FLAG	(1 << 1)
+
+#define SSDFS_FEATURE_COMPAT_SUPP \
+	(SSDFS_HAS_SEGBMAP_COMPAT_FLAG | SSDFS_HAS_MAPTBL_COMPAT_FLAG | \
+	 SSDFS_HAS_SHARED_EXTENTS_COMPAT_FLAG | \
+	 SSDFS_HAS_SHARED_XATTRS_COMPAT_FLAG | \
+	 SSDFS_HAS_SHARED_DICT_COMPAT_FLAG | \
+	 SSDFS_HAS_INODES_TREE_COMPAT_FLAG | \
+	 SSDFS_HAS_SNAPSHOTS_TREE_COMPAT_FLAG | \
+	 SSDFS_HAS_INVALID_EXTENTS_TREE_COMPAT_FLAG)
+
+#define SSDFS_FEATURE_COMPAT_RO_SUPP \
+	(SSDFS_ZLIB_COMPAT_RO_FLAG | SSDFS_LZO_COMPAT_RO_FLAG)
+
+#define SSDFS_FEATURE_INCOMPAT_SUPP	0ULL
+
+/*
+ * struct ssdfs_metadata_descriptor - metadata descriptor
+ * @offset: offset in bytes
+ * @size: size in bytes
+ * @check: metadata checksum
+ */
+struct ssdfs_metadata_descriptor {
+/* 0x0000 */
+	__le32 offset;
+	__le32 size;
+	struct ssdfs_metadata_check check;
+
+/* 0x0010 */
+} __packed;
+
+enum {
+	SSDFS_BLK_BMAP_INDEX,
+	SSDFS_SNAPSHOT_RULES_AREA_INDEX,
+	SSDFS_OFF_TABLE_INDEX,
+	SSDFS_COLD_PAYLOAD_AREA_INDEX,
+	SSDFS_WARM_PAYLOAD_AREA_INDEX,
+	SSDFS_HOT_PAYLOAD_AREA_INDEX,
+	SSDFS_BLK_DESC_AREA_INDEX,
+	SSDFS_MAPTBL_CACHE_INDEX,
+	SSDFS_LOG_FOOTER_INDEX,
+	SSDFS_SEG_HDR_DESC_MAX = SSDFS_LOG_FOOTER_INDEX + 1,
+	SSDFS_LOG_FOOTER_DESC_MAX = SSDFS_OFF_TABLE_INDEX + 1,
+};
+
+enum {
+	SSDFS_PREV_MIGRATING_PEB,
+	SSDFS_CUR_MIGRATING_PEB,
+	SSDFS_MIGRATING_PEBS_CHAIN
+};
+
+/*
+ * struct ssdfs_segment_header - header of segment
+ * @volume_hdr: copy of static part of superblock
+ * @timestamp: log creation timestamp
+ * @cno: log checkpoint
+ * @log_pages: size of log (partial segment) in pages count
+ * @seg_type: type of segment
+ * @seg_flags: flags of segment
+ * @desc_array: array of segment's metadata descriptors
+ * @peb_migration_id: identification number of PEB in migration sequence
+ * @peb_create_time: PEB creation timestamp
+ * @seg_id: segment ID that contains this PEB
+ * @leb_id: LEB ID that mapped with this PEB
+ * @peb_id: PEB ID
+ * @relation_peb_id: source PEB ID during migration
+ * @payload: space for segment header's payload
+ */
+struct ssdfs_segment_header {
+/* 0x0000 */
+	struct ssdfs_volume_header volume_hdr;
+
+/* 0x0400 */
+	__le64 timestamp;
+	__le64 cno;
+
+/* 0x0410 */
+	__le16 log_pages;
+	__le16 seg_type;
+	__le32 seg_flags;
+
+/* 0x0418 */
+	struct ssdfs_metadata_descriptor desc_array[SSDFS_SEG_HDR_DESC_MAX];
+
+/* 0x04A8 */
+#define SSDFS_PEB_UNKNOWN_MIGRATION_ID		(0)
+#define SSDFS_PEB_MIGRATION_ID_START		(1)
+#define SSDFS_PEB_MIGRATION_ID_MAX		(U8_MAX)
+	__le8 peb_migration_id[SSDFS_MIGRATING_PEBS_CHAIN];
+	__le8 reserved[0x6];
+
+/* 0x4B0 */
+	__le64 peb_create_time;
+
+/* 0x4B8 */
+	__le64 seg_id;
+	__le64 leb_id;
+	__le64 peb_id;
+	__le64 relation_peb_id;
+
+/* 0x4D8 */
+	__le8 payload[0x328];
+
+/* 0x0800 */
+} __packed;
+
+/* Possible segment types */
+#define SSDFS_UNKNOWN_SEG_TYPE			(0)
+#define SSDFS_SB_SEG_TYPE			(1)
+#define SSDFS_INITIAL_SNAPSHOT_SEG_TYPE		(2)
+#define SSDFS_SEGBMAP_SEG_TYPE			(3)
+#define SSDFS_MAPTBL_SEG_TYPE			(4)
+#define SSDFS_LEAF_NODE_SEG_TYPE		(5)
+#define SSDFS_HYBRID_NODE_SEG_TYPE		(6)
+#define SSDFS_INDEX_NODE_SEG_TYPE		(7)
+#define SSDFS_USER_DATA_SEG_TYPE		(8)
+#define SSDFS_LAST_KNOWN_SEG_TYPE		SSDFS_USER_DATA_SEG_TYPE
+
+/* Segment flags' bits */
+#define SSDFS_BLK_BMAP_BIT			(0)
+#define SSDFS_OFFSET_TABLE_BIT			(1)
+#define SSDFS_COLD_PAYLOAD_BIT			(2)
+#define SSDFS_WARM_PAYLOAD_BIT			(3)
+#define SSDFS_HOT_PAYLOAD_BIT			(4)
+#define SSDFS_BLK_DESC_CHAIN_BIT		(5)
+#define SSDFS_MAPTBL_CACHE_BIT			(6)
+#define SSDFS_FOOTER_BIT			(7)
+#define SSDFS_PARTIAL_LOG_BIT			(8)
+#define SSDFS_PARTIAL_LOG_HEADER_BIT		(9)
+#define SSDFS_PLH_INSTEAD_FOOTER_BIT		(10)
+
+/* Segment flags */
+#define SSDFS_SEG_HDR_HAS_BLK_BMAP		(1 << SSDFS_BLK_BMAP_BIT)
+#define SSDFS_SEG_HDR_HAS_OFFSET_TABLE		(1 << SSDFS_OFFSET_TABLE_BIT)
+#define SSDFS_LOG_HAS_COLD_PAYLOAD		(1 << SSDFS_COLD_PAYLOAD_BIT)
+#define SSDFS_LOG_HAS_WARM_PAYLOAD		(1 << SSDFS_WARM_PAYLOAD_BIT)
+#define SSDFS_LOG_HAS_HOT_PAYLOAD		(1 << SSDFS_HOT_PAYLOAD_BIT)
+#define SSDFS_LOG_HAS_BLK_DESC_CHAIN		(1 << SSDFS_BLK_DESC_CHAIN_BIT)
+#define SSDFS_LOG_HAS_MAPTBL_CACHE		(1 << SSDFS_MAPTBL_CACHE_BIT)
+#define SSDFS_LOG_HAS_FOOTER			(1 << SSDFS_FOOTER_BIT)
+#define SSDFS_LOG_IS_PARTIAL			(1 << SSDFS_PARTIAL_LOG_BIT)
+#define SSDFS_LOG_HAS_PARTIAL_HEADER		(1 << SSDFS_PARTIAL_LOG_HEADER_BIT)
+#define SSDFS_PARTIAL_HEADER_INSTEAD_FOOTER	(1 << SSDFS_PLH_INSTEAD_FOOTER_BIT)
+#define SSDFS_SEG_HDR_FLAG_MASK			0x7FF
+
+/* Segment flags manipulation functions */
+#define SSDFS_SEG_HDR_FNS(bit, name)					\
+static inline void ssdfs_set_##name(struct ssdfs_segment_header *hdr)	\
+{									\
+	unsigned long seg_flags = le32_to_cpu(hdr->seg_flags);		\
+	set_bit(SSDFS_##bit, &seg_flags);				\
+	hdr->seg_flags = cpu_to_le32((u32)seg_flags);			\
+}									\
+static inline void ssdfs_clear_##name(struct ssdfs_segment_header *hdr)	\
+{									\
+	unsigned long seg_flags = le32_to_cpu(hdr->seg_flags);		\
+	clear_bit(SSDFS_##bit, &seg_flags);				\
+	hdr->seg_flags = cpu_to_le32((u32)seg_flags);			\
+}									\
+static inline int ssdfs_##name(struct ssdfs_segment_header *hdr)	\
+{									\
+	unsigned long seg_flags = le32_to_cpu(hdr->seg_flags);		\
+	return test_bit(SSDFS_##bit, &seg_flags);			\
+}
+
+/*
+ * ssdfs_set_seg_hdr_has_blk_bmap()
+ * ssdfs_clear_seg_hdr_has_blk_bmap()
+ * ssdfs_seg_hdr_has_blk_bmap()
+ */
+SSDFS_SEG_HDR_FNS(BLK_BMAP_BIT, seg_hdr_has_blk_bmap)
+
+/*
+ * ssdfs_set_seg_hdr_has_offset_table()
+ * ssdfs_clear_seg_hdr_has_offset_table()
+ * ssdfs_seg_hdr_has_offset_table()
+ */
+SSDFS_SEG_HDR_FNS(OFFSET_TABLE_BIT, seg_hdr_has_offset_table)
+
+/*
+ * ssdfs_set_log_has_cold_payload()
+ * ssdfs_clear_log_has_cold_payload()
+ * ssdfs_log_has_cold_payload()
+ */
+SSDFS_SEG_HDR_FNS(COLD_PAYLOAD_BIT, log_has_cold_payload)
+
+/*
+ * ssdfs_set_log_has_warm_payload()
+ * ssdfs_clear_log_has_warm_payload()
+ * ssdfs_log_has_warm_payload()
+ */
+SSDFS_SEG_HDR_FNS(WARM_PAYLOAD_BIT, log_has_warm_payload)
+
+/*
+ * ssdfs_set_log_has_hot_payload()
+ * ssdfs_clear_log_has_hot_payload()
+ * ssdfs_log_has_hot_payload()
+ */
+SSDFS_SEG_HDR_FNS(HOT_PAYLOAD_BIT, log_has_hot_payload)
+
+/*
+ * ssdfs_set_log_has_blk_desc_chain()
+ * ssdfs_clear_log_has_blk_desc_chain()
+ * ssdfs_log_has_blk_desc_chain()
+ */
+SSDFS_SEG_HDR_FNS(BLK_DESC_CHAIN_BIT, log_has_blk_desc_chain)
+
+/*
+ * ssdfs_set_log_has_maptbl_cache()
+ * ssdfs_clear_log_has_maptbl_cache()
+ * ssdfs_log_has_maptbl_cache()
+ */
+SSDFS_SEG_HDR_FNS(MAPTBL_CACHE_BIT, log_has_maptbl_cache)
+
+/*
+ * ssdfs_set_log_has_footer()
+ * ssdfs_clear_log_has_footer()
+ * ssdfs_log_has_footer()
+ */
+SSDFS_SEG_HDR_FNS(FOOTER_BIT, log_has_footer)
+
+/*
+ * ssdfs_set_log_is_partial()
+ * ssdfs_clear_log_is_partial()
+ * ssdfs_log_is_partial()
+ */
+SSDFS_SEG_HDR_FNS(PARTIAL_LOG_BIT, log_is_partial)
+
+/*
+ * ssdfs_set_log_has_partial_header()
+ * ssdfs_clear_log_has_partial_header()
+ * ssdfs_log_has_partial_header()
+ */
+SSDFS_SEG_HDR_FNS(PARTIAL_LOG_HEADER_BIT, log_has_partial_header)
+
+/*
+ * ssdfs_set_partial_header_instead_footer()
+ * ssdfs_clear_partial_header_instead_footer()
+ * ssdfs_partial_header_instead_footer()
+ */
+SSDFS_SEG_HDR_FNS(PLH_INSTEAD_FOOTER_BIT, partial_header_instead_footer)
+
+/*
+ * struct ssdfs_log_footer - footer of partial log
+ * @volume_state: changeable part of superblock
+ * @timestamp: writing timestamp
+ * @cno: writing checkpoint
+ * @log_bytes: payload size in bytes
+ * @log_flags: flags of log
+ * @reserved1: reserved field
+ * @desc_array: array of footer's metadata descriptors
+ * @peb_create_time: PEB creation timestamp
+ * @payload: space for log footer's payload
+ */
+struct ssdfs_log_footer {
+/* 0x0000 */
+	struct ssdfs_volume_state volume_state;
+
+/* 0x0400 */
+	__le64 timestamp;
+	__le64 cno;
+
+/* 0x0410 */
+	__le32 log_bytes;
+	__le32 log_flags;
+	__le64 reserved1;
+
+/* 0x0420 */
+	struct ssdfs_metadata_descriptor desc_array[SSDFS_LOG_FOOTER_DESC_MAX];
+
+/* 0x0450 */
+	__le64 peb_create_time;
+
+/* 0x0458 */
+	__le8 payload[0x3A8];
+
+/* 0x0800 */
+} __packed;
+
+/* Log footer flags' bits */
+#define __SSDFS_BLK_BMAP_BIT			(0)
+#define __SSDFS_OFFSET_TABLE_BIT		(1)
+#define __SSDFS_PARTIAL_LOG_BIT			(2)
+#define __SSDFS_ENDING_LOG_BIT			(3)
+#define __SSDFS_SNAPSHOT_RULE_AREA_BIT		(4)
+
+/* Log footer flags */
+#define SSDFS_LOG_FOOTER_HAS_BLK_BMAP		(1 << __SSDFS_BLK_BMAP_BIT)
+#define SSDFS_LOG_FOOTER_HAS_OFFSET_TABLE	(1 << __SSDFS_OFFSET_TABLE_BIT)
+#define SSDFS_PARTIAL_LOG_FOOTER		(1 << __SSDFS_PARTIAL_LOG_BIT)
+#define SSDFS_ENDING_LOG_FOOTER			(1 << __SSDFS_ENDING_LOG_BIT)
+#define SSDFS_LOG_FOOTER_HAS_SNAPSHOT_RULES	(1 << __SSDFS_SNAPSHOT_RULE_AREA_BIT)
+#define SSDFS_LOG_FOOTER_FLAG_MASK		0x1F
+
+/* Log footer flags manipulation functions */
+#define SSDFS_LOG_FOOTER_FNS(bit, name)					\
+static inline void ssdfs_set_##name(struct ssdfs_log_footer *footer)	\
+{									\
+	unsigned long log_flags = le32_to_cpu(footer->log_flags);	\
+	set_bit(__SSDFS_##bit, &log_flags);				\
+	footer->log_flags = cpu_to_le32((u32)log_flags);		\
+}									\
+static inline void ssdfs_clear_##name(struct ssdfs_log_footer *footer)	\
+{									\
+	unsigned long log_flags = le32_to_cpu(footer->log_flags);	\
+	clear_bit(__SSDFS_##bit, &log_flags);				\
+	footer->log_flags = cpu_to_le32((u32)log_flags);		\
+}									\
+static inline int ssdfs_##name(struct ssdfs_log_footer *footer)		\
+{									\
+	unsigned long log_flags = le32_to_cpu(footer->log_flags);	\
+	return test_bit(__SSDFS_##bit, &log_flags);			\
+}
+
+/*
+ * ssdfs_set_log_footer_has_blk_bmap()
+ * ssdfs_clear_log_footer_has_blk_bmap()
+ * ssdfs_log_footer_has_blk_bmap()
+ */
+SSDFS_LOG_FOOTER_FNS(BLK_BMAP_BIT, log_footer_has_blk_bmap)
+
+/*
+ * ssdfs_set_log_footer_has_offset_table()
+ * ssdfs_clear_log_footer_has_offset_table()
+ * ssdfs_log_footer_has_offset_table()
+ */
+SSDFS_LOG_FOOTER_FNS(OFFSET_TABLE_BIT, log_footer_has_offset_table)
+
+/*
+ * ssdfs_set_partial_log_footer()
+ * ssdfs_clear_partial_log_footer()
+ * ssdfs_partial_log_footer()
+ */
+SSDFS_LOG_FOOTER_FNS(PARTIAL_LOG_BIT, partial_log_footer)
+
+/*
+ * ssdfs_set_ending_log_footer()
+ * ssdfs_clear_ending_log_footer()
+ * ssdfs_ending_log_footer()
+ */
+SSDFS_LOG_FOOTER_FNS(ENDING_LOG_BIT, ending_log_footer)
+
+/*
+ * ssdfs_set_log_footer_has_snapshot_rules()
+ * ssdfs_clear_log_footer_has_snapshot_rules()
+ * ssdfs_log_footer_has_snapshot_rules()
+ */
+SSDFS_LOG_FOOTER_FNS(SNAPSHOT_RULE_AREA_BIT, log_footer_has_snapshot_rules)
+
+/*
+ * struct ssdfs_partial_log_header - header of partial log
+ * @magic: magic signature + revision
+ * @check: metadata checksum
+ * @timestamp: writing timestamp
+ * @cno: writing checkpoint
+ * @log_pages: size of log in pages count
+ * @seg_type: type of segment
+ * @pl_flags: flags of log
+ * @log_bytes: payload size in bytes
+ * @flags: volume flags
+ * @desc_array: array of log's metadata descriptors
+ * @nsegs: segments count
+ * @free_pages: free pages count
+ * @root_folder: copy of root folder's inode
+ * @inodes_btree: inodes btree root
+ * @shared_extents_btree: shared extents btree root
+ * @shared_dict_btree: shared dictionary btree root
+ * @sequence_id: index of partial log in the sequence
+ * @log_pagesize: log2(page size)
+ * @log_erasesize: log2(erase block size)
+ * @log_segsize: log2(segment size)
+ * @log_pebs_per_seg: log2(erase blocks per segment)
+ * @lebs_per_peb_index: difference of LEB IDs between PEB indexes in segment
+ * @create_threads_per_seg: number of creation threads per segment
+ * @snapshots_btree: snapshots btree root
+ * @open_zones: number of open/active zones
+ * @peb_create_time: PEB creation timestamp
+ * @invextree: invalidated extents btree root
+ * @seg_id: segment ID that contains this PEB
+ * @leb_id: LEB ID that mapped with this PEB
+ * @peb_id: PEB ID
+ * @relation_peb_id: source PEB ID during migration
+ * @uuid: 128-bit uuid for volume
+ * @volume_create_time: volume create timestamp (mkfs phase)
+ *
+ * This header is used when the full log needs to be built from several
+ * partial logs. The header represents the combination of the most
+ * essential fields of segment header and log footer. The first partial
+ * log starts from the segment header and partial log header. The next
+ * every partial log starts from the partial log header. Only the latest
+ * log ends with the log footer.
+ */
+struct ssdfs_partial_log_header {
+/* 0x0000 */
+	struct ssdfs_signature magic;
+
+/* 0x0008 */
+	struct ssdfs_metadata_check check;
+
+/* 0x0010 */
+	__le64 timestamp;
+	__le64 cno;
+
+/* 0x0020 */
+	__le16 log_pages;
+	__le16 seg_type;
+	__le32 pl_flags;
+
+/* 0x0028 */
+	__le32 log_bytes;
+	__le32 flags;
+
+/* 0x0030 */
+	struct ssdfs_metadata_descriptor desc_array[SSDFS_SEG_HDR_DESC_MAX];
+
+/* 0x00C0 */
+	__le64 nsegs;
+	__le64 free_pages;
+
+/* 0x00D0 */
+	struct ssdfs_inode root_folder;
+
+/* 0x01D0 */
+	struct ssdfs_inodes_btree inodes_btree;
+
+/* 0x0250 */
+	struct ssdfs_shared_extents_btree shared_extents_btree;
+
+/* 0x02D0 */
+	struct ssdfs_shared_dictionary_btree shared_dict_btree;
+
+/* 0x0350 */
+	__le32 sequence_id;
+	__le8 log_pagesize;
+	__le8 log_erasesize;
+	__le8 log_segsize;
+	__le8 log_pebs_per_seg;
+	__le32 lebs_per_peb_index;
+	__le16 create_threads_per_seg;
+	__le8 reserved1[0x2];
+
+/* 0x0360 */
+	struct ssdfs_snapshots_btree snapshots_btree;
+
+/* 0x03E0 */
+	__le32 open_zones;
+	__le8 reserved2[0x4];
+	__le64 peb_create_time;
+	__le8 reserved3[0x10];
+
+/* 0x0400 */
+	struct ssdfs_invalidated_extents_btree invextree;
+
+/* 0x0480 */
+	__le64 seg_id;
+	__le64 leb_id;
+	__le64 peb_id;
+	__le64 relation_peb_id;
+
+/* 0x04A0 */
+	__le8 uuid[SSDFS_UUID_SIZE];
+
+/* 0x04B0 */
+	__le64 volume_create_time;
+
+/* 0x04B8 */
+	__le8 payload[0x348];
+
+/* 0x0800 */
+} __packed;
+
+/* Partial log flags manipulation functions */
+#define SSDFS_PL_HDR_FNS(bit, name)					 \
+static inline void ssdfs_set_##name(struct ssdfs_partial_log_header *hdr) \
+{									 \
+	unsigned long pl_flags = le32_to_cpu(hdr->pl_flags);		 \
+	set_bit(SSDFS_##bit, &pl_flags);				 \
+	hdr->pl_flags = cpu_to_le32((u32)pl_flags);			 \
+}									 \
+static inline void ssdfs_clear_##name(struct ssdfs_partial_log_header *hdr) \
+{									 \
+	unsigned long pl_flags = le32_to_cpu(hdr->pl_flags);		 \
+	clear_bit(SSDFS_##bit, &pl_flags);				 \
+	hdr->pl_flags = cpu_to_le32((u32)pl_flags);			 \
+}									 \
+static inline int ssdfs_##name(struct ssdfs_partial_log_header *hdr)	 \
+{									 \
+	unsigned long pl_flags = le32_to_cpu(hdr->pl_flags);		 \
+	return test_bit(SSDFS_##bit, &pl_flags);			 \
+}
+
+/*
+ * ssdfs_set_pl_hdr_has_blk_bmap()
+ * ssdfs_clear_pl_hdr_has_blk_bmap()
+ * ssdfs_pl_hdr_has_blk_bmap()
+ */
+SSDFS_PL_HDR_FNS(BLK_BMAP_BIT, pl_hdr_has_blk_bmap)
+
+/*
+ * ssdfs_set_pl_hdr_has_offset_table()
+ * ssdfs_clear_pl_hdr_has_offset_table()
+ * ssdfs_pl_hdr_has_offset_table()
+ */
+SSDFS_PL_HDR_FNS(OFFSET_TABLE_BIT, pl_hdr_has_offset_table)
+
+/*
+ * ssdfs_set_pl_has_cold_payload()
+ * ssdfs_clear_pl_has_cold_payload()
+ * ssdfs_pl_has_cold_payload()
+ */
+SSDFS_PL_HDR_FNS(COLD_PAYLOAD_BIT, pl_has_cold_payload)
+
+/*
+ * ssdfs_set_pl_has_warm_payload()
+ * ssdfs_clear_pl_has_warm_payload()
+ * ssdfs_pl_has_warm_payload()
+ */
+SSDFS_PL_HDR_FNS(WARM_PAYLOAD_BIT, pl_has_warm_payload)
+
+/*
+ * ssdfs_set_pl_has_hot_payload()
+ * ssdfs_clear_pl_has_hot_payload()
+ * ssdfs_pl_has_hot_payload()
+ */
+SSDFS_PL_HDR_FNS(HOT_PAYLOAD_BIT, pl_has_hot_payload)
+
+/*
+ * ssdfs_set_pl_has_blk_desc_chain()
+ * ssdfs_clear_pl_has_blk_desc_chain()
+ * ssdfs_pl_has_blk_desc_chain()
+ */
+SSDFS_PL_HDR_FNS(BLK_DESC_CHAIN_BIT, pl_has_blk_desc_chain)
+
+/*
+ * ssdfs_set_pl_has_maptbl_cache()
+ * ssdfs_clear_pl_has_maptbl_cache()
+ * ssdfs_pl_has_maptbl_cache()
+ */
+SSDFS_PL_HDR_FNS(MAPTBL_CACHE_BIT, pl_has_maptbl_cache)
+
+/*
+ * ssdfs_set_pl_has_footer()
+ * ssdfs_clear_pl_has_footer()
+ * ssdfs_pl_has_footer()
+ */
+SSDFS_PL_HDR_FNS(FOOTER_BIT, pl_has_footer)
+
+/*
+ * ssdfs_set_pl_is_partial()
+ * ssdfs_clear_pl_is_partial()
+ * ssdfs_pl_is_partial()
+ */
+SSDFS_PL_HDR_FNS(PARTIAL_LOG_BIT, pl_is_partial)
+
+/*
+ * ssdfs_set_pl_has_partial_header()
+ * ssdfs_clear_pl_has_partial_header()
+ * ssdfs_pl_has_partial_header()
+ */
+SSDFS_PL_HDR_FNS(PARTIAL_LOG_HEADER_BIT, pl_has_partial_header)
+
+/*
+ * ssdfs_set_pl_header_instead_footer()
+ * ssdfs_clear_pl_header_instead_footer()
+ * ssdfs_pl_header_instead_footer()
+ */
+SSDFS_PL_HDR_FNS(PLH_INSTEAD_FOOTER_BIT, pl_header_instead_footer)
+
+/*
+ * struct ssdfs_diff_blob_header - diff blob header
+ * @magic: diff blob's magic
+ * @type: diff blob's type
+ * @desc_size: size of diff blob's descriptor in bytes
+ * @blob_size: size of diff blob in bytes
+ * @flags: diff blob's flags
+ */
+struct ssdfs_diff_blob_header {
+/* 0x0000 */
+	__le16 magic;
+	__le8 type;
+	__le8 desc_size;
+	__le16 blob_size;
+	__le16 flags;
+
+/* 0x0008 */
+} __packed;
+
+/* Diff blob flags */
+#define SSDFS_DIFF_BLOB_HAS_BTREE_NODE_HEADER	(1 << 0)
+#define SSDFS_DIFF_CHAIN_CONTAINS_NEXT_BLOB	(1 << 1)
+#define SSDFS_DIFF_BLOB_FLAGS_MASK		(0x3)
+
+/*
+ * struct ssdfs_metadata_diff_blob_header - metadata diff blob header
+ * @diff: generic diff blob header
+ * @bits_count: count of bits in bitmap
+ * @item_start_bit: item starting bit in bitmap
+ * @index_start_bit: index starting bit in bitmap
+ * @item_size: size of item in bytes
+ */
+struct ssdfs_metadata_diff_blob_header {
+/* 0x0000 */
+	struct ssdfs_diff_blob_header diff;
+
+/* 0x0008 */
+	__le16 bits_count;
+	__le16 item_start_bit;
+	__le16 index_start_bit;
+	__le16 item_size;
+
+/* 0x0010 */
+} __packed;
+
+/* Diff blob types */
+enum {
+	SSDFS_UNKNOWN_DIFF_BLOB_TYPE,
+	SSDFS_BTREE_NODE_DIFF_BLOB,
+	SSDFS_USER_DATA_DIFF_BLOB,
+	SSDFS_DIFF_BLOB_TYPE_MAX
+};
+
+/*
+ * struct ssdfs_fragments_chain_header - header of fragments' chain
+ * @compr_bytes: size of the whole fragments' chain in compressed state
+ * @uncompr_bytes: size of the whole fragments' chain in decompressed state
+ * @fragments_count: count of fragments in the chain
+ * @desc_size: size of one descriptor item
+ * @magic: fragments chain header magic
+ * @type: fragments chain header type
+ * @flags: flags of fragments' chain
+ */
+struct ssdfs_fragments_chain_header {
+/* 0x0000 */
+	__le32 compr_bytes;
+	__le32 uncompr_bytes;
+
+/* 0x0008 */
+	__le16 fragments_count;
+	__le16 desc_size;
+
+/* 0x000C */
+	__le8 magic;
+	__le8 type;
+	__le16 flags;
+
+/* 0x0010 */
+} __packed;
+
+/* Fragments chain types */
+#define SSDFS_UNKNOWN_CHAIN_HDR		0x0
+#define SSDFS_LOG_AREA_CHAIN_HDR	0x1
+#define SSDFS_BLK_STATE_CHAIN_HDR	0x2
+#define SSDFS_BLK_DESC_CHAIN_HDR	0x3
+#define SSDFS_BLK_DESC_ZLIB_CHAIN_HDR	0x4
+#define SSDFS_BLK_DESC_LZO_CHAIN_HDR	0x5
+#define SSDFS_BLK2OFF_CHAIN_HDR		0x6
+#define SSDFS_BLK2OFF_ZLIB_CHAIN_HDR	0x7
+#define SSDFS_BLK2OFF_LZO_CHAIN_HDR	0x8
+#define SSDFS_BLK_BMAP_CHAIN_HDR	0x9
+#define SSDFS_CHAIN_HDR_TYPE_MAX	(SSDFS_BLK_BMAP_CHAIN_HDR + 1)
+
+/* Fragments chain flags */
+#define SSDFS_MULTIPLE_HDR_CHAIN	(1 << 0)
+#define SSDFS_CHAIN_HDR_FLAG_MASK	0x1
+
+/* Fragments chain constants */
+#define SSDFS_FRAGMENTS_CHAIN_MAX		14
+#define SSDFS_BLK_BMAP_FRAGMENTS_CHAIN_MAX	64
+
+/*
+ * struct ssdfs_fragment_desc - fragment descriptor
+ * @offset: fragment's offset
+ * @compr_size: size of fragment in compressed state
+ * @uncompr_size: size of fragment after decompression
+ * @checksum: fragment checksum
+ * @sequence_id: fragment's sequential id number
+ * @magic: fragment descriptor's magic
+ * @type: fragment descriptor's type
+ * @flags: fragment descriptor's flags
+ */
+struct ssdfs_fragment_desc {
+/* 0x0000 */
+	__le32 offset;
+	__le16 compr_size;
+	__le16 uncompr_size;
+
+/* 0x0008 */
+	__le32 checksum;
+	__le8 sequence_id;
+	__le8 magic;
+	__le8 type;
+	__le8 flags;
+
+/* 0x0010 */
+} __packed;
+
+/* Fragment descriptor types */
+#define SSDFS_UNKNOWN_FRAGMENT_TYPE	0
+#define SSDFS_FRAGMENT_UNCOMPR_BLOB	1
+#define SSDFS_FRAGMENT_ZLIB_BLOB	2
+#define SSDFS_FRAGMENT_LZO_BLOB		3
+#define SSDFS_DATA_BLK_STATE_DESC	4
+#define SSDFS_DATA_BLK_DESC		5
+#define SSDFS_DATA_BLK_DESC_ZLIB	6
+#define SSDFS_DATA_BLK_DESC_LZO		7
+#define SSDFS_BLK2OFF_EXTENT_DESC	8
+#define SSDFS_BLK2OFF_EXTENT_DESC_ZLIB	9
+#define SSDFS_BLK2OFF_EXTENT_DESC_LZO	10
+#define SSDFS_BLK2OFF_DESC		11
+#define SSDFS_BLK2OFF_DESC_ZLIB		12
+#define SSDFS_BLK2OFF_DESC_LZO		13
+#define SSDFS_NEXT_TABLE_DESC		14
+#define SSDFS_FRAGMENT_DESC_MAX_TYPE	(SSDFS_NEXT_TABLE_DESC + 1)
+
+/* Fragment descriptor flags */
+#define SSDFS_FRAGMENT_HAS_CSUM		(1 << 0)
+#define SSDFS_FRAGMENT_DESC_FLAGS_MASK	0x1
+
+/*
+ * struct ssdfs_block_bitmap_header - header of segment's block bitmap
+ * @magic: magic signature and flags
+ * @fragments_count: count of block bitmap's fragments
+ * @bytes_count: count of bytes in fragments' sequence
+ * @flags: block bitmap's flags
+ * @type: type of block bitmap
+ */
+struct ssdfs_block_bitmap_header {
+/* 0x0000 */
+	struct ssdfs_signature magic;
+
+/* 0x0008 */
+	__le16 fragments_count;
+	__le32 bytes_count;
+
+#define SSDFS_BLK_BMAP_BACKUP		(1 << 0)
+#define SSDFS_BLK_BMAP_COMPRESSED	(1 << 1)
+#define SSDFS_BLK_BMAP_FLAG_MASK	0x3
+	__le8 flags;
+
+#define SSDFS_BLK_BMAP_UNCOMPRESSED_BLOB	(0)
+#define SSDFS_BLK_BMAP_ZLIB_BLOB		(1)
+#define SSDFS_BLK_BMAP_LZO_BLOB			(2)
+#define SSDFS_BLK_BMAP_TYPE_MAX			(SSDFS_BLK_BMAP_LZO_BLOB + 1)
+	__le8 type;
+
+/* 0x0010 */
+} __packed;
+
+/*
+ * struct ssdfs_block_bitmap_fragment - block bitmap's fragment header
+ * @peb_index: PEB's index
+ * @sequence_id: ID of block bitmap's fragment in the sequence
+ * @flags: fragment's flags
+ * @type: fragment type
+ * @last_free_blk: last logical free block
+ * @metadata_blks: count of physical pages are used by metadata
+ * @invalid_blks: count of invalid blocks
+ * @chain_hdr: descriptor of block bitmap's fragments' chain
+ */
+struct ssdfs_block_bitmap_fragment {
+/* 0x0000 */
+	__le16 peb_index;
+	__le8 sequence_id;
+
+#define SSDFS_MIGRATING_BLK_BMAP	(1 << 0)
+#define SSDFS_PEB_HAS_EXT_PTR		(1 << 1)
+#define SSDFS_PEB_HAS_RELATION		(1 << 2)
+#define SSDFS_INFLATED_BLK_BMAP		(1 << 3)
+#define SSDFS_FRAG_BLK_BMAP_FLAG_MASK	0xF
+	__le8 flags : 6;
+
+#define SSDFS_SRC_BLK_BMAP		(0)
+#define SSDFS_DST_BLK_BMAP		(1)
+#define SSDFS_FRAG_BLK_BMAP_TYPE_MAX	(SSDFS_DST_BLK_BMAP + 1)
+	__le8 type : 2;
+
+	__le32 last_free_blk;
+
+/* 0x0008 */
+	__le32 metadata_blks;
+	__le32 invalid_blks;
+
+/* 0x0010 */
+	struct ssdfs_fragments_chain_header chain_hdr;
+
+/* 0x0020 */
+} __packed;
+
+/*
+ * The block to offset table has structure:
+ *
+ * ----------------------------
+ * |                          |
+ * |  Blk2Off table Header    |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |   Translation extents    |
+ * |        sequence          |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |  Physical offsets table  |
+ * |         header           |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |    Physical offset       |
+ * |  descriptors sequence    |
+ * |                          |
+ * ----------------------------
+ */
+
+/* Possible log's area types */
+enum {
+	SSDFS_LOG_BLK_DESC_AREA,
+	SSDFS_LOG_MAIN_AREA,
+	SSDFS_LOG_DIFFS_AREA,
+	SSDFS_LOG_JOURNAL_AREA,
+	SSDFS_LOG_AREA_MAX,
+};
+
+/*
+ * struct ssdfs_peb_page_descriptor - PEB's page descriptor
+ * @logical_offset: logical offset from file's begin in pages
+ * @logical_blk: logical number of the block in segment
+ * @peb_page: PEB's page index
+ */
+struct ssdfs_peb_page_descriptor {
+/* 0x0000 */
+	__le32 logical_offset;
+	__le16 logical_blk;
+	__le16 peb_page;
+
+/* 0x0008 */
+} __packed;
+
+/*
+ * struct ssdfs_blk_state_offset - block's state offset
+ * @log_start_page: start page of the log
+ * @log_area: identification number of log area
+ * @peb_migration_id: identification number of PEB in migration sequence
+ * @byte_offset: offset in bytes from area's beginning
+ */
+struct ssdfs_blk_state_offset {
+/* 0x0000 */
+	__le16 log_start_page;
+	__le8 log_area;
+	__le8 peb_migration_id;
+	__le32 byte_offset;
+
+/* 0x0008 */
+} __packed;
+
+/*
+ * struct ssdfs_phys_offset_descriptor - descriptor of physical offset
+ * @page_desc: PEB's page descriptor
+ * @blk_state: logical block's state offset
+ */
+struct ssdfs_phys_offset_descriptor {
+/* 0x0000 */
+	struct ssdfs_peb_page_descriptor page_desc;
+	struct ssdfs_blk_state_offset blk_state;
+
+/* 0x0010 */
+} __packed;
+
+/*
+ * struct ssdfs_phys_offset_table_header - physical offset table header
+ * @start_id: start id in the table's fragment
+ * @id_count: number of unique physical offsets in log's fragments chain
+ * @byte_size: size in bytes of table's fragment
+ * @peb_index: PEB index
+ * @sequence_id: table's fragment's sequential id number
+ * @type: table's type
+ * @flags: table's flags
+ * @magic: table's magic
+ * @checksum: table checksum
+ * @used_logical_blks: count of allocated logical blocks
+ * @free_logical_blks: count of free logical blocks
+ * @last_allocated_blk: last allocated block (hint for allocation)
+ * @next_fragment_off: offset till next table's fragment
+ *
+ * This table contains offsets of block descriptors in a segment.
+ * Generally speaking, table can be represented as array of
+ * ssdfs_phys_offset_descriptor structures are ordered by id
+ * numbers. The whole table can be split on several fragments.
+ * Every table's fragment begins from header.
+ */
+struct ssdfs_phys_offset_table_header {
+/* 0x0000 */
+	__le16 start_id;
+	__le16 id_count;
+	__le32 byte_size;
+
+/* 0x0008 */
+	__le16 peb_index;
+	__le16 sequence_id;
+	__le16 type;
+	__le16 flags;
+
+/* 0x0010 */
+	__le32 magic;
+	__le32 checksum;
+
+/* 0x0018 */
+	__le16 used_logical_blks;
+	__le16 free_logical_blks;
+	__le16 last_allocated_blk;
+	__le16 next_fragment_off;
+
+/* 0x0020 */
+} __packed;
+
+/* Physical offset table types */
+#define SSDFS_UNKNOWN_OFF_TABLE_TYPE	0
+#define SSDFS_SEG_OFF_TABLE		1
+#define SSDFS_OFF_TABLE_MAX_TYPE	(SSDFS_SEG_OFF_TABLE + 1)
+
+/* Physical offset table flags */
+#define SSDFS_OFF_TABLE_HAS_CSUM		(1 << 0)
+#define SSDFS_OFF_TABLE_HAS_NEXT_FRAGMENT	(1 << 1)
+#define SSDFS_BLK_DESC_TBL_COMPRESSED		(1 << 2)
+#define SSDFS_OFF_TABLE_HAS_OLD_LOG_FRAGMENT	(1 << 3)
+#define SSDFS_INFLATED_OFF_TABLE		(1 << 4)
+#define SSDFS_OFF_TABLE_FLAGS_MASK		0x1F
+
+/*
+ * struct ssdfs_translation_extent - logical block to offset id translation
+ * @logical_blk: starting logical block
+ * @offset_id: starting offset id
+ * @len: count of items in extent
+ * @sequence_id: id in sequence of extents
+ * @state: logical blocks' sequence state
+ */
+struct ssdfs_translation_extent {
+/* 0x0000 */
+	__le16 logical_blk;
+#define SSDFS_INVALID_OFFSET_ID		(U16_MAX)
+	__le16 offset_id;
+	__le16 len;
+	__le8 sequence_id;
+	__le8 state;
+
+/* 0x0008 */
+} __packed;
+
+enum {
+	SSDFS_LOGICAL_BLK_UNKNOWN_STATE,
+	SSDFS_LOGICAL_BLK_FREE,
+	SSDFS_LOGICAL_BLK_USED,
+	SSDFS_LOGICAL_BLK_STATE_MAX,
+};
+
+/*
+ * struct ssdfs_blk2off_table_header - translation table header
+ * @magic: magic signature
+ * @check: metadata checksum + flags
+ * @extents_off: offset in bytes from header begin till extents sequence
+ * @extents_count: count of extents in the sequence
+ * @offset_table_off: offset in bytes from header begin till phys offsets table
+ * @fragments_count: count of table's fragments for the whole PEB
+ * @sequence: first translation extent in the sequence
+ */
+struct ssdfs_blk2off_table_header {
+/* 0x0000 */
+	struct ssdfs_signature magic;
+
+/* 0x0008 */
+#define SSDFS_BLK2OFF_TBL_ZLIB_COMPR	(1 << 1)
+#define SSDFS_BLK2OFF_TBL_LZO_COMPR	(1 << 2)
+	struct ssdfs_metadata_check check;
+
+/* 0x0010 */
+	struct ssdfs_fragments_chain_header chain_hdr;
+
+/* 0x0020 */
+#define SSDFS_BLK2OFF_FRAG_CHAIN_MAX	(5)
+#define SSDFS_NEXT_BLK2OFF_TBL_INDEX	SSDFS_BLK2OFF_FRAG_CHAIN_MAX
+#define SSDFS_BLK2OFF_TBL_MAX		(SSDFS_BLK2OFF_FRAG_CHAIN_MAX + 1)
+	struct ssdfs_fragment_desc blk[SSDFS_BLK2OFF_TBL_MAX];
+
+/* 0x0080 */
+} __packed;
+
+/*
+ * The block's descriptor table has structure:
+ *
+ * ----------------------------
+ * |                          |
+ * | Area block table #0      |
+ * |  Fragment descriptor #0  |
+ * |          ***             |
+ * |  Fragment descriptor #14 |
+ * |  Next area block table   |
+ * |        descriptor        |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |    Block descriptor #0   |
+ * |           ***            |
+ * |    Block descriptor #N   |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |          ***             |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |    Block descriptor #0   |
+ * |           ***            |
+ * |    Block descriptor #N   |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |          ***             |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * | Area block table #N      |
+ * |  Fragment descriptor #0  |
+ * |          ***             |
+ * |  Fragment descriptor #14 |
+ * |  Next area block table   |
+ * |        descriptor        |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |    Block descriptor #0   |
+ * |           ***            |
+ * |    Block descriptor #N   |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |          ***             |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |    Block descriptor #0   |
+ * |           ***            |
+ * |    Block descriptor #N   |
+ * |                          |
+ * ----------------------------
+ */
+
+#define SSDFS_BLK_STATE_OFF_MAX		6
+
+/*
+ * struct ssdfs_block_descriptor - block descriptor
+ * @ino: inode identification number
+ * @logical_offset: logical offset from file's begin in pages
+ * @peb_index: PEB's index
+ * @peb_page: PEB's page index
+ * @state: array of fragment's offsets
+ */
+struct ssdfs_block_descriptor {
+/* 0x0000 */
+	__le64 ino;
+	__le32 logical_offset;
+	__le16 peb_index;
+	__le16 peb_page;
+
+/* 0x0010 */
+	struct ssdfs_blk_state_offset state[SSDFS_BLK_STATE_OFF_MAX];
+
+/* 0x0040 */
+} __packed;
+
+/*
+ * struct ssdfs_area_block_table - descriptor of block state sequence in area
+ * @chain_hdr: descriptor of block states' chain
+ * @blk: table of fragment descriptors
+ *
+ * This table describes block state sequence in PEB's area. This table
+ * can consists from several parts. Every part can describe 14 blocks
+ * in partial sequence. If sequence contains more block descriptors
+ * then last fragment descriptor describes placement of next part of
+ * block table and so on.
+ */
+struct ssdfs_area_block_table {
+/* 0x0000 */
+	struct ssdfs_fragments_chain_header chain_hdr;
+
+/* 0x0010 */
+#define SSDFS_NEXT_BLK_TABLE_INDEX	SSDFS_FRAGMENTS_CHAIN_MAX
+#define SSDFS_BLK_TABLE_MAX		(SSDFS_FRAGMENTS_CHAIN_MAX + 1)
+	struct ssdfs_fragment_desc blk[SSDFS_BLK_TABLE_MAX];
+
+/* 0x0100 */
+} __packed;
+
+/*
+ * The data (diff, journaling) area has structure:
+ * -----------------------------
+ * |                           |
+ * | Block state descriptor #0 |
+ * |  Fragment descriptor #0   |
+ * |          ***              |
+ * |  Fragment descriptor #N   |
+ * |                           |
+ * -----------------------------
+ * |                           |
+ * |   Data portion #0         |
+ * |          ***              |
+ * |   Data portion #N         |
+ * |                           |
+ * -----------------------------
+ * |                           |
+ * |          ***              |
+ * |                           |
+ * -----------------------------
+ * |                           |
+ * | Block state descriptor #N |
+ * |  Fragment descriptor #0   |
+ * |          ***              |
+ * |  Fragment descriptor #N   |
+ * |                           |
+ * -----------------------------
+ * |                           |
+ * |   Data portion #0         |
+ * |          ***              |
+ * |   Data portion #N         |
+ * |                           |
+ * -----------------------------
+ */
+
+/*
+ * ssdfs_block_state_descriptor - block's state descriptor
+ * @cno: checkpoint
+ * @parent_snapshot: parent snapshot
+ * @chain_hdr: descriptor of data fragments' chain
+ */
+struct ssdfs_block_state_descriptor {
+/* 0x0000 */
+	__le64 cno;
+	__le64 parent_snapshot;
+
+/* 0x0010 */
+	struct ssdfs_fragments_chain_header chain_hdr;
+
+/* 0x0020 */
+} __packed;
+
+/*
+ * struct ssdfs_segbmap_fragment_header - segment bitmap fragment header
+ * @magic: magic signature
+ * @seg_index: segment index in segment bitmap fragments' chain
+ * @peb_index: PEB's index in segment
+ * @flags: fragment's flags
+ * @seg_type: segment type (main/backup)
+ * @start_item: fragment's start item number
+ * @sequence_id: fragment identification number
+ * @fragment_bytes: bytes count in fragment
+ * @checksum: fragment checksum
+ * @total_segs: count of total segments in fragment
+ * @clean_or_using_segs: count of clean or using segments in fragment
+ * @used_or_dirty_segs: count of used or dirty segments in fragment
+ * @bad_segs: count of bad segments in fragment
+ */
+struct ssdfs_segbmap_fragment_header {
+/* 0x0000 */
+	__le16 magic;
+	__le16 seg_index;
+	__le16 peb_index;
+#define SSDFS_SEGBMAP_FRAG_ZLIB_COMPR	(1 << 0)
+#define SSDFS_SEGBMAP_FRAG_LZO_COMPR	(1 << 1)
+	__le8 flags;
+	__le8 seg_type;
+
+/* 0x0008 */
+	__le64 start_item;
+
+/* 0x0010 */
+	__le16 sequence_id;
+	__le16 fragment_bytes;
+	__le32 checksum;
+
+/* 0x0018 */
+	__le16 total_segs;
+	__le16 clean_or_using_segs;
+	__le16 used_or_dirty_segs;
+	__le16 bad_segs;
+
+/* 0x0020 */
+} __packed;
+
+/*
+ * struct ssdfs_peb_descriptor - descriptor of PEB
+ * @erase_cycles: count of P/E cycles of PEB
+ * @type: PEB's type
+ * @state: PEB's state
+ * @flags: PEB's flags
+ * @shared_peb_index: index of external shared destination PEB
+ */
+struct ssdfs_peb_descriptor {
+/* 0x0000 */
+	__le32 erase_cycles;
+	__le8 type;
+	__le8 state;
+	__le8 flags;
+	__le8 shared_peb_index;
+
+/* 0x0008 */
+} __packed;
+
+/* PEB's types */
+enum {
+	SSDFS_MAPTBL_UNKNOWN_PEB_TYPE,				/* 0x00 */
+	SSDFS_MAPTBL_DATA_PEB_TYPE,				/* 0x01 */
+	SSDFS_MAPTBL_LNODE_PEB_TYPE,				/* 0x02 */
+	SSDFS_MAPTBL_HNODE_PEB_TYPE,				/* 0x03 */
+	SSDFS_MAPTBL_IDXNODE_PEB_TYPE,				/* 0x04 */
+	SSDFS_MAPTBL_INIT_SNAP_PEB_TYPE,			/* 0x05 */
+	SSDFS_MAPTBL_SBSEG_PEB_TYPE,				/* 0x06 */
+	SSDFS_MAPTBL_SEGBMAP_PEB_TYPE,				/* 0x07 */
+	SSDFS_MAPTBL_MAPTBL_PEB_TYPE,				/* 0x08 */
+	SSDFS_MAPTBL_PEB_TYPE_MAX				/* 0x09 */
+};
+
+/* PEB's states */
+enum {
+	SSDFS_MAPTBL_UNKNOWN_PEB_STATE,				/* 0x00 */
+	SSDFS_MAPTBL_BAD_PEB_STATE,				/* 0x01 */
+	SSDFS_MAPTBL_CLEAN_PEB_STATE,				/* 0x02 */
+	SSDFS_MAPTBL_USING_PEB_STATE,				/* 0x03 */
+	SSDFS_MAPTBL_USED_PEB_STATE,				/* 0x04 */
+	SSDFS_MAPTBL_PRE_DIRTY_PEB_STATE,			/* 0x05 */
+	SSDFS_MAPTBL_DIRTY_PEB_STATE,				/* 0x06 */
+	SSDFS_MAPTBL_MIGRATION_SRC_USING_STATE,			/* 0x07 */
+	SSDFS_MAPTBL_MIGRATION_SRC_USED_STATE,			/* 0x08 */
+	SSDFS_MAPTBL_MIGRATION_SRC_PRE_DIRTY_STATE,		/* 0x09 */
+	SSDFS_MAPTBL_MIGRATION_SRC_DIRTY_STATE,			/* 0x0A */
+	SSDFS_MAPTBL_MIGRATION_DST_CLEAN_STATE,			/* 0x0B */
+	SSDFS_MAPTBL_MIGRATION_DST_USING_STATE,			/* 0x0C */
+	SSDFS_MAPTBL_MIGRATION_DST_USED_STATE,			/* 0x0D */
+	SSDFS_MAPTBL_MIGRATION_DST_PRE_DIRTY_STATE,		/* 0x0E */
+	SSDFS_MAPTBL_MIGRATION_DST_DIRTY_STATE,			/* 0x0F */
+	SSDFS_MAPTBL_PRE_ERASE_STATE,				/* 0x10 */
+	SSDFS_MAPTBL_UNDER_ERASE_STATE,				/* 0x11 */
+	SSDFS_MAPTBL_SNAPSHOT_STATE,				/* 0x12 */
+	SSDFS_MAPTBL_RECOVERING_STATE,				/* 0x13 */
+	SSDFS_MAPTBL_USING_INVALIDATED_PEB_STATE,		/* 0x14 */
+	SSDFS_MAPTBL_MIGRATION_SRC_USING_INVALIDATED_STATE,	/* 0x15 */
+	SSDFS_MAPTBL_MIGRATION_DST_USING_INVALIDATED_STATE,	/* 0x16 */
+	SSDFS_MAPTBL_PEB_STATE_MAX				/* 0x17 */
+};
+
+/* PEB's flags */
+#define SSDFS_MAPTBL_SHARED_DESTINATION_PEB		(1 << 0)
+#define SSDFS_MAPTBL_SOURCE_PEB_HAS_EXT_PTR		(1 << 1)
+#define SSDFS_MAPTBL_SOURCE_PEB_HAS_ZONE_PTR		(1 << 2)
+
+#define SSDFS_PEBTBL_BMAP_SIZE \
+	((PAGE_SIZE / sizeof(struct ssdfs_peb_descriptor)) / \
+	 BITS_PER_BYTE)
+
+/* PEB table's bitmap types */
+enum {
+	SSDFS_PEBTBL_USED_BMAP,
+	SSDFS_PEBTBL_DIRTY_BMAP,
+	SSDFS_PEBTBL_RECOVER_BMAP,
+	SSDFS_PEBTBL_BADBLK_BMAP,
+	SSDFS_PEBTBL_BMAP_MAX
+};
+
+/*
+ * struct ssdfs_peb_table_fragment_header - header of PEB table fragment
+ * @magic: signature of PEB table's fragment
+ * @flags: flags of PEB table's fragment
+ * @recover_months: recovering duration in months
+ * @recover_threshold: recover threshold
+ * @checksum: checksum of PEB table's fragment
+ * @start_peb: starting PEB number
+ * @pebs_count: count of PEB's descriptors in table's fragment
+ * @last_selected_peb: index of last selected unused PEB
+ * @reserved_pebs: count of reserved PEBs in table's fragment
+ * @stripe_id: stripe identification number
+ * @portion_id: sequential ID of mapping table fragment
+ * @fragment_id: sequential ID of PEB table fragment in the portion
+ * @bytes_count: table's fragment size in bytes
+ * @bmap: PEB table fragment's bitmap
+ */
+struct ssdfs_peb_table_fragment_header {
+/* 0x0000 */
+	__le16 magic;
+	__le8 flags;
+	__le8 recover_months : 4;
+	__le8 recover_threshold : 4;
+	__le32 checksum;
+
+/* 0x0008 */
+	__le64 start_peb;
+
+/* 0x0010 */
+	__le16 pebs_count;
+	__le16 last_selected_peb;
+	__le16 reserved_pebs;
+	__le16 stripe_id;
+
+/* 0x0018 */
+	__le16 portion_id;
+	__le16 fragment_id;
+	__le32 bytes_count;
+
+/* 0x0020 */
+	__le8 bmaps[SSDFS_PEBTBL_BMAP_MAX][SSDFS_PEBTBL_BMAP_SIZE];
+
+/* 0x0120 */
+} __packed;
+
+/* PEB table fragment's flags */
+#define SSDFS_PEBTBL_FRAG_ZLIB_COMPR		(1 << 0)
+#define SSDFS_PEBTBL_FRAG_LZO_COMPR		(1 << 1)
+#define SSDFS_PEBTBL_UNDER_RECOVERING		(1 << 2)
+#define SSDFS_PEBTBL_BADBLK_EXIST		(1 << 3)
+#define SSDFS_PEBTBL_TRY_CORRECT_PEBS_AGAIN	(1 << 4)
+#define SSDFS_PEBTBL_FIND_RECOVERING_PEBS \
+	(SSDFS_PEBTBL_UNDER_RECOVERING | SSDFS_PEBTBL_BADBLK_EXIST)
+#define SSDFS_PEBTBL_FLAGS_MASK			0x1F
+
+/* PEB table recover thresholds */
+#define SSDFS_PEBTBL_FIRST_RECOVER_TRY		(0)
+#define SSDFS_PEBTBL_SECOND_RECOVER_TRY		(1)
+#define SSDFS_PEBTBL_THIRD_RECOVER_TRY		(2)
+#define SSDFS_PEBTBL_FOURTH_RECOVER_TRY		(3)
+#define SSDFS_PEBTBL_FIFTH_RECOVER_TRY		(4)
+#define SSDFS_PEBTBL_SIX_RECOVER_TRY		(5)
+#define SSDFS_PEBTBL_BADBLK_THRESHOLD		(6)
+
+#define SSDFS_PEBTBL_FRAGMENT_HDR_SIZE \
+	(sizeof(struct ssdfs_peb_table_fragment_header))
+
+#define SSDFS_PEB_DESC_PER_FRAGMENT(fragment_size) \
+	((fragment_size - SSDFS_PEBTBL_FRAGMENT_HDR_SIZE) / \
+	 sizeof(struct ssdfs_peb_descriptor))
+
+/*
+ * struct ssdfs_leb_descriptor - logical descriptor of erase block
+ * @physical_index: PEB table's offset till PEB's descriptor
+ * @relation_index: PEB table's offset till associated PEB's descriptor
+ */
+struct ssdfs_leb_descriptor {
+/* 0x0000 */
+	__le16 physical_index;
+	__le16 relation_index;
+
+/* 0x0004 */
+} __packed;
+
+/*
+ * struct ssdfs_leb_table_fragment_header - header of LEB table fragment
+ * @magic: signature of LEB table's fragment
+ * @flags: flags of LEB table's fragment
+ * @checksum: checksum of LEB table's fragment
+ * @start_leb: starting LEB number
+ * @lebs_count: count of LEB's descriptors in table's fragment
+ * @mapped_lebs: count of LEBs are mapped on PEBs
+ * @migrating_lebs: count of LEBs under migration
+ * @portion_id: sequential ID of mapping table fragment
+ * @fragment_id: sequential ID of LEB table fragment in the portion
+ * @bytes_count: table's fragment size in bytes
+ */
+struct ssdfs_leb_table_fragment_header {
+/* 0x0000 */
+	__le16 magic;
+#define SSDFS_LEBTBL_FRAG_ZLIB_COMPR	(1 << 0)
+#define SSDFS_LEBTBL_FRAG_LZO_COMPR	(1 << 1)
+	__le16 flags;
+	__le32 checksum;
+
+/* 0x0008 */
+	__le64 start_leb;
+
+/* 0x0010 */
+	__le16 lebs_count;
+	__le16 mapped_lebs;
+	__le16 migrating_lebs;
+	__le16 reserved1;
+
+/* 0x0018 */
+	__le16 portion_id;
+	__le16 fragment_id;
+	__le32 bytes_count;
+
+/* 0x0020 */
+} __packed;
+
+#define SSDFS_LEBTBL_FRAGMENT_HDR_SIZE \
+	(sizeof(struct ssdfs_leb_table_fragment_header))
+
+#define SSDFS_LEB_DESC_PER_FRAGMENT(fragment_size) \
+	((fragment_size - SSDFS_LEBTBL_FRAGMENT_HDR_SIZE) / \
+	 sizeof(struct ssdfs_leb_descriptor))
+
+/*
+ * The mapping table cache is the copy of content of mapping
+ * table for some type of PEBs. The goal of cache is to provide
+ * the space for storing the copy of LEB_ID/PEB_ID pairs with
+ * PEB state record. The cache is using for conversion LEB ID
+ * to PEB ID and retrieving the PEB state record in the case
+ * when the fragment of mapping table is not initialized yet.
+ * Also the cache needs for storing modified PEB state during
+ * the mapping table destruction. The fragment of mapping table
+ * cache has structure:
+ *
+ * ----------------------------
+ * |                          |
+ * |         Header           |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |   LEB_ID/PEB_ID pairs    |
+ * |                          |
+ * ----------------------------
+ * |                          |
+ * |    PEB state records     |
+ * |                          |
+ * ----------------------------
+ */
+
+/*
+ * struct ssdfs_maptbl_cache_header - maptbl cache header
+ * @magic: magic signature
+ * @sequence_id: ID of fragment in the sequence
+ * @flags: maptbl cache header's flags
+ * @items_count: count of items in maptbl cache's fragment
+ * @bytes_count: size of fragment in bytes
+ * @start_leb: start LEB ID in fragment
+ * @end_leb: ending LEB ID in fragment
+ */
+struct ssdfs_maptbl_cache_header {
+/* 0x0000 */
+	struct ssdfs_signature magic;
+
+/* 0x0008 */
+	__le16 sequence_id;
+#define SSDFS_MAPTBL_CACHE_ZLIB_COMPR	(1 << 0)
+#define SSDFS_MAPTBL_CACHE_LZO_COMPR	(1 << 1)
+	__le16 flags;
+	__le16 items_count;
+	__le16 bytes_count;
+
+/* 0x0010 */
+	__le64 start_leb;
+	__le64 end_leb;
+
+/* 0x0020 */
+} __packed;
+
+/*
+ * struct ssdfs_maptbl_cache_peb_state - PEB state descriptor
+ * @consistency: PEB state consistency type
+ * @state: PEB's state
+ * @flags: PEB's flags
+ * @shared_peb_index: index of external shared destination PEB
+ *
+ * The mapping table cache is the copy of content of mapping
+ * table for some type of PEBs. If the mapping table cache and
+ * the mapping table contain the same content for the PEB then
+ * the PEB state record is consistent. Otherwise, the PEB state
+ * record is inconsistent. For example, the inconsistency takes
+ * place if a PEB state record was modified in the mapping table
+ * cache during the destruction of the mapping table.
+ */
+struct ssdfs_maptbl_cache_peb_state {
+/* 0x0000 */
+	__le8 consistency;
+	__le8 state;
+	__le8 flags;
+	__le8 shared_peb_index;
+
+/* 0x0004 */
+} __packed;
+
+/* PEB state consistency type */
+enum {
+	SSDFS_PEB_STATE_UNKNOWN,
+	SSDFS_PEB_STATE_CONSISTENT,
+	SSDFS_PEB_STATE_INCONSISTENT,
+	SSDFS_PEB_STATE_PRE_DELETED,
+	SSDFS_PEB_STATE_MAX
+};
+
+#define SSDFS_MAPTBL_CACHE_HDR_SIZE \
+	(sizeof(struct ssdfs_maptbl_cache_header))
+#define SSDFS_LEB2PEB_PAIR_SIZE \
+	(sizeof(struct ssdfs_leb2peb_pair))
+#define SSDFS_PEB_STATE_SIZE \
+	(sizeof(struct ssdfs_maptbl_cache_peb_state))
+
+#define SSDFS_LEB2PEB_PAIR_PER_FRAGMENT(fragment_size) \
+	((fragment_size - SSDFS_MAPTBL_CACHE_HDR_SIZE - \
+				SSDFS_PEB_STATE_SIZE) / \
+	 (SSDFS_LEB2PEB_PAIR_SIZE + SSDFS_PEB_STATE_SIZE))
+
+/*
+ * struct ssdfs_btree_node_header - btree's node header
+ * @magic: magic signature + revision
+ * @check: metadata checksum
+ * @height: btree node's height
+ * @log_node_size: log2(node size)
+ * @log_index_area_size: log2(index area size)
+ * @type: btree node type
+ * @flags: btree node flags
+ * @index_area_offset: offset of index area in bytes
+ * @index_count: count of indexes in index area
+ * @index_size: size of index in bytes
+ * @min_item_size: min size of item in bytes
+ * @max_item_size: max possible size of item in bytes
+ * @items_capacity: capacity of items in the node
+ * @start_hash: start hash value
+ * @end_hash: end hash value
+ * @create_cno: create checkpoint
+ * @node_id: node identification number
+ * @item_area_offset: offset of items area in bytes
+ */
+struct ssdfs_btree_node_header {
+/* 0x0000 */
+	struct ssdfs_signature magic;
+
+/* 0x0008 */
+	struct ssdfs_metadata_check check;
+
+/* 0x0010 */
+	__le8 height;
+	__le8 log_node_size;
+	__le8 log_index_area_size;
+	__le8 type;
+
+/* 0x0014 */
+#define SSDFS_BTREE_NODE_HAS_INDEX_AREA		(1 << 0)
+#define SSDFS_BTREE_NODE_HAS_ITEMS_AREA		(1 << 1)
+#define SSDFS_BTREE_NODE_HAS_L1TBL		(1 << 2)
+#define SSDFS_BTREE_NODE_HAS_L2TBL		(1 << 3)
+#define SSDFS_BTREE_NODE_HAS_HASH_TBL		(1 << 4)
+#define SSDFS_BTREE_NODE_PRE_ALLOCATED		(1 << 5)
+#define SSDFS_BTREE_NODE_FLAGS_MASK		0x3F
+	__le16 flags;
+	__le16 index_area_offset;
+
+/* 0x0018 */
+	__le16 index_count;
+	__le8 index_size;
+	__le8 min_item_size;
+	__le16 max_item_size;
+	__le16 items_capacity;
+
+/* 0x0020 */
+	__le64 start_hash;
+	__le64 end_hash;
+
+/* 0x0030 */
+	__le64 create_cno;
+	__le32 node_id;
+	__le32 item_area_offset;
+
+/* 0x0040 */
+} __packed;
+
+/* Index of btree node in node's items sequence */
+#define SSDFS_BTREE_NODE_HEADER_INDEX	(0)
+
+/* Btree node types */
+enum {
+	SSDFS_BTREE_NODE_UNKNOWN_TYPE,
+	SSDFS_BTREE_ROOT_NODE,
+	SSDFS_BTREE_INDEX_NODE,
+	SSDFS_BTREE_HYBRID_NODE,
+	SSDFS_BTREE_LEAF_NODE,
+	SSDFS_BTREE_NODE_TYPE_MAX
+};
+
+#define SSDFS_DENTRIES_PAGES_PER_NODE_MAX		(32)
+#define SSDFS_DENTRIES_INDEX_BMAP_SIZE \
+	((((SSDFS_DENTRIES_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_btree_index_key)) + BITS_PER_LONG) / BITS_PER_BYTE)
+#define SSDFS_RAW_DENTRIES_BMAP_SIZE \
+	((((SSDFS_DENTRIES_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_dir_entry)) + BITS_PER_LONG) / BITS_PER_BYTE)
+#define SSDFS_DENTRIES_BMAP_SIZE \
+	(SSDFS_DENTRIES_INDEX_BMAP_SIZE + SSDFS_RAW_DENTRIES_BMAP_SIZE)
+
+/*
+ * struct ssdfs_dentries_btree_node_header - directory entries node's header
+ * @node: generic btree node's header
+ * @parent_ino: parent inode number
+ * @dentries_count: count of allocated dentries in the node
+ * @inline_names: count of dentries with inline names
+ * @flags: dentries node's flags
+ * @free_space: free space of the node in bytes
+ * @lookup_table: table for clustering search in the node
+ *
+ * The @lookup_table has goal to provide the way of clustering
+ * the dentries in the node with the goal to speed-up the search.
+ */
+struct ssdfs_dentries_btree_node_header {
+/* 0x0000 */
+	struct ssdfs_btree_node_header node;
+
+/* 0x0040 */
+	__le64 parent_ino;
+
+/* 0x0048 */
+	__le16 dentries_count;
+	__le16 inline_names;
+	__le16 flags;
+	__le16 free_space;
+
+/* 0x0050 */
+#define SSDFS_DENTRIES_BTREE_LOOKUP_TABLE_SIZE		(22)
+	__le64 lookup_table[SSDFS_DENTRIES_BTREE_LOOKUP_TABLE_SIZE];
+
+/* 0x0100 */
+} __packed;
+
+#define SSDFS_SHARED_DICT_PAGES_PER_NODE_MAX		(32)
+#define SSDFS_SHARED_DICT_INDEX_BMAP_SIZE \
+	((((SSDFS_SHARED_DICT_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_btree_index_key)) + BITS_PER_LONG) / BITS_PER_BYTE)
+#define SSDFS_RAW_SHARED_DICT_BMAP_SIZE \
+	(((SSDFS_SHARED_DICT_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  SSDFS_DENTRY_INLINE_NAME_MAX_LEN) / BITS_PER_BYTE)
+#define SSDFS_SHARED_DICT_BMAP_SIZE \
+	(SSDFS_SHARED_DICT_INDEX_BMAP_SIZE + SSDFS_RAW_SHARED_DICT_BMAP_SIZE)
+
+/*
+ * struct ssdfs_shdict_ltbl1_item - shared dictionary lookup table1 item
+ * @hash_lo: low hash32 value
+ * @start_index: starting index into lookup table2
+ * @range_len: number of items in the range of lookup table2
+ *
+ * The header of shared dictionary node contains the lookup table1.
+ * This table is responsible for clustering the items in lookup
+ * table2. The @hash_lo is hash32 of the first part of the name.
+ * The length of the first part is the inline name length.
+ */
+struct ssdfs_shdict_ltbl1_item {
+/* 0x0000 */
+	__le32 hash_lo;
+	__le16 start_index;
+	__le16 range_len;
+
+/* 0x0008 */
+} __packed;
+
+/*
+ * struct ssdfs_shdict_ltbl2_item - shared dictionary lookup table2 item
+ * @hash: hash value
+ * @prefix_len: prefix length in bytes
+ * @str_count: count of strings in the range
+ * @hash_index: index of the hash in the hash table
+ *
+ * The lookup table2 is located at the end of the node. It begins from
+ * the bottom and is growing in the node's beginning direction.
+ * Every item of the lookup table2 describes a position of the starting
+ * keyword of a name. The goal of such descriptor is to describe
+ * the starting position of the deduplicated keyword that is shared by
+ * several following names. But the keyword is used only in the beginning
+ * of the sequence because the rest of the names are represented by
+ * suffixes only (for example, the sequence of names "absurd, abcissa,
+ * abacus" can be reprensented by "abacuscissasurd" deduplicated range
+ * of names).
+ */
+struct ssdfs_shdict_ltbl2_item {
+/* 0x0000 */
+	__le64 hash;
+
+/* 0x0008 */
+	__le8 prefix_len;
+	__le8 str_count;
+	__le16 hash_index;
+	__le32 reserved;
+
+/* 0x0010 */
+} __packed;
+
+/*
+ * struct ssdfs_shdict_htbl_item - shared dictionary hash table item
+ * @hash: hash of the name
+ * @str_offset: offset in bytes to string
+ * @str_len: string length
+ * @type: string type
+ *
+ * The hash table contains descriptors of all strings in
+ * string area. The @str_offset is the offset in bytes from
+ * the items (strings) area's beginning.
+ */
+struct ssdfs_shdict_htbl_item {
+/* 0x0000 */
+	__le64 hash;
+
+/* 0x0008 */
+	__le16 str_offset;
+	__le8 str_len;
+	__le8 type;
+	__le32 reserved;
+
+/* 0x0010 */
+} __packed;
+
+/* Name string types */
+enum {
+	SSDFS_UNKNOWN_NAME_TYPE,
+	SSDFS_NAME_PREFIX,
+	SSDFS_NAME_SUFFIX,
+	SSDFS_FULL_NAME,
+	SSDFS_NAME_TYPE_MAX
+};
+
+/*
+ * union ssdfs_shdict_search_key - generalized search key
+ * @hash_lo: low hash32 value
+ * @hash: hash of the name
+ * @ltbl1_item: lookup1 table item
+ * @ltbl2_item: lookup2 table item
+ * @ltbl1_item: lookup1 table item
+ * @htbl_item: hash table item
+ *
+ * This key is generalized version of any item in lookup1,
+ * lookup2 and hash tables. This structure is needed for
+ * the generic way of making search in all tables.
+ */
+union ssdfs_shdict_search_key {
+/* 0x0000 */
+	__le32 hash_lo;
+	__le64 hash;
+	struct ssdfs_shdict_ltbl1_item ltbl1_item;
+	struct ssdfs_shdict_ltbl2_item ltbl2_item;
+	struct ssdfs_shdict_htbl_item htbl_item;
+
+/* 0x0010 */
+} __packed;
+
+/*
+ * struct ssdfs_shared_dict_area - area descriptor
+ * @offset: area offset in bytes
+ * @size: area size in bytes
+ * @free_space: free space in bytes
+ * @items_count: count of items in area
+ */
+struct ssdfs_shared_dict_area {
+/* 0x0000 */
+	__le16 offset;
+	__le16 size;
+	__le16 free_space;
+	__le16 items_count;
+
+/* 0x0008 */
+} __packed;
+
+/*
+ * struct ssdfs_shared_dictionary_node_header - shared dictionary node header
+ * @node: generic btree node's header
+ * @str_area: string area descriptor
+ * @hash_table: hash table descriptor
+ * @lookup_table2: lookup2 table descriptor
+ * @flags: private flags
+ * @lookup_table1_items: number of valid items in the lookup1 table
+ * @lookup_table1: lookup1 table
+ */
+struct ssdfs_shared_dictionary_node_header {
+/* 0x0000 */
+	struct ssdfs_btree_node_header node;
+
+/* 0x0040 */
+	struct ssdfs_shared_dict_area str_area;
+
+/* 0x0048 */
+	struct ssdfs_shared_dict_area hash_table;
+
+/* 0x0050 */
+	struct ssdfs_shared_dict_area lookup_table2;
+
+/* 0x0058 */
+	__le16 flags;
+	__le16 lookup_table1_items;
+	__le32 reserved2;
+
+/* 0x0060 */
+#define SSDFS_SHDIC_LTBL1_SIZE		(20)
+	struct ssdfs_shdict_ltbl1_item lookup_table1[SSDFS_SHDIC_LTBL1_SIZE];
+
+/* 0x0100 */
+} __packed;
+
+#define SSDFS_EXTENT_PAGES_PER_NODE_MAX		(32)
+#define SSDFS_EXTENT_INDEX_BMAP_SIZE \
+	((((SSDFS_EXTENT_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_btree_index_key)) + BITS_PER_LONG) / BITS_PER_BYTE)
+#define SSDFS_RAW_EXTENT_BMAP_SIZE \
+	((((SSDFS_EXTENT_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_raw_fork)) + BITS_PER_LONG) / BITS_PER_BYTE)
+#define SSDFS_EXTENT_MAX_BMAP_SIZE \
+	(SSDFS_EXTENT_INDEX_BMAP_SIZE + SSDFS_RAW_EXTENT_BMAP_SIZE)
+
+/*
+ * ssdfs_extents_btree_node_header - extents btree node's header
+ * @node: generic btree node's header
+ * @parent_ino: parent inode number
+ * @blks_count: count of blocks in all valid extents
+ * @forks_count: count of forks in the leaf node or sub-tree (hybrid/index node)
+ * @allocated_extents: count of allocated extents in all forks
+ * @valid_extents: count of valid extents
+ * @max_extent_blks: maximal number of blocks in one extent
+ * @lookup_table: table for clustering search in the node
+ *
+ * The @lookup_table has goal to provide the way of clustering
+ * the forks in the node with the goal to speed-up the search.
+ */
+struct ssdfs_extents_btree_node_header {
+/* 0x0000 */
+	struct ssdfs_btree_node_header node;
+
+/* 0x0040 */
+	__le64 parent_ino;
+	__le64 blks_count;
+
+/* 0x0050 */
+	__le32 forks_count;
+	__le32 allocated_extents;
+	__le32 valid_extents;
+	__le32 max_extent_blks;
+
+/* 0x0060 */
+#define SSDFS_EXTENTS_BTREE_LOOKUP_TABLE_SIZE		(20)
+	__le64 lookup_table[SSDFS_EXTENTS_BTREE_LOOKUP_TABLE_SIZE];
+
+/* 0x0100 */
+} __packed;
+
+#define SSDFS_XATTRS_PAGES_PER_NODE_MAX		(32)
+#define SSDFS_XATTRS_INDEX_BMAP_SIZE \
+	((((SSDFS_XATTRS_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_btree_index_key)) + BITS_PER_LONG) / BITS_PER_BYTE)
+#define SSDFS_RAW_XATTRS_BMAP_SIZE \
+	((((SSDFS_XATTRS_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_xattr_entry)) + BITS_PER_LONG) / BITS_PER_BYTE)
+#define SSDFS_XATTRS_BMAP_SIZE \
+	(SSDFS_XATTRS_INDEX_BMAP_SIZE + SSDFS_RAW_XATTRS_BMAP_SIZE)
+
+/*
+ * struct ssdfs_xattrs_btree_node_header - xattrs node's header
+ * @node: generic btree node's header
+ * @parent_ino: parent inode number
+ * @xattrs_count: count of allocated xattrs in the node
+ * @flags: xattrs node's flags
+ * @free_space: free space of the node in bytes
+ * @lookup_table: table for clustering search in the node
+ *
+ * The @lookup_table has goal to provide the way of clustering
+ * the xattrs in the node with the goal to speed-up the search.
+ */
+struct ssdfs_xattrs_btree_node_header {
+/* 0x0000 */
+	struct ssdfs_btree_node_header node;
+
+/* 0x0040 */
+	__le64 parent_ino;
+
+/* 0x0048 */
+	__le16 xattrs_count;
+	__le16 reserved;
+	__le16 flags;
+	__le16 free_space;
+
+/* 0x0050 */
+#define SSDFS_XATTRS_BTREE_LOOKUP_TABLE_SIZE		(22)
+	__le64 lookup_table[SSDFS_XATTRS_BTREE_LOOKUP_TABLE_SIZE];
+
+/* 0x0100 */
+} __packed;
+
+/*
+ * struct ssdfs_index_area - index area info
+ * @start_hash: start hash value
+ * @end_hash: end hash value
+ */
+struct ssdfs_index_area {
+/* 0x0000 */
+	__le64 start_hash;
+	__le64 end_hash;
+
+/* 0x0010 */
+} __packed;
+
+#define SSDFS_INODE_BMAP_SIZE		(0xA0)
+
+/*
+ * struct ssdfs_inodes_btree_node_header -inodes btree node's header
+ * @node: generic btree node's header
+ * @inodes_count: count of inodes in the node
+ * @valid_inodes: count of valid inodes in the node
+ * @index_area: index area info (hybrid node)
+ * @bmap: bitmap of valid/invalid inodes in the node
+ */
+struct ssdfs_inodes_btree_node_header {
+/* 0x0000 */
+	struct ssdfs_btree_node_header node;
+
+/* 0x0040 */
+	__le16 inodes_count;
+	__le16 valid_inodes;
+	__le8 reserved1[0xC];
+
+/* 0x0050 */
+	struct ssdfs_index_area index_area;
+
+/* 0x0060 */
+	__le8 bmap[SSDFS_INODE_BMAP_SIZE];
+
+/* 0x0100 */
+} __packed;
+
+/*
+ * struct ssdfs_snapshot_rule_info - snapshot rule info
+ * @mode: snapshot mode (READ-ONLY|READ-WRITE)
+ * @type: snapshot type (PERIODIC|ONE-TIME)
+ * @expiration: snapshot expiration time (WEEK|MONTH|YEAR|NEVER)
+ * @frequency: taking snapshot frequency (SYNCFS|HOUR|DAY|WEEK)
+ * @snapshots_threshold max number of simultaneously available snapshots
+ * @snapshots_number: current number of created snapshots
+ * @ino: root object inode ID
+ * @uuid: snapshot UUID
+ * @name: snapshot rule name
+ * @name_hash: name hash
+ * @last_snapshot_cno: latest snapshot checkpoint
+ */
+struct ssdfs_snapshot_rule_info {
+/* 0x0000 */
+	__le8 mode;
+	__le8 type;
+	__le8 expiration;
+	__le8 frequency;
+	__le16 snapshots_threshold;
+	__le16 snapshots_number;
+
+/* 0x0008 */
+	__le64 ino;
+
+/* 0x0010 */
+	__le8 uuid[SSDFS_UUID_SIZE];
+
+/* 0x0020 */
+	char name[SSDFS_MAX_SNAP_RULE_NAME_LEN];
+
+/* 0x0030 */
+	__le64 name_hash;
+	__le64 last_snapshot_cno;
+
+/* 0x0040 */
+} __packed;
+
+/* Snapshot mode */
+enum {
+	SSDFS_UNKNOWN_SNAPSHOT_MODE,
+	SSDFS_READ_ONLY_SNAPSHOT,
+	SSDFS_READ_WRITE_SNAPSHOT,
+	SSDFS_SNAPSHOT_MODE_MAX
+};
+
+#define SSDFS_READ_ONLY_MODE_STR	"READ_ONLY"
+#define SSDFS_READ_WRITE_MODE_STR	"READ_WRITE"
+
+/* Snapshot type */
+enum {
+	SSDFS_UNKNOWN_SNAPSHOT_TYPE,
+	SSDFS_ONE_TIME_SNAPSHOT,
+	SSDFS_PERIODIC_SNAPSHOT,
+	SSDFS_SNAPSHOT_TYPE_MAX
+};
+
+#define SSDFS_ONE_TIME_TYPE_STR		"ONE-TIME"
+#define SSDFS_PERIODIC_TYPE_STR		"PERIODIC"
+
+/* Snapshot expiration */
+enum {
+	SSDFS_UNKNOWN_EXPIRATION_POINT,
+	SSDFS_EXPIRATION_IN_WEEK,
+	SSDFS_EXPIRATION_IN_MONTH,
+	SSDFS_EXPIRATION_IN_YEAR,
+	SSDFS_NEVER_EXPIRED,
+	SSDFS_EXPIRATION_POINT_MAX
+};
+
+#define SSDFS_WEEK_EXPIRATION_POINT_STR		"WEEK"
+#define SSDFS_MONTH_EXPIRATION_POINT_STR	"MONTH"
+#define SSDFS_YEAR_EXPIRATION_POINT_STR		"YEAR"
+#define SSDFS_NEVER_EXPIRED_STR			"NEVER"
+
+/* Snapshot creation frequency */
+enum {
+	SSDFS_UNKNOWN_FREQUENCY,
+	SSDFS_SYNCFS_FREQUENCY,
+	SSDFS_HOUR_FREQUENCY,
+	SSDFS_DAY_FREQUENCY,
+	SSDFS_WEEK_FREQUENCY,
+	SSDFS_MONTH_FREQUENCY,
+	SSDFS_CREATION_FREQUENCY_MAX
+};
+
+#define SSDFS_SYNCFS_FREQUENCY_STR		"SYNCFS"
+#define SSDFS_HOUR_FREQUENCY_STR		"HOUR"
+#define SSDFS_DAY_FREQUENCY_STR			"DAY"
+#define SSDFS_WEEK_FREQUENCY_STR		"WEEK"
+#define SSDFS_MONTH_FREQUENCY_STR		"MONTH"
+
+#define SSDFS_INFINITE_SNAPSHOTS_NUMBER		U16_MAX
+#define SSDFS_UNDEFINED_SNAPSHOTS_NUMBER	(0)
+
+/*
+ * struct ssdfs_snapshot_rules_header - snapshot rules table's header
+ * @magic: magic signature
+ * @item_size: snapshot rule's size in bytes
+ * @flags: various flags
+ * @items_count: number of snapshot rules in table
+ * @items_capacity: capacity of the snaphot rules table
+ * @area_size: size of table in bytes
+ */
+struct ssdfs_snapshot_rules_header {
+/* 0x0000 */
+	__le32 magic;
+	__le16 item_size;
+	__le16 flags;
+
+/* 0x0008 */
+	__le16 items_count;
+	__le16 items_capacity;
+	__le32 area_size;
+
+/* 0x0010 */
+	__le8 padding[0x10];
+
+/* 0x0020 */
+} __packed;
+
+/*
+ * struct ssdfs_snapshot - snapshot info
+ * @magic: magic signature of snapshot
+ * @mode: snapshot mode (READ-ONLY|READ-WRITE)
+ * @expiration: snapshot expiration time (WEEK|MONTH|YEAR|NEVER)
+ * @flags: snapshot's flags
+ * @name: snapshot name
+ * @uuid: snapshot UUID
+ * @create_time: snapshot's timestamp
+ * @create_cno: snapshot's checkpoint
+ * @ino: root object inode ID
+ * @name_hash: name hash
+ */
+struct ssdfs_snapshot {
+/* 0x0000 */
+	__le16 magic;
+	__le8 mode : 4;
+	__le8 expiration : 4;
+	__le8 flags;
+	char name[SSDFS_MAX_SNAPSHOT_NAME_LEN];
+
+/* 0x0010 */
+	__le8 uuid[SSDFS_UUID_SIZE];
+
+/* 0x0020 */
+	__le64 create_time;
+	__le64 create_cno;
+
+/* 0x0030 */
+	__le64 ino;
+	__le64 name_hash;
+
+/* 0x0040 */
+} __packed;
+
+/* snapshot flags */
+#define SSDFS_SNAPSHOT_HAS_EXTERNAL_STRING	(1 << 0)
+#define SSDFS_SNAPSHOT_FLAGS_MASK		0x1
+
+/*
+ * struct ssdfs_peb2time_pair - PEB to timestamp pair
+ * @peb_id: PEB ID
+ * @last_log_time: last log creation time
+ */
+struct ssdfs_peb2time_pair {
+/* 0x0000 */
+	__le64 peb_id;
+	__le64 last_log_time;
+
+/* 0x0010 */
+} __packed;
+
+/*
+ * struct ssdfs_peb2time_set - PEB to timestamp set
+ * @magic: magic signature of set
+ * @pairs_count: number of valid pairs in the set
+ * @create_time: create time of the first PEB in pair set
+ * @array: array of PEB to timestamp pairs
+ */
+struct ssdfs_peb2time_set {
+/* 0x0000 */
+	__le16 magic;
+	__le8 pairs_count;
+	__le8 padding[0x5];
+
+/* 0x0008 */
+	__le64 create_time;
+
+/* 0x0010 */
+#define SSDFS_PEB2TIME_ARRAY_CAPACITY		(3)
+	struct ssdfs_peb2time_pair array[SSDFS_PEB2TIME_ARRAY_CAPACITY];
+
+/* 0x0040 */
+} __packed;
+
+/*
+ * union ssdfs_snapshot_item - snapshot item
+ * @magic: magic signature
+ * @snapshot: snapshot info
+ * @peb2time: PEB to timestamp set
+ */
+union ssdfs_snapshot_item {
+/* 0x0000 */
+	__le16 magic;
+	struct ssdfs_snapshot snapshot;
+	struct ssdfs_peb2time_set peb2time;
+
+/* 0x0040 */
+} __packed;
+
+#define SSDFS_SNAPSHOTS_PAGES_PER_NODE_MAX		(32)
+#define SSDFS_SNAPSHOTS_INDEX_BMAP_SIZE \
+	((((SSDFS_SNAPSHOTS_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_btree_index_key)) + BITS_PER_LONG) / BITS_PER_BYTE)
+#define SSDFS_RAW_SNAPSHOTS_BMAP_SIZE \
+	((((SSDFS_SNAPSHOTS_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_snapshot_info)) + BITS_PER_LONG) / BITS_PER_BYTE)
+#define SSDFS_SNAPSHOTS_BMAP_SIZE \
+	(SSDFS_SNAPSHOTS_INDEX_BMAP_SIZE + SSDFS_RAW_SNAPSHOTS_BMAP_SIZE)
+
+/*
+ * struct ssdfs_snapshots_btree_node_header - snapshots node's header
+ * @node: generic btree node's header
+ * @snapshots_count: snapshots count in the node
+ * @lookup_table: table for clustering search in the node
+ *
+ * The @lookup_table has goal to provide the way of clustering
+ * the snapshots in the node with the goal to speed-up the search.
+ */
+struct ssdfs_snapshots_btree_node_header {
+/* 0x0000 */
+	struct ssdfs_btree_node_header node;
+
+/* 0x0040 */
+	__le32 snapshots_count;
+	__le8 padding[0x0C];
+
+/* 0x0050 */
+#define SSDFS_SNAPSHOTS_BTREE_LOOKUP_TABLE_SIZE		(22)
+	__le64 lookup_table[SSDFS_SNAPSHOTS_BTREE_LOOKUP_TABLE_SIZE];
+
+/* 0x0100 */
+} __packed;
+
+/*
+ * struct ssdfs_shared_extent - shared extent
+ * @fingerprint: fingerprint of shared extent
+ * @extent: position of the extent on volume
+ * @fingerprint_len: length of fingerprint
+ * @fingerprint_type: type of fingerprint
+ * @flags: various flags
+ * @ref_count: reference counter of shared extent
+ */
+struct ssdfs_shared_extent {
+/* 0x0000 */
+#define SSDFS_FINGERPRINT_LENGTH_MAX	(32)
+	__le8 fingerprint[SSDFS_FINGERPRINT_LENGTH_MAX];
+
+/* 0x0020 */
+	struct ssdfs_raw_extent extent;
+
+/* 0x0030 */
+	__le8 fingerprint_len;
+	__le8 fingerprint_type;
+	__le16 flags;
+	__le8 padding[0x4];
+
+/* 0x0038 */
+	__le64 ref_count;
+
+/* 0x0040 */
+} __packed;
+
+#define SSDFS_SHEXTREE_PAGES_PER_NODE_MAX		(32)
+#define SSDFS_SHEXTREE_INDEX_BMAP_SIZE \
+	((((SSDFS_SHEXTREE_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_btree_index_key)) + BITS_PER_LONG) / BITS_PER_BYTE)
+#define SSDFS_RAW_SHEXTREE_BMAP_SIZE \
+	((((SSDFS_SHEXTREE_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_shared_extent)) + BITS_PER_LONG) / BITS_PER_BYTE)
+#define SSDFS_SHEXTREE_BMAP_SIZE \
+	(SSDFS_SHEXTREE_INDEX_BMAP_SIZE + SSDFS_RAW_SHEXTREE_BMAP_SIZE)
+
+/*
+ * struct ssdfs_shextree_node_header - shared extents btree node's header
+ * @node: generic btree node's header
+ * @shared_extents: number of shared extents in the node
+ * @lookup_table: table for clustering search in the node
+ *
+ * The @lookup_table has goal to provide the way of clustering
+ * the shared extents in the node with the goal to speed-up the search.
+ */
+struct ssdfs_shextree_node_header {
+/* 0x0000 */
+	struct ssdfs_btree_node_header node;
+
+/* 0x0040 */
+	__le32 shared_extents;
+	__le8 padding[0x0C];
+
+/* 0x0050 */
+#define SSDFS_SHEXTREE_LOOKUP_TABLE_SIZE		(22)
+	__le64 lookup_table[SSDFS_SHEXTREE_LOOKUP_TABLE_SIZE];
+
+/* 0x0100 */
+} __packed;
+
+#define SSDFS_INVEXTREE_PAGES_PER_NODE_MAX		(32)
+#define SSDFS_INVEXTREE_INDEX_BMAP_SIZE \
+	((((SSDFS_INVEXTREE_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_btree_index_key)) + BITS_PER_LONG) / BITS_PER_BYTE)
+#define SSDFS_RAW_INVEXTREE_BMAP_SIZE \
+	((((SSDFS_INVEXTREE_PAGES_PER_NODE_MAX * PAGE_SIZE) / \
+	  sizeof(struct ssdfs_raw_extent)) + BITS_PER_LONG) / BITS_PER_BYTE)
+#define SSDFS_INVEXTREE_BMAP_SIZE \
+	(SSDFS_INVEXTREE_INDEX_BMAP_SIZE + SSDFS_RAW_INVEXTREE_BMAP_SIZE)
+
+/*
+ * struct ssdfs_invextree_node_header - invalidated extents btree node's header
+ * @node: generic btree node's header
+ * @extents_count: number of invalidated extents in the node
+ * @lookup_table: table for clustering search in the node
+ *
+ * The @lookup_table has goal to provide the way of clustering
+ * the invalidated extents in the node with the goal to speed-up the search.
+ */
+struct ssdfs_invextree_node_header {
+/* 0x0000 */
+	struct ssdfs_btree_node_header node;
+
+/* 0x0040 */
+	__le32 extents_count;
+	__le8 padding[0x0C];
+
+/* 0x0050 */
+#define SSDFS_INVEXTREE_LOOKUP_TABLE_SIZE		(22)
+	__le64 lookup_table[SSDFS_INVEXTREE_LOOKUP_TABLE_SIZE];
+
+/* 0x0100 */
+} __packed;
+
+#endif /* _LINUX_SSDFS_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 02/79] ssdfs: add key file system declarations
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 01/79] ssdfs: introduce SSDFS on-disk layout Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 03/79] ssdfs: add key file system's function declarations Viacheslav Dubeyko
                   ` (30 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

This patch adds declarations of key in-core file system's
metadata structures and key constants.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/ssdfs_constants.h    | 214 +++++++++
 fs/ssdfs/ssdfs_fs_info.h      | 821 ++++++++++++++++++++++++++++++++++
 fs/ssdfs/ssdfs_inode_info.h   | 144 ++++++
 fs/ssdfs/ssdfs_thread_info.h  |  43 ++
 fs/ssdfs/version.h            |   9 +
 include/trace/events/ssdfs.h  | 256 +++++++++++
 include/uapi/linux/ssdfs_fs.h | 126 ++++++
 7 files changed, 1613 insertions(+)
 create mode 100644 fs/ssdfs/ssdfs_constants.h
 create mode 100644 fs/ssdfs/ssdfs_fs_info.h
 create mode 100644 fs/ssdfs/ssdfs_inode_info.h
 create mode 100644 fs/ssdfs/ssdfs_thread_info.h
 create mode 100644 fs/ssdfs/version.h
 create mode 100644 include/trace/events/ssdfs.h
 create mode 100644 include/uapi/linux/ssdfs_fs.h

diff --git a/fs/ssdfs/ssdfs_constants.h b/fs/ssdfs/ssdfs_constants.h
new file mode 100644
index 000000000000..073906d0cd81
--- /dev/null
+++ b/fs/ssdfs/ssdfs_constants.h
@@ -0,0 +1,214 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/ssdfs_constants.h - SSDFS constant declarations.
+ *
+ * Copyright (c) 2019-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_CONSTANTS_H
+#define _SSDFS_CONSTANTS_H
+
+/*
+ * Thread types
+ */
+enum {
+	SSDFS_PEB_READ_THREAD,
+	SSDFS_PEB_FLUSH_THREAD,
+	SSDFS_PEB_GC_THREAD,
+#ifdef CONFIG_SSDFS_ONLINE_FSCK
+	SSDFS_PEB_FSCK_THREAD,
+#endif /* CONFIG_SSDFS_ONLINE_FSCK */
+	SSDFS_PEB_THREAD_TYPE_MAX,
+};
+
+enum {
+	SSDFS_SEG_USING_GC_THREAD,
+	SSDFS_SEG_USED_GC_THREAD,
+	SSDFS_SEG_PRE_DIRTY_GC_THREAD,
+	SSDFS_SEG_DIRTY_GC_THREAD,
+	SSDFS_DESTROY_SEG_GC_THREAD,
+	SSDFS_BTREE_NODE_GC_THREAD,
+	SSDFS_GC_THREAD_TYPE_MAX,
+};
+
+enum {
+	SSDFS_256B	= 256,
+	SSDFS_512B	= 512,
+	SSDFS_1KB	= 1024,
+	SSDFS_2KB	= 2048,
+	SSDFS_4KB	= 4096,
+	SSDFS_8KB	= 8192,
+	SSDFS_16KB	= 16384,
+	SSDFS_32KB	= 32768,
+	SSDFS_64KB	= 65536,
+	SSDFS_128KB	= 131072,
+	SSDFS_256KB	= 262144,
+	SSDFS_512KB	= 524288,
+	SSDFS_1MB	= 1048576,
+	SSDFS_2MB	= 2097152,
+	SSDFS_4MB	= 4194304,
+	SSDFS_8MB	= 8388608,
+	SSDFS_16MB	= 16777216,
+	SSDFS_32MB	= 33554432,
+	SSDFS_64MB	= 67108864,
+	SSDFS_128MB	= 134217728,
+	SSDFS_256MB	= 268435456,
+	SSDFS_512MB	= 536870912,
+	SSDFS_1GB	= 1073741824,
+	SSDFS_2GB	= 2147483648,
+	SSDFS_4GB	= 4294967296,
+	SSDFS_8GB	= 8589934592,
+	SSDFS_16GB	= 17179869184,
+	SSDFS_32GB	= 34359738368,
+	SSDFS_64GB	= 68719476736,
+};
+
+#define SSDFS_256B_STRING	"265B"
+#define SSDFS_512B_STRING	"512B"
+#define SSDFS_1KB_STRING	"1KB"
+#define SSDFS_2KB_STRING	"2KB"
+#define SSDFS_4KB_STRING	"4KB"
+#define SSDFS_8KB_STRING	"8KB"
+#define SSDFS_16KB_STRING	"16KB"
+#define SSDFS_32KB_STRING	"32KB"
+#define SSDFS_64KB_STRING	"64KB"
+#define SSDFS_128KB_STRING	"128KB"
+#define SSDFS_256KB_STRING	"256KB"
+#define SSDFS_512KB_STRING	"512KB"
+#define SSDFS_1MB_STRING	"1MB"
+#define SSDFS_2MB_STRING	"2MB"
+#define SSDFS_4MB_STRING	"4MB"
+#define SSDFS_8MB_STRING	"8MB"
+#define SSDFS_16MB_STRING	"16MB"
+#define SSDFS_32MB_STRING	"32MB"
+#define SSDFS_64MB_STRING	"64MB"
+#define SSDFS_128MB_STRING	"128MB"
+#define SSDFS_256MB_STRING	"256MB"
+#define SSDFS_512MB_STRING	"512MB"
+#define SSDFS_1GB_STRING	"1GB"
+#define SSDFS_2GB_STRING	"2GB"
+#define SSDFS_4GB_STRING	"4GB"
+#define SSDFS_8GB_STRING	"8GB"
+#define SSDFS_16GB_STRING	"16GB"
+#define SSDFS_32GB_STRING	"32GB"
+#define SSDFS_64GB_STRING	"64GB"
+#define SSDFS_NUMBER_UNKNOWN	"unknown"
+
+enum {
+	SSDFS_UNKNOWN_PAGE_TYPE,
+	SSDFS_USER_DATA_PAGES,
+	SSDFS_METADATA_PAGES,
+	SSDFS_PAGES_TYPE_MAX
+};
+
+#define SSDFS_INVALID_CNO		U64_MAX
+#define SSDFS_SECTOR_SHIFT		(9)
+#define SSDFS_DEFAULT_TIMEOUT		(msecs_to_jiffies(120000))
+#define SSDFS_DEFAULT_TIMEOUT_NS	(jiffies64_to_nsecs(SSDFS_DEFAULT_TIMEOUT))
+#define SSDFS_USER_DATA_TIME_FACTOR	(3)
+#define SSDFS_INDEX_NODE_TIME_FACTOR	(5)
+#define SSDFS_LEAF_NODE_TIME_FACTOR	(5)
+#define SSDFS_HYBRID_NODE_TIME_FACTOR	(10)
+#define SSDFS_NANOSECS_PER_SEC		(1000000000)
+#define SSDFS_SECS_PER_MINUTE		(60)
+#define SSDFS_NANOSECS_PER_MINUTE	((u64)SSDFS_SECS_PER_MINUTE * \
+					 SSDFS_NANOSECS_PER_SEC)
+#define SSDFS_SECS_PER_HOUR		(60 * 60)
+#define SSDFS_HOURS_PER_DAY		(24)
+#define SSDFS_DAYS_PER_WEEK		(7)
+#define SSDFS_WEEKS_PER_MONTH		(4)
+#define SSDFS_EXTENT_LEN_MAX		(8)
+#define SSDFS_MAX_NUMBER_OF_TRIES	(10)
+#define SSDFS_UNMOUNT_NUMBER_OF_TRIES	(300)
+#define SSDFS_SB_SNAPSHOT_LOG_PAGES	(2)
+#define SSDFS_DEFAULT_PROTECTION_RANGE	((u64)20 * SSDFS_NANOSECS_PER_MINUTE)
+
+/*
+ * Every PEB contains a sequence of logs. Log starts from
+ * header and could be ended by footer. Header and footer
+ * requires as minimum 2 logical blocks for metadata.
+ * It needs to prevent 2 logical blocks from allocation
+ * in every erase block (PEB).
+ */
+#define SSDFS_RESERVED_FREE_PAGE_THRESHOLD_PER_PEB	(2)
+
+static inline
+const char *GRANULARITY2STRING(u64 value)
+{
+	switch (value) {
+	case SSDFS_256B:
+		return SSDFS_256B_STRING;
+	case SSDFS_512B:
+		return SSDFS_512B_STRING;
+	case SSDFS_1KB:
+		return SSDFS_1KB_STRING;
+	case SSDFS_2KB:
+		return SSDFS_2KB_STRING;
+	case SSDFS_4KB:
+		return SSDFS_4KB_STRING;
+	case SSDFS_8KB:
+		return SSDFS_8KB_STRING;
+	case SSDFS_16KB:
+		return SSDFS_16KB_STRING;
+	case SSDFS_32KB:
+		return SSDFS_32KB_STRING;
+	case SSDFS_64KB:
+		return SSDFS_64KB_STRING;
+	case SSDFS_128KB:
+		return SSDFS_128KB_STRING;
+	case SSDFS_256KB:
+		return SSDFS_256KB_STRING;
+	case SSDFS_512KB:
+		return SSDFS_512KB_STRING;
+	case SSDFS_1MB:
+		return SSDFS_1MB_STRING;
+	case SSDFS_2MB:
+		return SSDFS_2MB_STRING;
+	case SSDFS_4MB:
+		return SSDFS_4MB_STRING;
+	case SSDFS_8MB:
+		return SSDFS_8MB_STRING;
+	case SSDFS_16MB:
+		return SSDFS_16MB_STRING;
+	case SSDFS_32MB:
+		return SSDFS_32MB_STRING;
+	case SSDFS_64MB:
+		return SSDFS_64MB_STRING;
+	case SSDFS_128MB:
+		return SSDFS_128MB_STRING;
+	case SSDFS_256MB:
+		return SSDFS_256MB_STRING;
+	case SSDFS_512MB:
+		return SSDFS_512MB_STRING;
+	case SSDFS_1GB:
+		return SSDFS_1GB_STRING;
+	case SSDFS_2GB:
+		return SSDFS_2GB_STRING;
+	case SSDFS_4GB:
+		return SSDFS_4GB_STRING;
+	case SSDFS_8GB:
+		return SSDFS_8GB_STRING;
+	case SSDFS_16GB:
+		return SSDFS_16GB_STRING;
+	case SSDFS_32GB:
+		return SSDFS_32GB_STRING;
+	case SSDFS_64GB:
+		return SSDFS_64GB_STRING;
+	}
+
+	return SSDFS_NUMBER_UNKNOWN;
+}
+
+enum {
+	SSDFS_PARENT_LOCK = 0,
+	SSDFS_CHILD_LOCK,
+};
+
+#endif /* _SSDFS_CONSTANTS_H */
diff --git a/fs/ssdfs/ssdfs_fs_info.h b/fs/ssdfs/ssdfs_fs_info.h
new file mode 100644
index 000000000000..082b4314095e
--- /dev/null
+++ b/fs/ssdfs/ssdfs_fs_info.h
@@ -0,0 +1,821 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/ssdfs_fs_info.h - in-core fs information.
+ *
+ * Copyright (c) 2019-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_FS_INFO_H
+#define _SSDFS_FS_INFO_H
+
+/* Global FS states */
+enum {
+	SSDFS_UNKNOWN_GLOBAL_FS_STATE,
+	SSDFS_REGULAR_FS_OPERATIONS,
+	SSDFS_METADATA_GOING_FLUSHING,
+	SSDFS_METADATA_UNDER_FLUSH,
+	SSDFS_UNMOUNT_METADATA_GOING_FLUSHING,
+	SSDFS_UNMOUNT_METADATA_UNDER_FLUSH,
+	SSDFS_UNMOUNT_MAPTBL_UNDER_FLUSH,
+	SSDFS_UNMOUNT_COMMIT_SUPERBLOCK,
+	SSDFS_UNMOUNT_DESTROY_METADATA,
+	SSDFS_GLOBAL_FS_STATE_MAX
+};
+
+/*
+ * struct ssdfs_volume_block - logical block
+ * @seg_id: segment ID
+ * @blk_index: block index in segment
+ */
+struct ssdfs_volume_block {
+	u64 seg_id;
+	u16 blk_index;
+};
+
+/*
+ * struct ssdfs_volume_extent - logical extent
+ * @start: initial logical block
+ * @len: extent length
+ */
+struct ssdfs_volume_extent {
+	struct ssdfs_volume_block start;
+	u16 len;
+};
+
+/*
+ * struct ssdfs_peb_extent - PEB's extent
+ * @leb_id: LEB ID
+ * @peb_id: PEB ID
+ * @page_offset: offset in pages
+ * @pages_count: pages count
+ */
+struct ssdfs_peb_extent {
+	u64 leb_id;
+	u64 peb_id;
+	u32 page_offset;
+	u32 pages_count;
+};
+
+/*
+ * struct ssdfs_zone_fragment - zone fragment
+ * @ino: inode identification number
+ * @logical_blk_offset: logical offset from file's beginning in blocks
+ * @extent: zone fragment descriptor
+ */
+struct ssdfs_zone_fragment {
+	u64 ino;
+	u64 logical_blk_offset;
+	struct ssdfs_raw_extent extent;
+};
+
+/*
+ * struct ssdfs_metadata_options - metadata options
+ * @blk_bmap.flags: block bitmap's flags
+ * @blk_bmap.compression: compression type
+ *
+ * @blk2off_tbl.flags: offset translation table's flags
+ * @blk2off_tbl.compression: compression type
+ *
+ * @user_data.flags: user data's flags
+ * @user_data.compression: compression type
+ * @user_data.migration_threshold: default value of destination PEBs in migration
+ */
+struct ssdfs_metadata_options {
+	struct {
+		u16 flags;
+		u8 compression;
+	} blk_bmap;
+
+	struct {
+		u16 flags;
+		u8 compression;
+	} blk2off_tbl;
+
+	struct {
+		u16 flags;
+		u8 compression;
+		u16 migration_threshold;
+	} user_data;
+};
+
+/*
+ * struct ssdfs_sb_info - superblock info
+ * @vh_buf: volume header buffer
+ * @vh_buf_size: size of volume header buffer in bytes
+ * @vs_buf: volume state buffer
+ * @vs_buf_size: size of volume state buffer in bytes
+ * @last_log: latest sb log
+ */
+struct ssdfs_sb_info {
+	void *vh_buf;
+	size_t vh_buf_size;
+	void *vs_buf;
+	size_t vs_buf_size;
+	struct ssdfs_peb_extent last_log;
+};
+
+/*
+ * struct ssdfs_sb_snapshot_seg_info - superblock snapshot segment info
+ * @need_snapshot_sb: does it need to snapshot superblock state?
+ * @sequence_id: index of log in the sequence
+ * @last_log: latest segment log
+ * @req: erase and re-write segment request
+ */
+struct ssdfs_sb_snapshot_seg_info {
+	bool need_snapshot_sb;
+	int sequence_id;
+	struct ssdfs_peb_extent last_log;
+	struct ssdfs_segment_request *req;
+};
+
+/*
+ * struct ssdfs_device_ops - device operations
+ * @device_name: get device name
+ * @device_size: get device size in bytes
+ * @open_zone: open zone
+ * @reopen_zone: reopen closed zone
+ * @close_zone: close zone
+ * @read: read from device
+ * @read_block: read logical block
+ * @read_blocks: read sequence of logical blocks
+ * @can_write_block: can we write into logical block?
+ * @write_block: write logical block to device
+ * @write_blocks: write sequence of logical blocks to device
+ * @erase: erase block
+ * @trim: support of background erase operation
+ * @peb_isbad: check that physical erase block is bad
+ * @sync: synchronize page cache with device
+ */
+struct ssdfs_device_ops {
+	const char * (*device_name)(struct super_block *sb);
+	__u64 (*device_size)(struct super_block *sb);
+	int (*open_zone)(struct super_block *sb, loff_t offset);
+	int (*reopen_zone)(struct super_block *sb, loff_t offset);
+	int (*close_zone)(struct super_block *sb, loff_t offset);
+	int (*read)(struct super_block *sb, u32 block_size,
+		    loff_t offset, size_t len, void *buf);
+	int (*read_block)(struct super_block *sb, struct folio *folio,
+			  loff_t offset);
+	int (*read_blocks)(struct super_block *sb, struct folio_batch *batch,
+			   loff_t offset);
+	int (*can_write_block)(struct super_block *sb, u32 block_size,
+				loff_t offset, bool need_check);
+	int (*write_block)(struct super_block *sb, loff_t offset,
+			   struct folio *folio);
+	int (*write_blocks)(struct super_block *sb, loff_t offset,
+			    struct folio_batch *batch);
+	int (*erase)(struct super_block *sb, loff_t offset, size_t len);
+	int (*trim)(struct super_block *sb, loff_t offset, size_t len);
+	int (*peb_isbad)(struct super_block *sb, loff_t offset);
+	int (*mark_peb_bad)(struct super_block *sb, loff_t offset);
+	void (*sync)(struct super_block *sb);
+};
+
+/*
+ * struct ssdfs_snapshot_subsystem - snapshots subsystem
+ * @reqs_queue: snapshot requests queue
+ * @rules_list: snapshot rules list
+ * @tree: snapshots btree
+ */
+struct ssdfs_snapshot_subsystem {
+	struct ssdfs_snapshot_reqs_queue reqs_queue;
+	struct ssdfs_snapshot_rules_list rules_list;
+	struct ssdfs_snapshots_btree_info *tree;
+};
+
+/*
+ * Option possible states
+ */
+enum {
+	/* REQUEST */
+	SSDFS_IGNORE_OPTION,
+	SSDFS_ENABLE_OPTION,
+	SSDFS_DISABLE_OPTION,
+
+	/* RESPONCE */
+	SSDFS_DONT_SUPPORT_OPTION,
+	SSDFS_USE_RECOMMENDED_VALUE,
+	SSDFS_UNRECOGNIZED_VALUE,
+	SSDFS_NOT_IMPLEMENTED_OPTION,
+	SSDFS_OPTION_HAS_BEEN_APPLIED,
+};
+
+/*
+ * struct ssdfs_tunefs_option - option state
+ * @state: option state (ignore, enable, disable)
+ * @value: option value
+ * @recommended_value: value returned by driver as better option
+ */
+struct ssdfs_tunefs_option {
+	int state;
+	int value;
+	int recommended_value;
+};
+
+/*
+ * struct ssdfs_tunefs_volume_label_option - volume label option
+ * @state: option state (ignore, enable, disable)
+ * @volume_label: volume label
+ */
+struct ssdfs_tunefs_volume_label_option {
+	int state;
+	char volume_label[SSDFS_VOLUME_LABEL_MAX];
+};
+
+/*
+ * struct ssdfs_tunefs_blkbmap_options - block bitmap options
+ * @has_backup_copy: backup copy is present?
+ * @compression: compression type
+ */
+struct ssdfs_tunefs_blkbmap_options {
+	struct ssdfs_tunefs_option has_backup_copy;
+	struct ssdfs_tunefs_option compression;
+};
+
+/*
+ * struct ssdfs_tunefs_blk2off_table_options - offsets table options
+ * @has_backup_copy: backup copy is present?
+ * @compression: compression type
+ */
+struct ssdfs_tunefs_blk2off_table_options {
+	struct ssdfs_tunefs_option has_backup_copy;
+	struct ssdfs_tunefs_option compression;
+};
+
+/*
+ * struct ssdfs_tunefs_segbmap_options - segment bitmap options
+ * @has_backup_copy: backup copy is present?
+ * @log_pages: count of pages in the full log
+ * @migration_threshold: max amount of migrating PEBs for segment
+ * @compression: compression type
+ */
+struct ssdfs_tunefs_segbmap_options {
+	struct ssdfs_tunefs_option has_backup_copy;
+	struct ssdfs_tunefs_option log_pages;
+	struct ssdfs_tunefs_option migration_threshold;
+	struct ssdfs_tunefs_option compression;
+};
+
+/*
+ * struct ssdfs_tunefs_maptbl_options - PEB mapping table options
+ * @has_backup_copy: backup copy is present?
+ * @log_pages: count of pages in the full log
+ * @migration_threshold: max amount of migrating PEBs for segment
+ * @reserved_pebs_per_fragment: percentage of reserved PEBs for fragment
+ * @compression: compression type
+ */
+struct ssdfs_tunefs_maptbl_options {
+	struct ssdfs_tunefs_option has_backup_copy;
+	struct ssdfs_tunefs_option log_pages;
+	struct ssdfs_tunefs_option migration_threshold;
+	struct ssdfs_tunefs_option reserved_pebs_per_fragment;
+	struct ssdfs_tunefs_option compression;
+};
+
+/*
+ * struct ssdfs_tunefs_btree_options - btree options
+ * @min_index_area_size: minimal index area's size in bytes
+ * @lnode_log_pages: leaf node's log pages
+ * @hnode_log_pages: hybrid node's log pages
+ * @inode_log_pages: index node's log pages
+ */
+struct ssdfs_tunefs_btree_options {
+	struct ssdfs_tunefs_option min_index_area_size;
+	struct ssdfs_tunefs_option lnode_log_pages;
+	struct ssdfs_tunefs_option hnode_log_pages;
+	struct ssdfs_tunefs_option inode_log_pages;
+};
+
+/*
+ * struct ssdfs_tunefs_user_data_options - user data options
+ * @log_pages: count of pages in the full log
+ * @migration_threshold: max amount of migrating PEBs for segment
+ * @compression: compression type
+ */
+struct ssdfs_tunefs_user_data_options {
+	struct ssdfs_tunefs_option log_pages;
+	struct ssdfs_tunefs_option migration_threshold;
+	struct ssdfs_tunefs_option compression;
+};
+
+/*
+ * struct ssdfs_current_volume_config - current volume config
+ * @fs_uuid: 128-bit volume's uuid
+ * @fs_label: volume name
+ * @nsegs: number of segments on the volume
+ * @pagesize: page size in bytes
+ * @erasesize: physical erase block size in bytes
+ * @segsize: segment size in bytes
+ * @pebs_per_seg: physical erase blocks per segment
+ * @pages_per_peb: pages per physical erase block
+ * @pages_per_seg: pages per segment
+ * @leb_pages_capacity: maximal number of logical blocks per LEB
+ * @peb_pages_capacity: maximal number of NAND pages can be written per PEB
+ * @fs_ctime: volume create timestamp (mkfs phase)
+ * @raw_inode_size: raw inode size in bytes
+ * @create_threads_per_seg: number of creation threads per segment
+ * @metadata_options: metadata options
+ * @sb_seg_log_pages: full log size in sb segment (pages count)
+ * @segbmap_log_pages: full log size in segbmap segment (pages count)
+ * @segbmap_flags: segment bitmap flags
+ * @maptbl_log_pages: full log size in maptbl segment (pages count)
+ * @maptbl_flags: mapping table flags
+ * @lnodes_seg_log_pages: full log size in leaf nodes segment (pages count)
+ * @hnodes_seg_log_pages: full log size in hybrid nodes segment (pages count)
+ * @inodes_seg_log_pages: full log size in index nodes segment (pages count)
+ * @user_data_log_pages: full log size in user data segment (pages count)
+ * @migration_threshold: default value of destination PEBs in migration
+ * @is_zns_device: file system volume is on ZNS device
+ * @zone_size: zone size in bytes
+ * @zone_capacity: zone capacity in bytes available for write operations
+ * @max_open_zones: open zones limitation (upper bound)
+ */
+struct ssdfs_current_volume_config {
+	unsigned char fs_uuid[SSDFS_UUID_SIZE];
+	char fs_label[SSDFS_VOLUME_LABEL_MAX];
+
+	u64 nsegs;
+	u32 pagesize;
+	u32 erasesize;
+	u32 segsize;
+	u32 pebs_per_seg;
+	u32 pages_per_peb;
+	u32 pages_per_seg;
+	u32 leb_pages_capacity;
+	u32 peb_pages_capacity;
+	u64 fs_ctime;
+	u16 raw_inode_size;
+	u16 create_threads_per_seg;
+
+	struct ssdfs_metadata_options metadata_options;
+
+	u16 sb_seg_log_pages;
+
+	u16 segbmap_log_pages;
+	u16 segbmap_flags;
+
+	u16 maptbl_log_pages;
+	u16 maptbl_flags;
+
+	u16 lnodes_seg_log_pages;
+	u16 hnodes_seg_log_pages;
+	u16 inodes_seg_log_pages;
+	u16 user_data_log_pages;
+
+	u16 migration_threshold;
+
+	int is_zns_device;
+	u64 zone_size;
+	u64 zone_capacity;
+	u32 max_open_zones;
+};
+
+/*
+ * struct ssdfs_tunefs_config_request - tunefs config request
+ * @label: volume label option
+ * @blkbmap: block bitmap options
+ * @blk2off_tbl: offset translation table options
+ * @segbmap: segment bitmap options
+ * @maptbl: PEB mapping table options
+ * @btree: btree options
+ * @user_data_seg: user data segment's options
+ */
+struct ssdfs_tunefs_config_request {
+	struct ssdfs_tunefs_volume_label_option label;
+	struct ssdfs_tunefs_blkbmap_options blkbmap;
+	struct ssdfs_tunefs_blk2off_table_options blk2off_tbl;
+	struct ssdfs_tunefs_segbmap_options segbmap;
+	struct ssdfs_tunefs_maptbl_options maptbl;
+	struct ssdfs_tunefs_btree_options btree;
+	struct ssdfs_tunefs_user_data_options user_data_seg;
+};
+
+/*
+ * struct ssdfs_tunefs_options - tunefs options
+ * @old_config: current volume configuration
+ * @new_config: tunefs configuration request
+ */
+struct ssdfs_tunefs_options {
+	struct ssdfs_current_volume_config old_config;
+	struct ssdfs_tunefs_config_request new_config;
+};
+
+/*
+ * struct ssdfs_tunefs_request_copy - tunefs request copy
+ * @lock: tunefs request lock
+ * @state: state of the tunefs request copy
+ * @new_config: new config request
+ */
+struct ssdfs_tunefs_request_copy {
+	struct mutex lock;
+	int state;
+	struct ssdfs_tunefs_config_request new_config;
+};
+
+/*
+ * struct ssdfs_requests_queue - requests queue descriptor
+ * @lock: requests queue's lock
+ * @list: requests queue's list
+ */
+struct ssdfs_requests_queue {
+	spinlock_t lock;
+	struct list_head list;
+};
+
+/*
+ * struct ssdfs_seg_objects_queue - segment objects queue descriptor
+ * @lock: segment objects queue's lock
+ * @list: segment objects queue's list
+ */
+struct ssdfs_seg_objects_queue {
+	spinlock_t lock;
+	struct list_head list;
+};
+
+/*
+ * struct ssdfs_global_fsck_thread - global fsck thread info
+ * @thread: thread info
+ * @wait_queue: wait queue
+ * @rq: requests queue
+ */
+struct ssdfs_global_fsck_thread {
+	struct ssdfs_thread_info thread;
+	wait_queue_head_t wait_queue;
+	struct ssdfs_requests_queue rq;
+};
+
+/*
+ * struct ssdfs_btree_nodes_list - btree nodes list
+ * @lock: btree nodes list's lock
+ * @list: btree nodes list's list
+ */
+struct ssdfs_btree_nodes_list {
+	spinlock_t lock;
+	struct list_head list;
+};
+
+/*
+ * struct ssdfs_fs_info - in-core fs information
+ * @log_pagesize: log2(page size)
+ * @pagesize: page size in bytes
+ * @log_erasesize: log2(erase block size)
+ * @erasesize: physical erase block size in bytes
+ * @log_segsize: log2(segment size)
+ * @segsize: segment size in bytes
+ * @log_pebs_per_seg: log2(erase blocks per segment)
+ * @pebs_per_seg: physical erase blocks per segment
+ * @pages_per_peb: pages per physical erase block
+ * @pages_per_seg: pages per segment
+ * @leb_pages_capacity: maximal number of logical blocks per LEB
+ * @peb_pages_capacity: maximal number of NAND pages can be written per PEB
+ * @lebs_per_peb_index: difference of LEB IDs between PEB indexes in segment
+ * @fs_ctime: volume create timestamp (mkfs phase)
+ * @fs_cno: volume create checkpoint
+ * @raw_inode_size: raw inode size in bytes
+ * @create_threads_per_seg: number of creation threads per segment
+ * @mount_opts: mount options
+ * @metadata_options: metadata options
+ * @volume_sem: volume semaphore
+ * @last_vh: buffer for last valid volume header
+ * @vh: volume header
+ * @vs: volume state
+ * @sbi: superblock info
+ * @sbi_backup: backup copy of superblock info
+ * @sb_snapi: superblock snapshot segment info
+ * @sb_seg_log_pages: full log size in sb segment (pages count)
+ * @segbmap_log_pages: full log size in segbmap segment (pages count)
+ * @maptbl_log_pages: full log size in maptbl segment (pages count)
+ * @lnodes_seg_log_pages: full log size in leaf nodes segment (pages count)
+ * @hnodes_seg_log_pages: full log size in hybrid nodes segment (pages count)
+ * @inodes_seg_log_pages: full log size in index nodes segment (pages count)
+ * @user_data_log_pages: full log size in user data segment (pages count)
+ * @volume_state_lock: lock for mutable volume metadata
+ * @free_pages: free pages count on the volume
+ * @reserved_new_user_data_pages: reserved pages of growing files' content
+ * @updated_user_data_pages: number of updated pages of files' content
+ * @read_user_data_requests: number of user data processing read/init request
+ * @flushing_user_data_requests: number of user data processing flush request
+ * @commit_log_requests: number of commit log request
+ * @pending_wq: wait queue for flush threads of user data segments
+ * @finish_user_data_read_wq: wait queue for waiting the end of user data reads
+ * @finish_user_data_flush_wq: wait queue for waiting the end of user data flush
+ * @finish_commit_log_flush_wq: wait queue for waiting the end of commit logs
+ * @fs_mount_time: file system mount timestamp
+ * @fs_mod_time: last write timestamp
+ * @fs_mount_cno: mount checkpoint
+ * @boot_vs_mount_timediff: difference between boottime and mounttime
+ * @fs_flags: file system flags
+ * @fs_state: file system state
+ * @fs_errors: behaviour when detecting errors
+ * @fs_feature_compat: compatible feature set
+ * @fs_feature_compat_ro: read-only compatible feature set
+ * @fs_feature_incompat: incompatible feature set
+ * @fs_uuid: 128-bit volume's uuid
+ * @fs_label: volume name
+ * @migration_threshold: default value of destination PEBs in migration
+ * @resize_mutex: resize mutex
+ * @nsegs: number of segments on the volume
+ * @sb_segs_sem: semaphore for superblock's array of LEB/PEB numbers
+ * @sb_lebs: array of LEB ID numbers
+ * @sb_pebs: array of PEB ID numbers
+ * @segbmap: segment bitmap object
+ * @segbmap_users: current number of segment bitmap's users
+ * @segbmap_users_wq: waiting queue of finishing segbmap operations before umount
+ * @maptbl: PEB mapping table object
+ * @maptbl_cache: maptbl cache
+ * @maptbl_users: current number of mapping table's users
+ * @maptbl_users_wq: waiting queue of finishing maptbl operations before umount
+ * @segs_tree: tree of segment objects
+ * @cur_segs: array of current segments
+ * @shextree: shared extents tree
+ * @shdictree: shared dictionary
+ * @inodes_tree: inodes btree
+ * @invextree: invalidated extents btree
+ * @snapshots: snapshots subsystem
+ * @btree_nodes: list of all btree nodes
+ * @gc_thread: array of GC threads
+ * @gc_wait_queue: array of GC threads' wait queues
+ * @gc_should_act: array of counters that define necessity of GC activity
+ * @pre_destroyed_segs_rq: pre destroy segment objects queue
+ * @flush_reqs: current number of flush requests
+ * @global_fsck: global fsck thread
+ * @sb: pointer on VFS superblock object
+ * @mtd: MTD info
+ * @devops: device access operations
+ * @pending_bios: count of pending BIOs (dev_bdev.c ONLY)
+ * @erase_folio: folio with content for erase operation (dev_bdev.c ONLY)
+ * @is_zns_device: file system volume is on ZNS device
+ * @zone_size: zone size in bytes
+ * @zone_capacity: zone capacity in bytes available for write operations
+ * @max_open_zones: open zones limitation (upper bound)
+ * @open_zones: current number of opened zones
+ * @fsck_priority: define priority of FSCK operations
+ * @tunefs_request: tunefs request copy
+ * @dev_kobj: /sys/fs/ssdfs/<device> kernel object
+ * @dev_kobj_unregister: completion state for <device> kernel object
+ * @maptbl_kobj: /sys/fs/<ssdfs>/<device>/maptbl kernel object
+ * @maptbl_kobj_unregister: completion state for maptbl kernel object
+ * @segbmap_kobj: /sys/fs/<ssdfs>/<device>/segbmap kernel object
+ * @segbmap_kobj_unregister: completion state for segbmap kernel object
+ * @segments_kobj: /sys/fs/<ssdfs>/<device>/segments kernel object
+ * @segments_kobj_unregister: completion state for segments kernel object
+ * @inodes_tree_kobj: /sys/fs/<ssdfs>/<device>/inodes_tree kernel object
+ * @inodes_tree_kobj_unregister: completion state for inodes_tree kernel object
+ * @snapshots_tree_kobj: /sys/fs/<ssdfs>/<device>/snapshots_tree kernel object
+ * @snapshots_tree_kobj_unregister: completion state for snapshots_tree kernel object
+ * @shared_dict_kobj: /sys/fs/<ssdfs>/<device>/shared_dict kernel object
+ * @shared_dict_kobj_unregister: completion state for shared_dict kernel object
+ * @invextree_kobj: /sys/fs/<ssdfs>/<device>/invextree kernel object
+ * @invextree_kobj_unregister: completion state for invextree kernel object
+ */
+struct ssdfs_fs_info {
+	u8 log_pagesize;
+	u32 pagesize;
+	u8 log_erasesize;
+	u32 erasesize;
+	u8 log_segsize;
+	u32 segsize;
+	u8 log_pebs_per_seg;
+	u32 pebs_per_seg;
+	u32 pages_per_peb;
+	u32 pages_per_seg;
+	u32 leb_pages_capacity;
+	u32 peb_pages_capacity;
+	u32 lebs_per_peb_index;
+	u64 fs_ctime;
+	u64 fs_cno;
+	u16 raw_inode_size;
+	u16 create_threads_per_seg;
+
+	unsigned long mount_opts;
+	struct ssdfs_metadata_options metadata_options;
+
+	struct rw_semaphore volume_sem;
+	struct ssdfs_volume_header last_vh;
+	struct ssdfs_volume_header *vh;
+	struct ssdfs_volume_state *vs;
+	struct ssdfs_sb_info sbi;
+	struct ssdfs_sb_info sbi_backup;
+	struct ssdfs_sb_snapshot_seg_info sb_snapi;
+	u16 sb_seg_log_pages;
+	u16 segbmap_log_pages;
+	u16 maptbl_log_pages;
+	u16 lnodes_seg_log_pages;
+	u16 hnodes_seg_log_pages;
+	u16 inodes_seg_log_pages;
+	u16 user_data_log_pages;
+
+	atomic_t global_fs_state;
+	struct completion mount_end;
+
+	spinlock_t volume_state_lock;
+	u64 free_pages;
+	u64 reserved_new_user_data_pages;
+	u64 updated_user_data_pages;
+	u64 read_user_data_requests;
+	u64 flushing_user_data_requests;
+	u64 commit_log_requests;
+	wait_queue_head_t pending_wq;
+	wait_queue_head_t finish_user_data_read_wq;
+	wait_queue_head_t finish_user_data_flush_wq;
+	wait_queue_head_t finish_commit_log_flush_wq;
+	u64 fs_mount_time;
+	u64 fs_mod_time;
+	u64 fs_mount_cno;
+	u64 boot_vs_mount_timediff;
+	u32 fs_flags;
+	u16 fs_state;
+	u16 fs_errors;
+	u64 fs_feature_compat;
+	u64 fs_feature_compat_ro;
+	u64 fs_feature_incompat;
+	unsigned char fs_uuid[SSDFS_UUID_SIZE];
+	char fs_label[SSDFS_VOLUME_LABEL_MAX];
+	u16 migration_threshold;
+
+	struct mutex resize_mutex;
+	u64 nsegs;
+
+	struct rw_semaphore sb_segs_sem;
+	u64 sb_lebs[SSDFS_SB_CHAIN_MAX][SSDFS_SB_SEG_COPY_MAX];
+	u64 sb_pebs[SSDFS_SB_CHAIN_MAX][SSDFS_SB_SEG_COPY_MAX];
+
+	struct ssdfs_segment_bmap *segbmap;
+	atomic_t segbmap_users;
+	wait_queue_head_t segbmap_users_wq;
+
+	struct ssdfs_peb_mapping_table *maptbl;
+	struct ssdfs_maptbl_cache maptbl_cache;
+	atomic_t maptbl_users;
+	wait_queue_head_t maptbl_users_wq;
+
+	struct ssdfs_segment_tree *segs_tree;
+	struct ssdfs_current_segs_array *cur_segs;
+
+	struct ssdfs_shared_extents_tree *shextree;
+	struct ssdfs_shared_dict_btree_info *shdictree;
+	struct ssdfs_inodes_btree_info *inodes_tree;
+	struct ssdfs_invextree_info *invextree;
+
+	struct ssdfs_snapshot_subsystem snapshots;
+
+	struct ssdfs_btree_nodes_list btree_nodes;
+
+	struct ssdfs_thread_info gc_thread[SSDFS_GC_THREAD_TYPE_MAX];
+	wait_queue_head_t gc_wait_queue[SSDFS_GC_THREAD_TYPE_MAX];
+	atomic_t gc_should_act[SSDFS_GC_THREAD_TYPE_MAX];
+	struct ssdfs_seg_objects_queue pre_destroyed_segs_rq;
+	atomic64_t flush_reqs;
+
+	struct ssdfs_global_fsck_thread global_fsck;
+
+	struct super_block *sb;
+
+	struct mtd_info *mtd;
+	const struct ssdfs_device_ops *devops;
+	atomic_t pending_bios;			/* for dev_bdev.c */
+	struct folio *erase_folio;		/* for dev_bdev.c */
+
+	bool is_zns_device;
+	u64 zone_size;
+	u64 zone_capacity;
+	u32 max_open_zones;
+	atomic_t open_zones;
+
+#ifdef CONFIG_SSDFS_ONLINE_FSCK
+	atomic_t fsck_priority;
+#endif /* CONFIG_SSDFS_ONLINE_FSCK */
+
+	struct ssdfs_tunefs_request_copy tunefs_request;
+
+	/* /sys/fs/ssdfs/<device> */
+	struct kobject dev_kobj;
+	struct completion dev_kobj_unregister;
+
+	/* /sys/fs/<ssdfs>/<device>/maptbl */
+	struct kobject maptbl_kobj;
+	struct completion maptbl_kobj_unregister;
+
+	/* /sys/fs/<ssdfs>/<device>/maptbl/fragments */
+	struct kobject maptbl_frags_kobj;
+	struct completion maptbl_frags_kobj_unregister;
+
+	/* /sys/fs/<ssdfs>/<device>/segbmap */
+	struct kobject segbmap_kobj;
+	struct completion segbmap_kobj_unregister;
+
+	/* /sys/fs/<ssdfs>/<device>/segbmap/fragments */
+	struct kobject segbmap_frags_kobj;
+	struct completion segbmap_frags_kobj_unregister;
+
+	/* /sys/fs/<ssdfs>/<device>/segments */
+	struct kobject segments_kobj;
+	struct completion segments_kobj_unregister;
+
+	/* /sys/fs/<ssdfs>/<device>/inodes_tree */
+	struct kobject inodes_tree_kobj;
+	struct completion inodes_tree_kobj_unregister;
+
+	/* /sys/fs/<ssdfs>/<device>/snapshots_tree */
+	struct kobject snapshots_tree_kobj;
+	struct completion snapshots_tree_kobj_unregister;
+
+	/* /sys/fs/<ssdfs>/<device>/shared_dict */
+	struct kobject shared_dict_kobj;
+	struct completion shared_dict_kobj_unregister;
+
+	/* /sys/fs/<ssdfs>/<device>/invextree */
+	struct kobject invextree_kobj;
+	struct completion invextree_kobj_unregister;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	spinlock_t requests_lock;
+	struct list_head user_data_requests_list;
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_t ssdfs_writeback_folios;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+#ifdef CONFIG_SSDFS_TESTING
+	struct address_space testing_pages;
+	struct inode *testing_inode;
+	bool do_fork_invalidation;
+#endif /* CONFIG_SSDFS_TESTING */
+};
+
+#define SSDFS_FS_I(sb) \
+	((struct ssdfs_fs_info *)(sb->s_fs_info))
+
+/*
+ * GC constants
+ */
+#define SSDFS_GC_LOW_BOUND_THRESHOLD	(50)
+#define SSDFS_GC_UPPER_BOUND_THRESHOLD	(1000)
+#define SSDFS_GC_DISTANCE_THRESHOLD	(5)
+#define SSDFS_GC_DEFAULT_SEARCH_STEP	(100)
+#define SSDFS_GC_DIRTY_SEG_SEARCH_STEP	(1000)
+#define SSDFS_GC_DIRTY_SEG_DEFAULT_OPS	(50)
+
+/*
+ * GC possible states
+ */
+enum {
+	SSDFS_UNDEFINED_GC_STATE,
+	SSDFS_COLLECT_GARBAGE_NOW,
+	SSDFS_WAIT_IDLE_STATE,
+	SSDFS_STOP_GC_ACTIVITY_NOW,
+	SSDFS_GC_STATE_MAX
+};
+
+/*
+ * FSCK possible states
+ */
+enum {
+	SSDFS_UNDEFINED_FSCK_STATE = SSDFS_UNDEFINED_GC_STATE,
+	SSDFS_DO_FSCK_CHECK_NOW = SSDFS_COLLECT_GARBAGE_NOW,
+	SSDFS_FSCK_WAIT_IDLE_STATE = SSDFS_WAIT_IDLE_STATE,
+	SSDFS_STOP_FSCK_ACTIVITY_NOW = SSDFS_STOP_GC_ACTIVITY_NOW,
+	SSDFS_FSCK_STATE_MAX
+};
+
+/*
+ * struct ssdfs_io_load_stats - I/O load estimation
+ * @measurements: number of executed measurements
+ * @reqs_count: number of I/O requests for every measurement
+ */
+struct ssdfs_io_load_stats {
+	u32 measurements;
+#define SSDFS_MEASUREMENTS_MAX		(10)
+	s64 reqs_count[SSDFS_MEASUREMENTS_MAX];
+};
+
+/*
+ * GC thread functions
+ */
+int ssdfs_using_seg_gc_thread_func(void *data);
+int ssdfs_used_seg_gc_thread_func(void *data);
+int ssdfs_pre_dirty_seg_gc_thread_func(void *data);
+int ssdfs_dirty_seg_gc_thread_func(void *data);
+int ssdfs_destroy_seg_gc_thread_func(void *data);
+int ssdfs_btree_node_gc_thread_func(void *data);
+int ssdfs_start_gc_thread(struct ssdfs_fs_info *fsi, int type);
+int ssdfs_stop_gc_thread(struct ssdfs_fs_info *fsi, int type);
+int is_time_collect_garbage(struct ssdfs_fs_info *fsi,
+			    struct ssdfs_io_load_stats *io_stats);
+
+/*
+ * Device operations
+ */
+extern const struct ssdfs_device_ops ssdfs_mtd_devops;
+extern const struct ssdfs_device_ops ssdfs_bdev_devops;
+extern const struct ssdfs_device_ops ssdfs_zns_devops;
+
+#endif /* _SSDFS_FS_INFO_H */
diff --git a/fs/ssdfs/ssdfs_inode_info.h b/fs/ssdfs/ssdfs_inode_info.h
new file mode 100644
index 000000000000..350923fab81b
--- /dev/null
+++ b/fs/ssdfs/ssdfs_inode_info.h
@@ -0,0 +1,144 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/ssdfs_inode_info.h - SSDFS in-core inode.
+ *
+ * Copyright (c) 2019-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_INODE_INFO_H
+#define _SSDFS_INODE_INFO_H
+
+/*
+ * Inode flags (GETFLAGS/SETFLAGS)
+ */
+#define	SSDFS_SECRM_FL			FS_SECRM_FL	/* Secure deletion */
+#define	SSDFS_UNRM_FL			FS_UNRM_FL	/* Undelete */
+#define	SSDFS_COMPR_FL			FS_COMPR_FL	/* Compress file */
+#define SSDFS_SYNC_FL			FS_SYNC_FL	/* Synchronous updates */
+#define SSDFS_IMMUTABLE_FL		FS_IMMUTABLE_FL	/* Immutable file */
+#define SSDFS_APPEND_FL			FS_APPEND_FL	/* writes to file may only append */
+#define SSDFS_NODUMP_FL			FS_NODUMP_FL	/* do not dump file */
+#define SSDFS_NOATIME_FL		FS_NOATIME_FL	/* do not update atime */
+/* Reserved for compression usage... */
+#define SSDFS_DIRTY_FL			FS_DIRTY_FL
+#define SSDFS_COMPRBLK_FL		FS_COMPRBLK_FL	/* One or more compressed clusters */
+#define SSDFS_NOCOMP_FL			FS_NOCOMP_FL	/* Don't compress */
+#define SSDFS_ECOMPR_FL			FS_ECOMPR_FL	/* Compression error */
+/* End compression flags --- maybe not all used */
+#define SSDFS_BTREE_FL			FS_BTREE_FL	/* btree format dir */
+#define SSDFS_INDEX_FL			FS_INDEX_FL	/* hash-indexed directory */
+#define SSDFS_IMAGIC_FL			FS_IMAGIC_FL	/* AFS directory */
+#define SSDFS_JOURNAL_DATA_FL		FS_JOURNAL_DATA_FL /* Reserved for ext3 */
+#define SSDFS_NOTAIL_FL			FS_NOTAIL_FL	/* file tail should not be merged */
+#define SSDFS_DIRSYNC_FL		FS_DIRSYNC_FL	/* dirsync behaviour (directories only) */
+#define SSDFS_TOPDIR_FL			FS_TOPDIR_FL	/* Top of directory hierarchies*/
+#define SSDFS_RESERVED_FL		FS_RESERVED_FL	/* reserved for ext2 lib */
+
+#define SSDFS_FL_USER_VISIBLE		FS_FL_USER_VISIBLE	/* User visible flags */
+#define SSDFS_FL_USER_MODIFIABLE	FS_FL_USER_MODIFIABLE	/* User modifiable flags */
+
+/* Flags that should be inherited by new inodes from their parent. */
+#define SSDFS_FL_INHERITED (SSDFS_SECRM_FL | SSDFS_UNRM_FL | SSDFS_COMPR_FL |\
+			   SSDFS_SYNC_FL | SSDFS_NODUMP_FL |\
+			   SSDFS_NOATIME_FL | SSDFS_COMPRBLK_FL |\
+			   SSDFS_NOCOMP_FL | SSDFS_JOURNAL_DATA_FL |\
+			   SSDFS_NOTAIL_FL | SSDFS_DIRSYNC_FL)
+
+/* Flags that are appropriate for regular files (all but dir-specific ones). */
+#define SSDFS_REG_FLMASK (~(SSDFS_DIRSYNC_FL | SSDFS_TOPDIR_FL))
+
+/* Flags that are appropriate for non-directories/regular files. */
+#define SSDFS_OTHER_FLMASK (SSDFS_NODUMP_FL | SSDFS_NOATIME_FL)
+
+/* Mask out flags that are inappropriate for the given type of inode. */
+static inline __u32 ssdfs_mask_flags(umode_t mode, __u32 flags)
+{
+	if (S_ISDIR(mode))
+		return flags;
+	else if (S_ISREG(mode))
+		return flags & SSDFS_REG_FLMASK;
+	else
+		return flags & SSDFS_OTHER_FLMASK;
+}
+
+/*
+ * struct ssdfs_inode_info - in-core inode
+ * @vfs_inode: VFS inode object
+ * @birthtime: creation time
+ * @raw_inode_size: raw inode size in bytes
+ * @private_flags: inode's private flags
+ * @lock: inode lock
+ * @parent_ino: parent inode ID
+ * @flags: inode flags
+ * @name_hash: name's hash code
+ * @name_len: name length
+ * @extents_tree: extents btree
+ * @dentries_tree: dentries btree
+ * @xattrs_tree: extended attributes tree
+ * @inline_file: inline file buffer
+ * @raw_inode: raw inode
+ */
+struct ssdfs_inode_info {
+	struct inode vfs_inode;
+	struct timespec64 birthtime;
+	u16 raw_inode_size;
+
+	atomic_t private_flags;
+
+	struct rw_semaphore lock;
+	u64 parent_ino;
+	u32 flags;
+	u64 name_hash;
+	u16 name_len;
+	struct ssdfs_extents_btree_info *extents_tree;
+	struct ssdfs_dentries_btree_info *dentries_tree;
+	struct ssdfs_xattrs_btree_info *xattrs_tree;
+	void *inline_file;
+	struct ssdfs_inode raw_inode;
+};
+
+static inline struct ssdfs_inode_info *SSDFS_I(struct inode *inode)
+{
+	return container_of(inode, struct ssdfs_inode_info, vfs_inode);
+}
+
+static inline
+struct ssdfs_extents_btree_info *SSDFS_EXTREE(struct ssdfs_inode_info *ii)
+{
+	if (S_ISDIR(ii->vfs_inode.i_mode))
+		return NULL;
+	else
+		return ii->extents_tree;
+}
+
+static inline
+struct ssdfs_dentries_btree_info *SSDFS_DTREE(struct ssdfs_inode_info *ii)
+{
+	if (S_ISDIR(ii->vfs_inode.i_mode))
+		return ii->dentries_tree;
+	else
+		return NULL;
+}
+
+static inline
+struct ssdfs_xattrs_btree_info *SSDFS_XATTREE(struct ssdfs_inode_info *ii)
+{
+	return ii->xattrs_tree;
+}
+
+extern const struct file_operations ssdfs_dir_operations;
+extern const struct inode_operations ssdfs_dir_inode_operations;
+extern const struct file_operations ssdfs_file_operations;
+extern const struct inode_operations ssdfs_file_inode_operations;
+extern const struct address_space_operations ssdfs_aops;
+extern const struct inode_operations ssdfs_special_inode_operations;
+extern const struct inode_operations ssdfs_symlink_inode_operations;
+
+#endif /* _SSDFS_INODE_INFO_H */
diff --git a/fs/ssdfs/ssdfs_thread_info.h b/fs/ssdfs/ssdfs_thread_info.h
new file mode 100644
index 000000000000..6d18245327b5
--- /dev/null
+++ b/fs/ssdfs/ssdfs_thread_info.h
@@ -0,0 +1,43 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/ssdfs_thread_info.h - thread declarations.
+ *
+ * Copyright (c) 2019-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_THREAD_INFO_H
+#define _SSDFS_THREAD_INFO_H
+
+/*
+ * struct ssdfs_thread_info - thread info
+ * @task: task descriptor
+ * @wait: wait queue
+ * @full_stop: ending of thread's activity
+ */
+struct ssdfs_thread_info {
+	struct task_struct *task;
+	struct wait_queue_entry wait;
+	struct completion full_stop;
+};
+
+/* function prototype */
+typedef int (*ssdfs_threadfn)(void *data);
+
+/*
+ * struct ssdfs_thread_descriptor - thread descriptor
+ * @threadfn: thread's function
+ * @fmt: thread's name format
+ */
+struct ssdfs_thread_descriptor {
+	ssdfs_threadfn threadfn;
+	const char *fmt;
+};
+
+#endif /* _SSDFS_THREAD_INFO_H */
diff --git a/fs/ssdfs/version.h b/fs/ssdfs/version.h
new file mode 100644
index 000000000000..8c94b1653bf6
--- /dev/null
+++ b/fs/ssdfs/version.h
@@ -0,0 +1,9 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ */
+#ifndef _SSDFS_VERSION_H
+#define _SSDFS_VERSION_H
+
+#define SSDFS_VERSION "SSDFS v.5.47"
+
+#endif /* _SSDFS_VERSION_H */
diff --git a/include/trace/events/ssdfs.h b/include/trace/events/ssdfs.h
new file mode 100644
index 000000000000..492beb6e7c30
--- /dev/null
+++ b/include/trace/events/ssdfs.h
@@ -0,0 +1,256 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * include/trace/events/ssdfs.h - definition of tracepoints.
+ *
+ * Copyright (c) 2019-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM ssdfs
+
+#if !defined(_TRACE_SSDFS_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_SSDFS_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(ssdfs__inode,
+
+	TP_PROTO(struct inode *inode),
+
+	TP_ARGS(inode),
+
+	TP_STRUCT__entry(
+		__field(dev_t,	dev)
+		__field(ino_t,	ino)
+		__field(umode_t, mode)
+		__field(loff_t,	size)
+		__field(unsigned int, nlink)
+		__field(blkcnt_t, blocks)
+	),
+
+	TP_fast_assign(
+		__entry->dev	= inode->i_sb->s_dev;
+		__entry->ino	= inode->i_ino;
+		__entry->mode	= inode->i_mode;
+		__entry->nlink	= inode->i_nlink;
+		__entry->size	= inode->i_size;
+		__entry->blocks	= inode->i_blocks;
+	),
+
+	TP_printk("dev = (%d,%d), ino = %lu, i_mode = 0x%hx, "
+		"i_size = %lld, i_nlink = %u, i_blocks = %llu",
+		MAJOR(__entry->dev),
+		MINOR(__entry->dev),
+		(unsigned long)__entry->ino,
+		__entry->mode,
+		__entry->size,
+		(unsigned int)__entry->nlink,
+		(unsigned long long)__entry->blocks)
+);
+
+DECLARE_EVENT_CLASS(ssdfs__inode_exit,
+
+	TP_PROTO(struct inode *inode, int ret),
+
+	TP_ARGS(inode, ret),
+
+	TP_STRUCT__entry(
+		__field(dev_t,	dev)
+		__field(ino_t,	ino)
+		__field(int,	ret)
+	),
+
+	TP_fast_assign(
+		__entry->dev	= inode->i_sb->s_dev;
+		__entry->ino	= inode->i_ino;
+		__entry->ret	= ret;
+	),
+
+	TP_printk("dev = (%d,%d), ino = %lu, ret = %d",
+		MAJOR(__entry->dev),
+		MINOR(__entry->dev),
+		(unsigned long)__entry->ino,
+		__entry->ret)
+);
+
+DEFINE_EVENT(ssdfs__inode, ssdfs_inode_new,
+
+	TP_PROTO(struct inode *inode),
+
+	TP_ARGS(inode)
+);
+
+DEFINE_EVENT(ssdfs__inode_exit, ssdfs_inode_new_exit,
+
+	TP_PROTO(struct inode *inode, int ret),
+
+	TP_ARGS(inode, ret)
+);
+
+DEFINE_EVENT(ssdfs__inode, ssdfs_inode_request,
+
+	TP_PROTO(struct inode *inode),
+
+	TP_ARGS(inode)
+);
+
+DEFINE_EVENT(ssdfs__inode, ssdfs_inode_evict,
+
+	TP_PROTO(struct inode *inode),
+
+	TP_ARGS(inode)
+);
+
+DEFINE_EVENT(ssdfs__inode, ssdfs_iget,
+
+	TP_PROTO(struct inode *inode),
+
+	TP_ARGS(inode)
+);
+
+DEFINE_EVENT(ssdfs__inode_exit, ssdfs_iget_exit,
+
+	TP_PROTO(struct inode *inode, int ret),
+
+	TP_ARGS(inode, ret)
+);
+
+TRACE_EVENT(ssdfs_sync_fs,
+
+	TP_PROTO(struct super_block *sb, int wait),
+
+	TP_ARGS(sb, wait),
+
+	TP_STRUCT__entry(
+		__field(dev_t,	dev)
+		__field(int,	wait)
+	),
+
+	TP_fast_assign(
+		__entry->dev	= sb->s_dev;
+		__entry->wait	= wait;
+	),
+
+	TP_printk("dev = (%d,%d), wait = %d",
+		MAJOR(__entry->dev),
+		MINOR(__entry->dev),
+		__entry->wait)
+);
+
+TRACE_EVENT(ssdfs_sync_fs_exit,
+
+	TP_PROTO(struct super_block *sb, int wait, int ret),
+
+	TP_ARGS(sb, wait, ret),
+
+	TP_STRUCT__entry(
+		__field(dev_t,	dev)
+		__field(int,	wait)
+		__field(int,	ret)
+	),
+
+	TP_fast_assign(
+		__entry->dev	= sb->s_dev;
+		__entry->wait	= wait;
+		__entry->ret	= ret;
+	),
+
+	TP_printk("dev = (%d,%d), wait = %d, ret = %d",
+		MAJOR(__entry->dev),
+		MINOR(__entry->dev),
+		__entry->wait,
+		__entry->ret)
+);
+
+DEFINE_EVENT(ssdfs__inode, ssdfs_sync_file_enter,
+
+	TP_PROTO(struct inode *inode),
+
+	TP_ARGS(inode)
+);
+
+TRACE_EVENT(ssdfs_sync_file_exit,
+
+	TP_PROTO(struct file *file, int datasync, int ret),
+
+	TP_ARGS(file, datasync, ret),
+
+	TP_STRUCT__entry(
+		__field(dev_t,	dev)
+		__field(ino_t,	ino)
+		__field(ino_t,	parent)
+		__field(int,	datasync)
+		__field(int,	ret)
+	),
+
+	TP_fast_assign(
+		struct dentry *dentry = file->f_path.dentry;
+		struct inode *inode = dentry->d_inode;
+
+		__entry->dev		= inode->i_sb->s_dev;
+		__entry->ino		= inode->i_ino;
+		__entry->parent		= dentry->d_parent->d_inode->i_ino;
+		__entry->datasync	= datasync;
+		__entry->ret		= ret;
+	),
+
+	TP_printk("dev = (%d,%d), ino = %lu, parent = %ld, "
+		"datasync = %d, ret = %d",
+		MAJOR(__entry->dev),
+		MINOR(__entry->dev),
+		(unsigned long)__entry->ino,
+		(unsigned long)__entry->parent,
+		__entry->datasync,
+		__entry->ret)
+);
+
+TRACE_EVENT(ssdfs_unlink_enter,
+
+	TP_PROTO(struct inode *dir, struct dentry *dentry),
+
+	TP_ARGS(dir, dentry),
+
+	TP_STRUCT__entry(
+		__field(dev_t,	dev)
+		__field(ino_t,	ino)
+		__field(loff_t,	size)
+		__field(blkcnt_t, blocks)
+		__field(const char *,	name)
+	),
+
+	TP_fast_assign(
+		__entry->dev	= dir->i_sb->s_dev;
+		__entry->ino	= dir->i_ino;
+		__entry->size	= dir->i_size;
+		__entry->blocks	= dir->i_blocks;
+		__entry->name	= dentry->d_name.name;
+	),
+
+	TP_printk("dev = (%d,%d), dir ino = %lu, i_size = %lld, "
+		"i_blocks = %llu, name = %s",
+		MAJOR(__entry->dev),
+		MINOR(__entry->dev),
+		(unsigned long)__entry->ino,
+		__entry->size,
+		(unsigned long long)__entry->blocks,
+		__entry->name)
+);
+
+DEFINE_EVENT(ssdfs__inode_exit, ssdfs_unlink_exit,
+
+	TP_PROTO(struct inode *inode, int ret),
+
+	TP_ARGS(inode, ret)
+);
+
+#endif /* _TRACE_SSDFS_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/include/uapi/linux/ssdfs_fs.h b/include/uapi/linux/ssdfs_fs.h
new file mode 100644
index 000000000000..74c8a1be8051
--- /dev/null
+++ b/include/uapi/linux/ssdfs_fs.h
@@ -0,0 +1,126 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * include/uapi/linux/ssdfs_fs.h - SSDFS common declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _UAPI_LINUX_SSDFS_H
+#define _UAPI_LINUX_SSDFS_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+/* SSDFS magic signatures */
+#define SSDFS_SUPER_MAGIC			0x53734466	/* SsDf */
+#define SSDFS_SEGMENT_HDR_MAGIC			0x5348		/* SH */
+#define SSDFS_LOG_FOOTER_MAGIC			0x4C46		/* LF */
+#define SSDFS_PARTIAL_LOG_HDR_MAGIC		0x5048		/* PH */
+#define SSDFS_PADDING_HDR_MAGIC			0x5044		/* PD */
+#define SSDFS_BLK_BMAP_MAGIC			0x424D		/* BM */
+#define SSDFS_FRAGMENT_DESC_MAGIC		0x66		/* f */
+#define SSDFS_CHAIN_HDR_MAGIC			0x63		/* c */
+#define SSDFS_PHYS_OFF_TABLE_MAGIC		0x504F5448	/* POTH */
+#define SSDFS_BLK2OFF_TABLE_HDR_MAGIC		0x5474		/* Tt */
+#define SSDFS_SEGBMAP_HDR_MAGIC			0x534D		/* SM */
+#define SSDFS_INODE_MAGIC			0x6469		/* di */
+#define SSDFS_PEB_TABLE_MAGIC			0x5074		/* Pt */
+#define SSDFS_LEB_TABLE_MAGIC			0x4C74		/* Lt */
+#define SSDFS_MAPTBL_CACHE_MAGIC		0x4D63		/* Mc */
+#define SSDFS_MAPTBL_CACHE_PEB_STATE_MAGIC	0x4D635053	/* McPS */
+#define SSDFS_INODES_BTREE_MAGIC		0x496E4274	/* InBt */
+#define SSDFS_INODES_BNODE_MAGIC		0x494E		/* IN */
+#define SSDFS_DENTRIES_BTREE_MAGIC		0x44654274	/* DeBt */
+#define SSDFS_DENTRIES_BNODE_MAGIC		0x444E		/* DN */
+#define SSDFS_EXTENTS_BTREE_MAGIC		0x45784274	/* ExBt */
+#define SSDFS_SHARED_EXTENTS_BTREE_MAGIC	0x53454274	/* SEBt */
+#define SSDFS_EXTENTS_BNODE_MAGIC		0x454E		/* EN */
+#define SSDFS_XATTR_BTREE_MAGIC			0x45414274	/* EABt */
+#define SSDFS_SHARED_XATTR_BTREE_MAGIC		0x53454174	/* SEAt */
+#define SSDFS_XATTR_BNODE_MAGIC			0x414E		/* AN */
+#define SSDFS_SHARED_DICT_BTREE_MAGIC		0x53446963	/* SDic */
+#define SSDFS_DICTIONARY_BNODE_MAGIC		0x534E		/* SN */
+#define SSDFS_SNAPSHOTS_BTREE_MAGIC		0x536E4274	/* SnBt */
+#define SSDFS_SNAPSHOTS_BNODE_MAGIC		0x736E		/* sn */
+#define SSDFS_SNAPSHOT_RULES_MAGIC		0x536E5275	/* SnRu */
+#define SSDFS_SNAPSHOT_RECORD_MAGIC		0x5372		/* Sr */
+#define SSDFS_PEB2TIME_RECORD_MAGIC		0x5072		/* Pr */
+#define SSDFS_DIFF_BLOB_MAGIC			0x4466		/* Df */
+#define SSDFS_INVEXT_BTREE_MAGIC		0x49784274	/* IxBt */
+#define SSDFS_INVEXT_BNODE_MAGIC		0x4958		/* IX */
+
+/* SSDFS padding blob */
+#define SSDFS_PADDING_BLOB		0x50414444494E4730	/* PADDING0 */
+
+/* SSDFS revision */
+#define SSDFS_MAJOR_REVISION		1
+#define SSDFS_MINOR_REVISION		20
+
+/* SSDFS constants */
+#define SSDFS_MAX_NAME_LEN		255
+#define SSDFS_UUID_SIZE			16
+#define SSDFS_VOLUME_LABEL_MAX		16
+#define SSDFS_MAX_SNAP_RULE_NAME_LEN	16
+#define SSDFS_MAX_SNAPSHOT_NAME_LEN	12
+
+#define SSDFS_RESERVED_VBR_SIZE		1024 /* Volume Boot Record size*/
+#define SSDFS_DEFAULT_SEG_SIZE		8388608
+
+/*
+ * File system states
+ */
+#define SSDFS_MOUNTED_FS		0x0000  /* Mounted FS state */
+#define SSDFS_VALID_FS			0x0001  /* Unmounted cleanly */
+#define SSDFS_ERROR_FS			0x0002  /* Errors detected */
+#define SSDFS_RESIZE_FS			0x0004	/* Resize required */
+#define SSDFS_LAST_KNOWN_FS_STATE	SSDFS_RESIZE_FS
+
+/*
+ * Behaviour when detecting errors
+ */
+#define SSDFS_ERRORS_CONTINUE		1	/* Continue execution */
+#define SSDFS_ERRORS_RO			2	/* Remount fs read-only */
+#define SSDFS_ERRORS_PANIC		3	/* Panic */
+#define SSDFS_ERRORS_DEFAULT		SSDFS_ERRORS_CONTINUE
+#define SSDFS_LAST_KNOWN_FS_ERROR	SSDFS_ERRORS_PANIC
+
+/* Reserved inode id */
+#define SSDFS_INVALID_EXTENTS_BTREE_INO		5
+#define SSDFS_SNAPSHOTS_BTREE_INO		6
+#define SSDFS_TESTING_INO			7
+#define SSDFS_SHARED_DICT_BTREE_INO		8
+#define SSDFS_INODES_BTREE_INO			9
+#define SSDFS_SHARED_EXTENTS_BTREE_INO		10
+#define SSDFS_SHARED_XATTR_BTREE_INO		11
+#define SSDFS_MAPTBL_INO			12
+#define SSDFS_SEG_TREE_INO			13
+#define SSDFS_SEG_BMAP_INO			14
+#define SSDFS_PEB_CACHE_INO			15
+#define SSDFS_ROOT_INO				16
+
+#define SSDFS_LINK_MAX				INT_MAX
+
+#define SSDFS_CUR_SEG_DEFAULT_ID		3
+#define SSDFS_LOG_PAGES_DEFAULT			32
+#define SSDFS_CREATE_THREADS_DEFAULT		1
+
+#define SSDFS_INITIAL_SNAPSHOT_SEG_ID		0
+#define SSDFS_INITIAL_SNAPSHOT_SEG_LEB_ID	0
+#define SSDFS_INITIAL_SNAPSHOT_SEG_PEB_ID	0
+
+#endif /* _UAPI_LINUX_SSDFS_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 03/79] ssdfs: add key file system's function declarations
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 01/79] ssdfs: introduce SSDFS on-disk layout Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 02/79] ssdfs: add key file system declarations Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 04/79] ssdfs: implement raw device operations Viacheslav Dubeyko
                   ` (29 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

This patch adds key file system's function declarations
and inline functions implementations.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/ssdfs.h        |  502 +++++++
 fs/ssdfs/ssdfs_inline.h | 3037 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 3539 insertions(+)
 create mode 100644 fs/ssdfs/ssdfs.h
 create mode 100644 fs/ssdfs/ssdfs_inline.h

diff --git a/fs/ssdfs/ssdfs.h b/fs/ssdfs/ssdfs.h
new file mode 100644
index 000000000000..9ad50f5f9458
--- /dev/null
+++ b/fs/ssdfs/ssdfs.h
@@ -0,0 +1,502 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/ssdfs.h - in-core declarations.
+ *
+ * Copyright (c) 2019-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_H
+#define _SSDFS_H
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/kobject.h>
+#include <linux/sched.h>
+#include <linux/fs.h>
+#include <linux/crc32.h>
+#include <linux/pagemap.h>
+#include <linux/fs_parser.h>
+#include <linux/fs_context.h>
+#include <linux/ssdfs_fs.h>
+
+#include "ssdfs_constants.h"
+#include "ssdfs_thread_info.h"
+#include "ssdfs_inode_info.h"
+#include "snapshot.h"
+#include "snapshot_requests_queue.h"
+#include "snapshot_rules.h"
+#include "ssdfs_fs_info.h"
+#include "ssdfs_inline.h"
+#include "fingerprint_array.h"
+
+/*
+ * struct ssdfs_value_pair - value/position pair
+ * @value: some value
+ * @pos: position of value
+ */
+struct ssdfs_value_pair {
+	int value;
+	int pos;
+};
+
+/*
+ * struct ssdfs_min_max_pair - minimum and maximum values pair
+ * @min: minimum value/position pair
+ * @max: maximum value/position pair
+ */
+struct ssdfs_min_max_pair {
+	struct ssdfs_value_pair min;
+	struct ssdfs_value_pair max;
+};
+
+/*
+ * struct ssdfs_block_bmap_range - block bitmap items range
+ * @start: begin item
+ * @len: count of items in the range
+ */
+struct ssdfs_block_bmap_range {
+	u32 start;
+	u32 len;
+};
+
+/*
+ * struct ssdfs_blk2off_range - extent of logical blocks
+ * @start_lblk: start logical block number
+ * @len: count of logical blocks in extent
+ */
+struct ssdfs_blk2off_range {
+	u16 start_lblk;
+	u16 len;
+};
+
+struct ssdfs_mount_context {
+	unsigned long s_mount_opts;
+};
+
+struct ssdfs_peb_info;
+struct ssdfs_peb_container;
+struct ssdfs_segment_info;
+struct ssdfs_peb_blk_bmap;
+
+/* btree_node.c */
+void ssdfs_zero_btree_node_obj_cache_ptr(void);
+int ssdfs_init_btree_node_obj_cache(void);
+void ssdfs_shrink_btree_node_obj_cache(void);
+void ssdfs_destroy_btree_node_obj_cache(void);
+
+/* btree_search.c */
+void ssdfs_zero_btree_search_obj_cache_ptr(void);
+int ssdfs_init_btree_search_obj_cache(void);
+void ssdfs_shrink_btree_search_obj_cache(void);
+void ssdfs_destroy_btree_search_obj_cache(void);
+
+/* compression.c */
+int ssdfs_compressors_init(void);
+void ssdfs_free_workspaces(void);
+void ssdfs_compressors_exit(void);
+
+/* dev_bdev.c */
+struct bio *ssdfs_bdev_bio_alloc(struct block_device *bdev,
+				 unsigned int nr_iovecs,
+				 unsigned int op,
+				 gfp_t gfp_mask);
+void ssdfs_bdev_bio_put(struct bio *bio);
+int ssdfs_bdev_bio_add_folio(struct bio *bio, struct folio *folio,
+			    unsigned int offset);
+int ssdfs_bdev_read_block(struct super_block *sb, struct folio *folio,
+			  loff_t offset);
+int ssdfs_bdev_read_blocks(struct super_block *sb, struct folio_batch *batch,
+			   loff_t offset);
+int ssdfs_bdev_read(struct super_block *sb, u32 block_size, loff_t offset,
+		    size_t len, void *buf);
+int ssdfs_bdev_can_write_block(struct super_block *sb, u32 block_size,
+				loff_t offset, bool need_check);
+int ssdfs_bdev_write_block(struct super_block *sb, loff_t offset,
+			   struct folio *folio);
+int ssdfs_bdev_write_blocks(struct super_block *sb, loff_t offset,
+			    struct folio_batch *batch);
+
+/* dev_zns.c */
+u64 ssdfs_zns_zone_size(struct super_block *sb, loff_t offset);
+u64 ssdfs_zns_zone_capacity(struct super_block *sb, loff_t offset);
+u64 ssdfs_zns_zone_write_pointer(struct super_block *sb, loff_t offset);
+
+/* dir.c */
+int ssdfs_inode_by_name(struct inode *dir,
+			const struct qstr *child,
+			ino_t *ino);
+int ssdfs_create(struct mnt_idmap *idmap,
+		 struct inode *dir, struct dentry *dentry,
+		 umode_t mode, bool excl);
+
+/* file.c */
+int ssdfs_allocate_inline_file_buffer(struct inode *inode);
+void ssdfs_destroy_inline_file_buffer(struct inode *inode);
+int ssdfs_fsync(struct file *file, loff_t start, loff_t end, int datasync);
+
+/* fs_error.c */
+extern __printf(5, 6)
+void ssdfs_fs_error(struct super_block *sb, const char *file,
+		    const char *function, unsigned int line,
+		    const char *fmt, ...);
+int ssdfs_set_folio_dirty(struct folio *folio);
+int __ssdfs_clear_dirty_folio(struct folio *folio);
+int ssdfs_clear_dirty_folio(struct folio *folio);
+void ssdfs_clear_dirty_folios(struct address_space *mapping);
+
+/* global_fsck.c */
+int ssdfs_start_global_fsck_thread(struct ssdfs_fs_info *fsi);
+int ssdfs_stop_global_fsck_thread(struct ssdfs_fs_info *fsi);
+
+/* inode.c */
+bool is_raw_inode_checksum_correct(struct ssdfs_fs_info *fsi,
+				   void *buf, size_t size);
+struct inode *ssdfs_iget(struct super_block *sb, ino_t ino);
+struct inode *ssdfs_new_inode(struct mnt_idmap *idmap,
+			      struct inode *dir, umode_t mode,
+			      const struct qstr *qstr);
+int ssdfs_getattr(struct mnt_idmap *idmap,
+		  const struct path *path, struct kstat *stat,
+		  u32 request_mask, unsigned int query_flags);
+int ssdfs_setattr(struct mnt_idmap *idmap,
+		  struct dentry *dentry, struct iattr *attr);
+void ssdfs_evict_inode(struct inode *inode);
+int ssdfs_write_inode(struct inode *inode, struct writeback_control *wbc);
+int ssdfs_statfs(struct dentry *dentry, struct kstatfs *buf);
+void ssdfs_set_inode_flags(struct inode *inode);
+
+/* inodes_tree.c */
+void ssdfs_zero_free_ino_desc_cache_ptr(void);
+int ssdfs_init_free_ino_desc_cache(void);
+void ssdfs_shrink_free_ino_desc_cache(void);
+void ssdfs_destroy_free_ino_desc_cache(void);
+
+/* ioctl.c */
+long ssdfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+
+/* log_footer.c */
+bool __is_ssdfs_log_footer_magic_valid(struct ssdfs_signature *magic);
+bool is_ssdfs_log_footer_magic_valid(struct ssdfs_log_footer *footer);
+bool is_ssdfs_log_footer_csum_valid(void *buf, size_t buf_size);
+bool is_ssdfs_volume_state_info_consistent(struct ssdfs_fs_info *fsi,
+					   void *buf,
+					   struct ssdfs_log_footer *footer,
+					   u64 dev_size);
+int ssdfs_read_unchecked_log_footer(struct ssdfs_fs_info *fsi,
+				    u64 peb_id, u32 block_size, u32 bytes_off,
+				    void *buf, bool silent,
+				    u32 *log_pages);
+int ssdfs_check_log_footer(struct ssdfs_fs_info *fsi,
+			   void *buf,
+			   struct ssdfs_log_footer *footer,
+			   bool silent);
+int ssdfs_read_checked_log_footer(struct ssdfs_fs_info *fsi, void *log_hdr,
+				  u64 peb_id, u32 block_size, u32 bytes_off,
+				  void *buf, bool silent);
+int ssdfs_prepare_current_segment_ids(struct ssdfs_fs_info *fsi,
+					__le64 *array,
+					size_t size);
+int ssdfs_prepare_volume_state_info_for_commit(struct ssdfs_fs_info *fsi,
+						u16 fs_state,
+						__le64 *cur_segs,
+						size_t size,
+						u64 last_log_time,
+						u64 last_log_cno,
+						struct ssdfs_volume_state *vs);
+int ssdfs_prepare_log_footer_for_commit(struct ssdfs_fs_info *fsi,
+					u32 block_size,
+					u32 log_pages,
+					u32 log_flags,
+					u64 last_log_time,
+					u64 last_log_cno,
+					struct ssdfs_log_footer *footer);
+
+/* offset_translation_table.c */
+void ssdfs_zero_blk2off_frag_obj_cache_ptr(void);
+int ssdfs_init_blk2off_frag_obj_cache(void);
+void ssdfs_shrink_blk2off_frag_obj_cache(void);
+void ssdfs_destroy_blk2off_frag_obj_cache(void);
+
+/* options.c */
+int ssdfs_parse_param(struct fs_context *fc, struct fs_parameter *param);
+void ssdfs_initialize_fs_errors_option(struct ssdfs_fs_info *fsi);
+int ssdfs_show_options(struct seq_file *seq, struct dentry *root);
+
+/* peb_migration_scheme.c */
+int ssdfs_peb_start_migration(struct ssdfs_peb_container *pebc);
+bool is_peb_under_migration(struct ssdfs_peb_container *pebc);
+bool is_pebs_relation_alive(struct ssdfs_peb_container *pebc);
+bool has_peb_migration_done(struct ssdfs_peb_container *pebc);
+bool should_migration_be_finished(struct ssdfs_peb_container *pebc);
+int ssdfs_peb_finish_migration(struct ssdfs_peb_container *pebc);
+bool has_ssdfs_source_peb_valid_blocks(struct ssdfs_peb_container *pebc);
+int ssdfs_peb_prepare_range_migration(struct ssdfs_peb_container *pebc,
+				      u32 range_len, int blk_type);
+int ssdfs_peb_migrate_valid_blocks_range(struct ssdfs_segment_info *si,
+					 struct ssdfs_peb_container *pebc,
+					 struct ssdfs_peb_blk_bmap *peb_blkbmap,
+					 struct ssdfs_block_bmap_range *range);
+
+/* readwrite.c */
+int ssdfs_read_folio_from_volume(struct ssdfs_fs_info *fsi,
+				 u64 peb_id, u32 bytes_offset,
+				 struct folio *folio);
+int ssdfs_read_folio_batch_from_volume(struct ssdfs_fs_info *fsi,
+					u64 peb_id, u32 bytes_offset,
+					struct folio_batch *batch);
+int ssdfs_aligned_read_buffer(struct ssdfs_fs_info *fsi,
+			      u64 peb_id, u32 block_size, u32 bytes_off,
+			      void *buf, size_t size,
+			      size_t *read_bytes);
+int ssdfs_unaligned_read_buffer(struct ssdfs_fs_info *fsi,
+				u64 peb_id, u32 block_size, u32 bytes_off,
+				void *buf, size_t size);
+int ssdfs_can_write_sb_log(struct super_block *sb,
+			   struct ssdfs_peb_extent *sb_log);
+int ssdfs_unaligned_read_folio_batch(struct folio_batch *batch,
+				     u32 offset, u32 size,
+				     void *buf);
+int ssdfs_unaligned_write_folio_batch(struct ssdfs_fs_info *fsi,
+				      struct folio_batch *batch,
+				      u32 offset, u32 size,
+				      void *buf);
+int ssdfs_unaligned_read_folio_vector(struct ssdfs_fs_info *fsi,
+				      struct ssdfs_folio_vector *vec,
+				      u32 offset, u32 size,
+				      void *buf);
+int ssdfs_unaligned_write_folio_vector(struct ssdfs_fs_info *fsi,
+					struct ssdfs_folio_vector *batch,
+					u32 offset, u32 size,
+					void *buf);
+
+/* recovery.c */
+int ssdfs_init_sb_info(struct ssdfs_fs_info *fsi,
+			struct ssdfs_sb_info *sbi);
+void ssdfs_destruct_sb_info(struct ssdfs_sb_info *sbi);
+void ssdfs_backup_sb_info(struct ssdfs_fs_info *fsi);
+void ssdfs_restore_sb_info(struct ssdfs_fs_info *fsi);
+int ssdfs_init_sb_snap_info(struct ssdfs_fs_info *fsi,
+			    struct ssdfs_sb_snapshot_seg_info *sb_snapi);
+void ssdfs_destruct_sb_snap_info(struct ssdfs_sb_snapshot_seg_info *sb_snapi);
+int ssdfs_gather_superblock_info(struct ssdfs_fs_info *fsi, int silent);
+
+/* segment.c */
+void ssdfs_zero_seg_obj_cache_ptr(void);
+int ssdfs_init_seg_obj_cache(void);
+void ssdfs_shrink_seg_obj_cache(void);
+void ssdfs_destroy_seg_obj_cache(void);
+int ssdfs_segment_get_used_data_pages(struct ssdfs_segment_info *si);
+
+/* super.c */
+void ssdfs_destroy_btree_of_inode(struct inode *inode);
+void ssdfs_destroy_and_decrement_btree_of_inode(struct inode *inode);
+
+/* sysfs.c */
+int ssdfs_sysfs_init(void);
+void ssdfs_sysfs_exit(void);
+int ssdfs_sysfs_create_device_group(struct super_block *sb);
+void ssdfs_sysfs_delete_device_group(struct ssdfs_fs_info *fsi);
+int ssdfs_sysfs_create_seg_group(struct ssdfs_segment_info *si);
+void ssdfs_sysfs_delete_seg_group(struct ssdfs_segment_info *si);
+int ssdfs_sysfs_create_peb_group(struct ssdfs_peb_container *pebc);
+void ssdfs_sysfs_delete_peb_group(struct ssdfs_peb_container *pebc);
+int ssdfs_sysfs_create_maptbl_group(struct ssdfs_fs_info *fsi);
+void ssdfs_sysfs_delete_maptbl_group(struct ssdfs_fs_info *fsi);
+int ssdfs_sysfs_create_segbmap_group(struct ssdfs_fs_info *fsi);
+void ssdfs_sysfs_delete_segbmap_group(struct ssdfs_fs_info *fsi);
+int ssdfs_sysfs_create_inodes_tree_group(struct ssdfs_fs_info *fsi);
+void ssdfs_sysfs_delete_inodes_tree_group(struct ssdfs_fs_info *fsi);
+int ssdfs_sysfs_create_snapshots_tree_group(struct ssdfs_fs_info *fsi);
+void ssdfs_sysfs_delete_snapshots_tree_group(struct ssdfs_fs_info *fsi);
+int ssdfs_sysfs_create_shared_dict_group(struct ssdfs_fs_info *fsi);
+void ssdfs_sysfs_delete_shared_dict_group(struct ssdfs_fs_info *fsi);
+int ssdfs_sysfs_create_invextree_group(struct ssdfs_fs_info *fsi);
+void ssdfs_sysfs_delete_invextree_group(struct ssdfs_fs_info *fsi);
+
+/* tunefs.c */
+bool IS_TUNEFS_REQUESTED(struct ssdfs_tunefs_request_copy *request);
+bool IS_OPTION_ENABLE_REQUESTED(struct ssdfs_tunefs_option *option);
+bool IS_OPTION_DISABLE_REQUESTED(struct ssdfs_tunefs_option *option);
+bool IS_VOLUME_LABEL_NEED2CHANGE(struct ssdfs_tunefs_volume_label_option *option);
+void ssdfs_tunefs_get_current_volume_config(struct ssdfs_fs_info *fsi,
+				struct ssdfs_current_volume_config *config);
+int ssdfs_tunefs_check_requested_volume_config(struct ssdfs_fs_info *fsi,
+					struct ssdfs_tunefs_options *options);
+void ssdfs_tunefs_get_new_config_request(struct ssdfs_fs_info *fsi,
+				struct ssdfs_tunefs_config_request *new_config);
+void ssdfs_tunefs_save_new_config_request(struct ssdfs_fs_info *fsi,
+					struct ssdfs_tunefs_options *options);
+
+/* volume_header.c */
+bool __is_ssdfs_segment_header_magic_valid(struct ssdfs_signature *magic);
+bool is_ssdfs_segment_header_magic_valid(struct ssdfs_segment_header *hdr);
+bool is_ssdfs_partial_log_header_magic_valid(struct ssdfs_signature *magic);
+bool is_ssdfs_volume_header_csum_valid(void *vh_buf, size_t buf_size);
+bool is_ssdfs_partial_log_header_csum_valid(void *plh_buf, size_t buf_size);
+bool is_ssdfs_volume_header_consistent(struct ssdfs_fs_info *fsi,
+					struct ssdfs_volume_header *vh,
+					u64 dev_size);
+int ssdfs_check_segment_header(struct ssdfs_fs_info *fsi,
+				struct ssdfs_segment_header *hdr,
+				bool silent);
+int ssdfs_read_checked_segment_header(struct ssdfs_fs_info *fsi,
+					u64 peb_id, u32 block_size,
+					u32 pages_off,
+					void *buf, bool silent);
+int ssdfs_check_partial_log_header(struct ssdfs_fs_info *fsi,
+				   struct ssdfs_partial_log_header *hdr,
+				   bool silent);
+void ssdfs_create_volume_header(struct ssdfs_fs_info *fsi,
+				struct ssdfs_volume_header *vh);
+int ssdfs_prepare_volume_header_for_commit(struct ssdfs_fs_info *fsi,
+					   struct ssdfs_volume_header *vh);
+int ssdfs_prepare_segment_header_for_commit(struct ssdfs_fs_info *fsi,
+					    u64 seg_id,
+					    u64 leb_id,
+					    u64 peb_id,
+					    u64 relation_peb_id,
+					    u32 log_pages,
+					    u16 seg_type,
+					    u32 seg_flags,
+					    u64 last_log_time,
+					    u64 last_log_cno,
+					    struct ssdfs_segment_header *hdr);
+int ssdfs_prepare_partial_log_header_for_commit(struct ssdfs_fs_info *fsi,
+					int sequence_id,
+					u64 seg_id,
+					u64 leb_id,
+					u64 peb_id,
+					u64 relation_peb_id,
+					u32 log_pages,
+					u16 seg_type,
+					u32 pl_flags,
+					u64 last_log_time,
+					u64 last_log_cno,
+					struct ssdfs_partial_log_header *hdr);
+
+/* memory leaks checker */
+void ssdfs_acl_memory_leaks_init(void);
+void ssdfs_acl_check_memory_leaks(void);
+void ssdfs_block_bmap_memory_leaks_init(void);
+void ssdfs_block_bmap_check_memory_leaks(void);
+void ssdfs_blk2off_memory_leaks_init(void);
+void ssdfs_blk2off_check_memory_leaks(void);
+void ssdfs_btree_memory_leaks_init(void);
+void ssdfs_btree_check_memory_leaks(void);
+void ssdfs_btree_hierarchy_memory_leaks_init(void);
+void ssdfs_btree_hierarchy_check_memory_leaks(void);
+void ssdfs_btree_node_memory_leaks_init(void);
+void ssdfs_btree_node_check_memory_leaks(void);
+void ssdfs_btree_search_memory_leaks_init(void);
+void ssdfs_btree_search_check_memory_leaks(void);
+void ssdfs_lzo_memory_leaks_init(void);
+void ssdfs_lzo_check_memory_leaks(void);
+void ssdfs_zlib_memory_leaks_init(void);
+void ssdfs_zlib_check_memory_leaks(void);
+void ssdfs_compr_memory_leaks_init(void);
+void ssdfs_compr_check_memory_leaks(void);
+void ssdfs_cur_seg_memory_leaks_init(void);
+void ssdfs_cur_seg_check_memory_leaks(void);
+void ssdfs_dentries_memory_leaks_init(void);
+void ssdfs_dentries_check_memory_leaks(void);
+void ssdfs_dev_bdev_memory_leaks_init(void);
+void ssdfs_dev_bdev_check_memory_leaks(void);
+void ssdfs_dev_zns_memory_leaks_init(void);
+void ssdfs_dev_zns_check_memory_leaks(void);
+void ssdfs_dev_mtd_memory_leaks_init(void);
+void ssdfs_dev_mtd_check_memory_leaks(void);
+void ssdfs_dir_memory_leaks_init(void);
+void ssdfs_dir_check_memory_leaks(void);
+void ssdfs_diff_memory_leaks_init(void);
+void ssdfs_diff_check_memory_leaks(void);
+void ssdfs_dynamic_array_memory_leaks_init(void);
+void ssdfs_dynamic_array_check_memory_leaks(void);
+void ssdfs_ext_queue_memory_leaks_init(void);
+void ssdfs_ext_queue_check_memory_leaks(void);
+void ssdfs_ext_tree_memory_leaks_init(void);
+void ssdfs_ext_tree_check_memory_leaks(void);
+void ssdfs_farray_memory_leaks_init(void);
+void ssdfs_farray_check_memory_leaks(void);
+void ssdfs_folio_vector_memory_leaks_init(void);
+void ssdfs_folio_vector_check_memory_leaks(void);
+#ifdef CONFIG_SSDFS_PEB_DEDUPLICATION
+void ssdfs_fingerprint_array_memory_leaks_init(void);
+void ssdfs_fingerprint_array_check_memory_leaks(void);
+#endif /* CONFIG_SSDFS_PEB_DEDUPLICATION */
+void ssdfs_file_memory_leaks_init(void);
+void ssdfs_file_check_memory_leaks(void);
+void ssdfs_fs_error_memory_leaks_init(void);
+void ssdfs_fs_error_check_memory_leaks(void);
+void ssdfs_flush_memory_leaks_init(void);
+void ssdfs_flush_check_memory_leaks(void);
+void ssdfs_gc_memory_leaks_init(void);
+void ssdfs_gc_check_memory_leaks(void);
+void ssdfs_global_fsck_memory_leaks_init(void);
+void ssdfs_global_fsck_check_memory_leaks(void);
+#ifdef CONFIG_SSDFS_ONLINE_FSCK
+void ssdfs_fsck_memory_leaks_init(void);
+void ssdfs_fsck_check_memory_leaks(void);
+#endif /* CONFIG_SSDFS_ONLINE_FSCK */
+void ssdfs_inode_memory_leaks_init(void);
+void ssdfs_inode_check_memory_leaks(void);
+void ssdfs_ino_tree_memory_leaks_init(void);
+void ssdfs_ino_tree_check_memory_leaks(void);
+void ssdfs_invext_tree_memory_leaks_init(void);
+void ssdfs_invext_tree_check_memory_leaks(void);
+void ssdfs_parray_memory_leaks_init(void);
+void ssdfs_parray_check_memory_leaks(void);
+void ssdfs_page_vector_memory_leaks_init(void);
+void ssdfs_page_vector_check_memory_leaks(void);
+void ssdfs_map_queue_memory_leaks_init(void);
+void ssdfs_map_queue_check_memory_leaks(void);
+void ssdfs_map_tbl_memory_leaks_init(void);
+void ssdfs_map_tbl_check_memory_leaks(void);
+void ssdfs_map_cache_memory_leaks_init(void);
+void ssdfs_map_cache_check_memory_leaks(void);
+void ssdfs_map_thread_memory_leaks_init(void);
+void ssdfs_map_thread_check_memory_leaks(void);
+void ssdfs_migration_memory_leaks_init(void);
+void ssdfs_migration_check_memory_leaks(void);
+void ssdfs_peb_memory_leaks_init(void);
+void ssdfs_peb_check_memory_leaks(void);
+void ssdfs_read_memory_leaks_init(void);
+void ssdfs_read_check_memory_leaks(void);
+void ssdfs_recovery_memory_leaks_init(void);
+void ssdfs_recovery_check_memory_leaks(void);
+void ssdfs_req_queue_memory_leaks_init(void);
+void ssdfs_req_queue_check_memory_leaks(void);
+void ssdfs_seg_obj_memory_leaks_init(void);
+void ssdfs_seg_obj_check_memory_leaks(void);
+void ssdfs_seg_bmap_memory_leaks_init(void);
+void ssdfs_seg_bmap_check_memory_leaks(void);
+void ssdfs_seg_blk_memory_leaks_init(void);
+void ssdfs_seg_blk_check_memory_leaks(void);
+void ssdfs_seg_tree_memory_leaks_init(void);
+void ssdfs_seg_tree_check_memory_leaks(void);
+void ssdfs_seq_arr_memory_leaks_init(void);
+void ssdfs_seq_arr_check_memory_leaks(void);
+void ssdfs_dict_memory_leaks_init(void);
+void ssdfs_dict_check_memory_leaks(void);
+void ssdfs_shextree_memory_leaks_init(void);
+void ssdfs_shextree_check_memory_leaks(void);
+void ssdfs_snap_reqs_queue_memory_leaks_init(void);
+void ssdfs_snap_reqs_queue_check_memory_leaks(void);
+void ssdfs_snap_rules_list_memory_leaks_init(void);
+void ssdfs_snap_rules_list_check_memory_leaks(void);
+void ssdfs_snap_tree_memory_leaks_init(void);
+void ssdfs_snap_tree_check_memory_leaks(void);
+void ssdfs_xattr_memory_leaks_init(void);
+void ssdfs_xattr_check_memory_leaks(void);
+
+#endif /* _SSDFS_H */
diff --git a/fs/ssdfs/ssdfs_inline.h b/fs/ssdfs/ssdfs_inline.h
new file mode 100644
index 000000000000..3ea62e5390d6
--- /dev/null
+++ b/fs/ssdfs/ssdfs_inline.h
@@ -0,0 +1,3037 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/ssdfs_inline.h - inline functions and macros.
+ *
+ * Copyright (c) 2019-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_INLINE_H
+#define _SSDFS_INLINE_H
+
+#include <linux/slab.h>
+#include <linux/swap.h>
+
+#define SSDFS_CRIT(fmt, ...) \
+	pr_crit_ratelimited("pid %d:%s:%d %s(): " fmt, \
+		 current->pid, __FILE__, __LINE__, __func__, ##__VA_ARGS__)
+
+#define SSDFS_ERR(fmt, ...) \
+	pr_err_ratelimited("pid %d:%s:%d %s(): " fmt, \
+		 current->pid, __FILE__, __LINE__, __func__, ##__VA_ARGS__)
+
+#define SSDFS_ERR_DBG(fmt, ...) \
+	pr_err("pid %d:%s:%d %s(): " fmt, \
+		 current->pid, __FILE__, __LINE__, __func__, ##__VA_ARGS__)
+
+#define SSDFS_WARN(fmt, ...) \
+	do { \
+		pr_warn_ratelimited("pid %d:%s:%d %s(): " fmt, \
+			current->pid, __FILE__, __LINE__, \
+			__func__, ##__VA_ARGS__); \
+		dump_stack(); \
+	} while (0)
+
+#define SSDFS_WARN_DBG(fmt, ...) \
+	do { \
+		pr_warn("pid %d:%s:%d %s(): " fmt, \
+			current->pid, __FILE__, __LINE__, \
+			__func__, ##__VA_ARGS__); \
+		dump_stack(); \
+	} while (0)
+
+#define SSDFS_NOTICE(fmt, ...) \
+	pr_notice(fmt, ##__VA_ARGS__)
+
+#define SSDFS_INFO(fmt, ...) \
+	pr_info(fmt, ##__VA_ARGS__)
+
+#ifdef CONFIG_SSDFS_DEBUG
+
+#define SSDFS_DBG(fmt, ...) \
+	pr_debug("pid %d:%s:%d %s(): " fmt, \
+		 current->pid, __FILE__, __LINE__, __func__, ##__VA_ARGS__)
+
+#else /* CONFIG_SSDFS_DEBUG */
+
+#define SSDFS_DBG(fmt, ...) \
+	no_printk(KERN_DEBUG fmt, ##__VA_ARGS__)
+
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+extern atomic64_t ssdfs_allocated_folios;
+extern atomic64_t ssdfs_memory_leaks;
+
+extern atomic64_t ssdfs_locked_folios;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+static inline
+void ssdfs_memory_leaks_increment(void *kaddr)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_inc(&ssdfs_memory_leaks);
+
+	SSDFS_DBG("memory %p, allocation count %lld\n",
+		  kaddr,
+		  atomic64_read(&ssdfs_memory_leaks));
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static inline
+void ssdfs_memory_leaks_decrement(void *kaddr)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_dec(&ssdfs_memory_leaks);
+
+	SSDFS_DBG("memory %p, allocation count %lld\n",
+		  kaddr,
+		  atomic64_read(&ssdfs_memory_leaks));
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static inline
+void *ssdfs_kmalloc(size_t size, gfp_t flags)
+{
+	void *kaddr;
+	unsigned int nofs_flags;
+
+	nofs_flags = memalloc_nofs_save();
+	kaddr = kmalloc(size, flags);
+	memalloc_nofs_restore(nofs_flags);
+
+	if (kaddr)
+		ssdfs_memory_leaks_increment(kaddr);
+
+	return kaddr;
+}
+
+static inline
+void *ssdfs_kzalloc(size_t size, gfp_t flags)
+{
+	void *kaddr;
+	unsigned int nofs_flags;
+
+	nofs_flags = memalloc_nofs_save();
+	kaddr = kzalloc(size, flags);
+	memalloc_nofs_restore(nofs_flags);
+
+	if (kaddr)
+		ssdfs_memory_leaks_increment(kaddr);
+
+	return kaddr;
+}
+
+static inline
+void *ssdfs_kvzalloc(size_t size, gfp_t flags)
+{
+	void *kaddr;
+	unsigned int nofs_flags;
+
+	nofs_flags = memalloc_nofs_save();
+	kaddr = kvzalloc(size, flags);
+	memalloc_nofs_restore(nofs_flags);
+
+	if (kaddr)
+		ssdfs_memory_leaks_increment(kaddr);
+
+	return kaddr;
+}
+
+static inline
+void *ssdfs_kcalloc(size_t n, size_t size, gfp_t flags)
+{
+	void *kaddr;
+	unsigned int nofs_flags;
+
+	nofs_flags = memalloc_nofs_save();
+	kaddr = kcalloc(n, size, flags);
+	memalloc_nofs_restore(nofs_flags);
+
+	if (kaddr)
+		ssdfs_memory_leaks_increment(kaddr);
+
+	return kaddr;
+}
+
+static inline
+void ssdfs_kfree(void *kaddr)
+{
+	if (kaddr) {
+		ssdfs_memory_leaks_decrement(kaddr);
+		kfree(kaddr);
+	}
+}
+
+static inline
+void ssdfs_kvfree(void *kaddr)
+{
+	if (kaddr) {
+		ssdfs_memory_leaks_decrement(kaddr);
+		kvfree(kaddr);
+	}
+}
+
+static inline
+void ssdfs_folio_get(struct folio *folio)
+{
+	folio_get(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio %p, count %d, flags %#lx\n",
+		  folio, folio_ref_count(folio), folio->flags.f);
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+static inline
+void ssdfs_folio_put(struct folio *folio)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio %p, count %d\n",
+		  folio, folio_ref_count(folio));
+
+	SSDFS_DBG("folio %p, count %d\n",
+		  folio, folio_ref_count(folio));
+
+	if (folio_ref_count(folio) < 1) {
+		SSDFS_WARN("folio %p, count %d\n",
+			   folio, folio_ref_count(folio));
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	folio_put(folio);
+}
+
+static inline
+void ssdfs_folio_lock(struct folio *folio)
+{
+	folio_lock(folio);
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_locked_folios) < 0) {
+		SSDFS_WARN("ssdfs_locked_folios %lld\n",
+			   atomic64_read(&ssdfs_locked_folios));
+	}
+
+	atomic64_inc(&ssdfs_locked_folios);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static inline
+void ssdfs_account_locked_folio(struct folio *folio)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (!folio)
+		return;
+
+	if (!folio_test_locked(folio)) {
+		SSDFS_WARN("folio %p, folio_index %llu\n",
+			   folio, (u64)folio->index);
+	}
+
+	if (atomic64_read(&ssdfs_locked_folios) < 0) {
+		SSDFS_WARN("ssdfs_locked_folios %lld\n",
+			   atomic64_read(&ssdfs_locked_folios));
+	}
+
+	atomic64_inc(&ssdfs_locked_folios);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static inline
+void ssdfs_folio_unlock(struct folio *folio)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (!folio_test_locked(folio)) {
+		SSDFS_WARN("folio %p, folio_index %llu\n",
+			   folio, (u64)folio->index);
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+	folio_unlock(folio);
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_dec(&ssdfs_locked_folios);
+
+	if (atomic64_read(&ssdfs_locked_folios) < 0) {
+		SSDFS_WARN("ssdfs_locked_folios %lld\n",
+			   atomic64_read(&ssdfs_locked_folios));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static inline
+void ssdfs_folio_start_writeback(struct ssdfs_fs_info *fsi,
+				 u64 seg_id, u64 logical_offset,
+				 struct folio *folio)
+{
+	folio_start_writeback(folio);
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&fsi->ssdfs_writeback_folios) < 0) {
+		SSDFS_WARN("ssdfs_writeback_folios %lld\n",
+			   atomic64_read(&fsi->ssdfs_writeback_folios));
+	}
+
+	atomic64_inc(&fsi->ssdfs_writeback_folios);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	if (folio->mapping && folio->mapping->host) {
+		SSDFS_DBG("ino %llu, folio_index %lu, "
+			   "seg_id %llu, logical_offset %llu, "
+			   "ssdfs_writeback_folios %lld\n",
+			   (u64)folio->mapping->host->i_ino,
+			   folio->index,
+			   seg_id, logical_offset,
+			   atomic64_read(&fsi->ssdfs_writeback_folios));
+	} else {
+		SSDFS_DBG("seg_id %llu, logical_offset %llu, "
+			  "folio_index %lu, ssdfs_writeback_folios %lld\n",
+			  seg_id, logical_offset,
+			  folio->index,
+			  atomic64_read(&fsi->ssdfs_writeback_folios));
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static inline
+void ssdfs_folio_end_writeback(struct ssdfs_fs_info *fsi,
+				u64 seg_id, u64 logical_offset,
+				struct folio *folio)
+{
+	folio_end_writeback(folio);
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_dec(&fsi->ssdfs_writeback_folios);
+
+	if (atomic64_read(&fsi->ssdfs_writeback_folios) < 0) {
+		SSDFS_WARN("ssdfs_writeback_folios %lld\n",
+			   atomic64_read(&fsi->ssdfs_writeback_folios));
+	}
+
+
+#ifdef CONFIG_SSDFS_DEBUG
+	if (folio->mapping && folio->mapping->host) {
+		SSDFS_DBG("ino %llu, folio_index %lu, "
+			  "seg_id %llu, logical_offset %llu, "
+			  "ssdfs_writeback_folios %lld\n",
+			  (u64)folio->mapping->host->i_ino,
+			  folio->index,
+			  seg_id, logical_offset,
+			  atomic64_read(&fsi->ssdfs_writeback_folios));
+	} else {
+		SSDFS_DBG("seg_id %llu, logical_offset %llu, "
+			  "folio_index %lu, ssdfs_writeback_folios %lld\n",
+			  seg_id, logical_offset,
+			  folio->index,
+			  atomic64_read(&fsi->ssdfs_writeback_folios));
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static inline
+struct folio *ssdfs_folio_alloc(gfp_t gfp_mask, unsigned int order)
+{
+	struct folio *folio;
+	unsigned int nofs_flags;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("mask %#x, order %u\n",
+		  gfp_mask, order);
+
+	if (order > get_order(SSDFS_128KB)) {
+		SSDFS_WARN("invalid order %u\n",
+			   order);
+		return ERR_PTR(-ERANGE);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	nofs_flags = memalloc_nofs_save();
+	folio = folio_alloc(gfp_mask, order);
+	memalloc_nofs_restore(nofs_flags);
+
+	if (unlikely(!folio)) {
+		SSDFS_WARN("unable to allocate folio\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	ssdfs_folio_get(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio %p, count %d, "
+		  "flags %#lx, folio_index %lu\n",
+		  folio, folio_ref_count(folio),
+		  folio->flags.f, folio->index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_inc(&ssdfs_allocated_folios);
+
+	SSDFS_DBG("folio %p, allocated_folios %lld\n",
+		  folio, atomic64_read(&ssdfs_allocated_folios));
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+	return folio;
+}
+
+static inline
+void ssdfs_folio_account(struct folio *folio)
+{
+	return;
+}
+
+static inline
+void ssdfs_folio_forget(struct folio *folio)
+{
+	return;
+}
+
+/*
+ * ssdfs_add_batch_folio() - add folio into batch
+ * @batch: folio batch
+ *
+ * This function adds folio into batch.
+ *
+ * RETURN:
+ * [success] - pointer on added folio.
+ * [failure] - error code:
+ *
+ * %-ENOMEM     - fail to allocate memory.
+ * %-E2BIG      - batch is full.
+ */
+static inline
+struct folio *ssdfs_add_batch_folio(struct folio_batch *batch,
+				    unsigned int order)
+{
+	struct folio *folio;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!batch);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (folio_batch_space(batch) == 0) {
+		SSDFS_ERR("batch hasn't space\n");
+		return ERR_PTR(-E2BIG);
+	}
+
+	folio = ssdfs_folio_alloc(GFP_KERNEL | __GFP_ZERO, order);
+	if (IS_ERR_OR_NULL(folio)) {
+		err = (folio == NULL ? -ENOMEM : PTR_ERR(folio));
+		SSDFS_ERR("unable to allocate folio\n");
+		return ERR_PTR(err);
+	}
+
+	folio_batch_add(batch, folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("batch %p, batch count %u\n",
+		  batch, folio_batch_count(batch));
+	SSDFS_DBG("folio %p, count %d\n",
+		  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return folio;
+}
+
+static inline
+void ssdfs_folio_free(struct folio *folio)
+{
+	if (!folio)
+		return;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	if (folio_test_locked(folio)) {
+		SSDFS_WARN("folio %p is still locked\n",
+			   folio);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	/* descrease reference counter */
+	ssdfs_folio_put(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio %p, count %d, "
+		  "flags %#lx, folio_index %lu\n",
+		  folio, folio_ref_count(folio),
+		  folio->flags.f, folio->index);
+
+	if (folio_ref_count(folio) <= 0 ||
+	    folio_ref_count(folio) >= 2) {
+		SSDFS_WARN("folio %p, count %d\n",
+			   folio, folio_ref_count(folio));
+		BUG();
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	/* free folio */
+	ssdfs_folio_put(folio);
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_dec(&ssdfs_allocated_folios);
+
+	SSDFS_DBG("allocated_folios %lld\n",
+		  atomic64_read(&ssdfs_allocated_folios));
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static inline
+void ssdfs_folio_batch_release(struct folio_batch *batch)
+{
+	int i;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("batch %p\n", batch);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!batch)
+		return;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("batch count %u\n", folio_batch_count(batch));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (i = 0; i < folio_batch_count(batch); i++) {
+		struct folio *folio = batch->folios[i];
+
+		if (!folio)
+			continue;
+
+		ssdfs_folio_free(folio);
+
+		batch->folios[i] = NULL;
+	}
+
+	folio_batch_reinit(batch);
+}
+
+#define SSDFS_MEMORY_LEAKS_CHECKER_FNS(name)				\
+static inline								\
+void ssdfs_##name##_cache_leaks_increment(void *kaddr)			\
+{									\
+	atomic64_inc(&ssdfs_##name##_cache_leaks);			\
+	SSDFS_DBG("memory %p, allocation count %lld\n",			\
+		  kaddr,						\
+		  atomic64_read(&ssdfs_##name##_cache_leaks));		\
+	ssdfs_memory_leaks_increment(kaddr);				\
+}									\
+static inline								\
+void ssdfs_##name##_cache_leaks_decrement(void *kaddr)			\
+{									\
+	atomic64_dec(&ssdfs_##name##_cache_leaks);			\
+	SSDFS_DBG("memory %p, allocation count %lld\n",			\
+		  kaddr,						\
+		  atomic64_read(&ssdfs_##name##_cache_leaks));		\
+	ssdfs_memory_leaks_decrement(kaddr);				\
+}									\
+static inline								\
+void *ssdfs_##name##_kmalloc(size_t size, gfp_t flags)			\
+{									\
+	void *kaddr = ssdfs_kmalloc(size, flags);			\
+	if (kaddr) {							\
+		atomic64_inc(&ssdfs_##name##_memory_leaks);		\
+		SSDFS_DBG("memory %p, allocation count %lld\n",		\
+			  kaddr,					\
+			  atomic64_read(&ssdfs_##name##_memory_leaks));	\
+	}								\
+	return kaddr;							\
+}									\
+static inline								\
+void *ssdfs_##name##_kzalloc(size_t size, gfp_t flags)			\
+{									\
+	void *kaddr = ssdfs_kzalloc(size, flags);			\
+	if (kaddr) {							\
+		atomic64_inc(&ssdfs_##name##_memory_leaks);		\
+		SSDFS_DBG("memory %p, allocation count %lld\n",		\
+			  kaddr,					\
+			  atomic64_read(&ssdfs_##name##_memory_leaks));	\
+	}								\
+	return kaddr;							\
+}									\
+static inline								\
+void *ssdfs_##name##_kvzalloc(size_t size, gfp_t flags)			\
+{									\
+	void *kaddr = ssdfs_kvzalloc(size, flags);			\
+	if (kaddr) {							\
+		atomic64_inc(&ssdfs_##name##_memory_leaks);		\
+		SSDFS_DBG("memory %p, allocation count %lld\n",		\
+			  kaddr,					\
+			  atomic64_read(&ssdfs_##name##_memory_leaks));	\
+	}								\
+	return kaddr;							\
+}									\
+static inline								\
+void *ssdfs_##name##_kcalloc(size_t n, size_t size, gfp_t flags)	\
+{									\
+	void *kaddr = ssdfs_kcalloc(n, size, flags);			\
+	if (kaddr) {							\
+		atomic64_inc(&ssdfs_##name##_memory_leaks);		\
+		SSDFS_DBG("memory %p, allocation count %lld\n",		\
+			  kaddr,					\
+			  atomic64_read(&ssdfs_##name##_memory_leaks));	\
+	}								\
+	return kaddr;							\
+}									\
+static inline								\
+void ssdfs_##name##_kfree(void *kaddr)					\
+{									\
+	if (kaddr) {							\
+		atomic64_dec(&ssdfs_##name##_memory_leaks);		\
+		SSDFS_DBG("memory %p, allocation count %lld\n",		\
+			  kaddr,					\
+			  atomic64_read(&ssdfs_##name##_memory_leaks));	\
+	}								\
+	ssdfs_kfree(kaddr);						\
+}									\
+static inline								\
+void ssdfs_##name##_kvfree(void *kaddr)					\
+{									\
+	if (kaddr) {							\
+		atomic64_dec(&ssdfs_##name##_memory_leaks);		\
+		SSDFS_DBG("memory %p, allocation count %lld\n",		\
+			  kaddr,					\
+			  atomic64_read(&ssdfs_##name##_memory_leaks));	\
+	}								\
+	ssdfs_kvfree(kaddr);						\
+}									\
+static inline								\
+struct folio *ssdfs_##name##_alloc_folio(gfp_t gfp_mask,		\
+					 unsigned int order)		\
+{									\
+	struct folio *folio;						\
+	folio = ssdfs_folio_alloc(gfp_mask, order);			\
+	if (!IS_ERR_OR_NULL(folio)) {					\
+		atomic64_inc(&ssdfs_##name##_folio_leaks);		\
+		SSDFS_DBG("folio %p, allocated_folios %lld\n",		\
+			  folio,					\
+			  atomic64_read(&ssdfs_##name##_folio_leaks));	\
+	}								\
+	return folio;							\
+}									\
+static inline								\
+void ssdfs_##name##_account_folio(struct folio *folio)			\
+{									\
+	if (folio) {							\
+		atomic64_inc(&ssdfs_##name##_folio_leaks);		\
+		SSDFS_DBG("folio %p, allocated_folios %lld\n",		\
+			  folio,					\
+			  atomic64_read(&ssdfs_##name##_folio_leaks));	\
+	}								\
+}									\
+static inline								\
+void ssdfs_##name##_forget_folio(struct folio *folio)			\
+{									\
+	if (folio) {							\
+		atomic64_dec(&ssdfs_##name##_folio_leaks);		\
+		SSDFS_DBG("folio %p, allocated_folios %lld\n",		\
+			  folio,					\
+			  atomic64_read(&ssdfs_##name##_folio_leaks));	\
+	}								\
+}									\
+static inline								\
+struct folio *ssdfs_##name##_add_batch_folio(struct folio_batch *batch,	\
+					     unsigned int order)	\
+{									\
+	struct folio *folio;						\
+	folio = ssdfs_add_batch_folio(batch, order);			\
+	if (!IS_ERR_OR_NULL(folio)) {					\
+		atomic64_inc(&ssdfs_##name##_folio_leaks);		\
+		SSDFS_DBG("folio %p, allocated_folios %lld\n",		\
+			  folio,					\
+			  atomic64_read(&ssdfs_##name##_folio_leaks));	\
+	}								\
+	return folio;							\
+}									\
+static inline								\
+void ssdfs_##name##_free_folio(struct folio *folio)			\
+{									\
+	if (folio) {							\
+		atomic64_dec(&ssdfs_##name##_folio_leaks);		\
+		SSDFS_DBG("folio %p, allocated_folios %lld\n",		\
+			  folio,					\
+			  atomic64_read(&ssdfs_##name##_folio_leaks));	\
+	}								\
+	ssdfs_folio_free(folio);					\
+}									\
+static inline								\
+void ssdfs_##name##_folio_batch_release(struct folio_batch *batch)	\
+{									\
+	int i;								\
+	if (batch) {							\
+		for (i = 0; i < folio_batch_count(batch); i++) {	\
+			struct folio *folio = batch->folios[i];		\
+			if (!folio)					\
+				continue;				\
+			atomic64_dec(&ssdfs_##name##_folio_leaks);	\
+			SSDFS_DBG("folio %p, allocated_folios %lld\n",	\
+			    folio,					\
+			    atomic64_read(&ssdfs_##name##_folio_leaks));\
+		}							\
+	}								\
+	ssdfs_folio_batch_release(batch);				\
+}									\
+
+#define SSDFS_MEMORY_ALLOCATOR_FNS(name)				\
+static inline								\
+void ssdfs_##name##_cache_leaks_increment(void *kaddr)			\
+{									\
+	ssdfs_memory_leaks_increment(kaddr);				\
+}									\
+static inline								\
+void ssdfs_##name##_cache_leaks_decrement(void *kaddr)			\
+{									\
+	ssdfs_memory_leaks_decrement(kaddr);				\
+}									\
+static inline								\
+void *ssdfs_##name##_kmalloc(size_t size, gfp_t flags)			\
+{									\
+	return ssdfs_kmalloc(size, flags);				\
+}									\
+static inline								\
+void *ssdfs_##name##_kzalloc(size_t size, gfp_t flags)			\
+{									\
+	return ssdfs_kzalloc(size, flags);				\
+}									\
+static inline								\
+void *ssdfs_##name##_kvzalloc(size_t size, gfp_t flags)			\
+{									\
+	return ssdfs_kvzalloc(size, flags);				\
+}									\
+static inline								\
+void *ssdfs_##name##_kcalloc(size_t n, size_t size, gfp_t flags)	\
+{									\
+	return ssdfs_kcalloc(n, size, flags);				\
+}									\
+static inline								\
+void ssdfs_##name##_kfree(void *kaddr)					\
+{									\
+	ssdfs_kfree(kaddr);						\
+}									\
+static inline								\
+void ssdfs_##name##_kvfree(void *kaddr)					\
+{									\
+	ssdfs_kvfree(kaddr);						\
+}									\
+static inline								\
+struct folio *ssdfs_##name##_alloc_folio(gfp_t gfp_mask,		\
+					 unsigned int order)		\
+{									\
+	return ssdfs_folio_alloc(gfp_mask, order);			\
+}									\
+static inline								\
+void ssdfs_##name##_account_folio(struct folio *folio)			\
+{									\
+	ssdfs_folio_account(folio);					\
+}									\
+static inline								\
+void ssdfs_##name##_forget_folio(struct folio *folio)			\
+{									\
+	ssdfs_folio_forget(folio);					\
+}									\
+static inline								\
+struct folio *ssdfs_##name##_add_batch_folio(struct folio_batch *batch,	\
+					     unsigned int order)	\
+{									\
+	return ssdfs_add_batch_folio(batch, order);			\
+}									\
+static inline								\
+void ssdfs_##name##_free_folio(struct folio *folio)			\
+{									\
+	ssdfs_folio_free(folio);					\
+}									\
+static inline								\
+void ssdfs_##name##_folio_batch_release(struct folio_batch *batch)	\
+{									\
+	ssdfs_folio_batch_release(batch);				\
+}									\
+
+static inline
+__le32 ssdfs_crc32_le(void *data, size_t len)
+{
+	return cpu_to_le32(crc32(~0, data, len));
+}
+
+static inline
+int ssdfs_calculate_csum(struct ssdfs_metadata_check *check,
+			  void *buf, size_t buf_size)
+{
+	u16 bytes;
+	u16 flags;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!check || !buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	bytes = le16_to_cpu(check->bytes);
+	flags = le16_to_cpu(check->flags);
+
+	if (bytes > buf_size) {
+		SSDFS_ERR("corrupted size %d of checked data\n", bytes);
+		return -EINVAL;
+	}
+
+	if (flags & SSDFS_CRC32) {
+		check->csum = 0;
+		check->csum = ssdfs_crc32_le(buf, bytes);
+	} else {
+		SSDFS_WARN("unknown flags set %#x\n", flags);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG();
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static inline
+bool is_csum_valid(struct ssdfs_metadata_check *check,
+		   void *buf, size_t buf_size)
+{
+	__le32 old_csum;
+	__le32 calc_csum;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!check);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	old_csum = check->csum;
+
+	err = ssdfs_calculate_csum(check, buf, buf_size);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to calculate checksum\n");
+		return false;
+	}
+
+	calc_csum = check->csum;
+	check->csum = old_csum;
+
+	if (old_csum != calc_csum) {
+		SSDFS_ERR("old_csum %#x != calc_csum %#x\n",
+			  __le32_to_cpu(old_csum),
+			  __le32_to_cpu(calc_csum));
+		print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+				     buf, buf_size);
+		return false;
+	}
+
+	return true;
+}
+
+static inline
+bool is_ssdfs_magic_valid(struct ssdfs_signature *magic)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!magic);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (le32_to_cpu(magic->common) != SSDFS_SUPER_MAGIC)
+		return false;
+	if (magic->version.major > SSDFS_MAJOR_REVISION ||
+	    magic->version.minor > SSDFS_MINOR_REVISION) {
+		SSDFS_INFO("Volume has unsupported %u.%u version. "
+			   "Driver expects %u.%u version.\n",
+			   magic->version.major,
+			   magic->version.minor,
+			   SSDFS_MAJOR_REVISION,
+			   SSDFS_MINOR_REVISION);
+		return false;
+	}
+
+	return true;
+}
+
+#define SSDFS_SEG_HDR(ptr) \
+	((struct ssdfs_segment_header *)(ptr))
+#define SSDFS_LF(ptr) \
+	((struct ssdfs_log_footer *)(ptr))
+#define SSDFS_VH(ptr) \
+	((struct ssdfs_volume_header *)(ptr))
+#define SSDFS_VS(ptr) \
+	((struct ssdfs_volume_state *)(ptr))
+#define SSDFS_PLH(ptr) \
+	((struct ssdfs_partial_log_header *)(ptr))
+
+/*
+ * Flags for mount options.
+ */
+#define SSDFS_MOUNT_COMPR_MODE_NONE		(1 << 0)
+#define SSDFS_MOUNT_COMPR_MODE_ZLIB		(1 << 1)
+#define SSDFS_MOUNT_COMPR_MODE_LZO		(1 << 2)
+#define SSDFS_MOUNT_ERRORS_CONT			(1 << 3)
+#define SSDFS_MOUNT_ERRORS_RO			(1 << 4)
+#define SSDFS_MOUNT_ERRORS_PANIC		(1 << 5)
+#define SSDFS_MOUNT_IGNORE_FS_STATE		(1 << 6)
+
+#define ssdfs_clear_opt(o, opt)		((o) &= ~SSDFS_MOUNT_##opt)
+#define ssdfs_set_opt(o, opt)		((o) |= SSDFS_MOUNT_##opt)
+#define ssdfs_test_opt(o, opt)		((o) & SSDFS_MOUNT_##opt)
+
+#define SSDFS_LOG_FOOTER_OFF(seg_hdr)({ \
+	u32 offset; \
+	int index; \
+	struct ssdfs_metadata_descriptor *desc; \
+	index = SSDFS_LOG_FOOTER_INDEX; \
+	desc = &SSDFS_SEG_HDR(seg_hdr)->desc_array[index]; \
+	offset = le32_to_cpu(desc->offset); \
+	offset; \
+})
+
+#define SSDFS_LOG_PAGES(seg_hdr) \
+	(le16_to_cpu(SSDFS_SEG_HDR(seg_hdr)->log_pages))
+#define SSDFS_SEG_TYPE(seg_hdr) \
+	(le16_to_cpu(SSDFS_SEG_HDR(seg_hdr)->seg_type))
+
+#define SSDFS_MAIN_SB_PEB(vh, type) \
+	(le64_to_cpu(SSDFS_VH(vh)->sb_pebs[type][SSDFS_MAIN_SB_SEG].peb_id))
+#define SSDFS_COPY_SB_PEB(vh, type) \
+	(le64_to_cpu(SSDFS_VH(vh)->sb_pebs[type][SSDFS_COPY_SB_SEG].peb_id))
+#define SSDFS_MAIN_SB_LEB(vh, type) \
+	(le64_to_cpu(SSDFS_VH(vh)->sb_pebs[type][SSDFS_MAIN_SB_SEG].leb_id))
+#define SSDFS_COPY_SB_LEB(vh, type) \
+	(le64_to_cpu(SSDFS_VH(vh)->sb_pebs[type][SSDFS_COPY_SB_SEG].leb_id))
+
+#define SSDFS_SEG_CNO(seg_hdr) \
+	(le64_to_cpu(SSDFS_SEG_HDR(seg_hdr)->cno))
+
+static inline
+u64 ssdfs_current_timestamp(void)
+{
+	struct timespec64 cur_time;
+
+	ktime_get_coarse_real_ts64(&cur_time);
+
+	return (u64)timespec64_to_ns(&cur_time);
+}
+
+static inline
+void ssdfs_init_boot_vs_mount_timediff(struct ssdfs_fs_info *fsi)
+{
+	struct timespec64 uptime;
+
+	ktime_get_boottime_ts64(&uptime);
+	fsi->boot_vs_mount_timediff = timespec64_to_ns(&uptime);
+}
+
+static inline
+u64 ssdfs_current_cno(struct super_block *sb)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct timespec64 uptime;
+	u64 boot_vs_mount_timediff;
+	u64 fs_mount_cno;
+
+	spin_lock(&fsi->volume_state_lock);
+	boot_vs_mount_timediff = fsi->boot_vs_mount_timediff;
+	fs_mount_cno = fsi->fs_mount_cno;
+	spin_unlock(&fsi->volume_state_lock);
+
+	ktime_get_boottime_ts64(&uptime);
+	return fs_mount_cno +
+		timespec64_to_ns(&uptime) -
+		boot_vs_mount_timediff;
+}
+
+#define SSDFS_MAPTBL_CACHE_HDR(ptr) \
+	((struct ssdfs_maptbl_cache_header *)(ptr))
+
+#define SSDFS_SEG_HDR_MAGIC(vh) \
+	(le16_to_cpu(SSDFS_VH(vh)->magic.key))
+#define SSDFS_SEG_TIME(seg_hdr) \
+	(le64_to_cpu(SSDFS_SEG_HDR(seg_hdr)->timestamp))
+
+#define SSDFS_VH_CNO(vh) \
+	(le64_to_cpu(SSDFS_VH(vh)->create_cno))
+#define SSDFS_VH_TIME(vh) \
+	(le64_to_cpu(SSDFS_VH(vh)->create_timestamp)
+
+#define SSDFS_VS_CNO(vs) \
+	(le64_to_cpu(SSDFS_VS(vs)->cno))
+#define SSDFS_VS_TIME(vs) \
+	(le64_to_cpu(SSDFS_VS(vs)->timestamp)
+
+#define SSDFS_POFFTH(ptr) \
+	((struct ssdfs_phys_offset_table_header *)(ptr))
+#define SSDFS_PHYSOFFD(ptr) \
+	((struct ssdfs_phys_offset_descriptor *)(ptr))
+
+/*
+ * struct ssdfs_offset2folio - folio descriptor for offset
+ * @block_size: logical block size in bytes
+ * @offset: offset in bytes
+ * @folio_index: folio index
+ * @folio_offset: folio offset in bytes
+ * @page_in_folio: page index in folio
+ * @page_offset: page offset from folio's beginning in bytes
+ * @offset_inside_page: offset inside of page in bytes
+ */
+struct ssdfs_offset2folio {
+	u32 block_size;
+	u64 offset;
+	u32 folio_index;
+	u64 folio_offset;
+	u32 page_in_folio;
+	u32 page_offset;
+	u32 offset_inside_page;
+};
+
+/*
+ * struct ssdfs_smart_folio - smart memory folio
+ * @ptr: memory folio pointer
+ * @desc: offset to folio descriptor
+ */
+struct ssdfs_smart_folio {
+	struct folio *ptr;
+	struct ssdfs_offset2folio desc;
+};
+
+/*
+ * IS_SSDFS_OFF2FOLIO_VALID() - check offset to folio descriptor
+ */
+static inline
+bool IS_SSDFS_OFF2FOLIO_VALID(struct ssdfs_offset2folio *desc)
+{
+	u64 calculated;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!desc);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (desc->block_size) {
+	case SSDFS_4KB:
+	case SSDFS_8KB:
+	case SSDFS_16KB:
+	case SSDFS_32KB:
+	case SSDFS_64KB:
+	case SSDFS_128KB:
+		/* expected block size */
+		break;
+
+	default:
+		SSDFS_ERR("unexpected logical block size %u\n",
+			  desc->block_size);
+		return false;
+	}
+
+	if (desc->folio_offset % desc->block_size) {
+		SSDFS_ERR("unaligned folio offset: "
+			  "folio_offset %llu, block_size %u\n",
+			  desc->folio_offset,
+			  desc->block_size);
+		return false;
+	}
+
+	calculated = (u64)desc->folio_index * desc->block_size;
+	if (calculated != desc->folio_offset) {
+		SSDFS_ERR("invalid folio index: "
+			  "folio_index %u, block_size %u, "
+			  "folio_offset %llu\n",
+			  desc->folio_index,
+			  desc->block_size,
+			  desc->folio_offset);
+		return false;
+	}
+
+	if (desc->page_offset % PAGE_SIZE) {
+		SSDFS_ERR("unaligned page offset: "
+			  "page_offset %u, page_size %lu\n",
+			  desc->page_offset,
+			  PAGE_SIZE);
+		return false;
+	}
+
+	calculated = (u64)desc->page_in_folio << PAGE_SHIFT;
+	if (calculated != desc->page_offset) {
+		SSDFS_ERR("invalid page in folio index: "
+			  "page_index %u, page_offset %u\n",
+			  desc->page_in_folio,
+			  desc->page_offset);
+		return false;
+	}
+
+	calculated = desc->folio_offset;
+	calculated += desc->page_offset;
+	calculated += desc->offset_inside_page;
+	if (calculated != desc->offset) {
+		SSDFS_ERR("invalid offset: "
+			  "offset %llu, folio_offset %llu, "
+			  "page_offset %u, offset_inside_page %u\n",
+			  desc->offset,
+			  desc->folio_offset,
+			  desc->page_offset,
+			  desc->offset_inside_page);
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * SSDFS_PAGE_OFFSET_IN_FOLIO() - calculate page offset in folio
+ * @folio_size: size of folio in bytes
+ * @offset: offset in bytes
+ */
+static inline
+u32 SSDFS_PAGE_OFFSET_IN_FOLIO(u32 folio_size, u64 offset)
+{
+	u64 folio_offset;
+	u64 index;
+	u64 page_offset;
+
+	index = div_u64(offset, folio_size);
+	folio_offset = index * folio_size;
+
+	index = (offset - folio_offset) >> PAGE_SHIFT;
+	page_offset = index << PAGE_SHIFT;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(page_offset >= U32_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return (u32)page_offset;
+}
+
+/*
+ * SSDFS_OFF2FOLIO() - convert offset to folio
+ * @block_size: size of block in bytes
+ * @offset: offset in bytes
+ * @desc: offset to folio descriptor [out]
+ */
+static inline
+int SSDFS_OFF2FOLIO(u32 block_size, u64 offset,
+		    struct ssdfs_offset2folio *desc)
+{
+	u64 index;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!desc);
+	BUG_ON(offset >= U64_MAX);
+
+	switch (block_size) {
+	case SSDFS_4KB:
+	case SSDFS_8KB:
+	case SSDFS_16KB:
+	case SSDFS_32KB:
+	case SSDFS_64KB:
+	case SSDFS_128KB:
+		/* expected block size */
+		break;
+
+	default:
+		SSDFS_ERR("unexpected logical block size %u\n",
+			  block_size);
+		return -EINVAL;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	desc->block_size = block_size;
+	desc->offset = offset;
+
+	desc->folio_index = div_u64(desc->offset, desc->block_size);
+	desc->folio_offset = (u64)desc->folio_index * desc->block_size;
+
+	index = (desc->offset - desc->folio_offset) >> PAGE_SHIFT;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(index >= U32_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	desc->page_in_folio = (u32)index;
+
+	index <<= PAGE_SHIFT;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(index >= U32_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	desc->page_offset = (u32)index;
+
+	desc->offset_inside_page = offset % PAGE_SIZE;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("block_size %u, offset %llu, "
+		  "folio_index %u, folio_offset %llu, "
+		  "page_in_folio %u, page_offset %u, "
+		  "offset_inside_page %u\n",
+		  desc->block_size, desc->offset,
+		  desc->folio_index, desc->folio_offset,
+		  desc->page_in_folio, desc->page_offset,
+		  desc->offset_inside_page);
+
+	if (!IS_SSDFS_OFF2FOLIO_VALID(desc)) {
+		SSDFS_ERR("invalid descriptor\n");
+		return -ERANGE;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+#define SSDFS_BLKBMP_HDR(ptr) \
+	((struct ssdfs_block_bitmap_header *)(ptr))
+#define SSDFS_SBMP_FRAG_HDR(ptr) \
+	((struct ssdfs_segbmap_fragment_header *)(ptr))
+#define SSDFS_BTN(ptr) \
+	((struct ssdfs_btree_node *)(ptr))
+
+static inline
+bool can_be_merged_into_extent(struct folio *folio1, struct folio *folio2)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(folio1->mapping->host->i_sb);
+	ino_t ino1 = folio1->mapping->host->i_ino;
+	ino_t ino2 = folio2->mapping->host->i_ino;
+	int pages_per_folio = fsi->pagesize >> PAGE_SHIFT;
+	pgoff_t index1 = folio1->index;
+	pgoff_t index2 = folio2->index;
+	pgoff_t diff_index;
+	pgoff_t expected_diff;
+	bool has_identical_type;
+	bool has_identical_ino;
+	bool has_adjacent_index;
+
+	has_identical_type = (folio_test_checked(folio1) &&
+					folio_test_checked(folio2)) ||
+				(!folio_test_checked(folio1) &&
+					!folio_test_checked(folio2));
+	has_identical_ino = ino1 == ino2;
+
+	if (index1 >= index2) {
+		diff_index = index1 - index2;
+		expected_diff = folio_nr_pages(folio2);
+	} else {
+		diff_index = index2 - index1;
+		expected_diff = folio_nr_pages(folio1);
+	}
+
+	has_adjacent_index = diff_index == expected_diff ||
+					diff_index == pages_per_folio;
+
+	return has_identical_type && has_identical_ino && has_adjacent_index;
+}
+
+static inline
+bool need_add_block(struct folio *folio)
+{
+	return folio_test_checked(folio);
+}
+
+static inline
+bool is_diff_folio(struct folio *folio)
+{
+	return folio_test_checked(folio);
+}
+
+static inline
+void set_folio_new(struct folio *folio)
+{
+	folio_set_checked(folio);
+}
+
+static inline
+void clear_folio_new(struct folio *folio)
+{
+	folio_clear_checked(folio);
+}
+
+static
+inline void ssdfs_set_folio_private(struct folio *folio,
+				    unsigned long private)
+{
+	folio_attach_private(folio, (void *)private);
+}
+
+static
+inline void ssdfs_clear_folio_private(struct folio *folio)
+{
+	folio_detach_private(folio);
+}
+
+static inline
+int ssdfs_memcpy(void *dst, u32 dst_off, u32 dst_size,
+		 const void *src, u32 src_off, u32 src_size,
+		 u32 copy_size)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	if ((src_off + copy_size) > src_size) {
+		SSDFS_WARN("fail to copy: "
+			   "src_off %u, copy_size %u, src_size %u\n",
+			   src_off, copy_size, src_size);
+		return -ERANGE;
+	}
+
+	if ((dst_off + copy_size) > dst_size) {
+		SSDFS_WARN("fail to copy: "
+			   "dst_off %u, copy_size %u, dst_size %u\n",
+			   dst_off, copy_size, dst_size);
+		return -ERANGE;
+	}
+
+	SSDFS_DBG("dst %p, dst_off %u, dst_size %u, "
+		  "src %p, src_off %u, src_size %u, "
+		  "copy_size %u\n",
+		  dst, dst_off, dst_size,
+		  src, src_off, src_size,
+		  copy_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	memcpy((u8 *)dst + dst_off, (u8 *)src + src_off, copy_size);
+	return 0;
+}
+
+static inline
+int ssdfs_iter_copy(void *dst_kaddr, u32 dst_offset,
+		    void *src_kaddr, u32 src_offset,
+		    u32 copy_size, u32 *copied_bytes)
+{
+	u32 src_offset_in_page;
+	u32 dst_offset_in_page;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!copied_bytes);
+	BUG_ON(copy_size == 0);
+
+	SSDFS_DBG("src_kaddr %p, src_offset %u, "
+		  "dst_kaddr %p, dst_offset %u, "
+		  "copy_size %u\n",
+		  src_kaddr, src_offset,
+		  dst_kaddr, dst_offset,
+		  copy_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	src_offset_in_page = src_offset % PAGE_SIZE;
+	*copied_bytes = PAGE_SIZE - src_offset_in_page;
+
+	dst_offset_in_page = dst_offset % PAGE_SIZE;
+	*copied_bytes = min_t(u32, *copied_bytes,
+				   PAGE_SIZE - dst_offset_in_page);
+
+	*copied_bytes = min_t(u32, *copied_bytes, copy_size);
+
+	err = ssdfs_memcpy(dst_kaddr, dst_offset_in_page, PAGE_SIZE,
+			   src_kaddr, src_offset_in_page, PAGE_SIZE,
+			   *copied_bytes);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to copy: "
+			  "src_kaddr %p, src_offset_in_page %u, "
+			  "dst_kaddr %p, dst_offset_in_page %u, "
+			  "copied_bytes %u, err %d\n",
+			  src_kaddr, src_offset_in_page,
+			  dst_kaddr, dst_offset_in_page,
+			  *copied_bytes, err);
+		return err;
+	}
+
+	return 0;
+}
+
+static inline
+int ssdfs_iter_copy_from_folio(void *dst_kaddr, u32 dst_offset, u32 dst_size,
+				void *src_kaddr, u32 src_offset,
+				u32 copy_size, u32 *copied_bytes)
+{
+	u32 src_offset_in_page;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!copied_bytes);
+	BUG_ON(copy_size == 0);
+
+	SSDFS_DBG("src_kaddr %p, src_offset %u, "
+		  "dst_kaddr %p, dst_offset %u, dst_size %u, "
+		  "copy_size %u\n",
+		  src_kaddr, src_offset,
+		  dst_kaddr, dst_offset, dst_size,
+		  copy_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	src_offset_in_page = src_offset % PAGE_SIZE;
+	*copied_bytes = PAGE_SIZE - src_offset_in_page;
+	*copied_bytes = min_t(u32, *copied_bytes, copy_size);
+
+	err = ssdfs_memcpy(dst_kaddr, dst_offset, dst_size,
+			   src_kaddr, src_offset_in_page, PAGE_SIZE,
+			   *copied_bytes);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to copy: "
+			  "src_kaddr %p, src_offset_in_page %u, "
+			  "dst_kaddr %p, dst_offset %u, "
+			  "copied_bytes %u, err %d\n",
+			  src_kaddr, src_offset_in_page,
+			  dst_kaddr, dst_offset,
+			  *copied_bytes, err);
+		return err;
+	}
+
+	return 0;
+}
+
+static inline
+int ssdfs_iter_copy_to_folio(void *dst_kaddr, u32 dst_offset,
+			     void *src_kaddr, u32 src_offset, u32 src_size,
+			     u32 copy_size, u32 *copied_bytes)
+{
+	u32 dst_offset_in_page;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!copied_bytes);
+	BUG_ON(copy_size == 0);
+
+	SSDFS_DBG("src_kaddr %p, src_offset %u, "
+		  "dst_kaddr %p, dst_offset %u, "
+		  "copy_size %u\n",
+		  src_kaddr, src_offset,
+		  dst_kaddr, dst_offset,
+		  copy_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	dst_offset_in_page = dst_offset % PAGE_SIZE;
+	*copied_bytes = PAGE_SIZE - dst_offset_in_page;
+	*copied_bytes = min_t(u32, *copied_bytes, copy_size);
+
+	err = ssdfs_memcpy(dst_kaddr, dst_offset_in_page, PAGE_SIZE,
+			   src_kaddr, src_offset, src_size,
+			   *copied_bytes);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to copy: "
+			  "src_kaddr %p, src_offset %u, src_size %u, "
+			  "dst_kaddr %p, dst_offset_in_page %u, "
+			  "copied_bytes %u, err %d\n",
+			  src_kaddr, src_offset, src_size,
+			  dst_kaddr, dst_offset_in_page,
+			  *copied_bytes, err);
+		return err;
+	}
+
+	return 0;
+}
+
+static inline
+int __ssdfs_memcpy_folio(struct folio *dst_folio, u32 dst_off, u32 dst_size,
+			 struct folio *src_folio, u32 src_off, u32 src_size,
+			 u32 copy_size)
+{
+	void *src_kaddr;
+	void *dst_kaddr;
+	u32 src_page, dst_page;
+	u32 copied_bytes = 0;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!dst_folio || !src_folio);
+
+	switch (dst_size) {
+	case SSDFS_4KB:
+	case SSDFS_8KB:
+	case SSDFS_16KB:
+	case SSDFS_32KB:
+	case SSDFS_64KB:
+	case SSDFS_128KB:
+		/* expected block size */
+		break;
+
+	default:
+		SSDFS_ERR("unexpected dst_size %u\n",
+			  dst_size);
+		return -EINVAL;
+	}
+
+	switch (src_size) {
+	case SSDFS_4KB:
+	case SSDFS_8KB:
+	case SSDFS_16KB:
+	case SSDFS_32KB:
+	case SSDFS_64KB:
+	case SSDFS_128KB:
+		/* expected block size */
+		break;
+
+	default:
+		SSDFS_ERR("unexpected src_size %u\n",
+			  src_size);
+		return -EINVAL;
+	}
+
+	if (dst_size > folio_size(dst_folio) ||
+	    copy_size > folio_size(dst_folio)) {
+		SSDFS_ERR("fail to copy: "
+			  "dst_size %u, copy_size %u, folio_size %zu\n",
+			  dst_size, copy_size, folio_size(dst_folio));
+		return -ERANGE;
+	}
+
+	if (src_size > folio_size(src_folio) ||
+	    copy_size > folio_size(src_folio)) {
+		SSDFS_ERR("fail to copy: "
+			  "src_size %u, copy_size %u, folio_size %zu\n",
+			  src_size, copy_size, folio_size(src_folio));
+		return -ERANGE;
+	}
+
+	if ((src_off + copy_size) > src_size) {
+		SSDFS_ERR("fail to copy: "
+			  "src_off %u, copy_size %u, src_size %u\n",
+			  src_off, copy_size, src_size);
+		return -ERANGE;
+	}
+
+	if ((dst_off + copy_size) > dst_size) {
+		SSDFS_ERR("fail to copy: "
+			  "dst_off %u, copy_size %u, dst_size %u\n",
+			  dst_off, copy_size, dst_size);
+		return -ERANGE;
+	}
+
+	SSDFS_DBG("dst_folio %p, dst_off %u, dst_size %u, "
+		  "src_folio %p, src_off %u, src_size %u, "
+		  "copy_size %u\n",
+		  dst_folio, dst_off, dst_size,
+		  src_folio, src_off, src_size,
+		  copy_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (copy_size == 0) {
+		SSDFS_ERR("copy_size == 0\n");
+		return -ERANGE;
+	}
+
+	while (copied_bytes < copy_size) {
+		u32 src_iter_offset;
+		u32 dst_iter_offset;
+		u32 iter_bytes;
+
+		src_iter_offset = src_off + copied_bytes;
+		src_page = src_iter_offset >> PAGE_SHIFT;
+
+		dst_iter_offset = dst_off + copied_bytes;
+		dst_page = dst_iter_offset >> PAGE_SHIFT;
+
+		src_kaddr = kmap_local_folio(src_folio, src_page * PAGE_SIZE);
+		dst_kaddr = kmap_local_folio(dst_folio, dst_page * PAGE_SIZE);
+		err = ssdfs_iter_copy(dst_kaddr, dst_iter_offset,
+				      src_kaddr, src_iter_offset,
+				      copy_size - copied_bytes,
+				      &iter_bytes);
+		kunmap_local(dst_kaddr);
+		kunmap_local(src_kaddr);
+
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to copy folio: "
+				  "src_page %u, src_iter_offset %u, "
+				  "dst_page %u, dst_iter_offset %u, "
+				  "iter_bytes %u, err %d\n",
+				  src_page, src_iter_offset,
+				  dst_page, dst_iter_offset,
+				  iter_bytes, err);
+			return err;
+		}
+
+		copied_bytes += iter_bytes;
+	}
+
+	if (copied_bytes != copy_size) {
+		SSDFS_ERR("copied_bytes %u != copy_size %u\n",
+			  copied_bytes, copy_size);
+		return -ERANGE;
+	}
+
+	flush_dcache_folio(dst_folio);
+
+	return 0;
+}
+
+static inline
+int ssdfs_memcpy_folio(struct ssdfs_smart_folio *dst_folio,
+			struct ssdfs_smart_folio *src_folio,
+			u32 copy_size)
+{
+	u32 dst_off;
+	u32 src_off;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!dst_folio || !src_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	dst_off = dst_folio->desc.page_offset +
+			dst_folio->desc.offset_inside_page;
+	src_off = src_folio->desc.page_offset +
+			src_folio->desc.offset_inside_page;
+
+	return __ssdfs_memcpy_folio(dst_folio->ptr,
+				    dst_off, dst_folio->desc.block_size,
+				    src_folio->ptr,
+				    src_off, src_folio->desc.block_size,
+				    copy_size);
+}
+
+static inline
+int __ssdfs_memcpy_from_folio(void *dst, u32 dst_off, u32 dst_size,
+			      struct folio *folio, u32 src_off, u32 src_size,
+			      u32 copy_size)
+{
+	void *src_kaddr;
+	u32 src_page;
+	u32 copied_bytes = 0;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	switch (src_size) {
+	case SSDFS_4KB:
+	case SSDFS_8KB:
+	case SSDFS_16KB:
+	case SSDFS_32KB:
+	case SSDFS_64KB:
+	case SSDFS_128KB:
+		/* expected block size */
+		break;
+
+	default:
+		SSDFS_ERR("unexpected src_size %u\n",
+			  src_size);
+		return -EINVAL;
+	}
+
+	if (src_size > folio_size(folio) ||
+	    copy_size > folio_size(folio)) {
+		SSDFS_ERR("fail to copy: "
+			  "src_size %u, copy_size %u, folio_size %zu\n",
+			  src_size, copy_size, folio_size(folio));
+		return -ERANGE;
+	}
+
+	if ((src_off + copy_size) > src_size) {
+		SSDFS_ERR("fail to copy: "
+			  "src_off %u, copy_size %u, src_size %u\n",
+			  src_off, copy_size, src_size);
+		return -ERANGE;
+	}
+
+	if ((dst_off + copy_size) > dst_size) {
+		SSDFS_ERR("fail to copy: "
+			  "dst_off %u, copy_size %u, dst_size %u\n",
+			  dst_off, copy_size, dst_size);
+		return -ERANGE;
+	}
+
+	SSDFS_DBG("dst %p, dst_off %u, dst_size %u, "
+		  "folio %p, src_off %u, src_size %u, "
+		  "copy_size %u\n",
+		  dst, dst_off, dst_size,
+		  folio, src_off, src_size,
+		  copy_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (copy_size == 0) {
+		SSDFS_ERR("copy_size == 0\n");
+		return -ERANGE;
+	}
+
+	while (copied_bytes < copy_size) {
+		u32 src_iter_offset;
+		u32 dst_iter_offset;
+		u32 iter_bytes;
+
+		src_iter_offset = src_off + copied_bytes;
+		src_page = src_iter_offset >> PAGE_SHIFT;
+
+		dst_iter_offset = dst_off + copied_bytes;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("src_off %u, src_iter_offset %u, src_page %u, "
+			  "dst_off %u, dst_iter_offset %u\n",
+			  src_off, src_iter_offset, src_page,
+			  dst_off, dst_iter_offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		src_kaddr = kmap_local_folio(folio, src_page * PAGE_SIZE);
+		err = ssdfs_iter_copy_from_folio(dst, dst_iter_offset, dst_size,
+						 src_kaddr, src_iter_offset,
+						 copy_size - copied_bytes,
+						 &iter_bytes);
+		kunmap_local(src_kaddr);
+
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to copy folio: "
+				  "src_page %u, src_iter_offset %u, "
+				  "dst_iter_offset %u, "
+				  "iter_bytes %u, err %d\n",
+				  src_page, src_iter_offset,
+				  dst_iter_offset,
+				  iter_bytes, err);
+			return err;
+		}
+
+		copied_bytes += iter_bytes;
+	}
+
+	if (copied_bytes != copy_size) {
+		SSDFS_ERR("copied_bytes %u != copy_size %u\n",
+			  copied_bytes, copy_size);
+		return -ERANGE;
+	}
+
+	return 0;
+}
+
+static inline
+int ssdfs_memcpy_from_folio(void *dst, u32 dst_off, u32 dst_size,
+			    struct ssdfs_smart_folio *src_folio,
+			    u32 copy_size)
+{
+	u32 src_off;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!dst || !src_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	src_off = src_folio->desc.page_offset +
+			src_folio->desc.offset_inside_page;
+
+	return __ssdfs_memcpy_from_folio(dst, dst_off, dst_size,
+					 src_folio->ptr,
+					 src_off, folio_size(src_folio->ptr),
+					 copy_size);
+}
+
+static inline
+int __ssdfs_memcpy_to_folio(struct folio *folio, u32 dst_off, u32 dst_size,
+			    void *src, u32 src_off, u32 src_size,
+			    u32 copy_size)
+{
+	void *dst_kaddr;
+	u32 dst_page;
+	u32 copied_bytes = 0;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	switch (dst_size) {
+	case SSDFS_4KB:
+	case SSDFS_8KB:
+	case SSDFS_16KB:
+	case SSDFS_32KB:
+	case SSDFS_64KB:
+	case SSDFS_128KB:
+		/* expected block size */
+		break;
+
+	default:
+		SSDFS_ERR("unexpected dst_size %u\n",
+			  dst_size);
+		return -EINVAL;
+	}
+
+	if (dst_size > folio_size(folio) ||
+	    copy_size > folio_size(folio)) {
+		SSDFS_ERR("fail to copy: "
+			  "dst_size %u, copy_size %u, folio_size %zu\n",
+			  dst_size, copy_size, folio_size(folio));
+		return -ERANGE;
+	}
+
+	if ((src_off + copy_size) > src_size) {
+		SSDFS_ERR("fail to copy: "
+			  "src_off %u, copy_size %u, src_size %u\n",
+			  src_off, copy_size, src_size);
+		return -ERANGE;
+	}
+
+	if ((dst_off + copy_size) > dst_size) {
+		SSDFS_ERR("fail to copy: "
+			  "dst_off %u, copy_size %u, dst_size %u\n",
+			  dst_off, copy_size, dst_size);
+		return -ERANGE;
+	}
+
+	SSDFS_DBG("folio %p, dst_off %u, dst_size %u, "
+		  "src %p, src_off %u, src_size %u, "
+		  "copy_size %u\n",
+		  folio, dst_off, dst_size,
+		  src, src_off, src_size,
+		  copy_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (copy_size == 0) {
+		SSDFS_ERR("copy_size == 0\n");
+		return -ERANGE;
+	}
+
+	while (copied_bytes < copy_size) {
+		u32 src_iter_offset;
+		u32 dst_iter_offset;
+		u32 iter_bytes;
+
+		src_iter_offset = src_off + copied_bytes;
+
+		dst_iter_offset = dst_off + copied_bytes;
+		dst_page = dst_iter_offset >> PAGE_SHIFT;
+
+		dst_kaddr = kmap_local_folio(folio, dst_page * PAGE_SIZE);
+		err = ssdfs_iter_copy_to_folio(dst_kaddr, dst_iter_offset,
+						src, src_iter_offset, src_size,
+						copy_size - copied_bytes,
+						&iter_bytes);
+		kunmap_local(dst_kaddr);
+
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to copy folio: "
+				  "src_iter_offset %u, "
+				  "dst_page %u, dst_iter_offset %u, "
+				  "iter_bytes %u, err %d\n",
+				  src_iter_offset,
+				  dst_page, dst_iter_offset,
+				  iter_bytes, err);
+			return err;
+		}
+
+		copied_bytes += iter_bytes;
+	}
+
+	if (copied_bytes != copy_size) {
+		SSDFS_ERR("copied_bytes %u != copy_size %u\n",
+			  copied_bytes, copy_size);
+		return -ERANGE;
+	}
+
+	flush_dcache_folio(folio);
+
+	return 0;
+}
+
+static inline
+int ssdfs_memcpy_to_folio(struct ssdfs_smart_folio *dst_folio,
+			  void *src, u32 src_off, u32 src_size,
+			  u32 copy_size)
+{
+	u32 dst_off;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!dst_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	dst_off = dst_folio->desc.page_offset +
+			dst_folio->desc.offset_inside_page;
+
+	return __ssdfs_memcpy_to_folio(dst_folio->ptr,
+					dst_off, dst_folio->desc.block_size,
+					src, src_off, src_size,
+					copy_size);
+}
+
+static inline
+int ssdfs_memcpy_to_batch(struct folio_batch *batch, u32 dst_off,
+			  void *src, u32 src_off, u32 src_size,
+			  u32 copy_size)
+{
+	struct folio *folio = NULL;
+	int index;
+	u32 batch_size;
+	u32 offset;
+	u32 processed_bytes = 0;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!batch || !src);
+
+	SSDFS_DBG("dst_off %u, src_off %u, "
+		  "src_size %u, copy_size %u\n",
+		  dst_off, src_off, src_size, copy_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	batch_size = folio_batch_count(batch);
+	offset = 0;
+	for (index = 0; index < batch_size; index++) {
+		folio = batch->folios[index];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		offset += folio_size(folio);
+
+		if (dst_off <= offset)
+			break;
+	}
+
+	if (!folio) {
+		SSDFS_ERR("fail to find folio: "
+			  "dst_off %u\n",
+			  dst_off);
+		return -ERANGE;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(index >= folio_batch_count(batch));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	while (processed_bytes < copy_size) {
+		u32 offset_inside_folio;
+		u32 dst_size;
+		u32 copied_bytes = 0;
+
+		if (index >= folio_batch_count(batch)) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("stop copy operation: "
+				  "index %d, batch_size %u\n",
+				  index,
+				  folio_batch_count(batch));
+#endif /* CONFIG_SSDFS_DEBUG */
+			break;
+		}
+
+		folio = batch->folios[index];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		offset_inside_folio = dst_off + processed_bytes;
+		offset_inside_folio %= folio_size(folio);
+		dst_size = folio_size(folio) - offset_inside_folio;
+
+		copied_bytes = min_t(u32, src_size, dst_size);
+		copied_bytes = min_t(u32, copied_bytes,
+					copy_size - processed_bytes);
+
+		err = __ssdfs_memcpy_to_folio(folio,
+						offset_inside_folio,
+						folio_size(folio),
+						src,
+						src_off + processed_bytes,
+						src_size,
+						copied_bytes);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to copy: "
+				  "offset_inside_folio %u, "
+				  "folio_size %zu, "
+				  "copied_bytes %u, err %d\n",
+				  offset_inside_folio,
+				  folio_size(folio),
+				  copied_bytes,
+				  err);
+			return err;
+		}
+
+		processed_bytes += copied_bytes;
+
+		index++;
+	}
+
+	if (processed_bytes < copy_size) {
+		SSDFS_ERR("fail to copy: "
+			  "processed_bytes %u < copy_size %u\n",
+			  processed_bytes, copy_size);
+		return -ERANGE;
+	}
+
+	return 0;
+}
+
+static inline
+int ssdfs_memcpy_from_batch(void *dst, u32 dst_off, u32 dst_size,
+			    struct folio_batch *batch, u32 src_off,
+			    u32 copy_size)
+{
+	struct folio *folio = NULL;
+	int index;
+	u32 batch_size;
+	u32 offset;
+	u32 processed_bytes = 0;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!dst || !batch);
+
+	SSDFS_DBG("dst_off %u, src_off %u, "
+		  "dst_size %u, copy_size %u\n",
+		  dst_off, src_off, dst_size, copy_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	batch_size = folio_batch_count(batch);
+	offset = 0;
+	for (index = 0; index < batch_size; index++) {
+		folio = batch->folios[index];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		offset += folio_size(folio);
+
+		if (src_off <= offset)
+			break;
+	}
+
+	if (!folio) {
+		SSDFS_ERR("fail to find folio: "
+			  "dst_off %u\n",
+			  dst_off);
+		return -ERANGE;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("index %d, batch_size %u, "
+		  "offset %u, src_off %u, "
+		  "folio_size %zu\n",
+		  index, batch_size,
+		  offset, src_off,
+		  folio_size(folio));
+
+	BUG_ON(index >= folio_batch_count(batch));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	while (processed_bytes < copy_size) {
+		u32 offset_inside_folio;
+		u32 src_size;
+		u32 copied_bytes = 0;
+
+		if (index >= folio_batch_count(batch)) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("stop copy operation: "
+				  "index %d, batch_size %u\n",
+				  index,
+				  folio_batch_count(batch));
+#endif /* CONFIG_SSDFS_DEBUG */
+			break;
+		}
+
+		folio = batch->folios[index];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		offset_inside_folio = src_off + processed_bytes;
+		offset_inside_folio %= folio_size(folio);
+		src_size = folio_size(folio) - offset_inside_folio;
+
+		copied_bytes = min_t(u32, src_size, dst_size);
+		copied_bytes = min_t(u32, copied_bytes,
+					copy_size - processed_bytes);
+
+		err = __ssdfs_memcpy_from_folio(dst,
+						dst_off,
+						dst_size,
+						folio,
+						offset_inside_folio,
+						folio_size(folio),
+						copied_bytes);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to copy: "
+				  "offset_inside_folio %u, "
+				  "folio_size %zu, "
+				  "copied_bytes %u, err %d\n",
+				  offset_inside_folio,
+				  folio_size(folio),
+				  copied_bytes,
+				  err);
+			return err;
+		}
+
+		processed_bytes += copied_bytes;
+
+		index++;
+	}
+
+	if (processed_bytes < copy_size) {
+		SSDFS_ERR("fail to copy: "
+			  "processed_bytes %u < copy_size %u\n",
+			  processed_bytes, copy_size);
+		return -ERANGE;
+	}
+
+	return 0;
+}
+
+static inline
+int ssdfs_memcpy_batch2batch(struct folio_batch *dst_batch, u32 dst_off,
+			     struct folio_batch *src_batch, u32 src_off,
+			     u32 copy_size)
+{
+	struct folio *src_folio = NULL;
+	struct folio *dst_folio = NULL;
+	int src_index, dst_index;
+	u32 batch_size;
+	u32 offset;
+	u32 processed_bytes = 0;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!dst_batch || !src_batch);
+
+	SSDFS_DBG("dst_off %u, src_off %u, "
+		  "copy_size %u\n",
+		  dst_off, src_off, copy_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	batch_size = folio_batch_count(src_batch);
+	offset = 0;
+	for (src_index = 0; src_index < batch_size; src_index++) {
+		src_folio = src_batch->folios[src_index];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!src_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		offset += folio_size(src_folio);
+
+		if (src_off <= offset)
+			break;
+	}
+
+	if (!src_folio) {
+		SSDFS_ERR("fail to find source folio: "
+			  "src_off %u\n",
+			  src_off);
+		return -ERANGE;
+	}
+
+	batch_size = folio_batch_count(dst_batch);
+	offset = 0;
+	for (dst_index = 0; dst_index < batch_size; dst_index++) {
+		dst_folio = dst_batch->folios[dst_index];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!dst_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		offset += folio_size(dst_folio);
+
+		if (dst_off <= offset)
+			break;
+	}
+
+	if (!dst_folio) {
+		SSDFS_ERR("fail to find destination folio: "
+			  "dst_off %u\n",
+			  dst_off);
+		return -ERANGE;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(src_index >= folio_batch_count(src_batch));
+	BUG_ON(dst_index >= folio_batch_count(dst_batch));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	while (processed_bytes < copy_size) {
+		u32 src_offset_inside_folio;
+		u32 dst_offset_inside_folio;
+		u32 src_size;
+		u32 dst_size;
+		u32 copied_bytes = 0;
+
+		if (src_index >= folio_batch_count(src_batch) ||
+		    dst_index >= folio_batch_count(dst_batch)) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("stop copy operation: "
+				  "src_index %d, src_batch_size %u, "
+				  "dst_index %d, dst_batch_size %u\n",
+				  src_index,
+				  folio_batch_count(src_batch),
+				  dst_index,
+				  folio_batch_count(dst_batch));
+#endif /* CONFIG_SSDFS_DEBUG */
+			break;
+		}
+
+		src_folio = src_batch->folios[src_index];
+		dst_folio = dst_batch->folios[dst_index];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!src_folio);
+		BUG_ON(!dst_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		src_offset_inside_folio = src_off + processed_bytes;
+		src_offset_inside_folio %= folio_size(src_folio);
+		src_size = folio_size(src_folio) - src_offset_inside_folio;
+
+		dst_offset_inside_folio = dst_off + processed_bytes;
+		dst_offset_inside_folio %= folio_size(dst_folio);
+		dst_size = folio_size(dst_folio) - dst_offset_inside_folio;
+
+		copied_bytes = min_t(u32, src_size, dst_size);
+		copied_bytes = min_t(u32, copied_bytes,
+					copy_size - processed_bytes);
+
+		err = __ssdfs_memcpy_folio(dst_folio,
+					   dst_offset_inside_folio,
+					   folio_size(dst_folio),
+					   src_folio,
+					   src_offset_inside_folio,
+					   folio_size(src_folio),
+					   copied_bytes);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to copy: "
+				  "src_offset_inside_folio %u, "
+				  "src_folio_size %zu, "
+				  "dst_offset_inside_folio %u, "
+				  "dst_folio_size %zu, "
+				  "copied_bytes %u, err %d\n",
+				  src_offset_inside_folio,
+				  folio_size(src_folio),
+				  dst_offset_inside_folio,
+				  folio_size(dst_folio),
+				  copied_bytes,
+				  err);
+			return err;
+		}
+
+		processed_bytes += copied_bytes;
+
+		src_index++;
+		dst_index++;
+	}
+
+	if (processed_bytes < copy_size) {
+		SSDFS_ERR("fail to copy: "
+			  "processed_bytes %u < copy_size %u\n",
+			  processed_bytes, copy_size);
+		return -ERANGE;
+	}
+
+	return 0;
+}
+
+static inline
+int ssdfs_memmove(void *dst, u32 dst_off, u32 dst_size,
+		  const void *src, u32 src_off, u32 src_size,
+		  u32 move_size)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	if ((src_off + move_size) > src_size) {
+		SSDFS_ERR("fail to move: "
+			  "src_off %u, move_size %u, src_size %u\n",
+			  src_off, move_size, src_size);
+		return -ERANGE;
+	}
+
+	if ((dst_off + move_size) > dst_size) {
+		SSDFS_ERR("fail to move: "
+			  "dst_off %u, move_size %u, dst_size %u\n",
+			  dst_off, move_size, dst_size);
+		return -ERANGE;
+	}
+
+	SSDFS_DBG("dst %p, dst_off %u, dst_size %u, "
+		  "src %p, src_off %u, src_size %u, "
+		  "move_size %u\n",
+		  dst, dst_off, dst_size,
+		  src, src_off, src_size,
+		  move_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	memmove((u8 *)dst + dst_off, (u8 *)src + src_off, move_size);
+	return 0;
+}
+
+static inline
+int ssdfs_memmove_folio(struct ssdfs_smart_folio *dst_folio,
+			struct ssdfs_smart_folio *src_folio,
+			u32 move_size)
+{
+	void *kaddr;
+	u64 src_offset, dst_offset;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!dst_folio || !src_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (src_folio->desc.folio_index == dst_folio->desc.folio_index &&
+	    src_folio->desc.page_in_folio == dst_folio->desc.page_in_folio) {
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!src_folio->ptr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		src_offset = src_folio->desc.offset_inside_page;
+		dst_offset = dst_folio->desc.offset_inside_page;
+
+		kaddr = kmap_local_folio(src_folio->ptr,
+					 src_folio->desc.page_offset);
+		err = ssdfs_memmove(kaddr, dst_offset, PAGE_SIZE,
+				    kaddr, src_offset, PAGE_SIZE,
+				    move_size);
+		flush_dcache_folio(src_folio->ptr);
+		kunmap_local(kaddr);
+	} else {
+		err = ssdfs_memcpy_folio(dst_folio, src_folio, move_size);
+	}
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to move: err %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
+static inline
+int __ssdfs_memmove_folio(struct folio *dst_ptr, u32 dst_off, u32 dst_size,
+			  struct folio *src_ptr, u32 src_off, u32 src_size,
+			  u32 move_size)
+{
+	struct ssdfs_smart_folio src_folio, dst_folio;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!dst_ptr || !src_ptr);
+
+	SSDFS_DBG("src_off %u, src_size %u, "
+		  "dst_off %u, dst_size %u\n",
+		  src_off, src_size,
+		  dst_off, dst_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = SSDFS_OFF2FOLIO(folio_size(src_ptr), src_off, &src_folio.desc);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to convert offset into folio: "
+			  "offset %u, err %d\n",
+			  src_off, err);
+		return err;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!IS_SSDFS_OFF2FOLIO_VALID(&src_folio.desc));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	src_folio.ptr = src_ptr;
+
+	err = SSDFS_OFF2FOLIO(folio_size(dst_ptr), dst_off, &dst_folio.desc);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to convert offset into folio: "
+			  "offset %u, err %d\n",
+			  dst_off, err);
+		return err;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!IS_SSDFS_OFF2FOLIO_VALID(&dst_folio.desc));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	dst_folio.ptr = dst_ptr;
+
+	return ssdfs_memmove_folio(&dst_folio, &src_folio, move_size);
+}
+
+static inline
+int ssdfs_memmove_inside_batch(struct folio_batch *batch,
+				u32 dst_off, u32 src_off,
+				u32 move_size)
+{
+	struct folio *src_folio = NULL;
+	struct folio *dst_folio = NULL;
+	int src_index, dst_index;
+	u32 batch_size;
+	u32 offset;
+	u32 processed_bytes = 0;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!batch);
+
+	SSDFS_DBG("dst_off %u, src_off %u, move_size %u\n",
+		  dst_off, src_off, move_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	batch_size = folio_batch_count(batch);
+
+	offset = 0;
+	for (src_index = 0; src_index < batch_size; src_index++) {
+		src_folio = batch->folios[src_index];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!src_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		offset += folio_size(src_folio);
+
+		if (src_off <= offset)
+			break;
+	}
+
+	if (!src_folio) {
+		SSDFS_ERR("fail to find source folio: "
+			  "src_off %u\n",
+			  src_off);
+		return -ERANGE;
+	}
+
+	offset = 0;
+	for (dst_index = 0; dst_index < batch_size; dst_index++) {
+		dst_folio = batch->folios[dst_index];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!dst_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		offset += folio_size(dst_folio);
+
+		if (dst_off <= offset)
+			break;
+	}
+
+	if (!dst_folio) {
+		SSDFS_ERR("fail to find destination folio: "
+			  "dst_off %u\n",
+			  dst_off);
+		return -ERANGE;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("src_index %d, dst_index %d, batch_size %u\n",
+		  src_index, dst_index,
+		  folio_batch_count(batch));
+
+	BUG_ON(src_index >= folio_batch_count(batch));
+	BUG_ON(dst_index >= folio_batch_count(batch));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	while (processed_bytes < move_size) {
+		u32 src_offset_inside_folio;
+		u32 dst_offset_inside_folio;
+		u32 src_size;
+		u32 dst_size;
+		u32 copied_bytes = 0;
+
+		if (src_index >= folio_batch_count(batch) ||
+		    dst_index >= folio_batch_count(batch)) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("stop copy operation: "
+				  "src_index %d, dst_index %d, "
+				  "batch_size %u\n",
+				  src_index, dst_index,
+				  folio_batch_count(batch));
+#endif /* CONFIG_SSDFS_DEBUG */
+			break;
+		}
+
+		src_folio = batch->folios[src_index];
+		dst_folio = batch->folios[dst_index];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!src_folio);
+		BUG_ON(!dst_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		src_offset_inside_folio = src_off + processed_bytes;
+		src_offset_inside_folio %= folio_size(src_folio);
+		src_size = folio_size(src_folio) - src_offset_inside_folio;
+
+		dst_offset_inside_folio = dst_off + processed_bytes;
+		dst_offset_inside_folio %= folio_size(dst_folio);
+		dst_size = folio_size(dst_folio) - dst_offset_inside_folio;
+
+		copied_bytes = min_t(u32, src_size, dst_size);
+		copied_bytes = min_t(u32, copied_bytes,
+					move_size - processed_bytes);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("src_off %u, dst_off %u, processed_bytes %u, "
+			  "src_offset_inside_folio %u, src_size %u, "
+			  "dst_offset_inside_folio %u, dst_size %u, "
+			  "move_size %u, copied_bytes %u\n",
+			  src_off, dst_off, processed_bytes,
+			  src_offset_inside_folio, src_size,
+			  dst_offset_inside_folio, dst_size,
+			  move_size, copied_bytes);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (src_index == dst_index) {
+			err = __ssdfs_memmove_folio(dst_folio,
+						    dst_offset_inside_folio,
+						    folio_size(dst_folio),
+						    src_folio,
+						    src_offset_inside_folio,
+						    folio_size(src_folio),
+						    copied_bytes);
+		} else {
+			err = __ssdfs_memcpy_folio(dst_folio,
+						   dst_offset_inside_folio,
+						   folio_size(dst_folio),
+						   src_folio,
+						   src_offset_inside_folio,
+						   folio_size(src_folio),
+						   copied_bytes);
+		}
+
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to copy: "
+				  "src_offset_inside_folio %u, "
+				  "src_folio_size %zu, "
+				  "dst_offset_inside_folio %u, "
+				  "dst_folio_size %zu, "
+				  "copied_bytes %u, err %d\n",
+				  src_offset_inside_folio,
+				  folio_size(src_folio),
+				  dst_offset_inside_folio,
+				  folio_size(dst_folio),
+				  copied_bytes,
+				  err);
+			return err;
+		}
+
+		processed_bytes += copied_bytes;
+
+		src_index++;
+		dst_index++;
+	}
+
+	if (processed_bytes < move_size) {
+		SSDFS_ERR("fail to move: "
+			  "processed_bytes %u < move_size %u\n",
+			  processed_bytes, move_size);
+		return -ERANGE;
+	}
+
+	return 0;
+}
+
+static inline
+int __ssdfs_memset_folio(struct folio *folio, u32 dst_off, u32 dst_size,
+			 int value, u32 set_size)
+{
+	void *dst_kaddr;
+	u32 dst_page;
+	u32 processed_bytes = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	switch (dst_size) {
+	case SSDFS_4KB:
+	case SSDFS_8KB:
+	case SSDFS_16KB:
+	case SSDFS_32KB:
+	case SSDFS_64KB:
+	case SSDFS_128KB:
+		/* expected block size */
+		break;
+
+	default:
+		SSDFS_ERR("unexpected dst_size %u\n",
+			  dst_size);
+		return -EINVAL;
+	}
+
+	if (dst_size > folio_size(folio) ||
+	    set_size > folio_size(folio)) {
+		SSDFS_ERR("fail to copy: "
+			  "dst_size %u, set_size %u, folio_size %zu\n",
+			  dst_size, set_size, folio_size(folio));
+		return -ERANGE;
+	}
+
+	if ((dst_off + set_size) > dst_size) {
+		SSDFS_WARN("fail to memset: "
+			   "dst_off %u, set_size %u, dst_size %u\n",
+			   dst_off, set_size, dst_size);
+		return -ERANGE;
+	}
+
+	SSDFS_DBG("folio %p, dst_off %u, dst_size %u, "
+		  "value %#x, set_size %u\n",
+		  folio, dst_off, dst_size,
+		  value, set_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (set_size == 0) {
+		SSDFS_ERR("set_size == 0\n");
+		return -ERANGE;
+	}
+
+	while (processed_bytes < set_size) {
+		u32 dst_iter_offset;
+		u32 iter_bytes;
+
+		dst_iter_offset = dst_off + processed_bytes;
+		dst_page = dst_iter_offset >> PAGE_SHIFT;
+		dst_iter_offset = dst_iter_offset % PAGE_SIZE;
+
+		iter_bytes = min_t(u32, PAGE_SIZE - dst_iter_offset,
+				   set_size - processed_bytes);
+
+		dst_kaddr = kmap_local_folio(folio, dst_page * PAGE_SIZE);
+		memset((u8 *)dst_kaddr + dst_iter_offset,
+			value, iter_bytes);
+		kunmap_local(dst_kaddr);
+
+		processed_bytes += iter_bytes;
+	}
+
+	if (processed_bytes != set_size) {
+		SSDFS_ERR("processed_bytes %u != set_size %u\n",
+			  processed_bytes, set_size);
+		return -ERANGE;
+	}
+
+	flush_dcache_folio(folio);
+
+	return 0;
+}
+
+static inline
+int ssdfs_memset_folio(struct ssdfs_smart_folio *dst_folio,
+			int value, u32 set_size)
+{
+	u32 dst_off;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!dst_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	dst_off = dst_folio->desc.page_offset +
+			dst_folio->desc.offset_inside_page;
+
+	return __ssdfs_memset_folio(dst_folio->ptr,
+				    dst_off, dst_folio->desc.block_size,
+				    value, set_size);
+}
+
+static inline
+int __ssdfs_memzero_folio(struct folio *folio, u32 dst_off, u32 dst_size,
+			  u32 set_size)
+{
+	return __ssdfs_memset_folio(folio, dst_off, dst_size,
+				    0, set_size);
+}
+
+static inline
+int ssdfs_memzero_folio(struct ssdfs_smart_folio *dst_folio,
+			u32 set_size)
+{
+	u32 dst_off;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!dst_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	dst_off = dst_folio->desc.page_offset +
+				dst_folio->desc.offset_inside_page;
+
+	return __ssdfs_memzero_folio(dst_folio->ptr,
+				     dst_off, dst_folio->desc.block_size,
+				     set_size);
+}
+
+static inline
+u32 SSDFS_MEM_PAGES_PER_FOLIO(struct folio *folio)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return (u32)folio_size(folio) >> PAGE_SHIFT;
+}
+
+static inline
+u32 SSDFS_MEM_PAGES_PER_LOGICAL_BLOCK(struct ssdfs_fs_info *fsi)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return fsi->pagesize >> PAGE_SHIFT;
+}
+
+static inline
+bool is_ssdfs_file_inline(struct ssdfs_inode_info *ii)
+{
+	return atomic_read(&ii->private_flags) & SSDFS_INODE_HAS_INLINE_FILE;
+}
+
+static inline
+size_t ssdfs_inode_inline_file_capacity(struct inode *inode)
+{
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	size_t raw_inode_size;
+	size_t metadata_len;
+
+	raw_inode_size = ii->raw_inode_size;
+	metadata_len = offsetof(struct ssdfs_inode, internal);
+
+	if (raw_inode_size <= metadata_len) {
+		SSDFS_ERR("corrupted raw inode: "
+			  "raw_inode_size %zu, metadata_len %zu\n",
+			  raw_inode_size, metadata_len);
+		return 0;
+	}
+
+	return raw_inode_size - metadata_len;
+}
+
+/*
+ * __ssdfs_generate_name_hash() - generate a name's hash
+ * @name: pointer on the name's string
+ * @len: length of the name
+ * @inline_name_max_len: max length of inline name
+ */
+static inline
+u64 __ssdfs_generate_name_hash(const char *name, size_t len,
+				size_t inline_name_max_len)
+{
+	u32 hash32_lo, hash32_hi;
+	size_t copy_len;
+	u64 name_hash;
+	u32 diff = 0;
+	u8 symbol1, symbol2;
+	int i;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!name);
+
+	SSDFS_DBG("name %s, len %zu, inline_name_max_len %zu\n",
+		  name, len, inline_name_max_len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (len == 0) {
+		SSDFS_ERR("invalid len %zu\n", len);
+		return U64_MAX;
+	}
+
+	copy_len = min_t(size_t, len, inline_name_max_len);
+	hash32_lo = full_name_hash(NULL, name, copy_len);
+
+	if (len <= inline_name_max_len) {
+		hash32_hi = len;
+
+		for (i = 1; i < len; i++) {
+			symbol1 = (u8)name[i - 1];
+			symbol2 = (u8)name[i];
+			diff = 0;
+
+			if (symbol1 > symbol2)
+				diff = symbol1 - symbol2;
+			else
+				diff = symbol2 - symbol1;
+
+			hash32_hi += diff * symbol1;
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("hash32_hi %x, symbol1 %x, "
+				  "symbol2 %x, index %d, diff %u\n",
+				  hash32_hi, symbol1, symbol2,
+				  i, diff);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+	} else {
+		hash32_hi = full_name_hash(NULL,
+					   name + inline_name_max_len,
+					   len - copy_len);
+	}
+
+	name_hash = SSDFS_NAME_HASH(hash32_lo, hash32_hi);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("name %s, len %zu, name_hash %llx\n",
+		  name, len, name_hash);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return name_hash;
+}
+
+/*
+ * __is_ssdfs_segment_header_magic_valid() - check segment header's magic
+ * @magic: pointer on magic value
+ */
+static inline
+bool __is_ssdfs_segment_header_magic_valid(struct ssdfs_signature *magic)
+{
+	return le16_to_cpu(magic->key) == SSDFS_SEGMENT_HDR_MAGIC;
+}
+
+/*
+ * is_ssdfs_segment_header_magic_valid() - check segment header's magic
+ * @hdr: segment header
+ */
+static inline
+bool is_ssdfs_segment_header_magic_valid(struct ssdfs_segment_header *hdr)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!hdr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return __is_ssdfs_segment_header_magic_valid(&hdr->volume_hdr.magic);
+}
+
+/*
+ * is_ssdfs_partial_log_header_magic_valid() - check partial log header's magic
+ * @magic: pointer on magic value
+ */
+static inline
+bool is_ssdfs_partial_log_header_magic_valid(struct ssdfs_signature *magic)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!magic);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return le16_to_cpu(magic->key) == SSDFS_PARTIAL_LOG_HDR_MAGIC;
+}
+
+/*
+ * is_ssdfs_volume_header_csum_valid() - check volume header checksum
+ * @vh_buf: volume header buffer
+ * @buf_size: size of buffer in bytes
+ */
+static inline
+bool is_ssdfs_volume_header_csum_valid(void *vh_buf, size_t buf_size)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!vh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return is_csum_valid(&SSDFS_VH(vh_buf)->check, vh_buf, buf_size);
+}
+
+/*
+ * is_ssdfs_partial_log_header_csum_valid() - check partial log header checksum
+ * @plh_buf: partial log header buffer
+ * @buf_size: size of buffer in bytes
+ */
+static inline
+bool is_ssdfs_partial_log_header_csum_valid(void *plh_buf, size_t buf_size)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!plh_buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return is_csum_valid(&SSDFS_PLH(plh_buf)->check, plh_buf, buf_size);
+}
+
+/*
+ * __is_ssdfs_log_footer_magic_valid() - check log footer's magic
+ * @magic: pointer on magic value
+ */
+static inline
+bool __is_ssdfs_log_footer_magic_valid(struct ssdfs_signature *magic)
+{
+	return le16_to_cpu(magic->key) == SSDFS_LOG_FOOTER_MAGIC;
+}
+
+/*
+ * is_ssdfs_log_footer_magic_valid() - check log footer's magic
+ * @footer: log footer
+ */
+static inline
+bool is_ssdfs_log_footer_magic_valid(struct ssdfs_log_footer *footer)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!footer);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return __is_ssdfs_log_footer_magic_valid(&footer->volume_state.magic);
+}
+
+/*
+ * is_ssdfs_log_footer_csum_valid() - check log footer's checksum
+ * @buf: buffer with log footer
+ * @size: size of buffer in bytes
+ */
+static inline
+bool is_ssdfs_log_footer_csum_valid(void *buf, size_t buf_size)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return is_csum_valid(&SSDFS_LF(buf)->volume_state.check, buf, buf_size);
+}
+
+/*
+ * is_ssdfs_uuid_and_fs_ctime_actual() - check that UUID and create time are equal
+ * @fsi: shared file system info
+ * @buf: logical block's buffer
+ */
+static inline
+bool is_ssdfs_uuid_and_fs_ctime_actual(struct ssdfs_fs_info *fsi,
+					const void *buf)
+{
+	struct ssdfs_volume_header *vh;
+	struct ssdfs_signature *magic;
+	struct ssdfs_segment_header *seg_hdr = NULL;
+	struct ssdfs_partial_log_header *pl_hdr = NULL;
+	struct ssdfs_log_footer *footer = NULL;
+	__le8 *uuid = NULL;
+	u64 create_time = U64_MAX;
+	bool uuid_is_equal = false;
+	bool fs_ctime_is_equal = false;
+
+	vh = SSDFS_VH(buf);
+	magic = (struct ssdfs_signature *)buf;
+
+	if (is_ssdfs_magic_valid(&vh->magic)) {
+		if (__is_ssdfs_segment_header_magic_valid(magic)) {
+			seg_hdr = SSDFS_SEG_HDR(buf);
+			uuid = seg_hdr->volume_hdr.uuid;
+			create_time =
+				le64_to_cpu(seg_hdr->volume_hdr.create_time);
+		} else if (is_ssdfs_partial_log_header_magic_valid(magic)) {
+			pl_hdr = SSDFS_PLH(buf);
+			uuid = pl_hdr->uuid;
+			create_time = le64_to_cpu(pl_hdr->volume_create_time);
+		} else if (__is_ssdfs_log_footer_magic_valid(magic)) {
+			footer = SSDFS_LF(buf);
+			uuid = footer->volume_state.uuid;
+			create_time = U64_MAX;
+		} else
+			goto finish_check;
+
+		spin_lock(&fsi->volume_state_lock);
+		uuid_is_equal = is_uuids_identical((u8 *)uuid, fsi->fs_uuid);
+		spin_unlock(&fsi->volume_state_lock);
+
+		if (create_time != U64_MAX)
+			fs_ctime_is_equal = create_time == fsi->fs_ctime;
+		else
+			fs_ctime_is_equal = true;
+	}
+
+finish_check:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("uuid_is_equal %#x, fs_ctime_is_equal %#x\n",
+		  uuid_is_equal, fs_ctime_is_equal);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (uuid_is_equal && fs_ctime_is_equal)
+		return true;
+	else
+		return false;
+}
+
+/*
+ * ssdfs_compare_fs_ctime() - compare FS creation times
+ * @fsi: shared file system info
+ * @buf: log header buffer
+ */
+static inline
+int ssdfs_compare_fs_ctime(struct ssdfs_fs_info *fsi,
+			   const void *buf)
+{
+	struct ssdfs_signature *magic;
+	struct ssdfs_segment_header *seg_hdr = NULL;
+	struct ssdfs_partial_log_header *pl_hdr = NULL;
+	u64 create_time = U64_MAX;
+
+	magic = (struct ssdfs_signature *)buf;
+	BUG_ON(!is_ssdfs_magic_valid(magic));
+
+	if (__is_ssdfs_segment_header_magic_valid(magic)) {
+		seg_hdr = SSDFS_SEG_HDR(buf);
+		create_time = le64_to_cpu(seg_hdr->volume_hdr.create_time);
+	} else if (is_ssdfs_partial_log_header_magic_valid(magic)) {
+		pl_hdr = SSDFS_PLH(buf);
+		create_time = le64_to_cpu(pl_hdr->volume_create_time);
+	} else
+		BUG();
+
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("fsi->fs_ctime %llu, create_time %llu\n",
+		  fsi->fs_ctime, create_time);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+
+	if (fsi->fs_ctime < create_time)
+		return 1;
+	else if (fsi->fs_ctime > create_time)
+		return -1;
+	else
+		return 0;
+}
+
+static inline
+void ssdfs_increase_volume_free_pages(struct ssdfs_fs_info *fsi,
+				      u64 new_free_pages)
+{
+	u64 free_pages;
+	u64 volume_capacity;
+
+	mutex_lock(&fsi->resize_mutex);
+	volume_capacity = fsi->nsegs * fsi->pages_per_seg;
+	mutex_unlock(&fsi->resize_mutex);
+
+	spin_lock(&fsi->volume_state_lock);
+	free_pages = fsi->free_pages;
+	if ((fsi->free_pages + new_free_pages) < volume_capacity)
+		fsi->free_pages += new_free_pages;
+	else
+		fsi->free_pages = volume_capacity;
+	new_free_pages = fsi->free_pages;
+	spin_unlock(&fsi->volume_state_lock);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("free_pages %llu, new_free_pages %llu, "
+		  "volume_capacity %llu\n",
+		  free_pages, new_free_pages,
+		  volume_capacity);
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+#define SSDFS_LOG_FOOTER_OFF(seg_hdr)({ \
+	u32 offset; \
+	int index; \
+	struct ssdfs_metadata_descriptor *desc; \
+	index = SSDFS_LOG_FOOTER_INDEX; \
+	desc = &SSDFS_SEG_HDR(seg_hdr)->desc_array[index]; \
+	offset = le32_to_cpu(desc->offset); \
+	offset; \
+})
+
+#define SSDFS_WAITED_TOO_LONG_MSECS		(SSDFS_DEFAULT_TIMEOUT / 2)
+
+static inline
+void ssdfs_check_jiffies_left_till_timeout(unsigned long value)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	unsigned int msecs;
+
+	msecs = jiffies_to_msecs(SSDFS_DEFAULT_TIMEOUT - value);
+	if (msecs >= SSDFS_WAITED_TOO_LONG_MSECS)
+		SSDFS_WARN("function waited %u msecs\n", msecs);
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+#define SSDFS_WAIT_COMPLETION(end)({ \
+	unsigned long res; \
+	int err = 0; \
+	res = wait_for_completion_timeout(end, SSDFS_DEFAULT_TIMEOUT); \
+	if (res == 0) { \
+		err = -ERANGE; \
+	} else { \
+		ssdfs_check_jiffies_left_till_timeout(res); \
+	} \
+	err; \
+})
+
+#define SSDFS_FSI(ptr) \
+	((struct ssdfs_fs_info *)(ptr))
+#define SSDFS_BLKT(ptr) \
+	((struct ssdfs_area_block_table *)(ptr))
+#define SSDFS_FRAGD(ptr) \
+	((struct ssdfs_fragment_desc *)(ptr))
+#define SSDFS_BLKD(ptr) \
+	((struct ssdfs_block_descriptor *)(ptr))
+#define SSDFS_BLKSTOFF(ptr) \
+	((struct ssdfs_blk_state_offset *)(ptr))
+#define SSDFS_STNODE_HDR(ptr) \
+	((struct ssdfs_segment_tree_node_header *)(ptr))
+#define SSDFS_SNRU_HDR(ptr) \
+	((struct ssdfs_snapshot_rules_header *)(ptr))
+#define SSDFS_SNRU_INFO(ptr) \
+	((struct ssdfs_snapshot_rule_info *)(ptr))
+#define SSDFS_RAW_FORK(ptr) \
+	((struct ssdfs_raw_fork *)(ptr))
+
+#define SSDFS_LEB2SEG(fsi, leb) \
+	((u64)ssdfs_get_seg_id_for_leb_id(fsi, leb))
+
+#endif /* _SSDFS_INLINE_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 04/79] ssdfs: implement raw device operations
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (2 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 03/79] ssdfs: add key file system's function declarations Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 06/79] ssdfs: implement super operations Viacheslav Dubeyko
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

Implement raw device operations:
(1) device_name: get device name
(2) device_size: get device size in bytes
(3) open_zone: open zone
(4) reopen_zone: reopen closed zone
(5) close_zone: close zone
(6) read: read from device
(7) read_block: read logical block
(8) read_blocks: read sequence of logical blocks
(9) can_write_block: can we write into logical block?
(10) write_block: write logical block to device
(11) write_blocks: write sequence of logical blocks to device
(12) erase: erase the whole erase block
(13) trim: support of background erase operation
(14) sync: synchronize page cache with device

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/dev_bdev.c | 1065 ++++++++++++++++++++++++++++++++++
 fs/ssdfs/dev_mtd.c  |  650 +++++++++++++++++++++
 fs/ssdfs/dev_zns.c  | 1344 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 3059 insertions(+)
 create mode 100644 fs/ssdfs/dev_bdev.c
 create mode 100644 fs/ssdfs/dev_mtd.c
 create mode 100644 fs/ssdfs/dev_zns.c

diff --git a/fs/ssdfs/dev_bdev.c b/fs/ssdfs/dev_bdev.c
new file mode 100644
index 000000000000..13da78eadd12
--- /dev/null
+++ b/fs/ssdfs/dev_bdev.c
@@ -0,0 +1,1065 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/dev_bdev.c - Block device access code.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+#include <linux/pagevec.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/backing-dev.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "folio_vector.h"
+#include "ssdfs.h"
+
+#include <trace/events/ssdfs.h>
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+atomic64_t ssdfs_dev_bdev_folio_leaks;
+atomic64_t ssdfs_dev_bdev_memory_leaks;
+atomic64_t ssdfs_dev_bdev_cache_leaks;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+/*
+ * void ssdfs_dev_bdev_cache_leaks_increment(void *kaddr)
+ * void ssdfs_dev_bdev_cache_leaks_decrement(void *kaddr)
+ * void *ssdfs_dev_bdev_kmalloc(size_t size, gfp_t flags)
+ * void *ssdfs_dev_bdev_kzalloc(size_t size, gfp_t flags)
+ * void *ssdfs_dev_bdev_kcalloc(size_t n, size_t size, gfp_t flags)
+ * void ssdfs_dev_bdev_kfree(void *kaddr)
+ * struct folio *ssdfs_dev_bdev_alloc_folio(gfp_t gfp_mask,
+ *                                          unsigned int order)
+ * struct folio *ssdfs_dev_bdev_add_batch_folio(struct folio_batch *batch,
+ *                                              unsigned int order)
+ * void ssdfs_dev_bdev_free_folio(struct folio *folio)
+ * void ssdfs_dev_bdev_folio_batch_release(struct folio_batch *batch)
+ */
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_MEMORY_LEAKS_CHECKER_FNS(dev_bdev)
+#else
+	SSDFS_MEMORY_ALLOCATOR_FNS(dev_bdev)
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+void ssdfs_dev_bdev_memory_leaks_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_dev_bdev_folio_leaks, 0);
+	atomic64_set(&ssdfs_dev_bdev_memory_leaks, 0);
+	atomic64_set(&ssdfs_dev_bdev_cache_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+void ssdfs_dev_bdev_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_dev_bdev_folio_leaks) != 0) {
+		SSDFS_ERR("BLOCK DEV: "
+			  "memory leaks include %lld folios\n",
+			  atomic64_read(&ssdfs_dev_bdev_folio_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_dev_bdev_memory_leaks) != 0) {
+		SSDFS_ERR("BLOCK DEV: "
+			  "memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_dev_bdev_memory_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_dev_bdev_cache_leaks) != 0) {
+		SSDFS_ERR("BLOCK DEV: "
+			  "caches suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_dev_bdev_cache_leaks));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static DECLARE_WAIT_QUEUE_HEAD(wq);
+
+/*
+ * ssdfs_bdev_device_name() - get device name
+ * @sb: superblock object
+ */
+static const char *ssdfs_bdev_device_name(struct super_block *sb)
+{
+	return sb->s_id;
+}
+
+/*
+ * ssdfs_bdev_device_size() - get partition size in bytes
+ * @sb: superblock object
+ */
+static __u64 ssdfs_bdev_device_size(struct super_block *sb)
+{
+	return i_size_read(sb->s_bdev->bd_mapping->host);
+}
+
+static int ssdfs_bdev_open_zone(struct super_block *sb, loff_t offset)
+{
+	return -EOPNOTSUPP;
+}
+
+static int ssdfs_bdev_reopen_zone(struct super_block *sb, loff_t offset)
+{
+	return -EOPNOTSUPP;
+}
+
+static int ssdfs_bdev_close_zone(struct super_block *sb, loff_t offset)
+{
+	return -EOPNOTSUPP;
+}
+
+/*
+ * ssdfs_bdev_bio_alloc() - allocate bio object
+ * @bdev: block device
+ * @nr_iovecs: number of items in biovec
+ * @op: direction of I/O
+ * @gfp_mask: mask of creation flags
+ */
+struct bio *ssdfs_bdev_bio_alloc(struct block_device *bdev,
+				 unsigned int nr_iovecs,
+				 unsigned int op,
+				 gfp_t gfp_mask)
+{
+	struct bio *bio;
+
+	bio = bio_alloc(bdev, nr_iovecs, op, gfp_mask);
+	if (!bio) {
+		SSDFS_ERR("fail to allocate bio\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return bio;
+}
+
+/*
+ * ssdfs_bdev_bio_put() - free bio object
+ */
+void ssdfs_bdev_bio_put(struct bio *bio)
+{
+	if (!bio)
+		return;
+
+	bio_put(bio);
+}
+
+/*
+ * ssdfs_bdev_bio_add_folio() - add folio into bio
+ * @bio: pointer on bio object
+ * @folio: memory folio
+ * @offset: vec entry offset
+ */
+int ssdfs_bdev_bio_add_folio(struct bio *bio, struct folio *folio,
+			     unsigned int offset)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!bio || !folio);
+
+	SSDFS_DBG("folio %p, count %d\n",
+		  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!bio_add_folio(bio, folio, folio_size(folio), offset)) {
+		SSDFS_ERR("fail to add folio: "
+			  "offset %u, size %zu\n",
+			  offset, folio_size(folio));
+		return -ERANGE;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_bdev_sync_folio_request() - submit folio request
+ * @sb: superblock object
+ * @folio: memory folio
+ * @offset: offset in bytes from partition's begin
+ * @op: direction of I/O
+ * @op_flags: request op flags
+ */
+static int ssdfs_bdev_sync_folio_request(struct super_block *sb,
+					 struct folio *folio,
+					 loff_t offset,
+					 unsigned int op, int op_flags)
+{
+	struct bio *bio;
+	loff_t folio_index;
+	sector_t sector;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	folio_index = div_u64(offset, folio_size(folio));
+	sector = (pgoff_t)(((u64)folio_index * folio_size(folio)) >>
+								SECTOR_SHIFT);
+
+	bio = ssdfs_bdev_bio_alloc(sb->s_bdev, 1, op, GFP_NOIO);
+	if (IS_ERR_OR_NULL(bio)) {
+		err = !bio ? -ERANGE : PTR_ERR(bio);
+		SSDFS_ERR("fail to allocate bio: err %d\n",
+			  err);
+		return err;
+	}
+
+	bio->bi_iter.bi_sector = sector;
+	bio_set_dev(bio, sb->s_bdev);
+	bio->bi_opf = op | op_flags;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio %p, count %d\n",
+		  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_bdev_bio_add_folio(bio, folio, 0);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to add folio into bio: "
+			  "err %d\n",
+			  err);
+		goto finish_sync_folio_request;
+	}
+
+	err = submit_bio_wait(bio);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to process request: "
+			  "err %d\n",
+			  err);
+		goto finish_sync_folio_request;
+	}
+
+finish_sync_folio_request:
+	ssdfs_bdev_bio_put(bio);
+
+	return err;
+}
+
+/*
+ * ssdfs_bdev_sync_batch_request() - submit folio batch request
+ * @sb: superblock object
+ * @batch: folio batch
+ * @offset: offset in bytes from partition's begin
+ * @op: direction of I/O
+ * @op_flags: request op flags
+ */
+static int ssdfs_bdev_sync_batch_request(struct super_block *sb,
+					 struct folio_batch *batch,
+					 loff_t offset,
+					 unsigned int op, int op_flags)
+{
+	struct bio *bio;
+	loff_t folio_index;
+	sector_t sector;
+	u32 block_size;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!batch);
+
+	SSDFS_DBG("offset %llu, op %#x, op_flags %#x\n",
+		  offset, op, op_flags);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (folio_batch_count(batch) == 0) {
+		SSDFS_WARN("empty folio batch\n");
+		return -ERANGE;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!batch->folios[0]);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	block_size = folio_size(batch->folios[0]);
+
+	folio_index = div_u64(offset, block_size);
+	sector = (pgoff_t)(((u64)folio_index * block_size) >> SECTOR_SHIFT);
+
+	bio = ssdfs_bdev_bio_alloc(sb->s_bdev, folio_batch_count(batch),
+				   op, GFP_NOIO);
+	if (IS_ERR_OR_NULL(bio)) {
+		err = !bio ? -ERANGE : PTR_ERR(bio);
+		SSDFS_ERR("fail to allocate bio: err %d\n",
+			  err);
+		return err;
+	}
+
+	bio->bi_iter.bi_sector = sector;
+	bio_set_dev(bio, sb->s_bdev);
+	bio->bi_opf = op | op_flags;
+
+	for (i = 0; i < folio_batch_count(batch); i++) {
+		struct folio *folio = batch->folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!folio);
+
+		SSDFS_DBG("folio %p, count %d\n",
+			  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_bdev_bio_add_folio(bio, folio, 0);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to add folio %d into bio: "
+				  "err %d\n",
+				  i, err);
+			goto finish_sync_batch_request;
+		}
+	}
+
+	err = submit_bio_wait(bio);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to process request: "
+			  "err %d\n",
+			  err);
+		goto finish_sync_batch_request;
+	}
+
+finish_sync_batch_request:
+	ssdfs_bdev_bio_put(bio);
+
+	return err;
+}
+
+/*
+ * ssdfs_bdev_read_block() - read logical block from the volume
+ * @sb: superblock object
+ * @folio: memory folio
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to read data on @offset
+ * from partition's begin in memory folio.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+int ssdfs_bdev_read_block(struct super_block *sb, struct folio *folio,
+			  loff_t offset)
+{
+	int err;
+
+	err = ssdfs_bdev_sync_folio_request(sb, folio, offset,
+					    REQ_OP_READ, REQ_SYNC);
+	if (err) {
+		folio_clear_uptodate(folio);
+	} else {
+		folio_mark_uptodate(folio);
+		flush_dcache_folio(folio);
+	}
+
+	ssdfs_folio_unlock(folio);
+
+	return err;
+}
+
+/*
+ * ssdfs_bdev_read_blocks() - read logical blocks from the volume
+ * @sb: superblock object
+ * @batch: folio batch
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to read data on @offset
+ * from partition's begin in folio batch.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+int ssdfs_bdev_read_blocks(struct super_block *sb, struct folio_batch *batch,
+			   loff_t offset)
+{
+	int i;
+	int err = 0;
+
+	err = ssdfs_bdev_sync_batch_request(sb, batch, offset,
+					    REQ_OP_READ, REQ_RAHEAD);
+
+	for (i = 0; i < folio_batch_count(batch); i++) {
+		struct folio *folio = batch->folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (err) {
+			folio_clear_uptodate(folio);
+		} else {
+			folio_mark_uptodate(folio);
+			flush_dcache_folio(folio);
+		}
+
+		ssdfs_folio_unlock(folio);
+	}
+
+	return err;
+}
+
+/*
+ * ssdfs_bdev_read_batch() - read from volume into buffer
+ * @sb: superblock object
+ * @block_size: block size in bytes
+ * @offset: offset in bytes from partition's begin
+ * @len: size of buffer in bytes
+ * @buf: buffer
+ * @read_bytes: pointer on read bytes [out]
+ *
+ * This function tries to read data on @offset
+ * from partition's begin with @len bytes in size
+ * from the volume into @buf.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+static int ssdfs_bdev_read_batch(struct super_block *sb,
+				 u32 block_size,
+				 loff_t offset, size_t len,
+				 void *buf, size_t *read_bytes)
+{
+	struct folio_batch batch;
+	struct folio *folio;
+	loff_t folio_start, folio_end;
+	u32 folios_count;
+	u32 read_len;
+	loff_t cur_offset = offset;
+	u32 offset_inside_folio;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, block_size %u, offset %llu, len %zu, buf %p\n",
+		  sb, block_size, (unsigned long long)offset, len, buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*read_bytes = 0;
+
+	folio_start = div_u64(offset, block_size);
+	folio_end = div_u64(offset + len + block_size - 1, block_size);
+	folios_count = (u32)(folio_end - folio_start);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("offset %llu, len %zu, block_size %u, "
+		  "folio_start %llu, folio_end %llu, folios_count %u\n",
+		  (unsigned long long)offset, len, block_size,
+		  (unsigned long long)folio_start,
+		  (unsigned long long)folio_end,
+		  folios_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (folios_count > SSDFS_EXTENT_LEN_MAX) {
+		SSDFS_WARN("folios_count %u > batch_capacity %u, "
+			   "offset %llu, len %zu, block_size %u, "
+			   "folio_start %llu, folio_end %llu\n",
+			   folios_count, SSDFS_EXTENT_LEN_MAX,
+			   (unsigned long long)offset, len,
+			   block_size, folio_start, folio_end);
+		return -ERANGE;
+	}
+
+	folio_batch_init(&batch);
+
+	for (i = 0; i < folios_count; i++) {
+		folio = ssdfs_dev_bdev_alloc_folio(GFP_KERNEL | __GFP_ZERO,
+						   get_order(block_size));
+		if (IS_ERR_OR_NULL(folio)) {
+			err = (folio == NULL ? -ENOMEM : PTR_ERR(folio));
+			SSDFS_ERR("unable to allocate memory folio\n");
+			goto finish_bdev_read_batch;
+		}
+
+		ssdfs_folio_get(folio);
+		ssdfs_folio_lock(folio);
+		folio_batch_add(&batch, folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("folio %p, count %d\n",
+			  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+	err = ssdfs_bdev_sync_batch_request(sb, &batch, offset,
+					    REQ_OP_READ, REQ_SYNC);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to read folio batch: err %d\n",
+			  err);
+		goto finish_bdev_read_batch;
+	}
+
+	for (i = 0; i < folio_batch_count(&batch); i++) {
+		folio = batch.folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (*read_bytes >= len) {
+			err = -ERANGE;
+			SSDFS_ERR("read_bytes %zu >= len %zu\n",
+				  *read_bytes, len);
+			goto finish_bdev_read_batch;
+		}
+
+		div_u64_rem(cur_offset, block_size, &offset_inside_folio);
+		read_len = min_t(size_t, (size_t)(block_size -
+							offset_inside_folio),
+					 (size_t)(len - *read_bytes));
+
+		err = __ssdfs_memcpy_from_folio(buf, *read_bytes,
+						len,
+						folio, offset_inside_folio,
+						block_size,
+						read_len);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to copy: err %d\n", err);
+			goto finish_bdev_read_batch;
+		}
+
+		*read_bytes += read_len;
+		cur_offset += read_len;
+	}
+
+finish_bdev_read_batch:
+	for (i = folio_batch_count(&batch) - 1; i >= 0; i--) {
+		folio = batch.folios[i];
+
+		if (folio) {
+			ssdfs_folio_unlock(folio);
+			ssdfs_folio_put(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("folio %p, count %d\n",
+				  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			ssdfs_dev_bdev_free_folio(folio);
+			batch.folios[i] = NULL;
+		}
+	}
+
+	folio_batch_reinit(&batch);
+
+	if (*read_bytes != len) {
+		err = -EIO;
+		SSDFS_ERR("read_bytes (%zu) != len (%zu)\n",
+			  *read_bytes, len);
+	}
+
+	return err;
+}
+
+/*
+ * ssdfs_bdev_read() - read from volume into buffer
+ * @sb: superblock object
+ * @block_size: block size in bytes
+ * @offset: offset in bytes from partition's begin
+ * @len: size of buffer in bytes
+ * @buf: buffer
+ *
+ * This function tries to read data on @offset
+ * from partition's begin with @len bytes in size
+ * from the volume into @buf.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+int ssdfs_bdev_read(struct super_block *sb, u32 block_size,
+		    loff_t offset, size_t len, void *buf)
+{
+	size_t read_bytes = 0;
+	loff_t cur_offset = offset;
+	u8 *ptr = (u8 *)buf;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, block_size %u, offset %llu, len %zu, buf %p\n",
+		  sb, block_size, (unsigned long long)offset, len, buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (len == 0) {
+		SSDFS_WARN("len is zero\n");
+		return 0;
+	}
+
+	while (read_bytes < len) {
+		size_t iter_read;
+
+		err = ssdfs_bdev_read_batch(sb, block_size,
+					    cur_offset,
+					    len - read_bytes,
+					    ptr,
+					    &iter_read);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to read batch: "
+				  "block_size %u, cur_offset %llu, "
+				  "read_bytes %zu, err %d\n",
+				  block_size, cur_offset,
+				  read_bytes, err);
+			return err;
+		}
+
+		cur_offset += iter_read;
+		ptr += iter_read;
+		read_bytes += iter_read;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_bdev_can_write_block() - check that logical block can be written
+ * @sb: superblock object
+ * @block_size: size of block in bytes
+ * @offset: offset in bytes from partition's begin
+ * @need_check: make check or not?
+ *
+ * This function checks that logical block can be written.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-ENOMEM      - fail to allocate memory.
+ * %-EIO         - I/O error.
+ */
+int ssdfs_bdev_can_write_block(struct super_block *sb, u32 block_size,
+				loff_t offset, bool need_check)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct ssdfs_signature *magic;
+	void *buf;
+	bool is_ssdfs_log_found;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, block_size %u, need_check %d\n",
+		  sb, (unsigned long long)offset,
+		  block_size, (int)need_check);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!need_check)
+		return 0;
+
+	buf = ssdfs_dev_bdev_kzalloc(block_size, GFP_KERNEL);
+	if (!buf) {
+		SSDFS_ERR("unable to allocate %d bytes\n", block_size);
+		return -ENOMEM;
+	}
+
+	err = ssdfs_bdev_read(sb, block_size, offset, block_size, buf);
+	if (err)
+		goto free_buf;
+
+	if (memchr_inv(buf, 0xff, block_size)) {
+		if (memchr_inv(buf, 0x00, block_size)) {
+			magic = (struct ssdfs_signature *)buf;
+
+			is_ssdfs_log_found =
+				__is_ssdfs_segment_header_magic_valid(magic) ||
+				is_ssdfs_partial_log_header_magic_valid(magic) ||
+				__is_ssdfs_log_footer_magic_valid(magic);
+
+			if (is_ssdfs_log_found &&
+			    is_ssdfs_uuid_and_fs_ctime_actual(fsi, buf)) {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("area with offset %llu contains data\n",
+					  (unsigned long long)offset);
+
+				SSDFS_DBG("PAGE DUMP:\n");
+				print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+						     buf,
+						     block_size);
+				SSDFS_DBG("\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+				err = -EIO;
+			} else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("area with offset %llu contains data\n",
+					  (unsigned long long)offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+				err = -EIO;
+			}
+		}
+	}
+
+free_buf:
+	ssdfs_dev_bdev_kfree(buf);
+	return err;
+}
+
+/*
+ * ssdfs_bdev_write_block() - write logical block to volume
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @folio: memory folio
+ *
+ * This function tries to write from @folio data
+ * on @offset from partition's begin.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EIO         - I/O error.
+ */
+int ssdfs_bdev_write_block(struct super_block *sb, loff_t offset,
+			   struct folio *folio)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+#ifdef CONFIG_SSDFS_DEBUG
+	u32 remainder;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, folio %p\n",
+		  sb, offset, folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (sb->s_flags & SB_RDONLY) {
+		SSDFS_WARN("unable to write on RO file system\n");
+		return -EROFS;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!folio);
+	BUG_ON((offset >= ssdfs_bdev_device_size(sb)) ||
+		(folio_size(folio) > (ssdfs_bdev_device_size(sb) - offset)));
+	div_u64_rem((u64)offset, (u64)folio_size(folio), &remainder);
+	BUG_ON(remainder);
+	BUG_ON(!folio_test_dirty(folio));
+	BUG_ON(folio_test_locked(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_folio_lock(folio);
+	atomic_inc(&fsi->pending_bios);
+
+	err = ssdfs_bdev_sync_folio_request(sb, folio, offset,
+					    REQ_OP_WRITE, REQ_SYNC);
+	if (err) {
+		SSDFS_ERR("failed to write (err %d): offset %llu\n",
+			  err, (unsigned long long)offset);
+	} else {
+		ssdfs_clear_dirty_folio(folio);
+		folio_mark_uptodate(folio);
+	}
+
+	ssdfs_folio_unlock(folio);
+	ssdfs_folio_put(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio %p, count %d\n",
+		  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (atomic_dec_and_test(&fsi->pending_bios))
+		wake_up_all(&wq);
+
+	return err;
+}
+
+/*
+ * ssdfs_bdev_write_blocks() - write batch on volume
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @batch: memory folios batch
+ *
+ * This function tries to write from @batch data
+ * on @offset from partition's beginning.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EIO         - I/O error.
+ */
+int ssdfs_bdev_write_blocks(struct super_block *sb, loff_t offset,
+			    struct folio_batch *batch)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct folio *folio;
+	int i;
+#ifdef CONFIG_SSDFS_DEBUG
+	u32 remainder;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, batch %p\n",
+		  sb, offset, batch);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (sb->s_flags & SB_RDONLY) {
+		SSDFS_WARN("unable to write on RO file system\n");
+		return -EROFS;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!batch);
+	BUG_ON(offset >= ssdfs_bdev_device_size(sb));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (folio_batch_count(batch) == 0) {
+		SSDFS_WARN("empty batch\n");
+		return 0;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	div_u64_rem((u64)offset, (u64)folio_size(batch->folios[0]), &remainder);
+	BUG_ON(remainder);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (i = 0; i < folio_batch_count(batch); i++) {
+		folio = batch->folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!folio);
+
+		SSDFS_DBG("folio_index %lu, folio_size %zu, "
+			  "folio_dirty %#x\n",
+			  folio->index, folio_size(folio),
+			  folio_test_dirty(folio));
+
+		BUG_ON(!folio_test_dirty(folio));
+		BUG_ON(folio_test_locked(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		ssdfs_folio_lock(folio);
+	}
+
+	atomic_inc(&fsi->pending_bios);
+
+	err = ssdfs_bdev_sync_batch_request(sb, batch, offset,
+					    REQ_OP_WRITE, REQ_SYNC);
+
+	for (i = 0; i < folio_batch_count(batch); i++) {
+		folio = batch->folios[i];
+
+		if (err) {
+			SSDFS_ERR("failed to write (err %d): "
+				  "folio_index %llu\n",
+				  err,
+				  (unsigned long long)folio->index);
+		} else {
+			ssdfs_clear_dirty_folio(folio);
+			folio_mark_uptodate(folio);
+		}
+
+		ssdfs_folio_unlock(folio);
+		ssdfs_folio_put(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("folio %p, count %d\n",
+			  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+	if (atomic_dec_and_test(&fsi->pending_bios))
+		wake_up_all(&wq);
+
+	return err;
+}
+
+/*
+ * ssdfs_bdev_support_discard() - check that block device supports discard
+ */
+static inline bool ssdfs_bdev_support_discard(struct block_device *bdev)
+{
+	return bdev_max_discard_sectors(bdev) ||
+		bdev_is_zoned(bdev);
+}
+
+/*
+ * ssdfs_bdev_trim() - initiate background erase operation
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @len: size in bytes
+ *
+ * This function tries to initiate background erase operation.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EFAULT      - erase operation error.
+ */
+static int ssdfs_bdev_trim(struct super_block *sb, loff_t offset, size_t len)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	u32 erase_size = fsi->erasesize;
+	loff_t page_start, page_end;
+	u32 pages_count;
+	u32 remainder;
+	sector_t start_sector;
+	sector_t sectors_count;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, len %zu\n",
+		  sb, (unsigned long long)offset, len);
+
+	div_u64_rem((u64)len, (u64)erase_size, &remainder);
+	BUG_ON(remainder);
+	div_u64_rem((u64)offset, (u64)erase_size, &remainder);
+	BUG_ON(remainder);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (sb->s_flags & SB_RDONLY)
+		return -EROFS;
+
+	div_u64_rem((u64)len, (u64)erase_size, &remainder);
+	if (remainder) {
+		SSDFS_WARN("len %llu, erase_size %u, remainder %u\n",
+			   (unsigned long long)len,
+			   erase_size, remainder);
+		return -ERANGE;
+	}
+
+	page_start = offset >> PAGE_SHIFT;
+	page_end = (offset + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	pages_count = (u32)(page_end - page_start);
+
+	if (pages_count == 0) {
+		SSDFS_WARN("pages_count equals to zero\n");
+		return -ERANGE;
+	}
+
+	start_sector = page_start << (PAGE_SHIFT - SSDFS_SECTOR_SHIFT);
+	sectors_count = pages_count << (PAGE_SHIFT - SSDFS_SECTOR_SHIFT);
+
+	if (ssdfs_bdev_support_discard(sb->s_bdev)) {
+		err = blkdev_issue_secure_erase(sb->s_bdev,
+						start_sector, sectors_count,
+						GFP_NOFS);
+		if (err)
+			goto try_zeroout;
+	} else {
+try_zeroout:
+		err = blkdev_issue_zeroout(sb->s_bdev,
+					   start_sector, sectors_count,
+					   GFP_NOFS, 0);
+	}
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to discard: "
+			  "start_sector %llu, sectors_count %llu, "
+			  "err %d\n",
+			  start_sector, sectors_count, err);
+		return err;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_bdev_erase() - make erase operation
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @len: size in bytes
+ *
+ * This function tries to make erase operation.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EFAULT      - erase operation error.
+ */
+static int ssdfs_bdev_erase(struct super_block *sb, loff_t offset, size_t len)
+{
+	return ssdfs_bdev_trim(sb, offset, len);
+}
+
+/*
+ * ssdfs_bdev_peb_isbad() - check that PEB is bad
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to detect that PEB is bad or not.
+ */
+static int ssdfs_bdev_peb_isbad(struct super_block *sb, loff_t offset)
+{
+	/* do nothing */
+	return 0;
+}
+
+/*
+ * ssdfs_bdev_mark_peb_bad() - mark PEB as bad
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to mark PEB as bad.
+ */
+static int ssdfs_bdev_mark_peb_bad(struct super_block *sb, loff_t offset)
+{
+	/* do nothing */
+	return 0;
+}
+
+/*
+ * ssdfs_bdev_sync() - make sync operation
+ * @sb: superblock object
+ */
+static void ssdfs_bdev_sync(struct super_block *sb)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("device %s\n", sb->s_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	wait_event(wq, atomic_read(&fsi->pending_bios) == 0);
+}
+
+const struct ssdfs_device_ops ssdfs_bdev_devops = {
+	.device_name		= ssdfs_bdev_device_name,
+	.device_size		= ssdfs_bdev_device_size,
+	.open_zone		= ssdfs_bdev_open_zone,
+	.reopen_zone		= ssdfs_bdev_reopen_zone,
+	.close_zone		= ssdfs_bdev_close_zone,
+	.read			= ssdfs_bdev_read,
+	.read_block		= ssdfs_bdev_read_block,
+	.read_blocks		= ssdfs_bdev_read_blocks,
+	.can_write_block	= ssdfs_bdev_can_write_block,
+	.write_block		= ssdfs_bdev_write_block,
+	.write_blocks		= ssdfs_bdev_write_blocks,
+	.erase			= ssdfs_bdev_erase,
+	.trim			= ssdfs_bdev_trim,
+	.peb_isbad		= ssdfs_bdev_peb_isbad,
+	.mark_peb_bad		= ssdfs_bdev_mark_peb_bad,
+	.sync			= ssdfs_bdev_sync,
+};
diff --git a/fs/ssdfs/dev_mtd.c b/fs/ssdfs/dev_mtd.c
new file mode 100644
index 000000000000..ccb79c7f81bf
--- /dev/null
+++ b/fs/ssdfs/dev_mtd.c
@@ -0,0 +1,650 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/dev_mtd.c - MTD device access code.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+#include <linux/mtd/mtd.h>
+#include <linux/mtd/super.h>
+#include <linux/pagevec.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "folio_vector.h"
+#include "ssdfs.h"
+
+#include <trace/events/ssdfs.h>
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+atomic64_t ssdfs_dev_mtd_folio_leaks;
+atomic64_t ssdfs_dev_mtd_memory_leaks;
+atomic64_t ssdfs_dev_mtd_cache_leaks;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+/*
+ * void ssdfs_dev_mtd_cache_leaks_increment(void *kaddr)
+ * void ssdfs_dev_mtd_cache_leaks_decrement(void *kaddr)
+ * void *ssdfs_dev_mtd_kmalloc(size_t size, gfp_t flags)
+ * void *ssdfs_dev_mtd_kzalloc(size_t size, gfp_t flags)
+ * void *ssdfs_dev_mtd_kcalloc(size_t n, size_t size, gfp_t flags)
+ * void ssdfs_dev_mtd_kfree(void *kaddr)
+ * struct folio *ssdfs_dev_mtd_alloc_folio(gfp_t gfp_mask,
+ *                                         unsigned int order)
+ * struct folio *ssdfs_dev_mtd_add_batch_folio(struct folio_batch *batch,
+ *                                             unsigned int order)
+ * void ssdfs_dev_mtd_free_folio(struct folio *folio)
+ * void ssdfs_dev_mtd_folio_batch_release(struct folio_batch *batch)
+ */
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_MEMORY_LEAKS_CHECKER_FNS(dev_mtd)
+#else
+	SSDFS_MEMORY_ALLOCATOR_FNS(dev_mtd)
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+void ssdfs_dev_mtd_memory_leaks_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_dev_mtd_folio_leaks, 0);
+	atomic64_set(&ssdfs_dev_mtd_memory_leaks, 0);
+	atomic64_set(&ssdfs_dev_mtd_cache_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+void ssdfs_dev_mtd_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_dev_mtd_folio_leaks) != 0) {
+		SSDFS_ERR("MTD DEV: "
+			  "memory leaks include %lld folios\n",
+			  atomic64_read(&ssdfs_dev_mtd_folio_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_dev_mtd_memory_leaks) != 0) {
+		SSDFS_ERR("MTD DEV: "
+			  "memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_dev_mtd_memory_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_dev_mtd_cache_leaks) != 0) {
+		SSDFS_ERR("MTD DEV: "
+			  "caches suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_dev_mtd_cache_leaks));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+/*
+ * ssdfs_mtd_device_name() - get device name
+ * @sb: superblock object
+ */
+static const char *ssdfs_mtd_device_name(struct super_block *sb)
+{
+	return sb->s_mtd->name;
+}
+
+/*
+ * ssdfs_mtd_device_size() - get partition size in bytes
+ * @sb: superblock object
+ */
+static __u64 ssdfs_mtd_device_size(struct super_block *sb)
+{
+	return SSDFS_FS_I(sb)->mtd->size;
+}
+
+static int ssdfs_mtd_open_zone(struct super_block *sb, loff_t offset)
+{
+	return -EOPNOTSUPP;
+}
+
+static int ssdfs_mtd_reopen_zone(struct super_block *sb, loff_t offset)
+{
+	return -EOPNOTSUPP;
+}
+
+static int ssdfs_mtd_close_zone(struct super_block *sb, loff_t offset)
+{
+	return -EOPNOTSUPP;
+}
+
+/*
+ * ssdfs_mtd_read() - read from volume into buffer
+ * @sb: superblock object
+ * @block_size: block size in bytes
+ * @offset: offset in bytes from partition's begin
+ * @len: size of buffer in bytes
+ * @buf: buffer
+ *
+ * This function tries to read data on @offset
+ * from partition's begin with @len bytes in size
+ * from the volume into @buf.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+static int ssdfs_mtd_read(struct super_block *sb, u32 block_size,
+			  loff_t offset, size_t len, void *buf)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct mtd_info *mtd = fsi->mtd;
+	loff_t folio_index;
+	size_t retlen;
+	int ret;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, block_size %u, offset %llu, len %zu, buf %p\n",
+		  sb, block_size, (unsigned long long)offset, len, buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	folio_index = div_u64(offset, block_size);
+	offset = folio_index * block_size;
+
+	ret = mtd_read(mtd, offset, len, &retlen, buf);
+	if (ret) {
+		SSDFS_ERR("failed to read (err %d): offset %llu, len %zu\n",
+			  ret, (unsigned long long)offset, len);
+		return ret;
+	}
+
+	if (retlen != len) {
+		SSDFS_ERR("retlen (%zu) != len (%zu)\n", retlen, len);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_mtd_read_block() - read block from the volume
+ * @sb: superblock object
+ * @folio: memory folio
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to read data on @offset
+ * from partition's begin in memory folio.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+static int ssdfs_mtd_read_block(struct super_block *sb, struct folio *folio,
+				loff_t offset)
+{
+	void *kaddr;
+	u32 processed_bytes = 0;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, folio %p, folio_index %llu\n",
+		  sb, (unsigned long long)offset, folio,
+		  (unsigned long long)folio->index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	while (processed_bytes < folio_size(folio)) {
+		kaddr = kmap_local_folio(folio, processed_bytes);
+		err = ssdfs_mtd_read(sb, PAGE_SIZE,
+				     offset + processed_bytes,
+				     PAGE_SIZE, kaddr);
+		kunmap_local(kaddr);
+
+		if (err) {
+			folio_clear_uptodate(folio);
+			break;
+		}
+
+		processed_bytes += PAGE_SIZE;
+	};
+
+	if (!err) {
+		folio_mark_uptodate(folio);
+		flush_dcache_folio(folio);
+	}
+
+	ssdfs_folio_unlock(folio);
+
+	return err;
+}
+
+/*
+ * ssdfs_mtd_read_blocks() - read logical blocks from the volume
+ * @sb: superblock object
+ * @batch: memory folios batch
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to read data on @offset
+ * from partition's begin in memory folios.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+static int ssdfs_mtd_read_blocks(struct super_block *sb,
+				 struct folio_batch *batch,
+				 loff_t offset)
+{
+	struct folio *folio;
+	loff_t cur_offset = offset;
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, batch %p\n",
+		  sb, (unsigned long long)offset, batch);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (folio_batch_count(batch) == 0) {
+		SSDFS_WARN("empty folio batch\n");
+		return 0;
+	}
+
+	for (i = 0; i < folio_batch_count(batch); i++) {
+		folio = batch->folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_mtd_read_block(sb, folio, cur_offset);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to read block: "
+				  "cur_offset %llu, err %d\n",
+				  cur_offset, err);
+			return err;
+		}
+
+		cur_offset += folio_size(folio);
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_mtd_can_write_block() - check that logical block can be written
+ * @sb: superblock object
+ * @block_size: block size in bytes
+ * @offset: offset in bytes from partition's begin
+ * @need_check: make check or not?
+ *
+ * This function checks that logical block can be written.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-ENOMEM      - fail to allocate memory.
+ * %-EIO         - I/O error.
+ */
+static int ssdfs_mtd_can_write_block(struct super_block *sb, u32 block_size,
+				     loff_t offset, bool need_check)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct ssdfs_signature *magic;
+	void *buf;
+	bool is_ssdfs_log_found;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, block_size %u, need_check %d\n",
+		  sb, (unsigned long long)offset,
+		  block_size, (int)need_check);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!need_check)
+		return 0;
+
+	buf = ssdfs_dev_mtd_kzalloc(block_size, GFP_KERNEL);
+	if (!buf) {
+		SSDFS_ERR("unable to allocate %d bytes\n", block_size);
+		return -ENOMEM;
+	}
+
+	err = ssdfs_mtd_read(sb, block_size, offset, block_size, buf);
+	if (err)
+		goto free_buf;
+
+	if (memchr_inv(buf, 0xff, block_size)) {
+		if (memchr_inv(buf, 0x00, block_size)) {
+			magic = (struct ssdfs_signature *)buf;
+
+			is_ssdfs_log_found =
+				__is_ssdfs_segment_header_magic_valid(magic) ||
+				is_ssdfs_partial_log_header_magic_valid(magic) ||
+				__is_ssdfs_log_footer_magic_valid(magic);
+
+			if (is_ssdfs_log_found &&
+			    is_ssdfs_uuid_and_fs_ctime_actual(fsi, buf)) {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("area with offset %llu contains data\n",
+					  (unsigned long long)offset);
+
+				SSDFS_DBG("PAGE DUMP:\n");
+				print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+						     buf,
+						     block_size);
+				SSDFS_DBG("\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+				err = -EIO;
+			} else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("area with offset %llu contains data\n",
+					  (unsigned long long)offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+				err = -EIO;
+			}
+		}
+	}
+
+free_buf:
+	ssdfs_dev_mtd_kfree(buf);
+	return err;
+}
+
+/*
+ * ssdfs_mtd_write_block() - write logical block to volume
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's beginning
+ * @folio: memory folio
+ *
+ * This function tries to write from @folio data
+ * on @offset from partition's beginning.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EIO         - I/O error.
+ */
+static int ssdfs_mtd_write_folio(struct super_block *sb, loff_t offset,
+				 struct folio *folio)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct mtd_info *mtd = fsi->mtd;
+	size_t retlen;
+	unsigned char *kaddr;
+	int ret;
+#ifdef CONFIG_SSDFS_DEBUG
+	u32 remainder;
+#endif /* CONFIG_SSDFS_DEBUG */
+	u32 written_bytes = 0;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, folio %p\n",
+		  sb, offset, folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (sb->s_flags & SB_RDONLY) {
+		SSDFS_WARN("unable to write on RO file system\n");
+		return -EROFS;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!folio);
+	BUG_ON((offset >= mtd->size) ||
+		(folio_size(folio) > (mtd->size - offset)));
+	div_u64_rem((u64)offset, (u64)folio_size(folio), &remainder);
+	BUG_ON(remainder);
+	BUG_ON(!folio_test_dirty(folio));
+	BUG_ON(folio_test_locked(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_folio_lock(folio);
+
+	while (written_bytes < folio_size(folio)) {
+		kaddr = kmap_local_folio(folio, written_bytes);
+		ret = mtd_write(mtd, offset + written_bytes, PAGE_SIZE,
+				&retlen, kaddr);
+		kunmap_local(kaddr);
+
+		if (ret || (retlen != PAGE_SIZE)) {
+			SSDFS_ERR("failed to write (err %d): offset %llu, "
+				  "len %zu, retlen %zu\n",
+				  ret, (unsigned long long)offset,
+				  PAGE_SIZE, retlen);
+			err = -EIO;
+			break;
+		}
+
+		written_bytes += PAGE_SIZE;
+	}
+
+	if (!err) {
+		ssdfs_clear_dirty_folio(folio);
+		folio_mark_uptodate(folio);
+	}
+
+	ssdfs_folio_unlock(folio);
+	ssdfs_folio_put(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio %p, count %d\n",
+		  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * ssdfs_mtd_write_blocks() - write logical blocks to volume
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's beginning
+ * @batch: memory folios batch
+ *
+ * This function tries to write from @batch data
+ * to @offset from partition's beginning.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EIO         - I/O error.
+ */
+static int ssdfs_mtd_write_blocks(struct super_block *sb, loff_t offset,
+				  struct folio_batch *batch)
+{
+	struct folio *folio;
+	loff_t cur_offset = offset;
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, batch %p\n",
+		  sb, offset, batch);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (sb->s_flags & SB_RDONLY) {
+		SSDFS_WARN("unable to write on RO file system\n");
+		return -EROFS;
+	}
+
+	if (folio_batch_count(batch) == 0) {
+		SSDFS_WARN("empty folio batch\n");
+		return 0;
+	}
+
+	for (i = 0; i < folio_batch_count(batch); i++) {
+		folio = batch->folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_mtd_write_folio(sb, cur_offset, folio);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to write block: "
+				  "cur_offset %llu, err %d\n",
+				  cur_offset, err);
+			return err;
+		}
+
+		cur_offset += folio_size(folio);
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_mtd_erase() - make erase operation
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @len: size in bytes
+ *
+ * This function tries to make erase operation.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EFAULT      - erase operation error.
+ */
+static int ssdfs_mtd_erase(struct super_block *sb, loff_t offset, size_t len)
+{
+	struct mtd_info *mtd = SSDFS_FS_I(sb)->mtd;
+	struct erase_info ei;
+	u32 remainder;
+	int ret;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, len %zu\n",
+		  sb, (unsigned long long)offset, len);
+
+	div_u64_rem((u64)len, (u64)mtd->erasesize, &remainder);
+	BUG_ON(remainder);
+	div_u64_rem((u64)offset, (u64)mtd->erasesize, &remainder);
+	BUG_ON(remainder);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (sb->s_flags & SB_RDONLY)
+		return -EROFS;
+
+	div_u64_rem((u64)len, (u64)mtd->erasesize, &remainder);
+	if (remainder) {
+		SSDFS_WARN("len %llu, erase_size %u, remainder %u\n",
+			   (unsigned long long)len,
+			   mtd->erasesize, remainder);
+		return -ERANGE;
+	}
+
+	memset(&ei, 0, sizeof(ei));
+	ei.addr = offset;
+	ei.len = len;
+
+	ret = mtd_erase(mtd, &ei);
+	if (ret) {
+		SSDFS_ERR("failed to erase (err %d): offset %llu, len %zu\n",
+			  ret, (unsigned long long)offset, len);
+		return ret;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_mtd_trim() - initiate background erase operation
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @len: size in bytes
+ *
+ * This function tries to initiate background erase operation.
+ * Currently, it is the same operation as foreground erase.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EFAULT      - erase operation error.
+ */
+static int ssdfs_mtd_trim(struct super_block *sb, loff_t offset, size_t len)
+{
+	return ssdfs_mtd_erase(sb, offset, len);
+}
+
+/*
+ * ssdfs_mtd_peb_isbad() - check that PEB is bad
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to detect that PEB is bad or not.
+ */
+static int ssdfs_mtd_peb_isbad(struct super_block *sb, loff_t offset)
+{
+	return mtd_block_isbad(SSDFS_FS_I(sb)->mtd, offset);
+}
+
+/*
+ * ssdfs_mtd_mark_peb_bad() - mark PEB as bad
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to mark PEB as bad.
+ */
+static int ssdfs_mtd_mark_peb_bad(struct super_block *sb, loff_t offset)
+{
+	return mtd_block_markbad(SSDFS_FS_I(sb)->mtd, offset);
+}
+
+/*
+ * ssdfs_mtd_sync() - make sync operation
+ * @sb: superblock object
+ */
+static void ssdfs_mtd_sync(struct super_block *sb)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("device %d (\"%s\")\n",
+		  fsi->mtd->index, fsi->mtd->name);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	mtd_sync(fsi->mtd);
+}
+
+const struct ssdfs_device_ops ssdfs_mtd_devops = {
+	.device_name		= ssdfs_mtd_device_name,
+	.device_size		= ssdfs_mtd_device_size,
+	.open_zone		= ssdfs_mtd_open_zone,
+	.reopen_zone		= ssdfs_mtd_reopen_zone,
+	.close_zone		= ssdfs_mtd_close_zone,
+	.read			= ssdfs_mtd_read,
+	.read_block		= ssdfs_mtd_read_block,
+	.read_blocks		= ssdfs_mtd_read_blocks,
+	.can_write_block	= ssdfs_mtd_can_write_block,
+	.write_block		= ssdfs_mtd_write_folio,
+	.write_blocks		= ssdfs_mtd_write_blocks,
+	.erase			= ssdfs_mtd_erase,
+	.trim			= ssdfs_mtd_trim,
+	.peb_isbad		= ssdfs_mtd_peb_isbad,
+	.mark_peb_bad		= ssdfs_mtd_mark_peb_bad,
+	.sync			= ssdfs_mtd_sync,
+};
diff --git a/fs/ssdfs/dev_zns.c b/fs/ssdfs/dev_zns.c
new file mode 100644
index 000000000000..f2afe0038f9b
--- /dev/null
+++ b/fs/ssdfs/dev_zns.c
@@ -0,0 +1,1344 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/dev_zns.c - ZNS SSD support.
+ *
+ * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates.
+ *              https://www.bytedance.com/
+ * Copyright (c) 2022-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cong Wang
+ */
+
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+#include <linux/pagevec.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/backing-dev.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "folio_vector.h"
+#include "ssdfs.h"
+
+#include <trace/events/ssdfs.h>
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+atomic64_t ssdfs_dev_zns_folio_leaks;
+atomic64_t ssdfs_dev_zns_memory_leaks;
+atomic64_t ssdfs_dev_zns_cache_leaks;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+/*
+ * void ssdfs_dev_zns_cache_leaks_increment(void *kaddr)
+ * void ssdfs_dev_zns_cache_leaks_decrement(void *kaddr)
+ * void *ssdfs_dev_zns_kmalloc(size_t size, gfp_t flags)
+ * void *ssdfs_dev_zns_kzalloc(size_t size, gfp_t flags)
+ * void *ssdfs_dev_zns_kcalloc(size_t n, size_t size, gfp_t flags)
+ * void ssdfs_dev_zns_kfree(void *kaddr)
+ * struct page *ssdfs_dev_zns_alloc_page(gfp_t gfp_mask)
+ * struct page *ssdfs_dev_zns_add_pagevec_page(struct pagevec *pvec)
+ * void ssdfs_dev_zns_free_page(struct page *page)
+ * void ssdfs_dev_zns_pagevec_release(struct pagevec *pvec)
+ */
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_MEMORY_LEAKS_CHECKER_FNS(dev_zns)
+#else
+	SSDFS_MEMORY_ALLOCATOR_FNS(dev_zns)
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+void ssdfs_dev_zns_memory_leaks_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_dev_zns_folio_leaks, 0);
+	atomic64_set(&ssdfs_dev_zns_memory_leaks, 0);
+	atomic64_set(&ssdfs_dev_zns_cache_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+void ssdfs_dev_zns_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_dev_zns_folio_leaks) != 0) {
+		SSDFS_ERR("ZNS DEV: "
+			  "memory leaks include %lld folios\n",
+			  atomic64_read(&ssdfs_dev_zns_folio_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_dev_zns_memory_leaks) != 0) {
+		SSDFS_ERR("ZNS DEV: "
+			  "memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_dev_zns_memory_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_dev_zns_cache_leaks) != 0) {
+		SSDFS_ERR("ZNS DEV: "
+			  "caches suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_dev_zns_cache_leaks));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static DECLARE_WAIT_QUEUE_HEAD(zns_wq);
+
+/*
+ * ssdfs_zns_device_name() - get device name
+ * @sb: superblock object
+ */
+static const char *ssdfs_zns_device_name(struct super_block *sb)
+{
+	return sb->s_id;
+}
+
+/*
+ * ssdfs_zns_device_size() - get partition size in bytes
+ * @sb: superblock object
+ */
+static __u64 ssdfs_zns_device_size(struct super_block *sb)
+{
+	return i_size_read(sb->s_bdev->bd_mapping->host);
+}
+
+static int ssdfs_report_zone(struct blk_zone *zone,
+			     unsigned int index, void *data)
+{
+	ssdfs_memcpy(data, 0, sizeof(struct blk_zone),
+		     zone, 0, sizeof(struct blk_zone),
+		     sizeof(struct blk_zone));
+	return 0;
+}
+
+/*
+ * ssdfs_zns_open_zone() - open zone
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ */
+static int ssdfs_zns_open_zone(struct super_block *sb, loff_t offset)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	sector_t zone_size = fsi->erasesize >> SECTOR_SHIFT;
+	u32 open_zones;
+	unsigned int nofs_flags;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu\n",
+		  sb, (unsigned long long)offset);
+	SSDFS_DBG("BEFORE: open_zones %d\n",
+		  atomic_read(&fsi->open_zones));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	nofs_flags = memalloc_nofs_save();
+	err = blkdev_zone_mgmt(sb->s_bdev, REQ_OP_ZONE_OPEN,
+				zone_sector, zone_size);
+	memalloc_nofs_restore(nofs_flags);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to open zone: "
+			  "zone_sector %llu, zone_size %llu, "
+			  "open_zones %u, max_open_zones %u, "
+			  "err %d\n",
+			  zone_sector, zone_size,
+			  open_zones, fsi->max_open_zones,
+			  err);
+		return err;
+	}
+
+	open_zones = atomic_inc_return(&fsi->open_zones);
+	if (open_zones > fsi->max_open_zones) {
+		atomic_dec(&fsi->open_zones);
+
+		SSDFS_WARN("open zones limit achieved: "
+			   "open_zones %u\n", open_zones);
+		return -EBUSY;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("AFTER: open_zones %d\n",
+		   atomic_read(&fsi->open_zones));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+/*
+ * ssdfs_zns_reopen_zone() - reopen closed zone
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ */
+static int ssdfs_zns_reopen_zone(struct super_block *sb, loff_t offset)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct blk_zone zone;
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	sector_t zone_size = fsi->erasesize >> SECTOR_SHIFT;
+	unsigned int nofs_flags;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu\n",
+		  sb, (unsigned long long)offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (err != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, err);
+		return err;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("zone before: start %llu, len %llu, wp %llu, "
+		  "type %#x, cond %#x, non_seq %#x, "
+		  "reset %#x, capacity %llu\n",
+		  zone.start, zone.len, zone.wp,
+		  zone.type, zone.cond, zone.non_seq,
+		  zone.reset, zone.capacity);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (zone.cond) {
+	case BLK_ZONE_COND_CLOSED:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is closed: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		/* continue logic */
+		break;
+
+	case BLK_ZONE_COND_READONLY:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is READ-ONLY: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	case BLK_ZONE_COND_FULL:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is full: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	case BLK_ZONE_COND_OFFLINE:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is offline: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	default:
+		/* continue logic */
+		break;
+	}
+
+	nofs_flags = memalloc_nofs_save();
+	err = blkdev_zone_mgmt(sb->s_bdev, REQ_OP_ZONE_OPEN,
+				zone_sector, zone_size);
+	memalloc_nofs_restore(nofs_flags);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to open zone: "
+			  "zone_sector %llu, zone_size %llu, "
+			  "err %d\n",
+			  zone_sector, zone_size,
+			  err);
+		return err;
+	}
+
+	err = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (err != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, err);
+		return err;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("zone after: start %llu, len %llu, wp %llu, "
+		  "type %#x, cond %#x, non_seq %#x, "
+		  "reset %#x, capacity %llu\n",
+		  zone.start, zone.len, zone.wp,
+		  zone.type, zone.cond, zone.non_seq,
+		  zone.reset, zone.capacity);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (zone.cond) {
+	case BLK_ZONE_COND_CLOSED:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is closed: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	case BLK_ZONE_COND_READONLY:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is READ-ONLY: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	case BLK_ZONE_COND_FULL:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is full: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	case BLK_ZONE_COND_OFFLINE:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is offline: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	default:
+		/* continue logic */
+		break;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_zns_close_zone() - close zone
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ */
+static int ssdfs_zns_close_zone(struct super_block *sb, loff_t offset)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	sector_t zone_size = fsi->erasesize >> SECTOR_SHIFT;
+	u32 open_zones;
+	unsigned int nofs_flags;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu\n",
+		  sb, (unsigned long long)offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	nofs_flags = memalloc_nofs_save();
+	err = blkdev_zone_mgmt(sb->s_bdev, REQ_OP_ZONE_FINISH,
+				zone_sector, zone_size);
+	memalloc_nofs_restore(nofs_flags);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to open zone: "
+			  "zone_sector %llu, zone_size %llu, err %d\n",
+			  zone_sector, zone_size, err);
+		return err;
+	}
+
+	open_zones = atomic_dec_return(&fsi->open_zones);
+	if (open_zones > fsi->max_open_zones) {
+		SSDFS_WARN("open zones limit exhausted: "
+			   "open_zones %u\n", open_zones);
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_zns_zone_size() - retrieve zone size
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to retrieve zone size.
+ */
+u64 ssdfs_zns_zone_size(struct super_block *sb, loff_t offset)
+{
+	struct blk_zone zone;
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	int res;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu\n",
+		  sb, (unsigned long long)offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+		return U64_MAX;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("zone: start %llu, len %llu, wp %llu, "
+		  "type %#x, cond %#x, non_seq %#x, "
+		  "reset %#x, capacity %llu\n",
+		  zone.start, zone.len, zone.wp,
+		  zone.type, zone.cond, zone.non_seq,
+		  zone.reset, zone.capacity);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return (u64)zone.len << SECTOR_SHIFT;
+}
+
+/*
+ * ssdfs_zns_zone_capacity() - retrieve zone capacity
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to retrieve zone capacity.
+ */
+u64 ssdfs_zns_zone_capacity(struct super_block *sb, loff_t offset)
+{
+	struct blk_zone zone;
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	int res;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu\n",
+		  sb, (unsigned long long)offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+		return U64_MAX;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("zone: start %llu, len %llu, wp %llu, "
+		  "type %#x, cond %#x, non_seq %#x, "
+		  "reset %#x, capacity %llu\n",
+		  zone.start, zone.len, zone.wp,
+		  zone.type, zone.cond, zone.non_seq,
+		  zone.reset, zone.capacity);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return (u64)zone.capacity << SECTOR_SHIFT;
+}
+
+/*
+ * ssdfs_zns_zone_write_pointer() - retrieve zone's write pointer
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to retrieve zone's write pointer.
+ */
+u64 ssdfs_zns_zone_write_pointer(struct super_block *sb, loff_t offset)
+{
+	struct blk_zone zone;
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	int res;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu\n",
+		  sb, (unsigned long long)offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+		return U64_MAX;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("zone: start %llu, len %llu, wp %llu, "
+		  "type %#x, cond %#x, non_seq %#x, "
+		  "reset %#x, capacity %llu\n",
+		  zone.start, zone.len, zone.wp,
+		  zone.type, zone.cond, zone.non_seq,
+		  zone.reset, zone.capacity);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (zone.wp >= (zone.start + zone.capacity)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is closed: "
+			  "start %llu, len %llu, "
+			  "wp %llu, type %#x, cond %#x, non_seq %#x, "
+			  "reset %#x, capacity %llu\n",
+			  zone.start, zone.len, zone.wp,
+			  zone.type, zone.cond, zone.non_seq,
+			  zone.reset, zone.capacity);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return U64_MAX;
+	}
+
+	return (u64)zone.wp << SECTOR_SHIFT;
+}
+
+/*
+ * ssdfs_zns_sync_folio_request() - submit folio request
+ * @sb: superblock object
+ * @folio: memory folio
+ * @zone_start: first sector of zone
+ * @offset: offset in bytes from partition's begin
+ * @op: direction of I/O
+ * @op_flags: request op flags
+ */
+static int ssdfs_zns_sync_folio_request(struct super_block *sb,
+					struct folio *folio,
+					sector_t zone_start,
+					loff_t offset,
+					unsigned int op, int op_flags)
+{
+	struct bio *bio;
+#ifdef CONFIG_SSDFS_DEBUG
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	struct blk_zone zone;
+	int res;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+	op |= REQ_OP_ZONE_APPEND | REQ_IDLE;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!folio);
+
+	SSDFS_DBG("offset %llu, zone_start %llu, "
+		  "op %#x, op_flags %#x\n",
+		  offset, zone_start, op, op_flags);
+
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+	} else {
+		SSDFS_DBG("zone: start %llu, len %llu, wp %llu, "
+			  "type %#x, cond %#x, non_seq %#x, "
+			  "reset %#x, capacity %llu\n",
+			  zone.start, zone.len, zone.wp,
+			  zone.type, zone.cond, zone.non_seq,
+			  zone.reset, zone.capacity);
+	}
+
+	BUG_ON(zone_start != zone.start);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	bio = ssdfs_bdev_bio_alloc(sb->s_bdev, 1, op, GFP_NOFS);
+	if (IS_ERR_OR_NULL(bio)) {
+		err = !bio ? -ERANGE : PTR_ERR(bio);
+		SSDFS_ERR("fail to allocate bio: err %d\n",
+			  err);
+		return err;
+	}
+
+	bio->bi_iter.bi_sector = zone_start;
+	bio_set_dev(bio, sb->s_bdev);
+	bio->bi_opf = op | op_flags;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio %p, count %d\n",
+		  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_bdev_bio_add_folio(bio, folio, 0);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to add folio into bio: "
+			  "err %d\n",
+			  err);
+		goto finish_sync_folio_request;
+	}
+
+	err = submit_bio_wait(bio);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to process request: "
+			  "err %d\n",
+			  err);
+		goto finish_sync_folio_request;
+	}
+
+finish_sync_folio_request:
+	ssdfs_bdev_bio_put(bio);
+
+	return err;
+}
+
+/*
+ * ssdfs_zns_sync_batch_request() - submit folio batch request
+ * @sb: superblock object
+ * @batch: folio batch
+ * @zone_start: first sector of zone
+ * @offset: offset in bytes from partition's begin
+ * @op: direction of I/O
+ * @op_flags: request op flags
+ */
+static int ssdfs_zns_sync_batch_request(struct super_block *sb,
+					struct folio_batch *batch,
+					sector_t zone_start,
+					loff_t offset,
+					unsigned int op, int op_flags)
+{
+	struct bio *bio;
+	int i;
+#ifdef CONFIG_SSDFS_DEBUG
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	struct blk_zone zone;
+	int res;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+	op |= REQ_OP_ZONE_APPEND | REQ_IDLE;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!batch);
+
+	SSDFS_DBG("offset %llu, zone_start %llu, "
+		  "op %#x, op_flags %#x\n",
+		  offset, zone_start, op, op_flags);
+
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+	} else {
+		SSDFS_DBG("zone: start %llu, len %llu, wp %llu, "
+			  "type %#x, cond %#x, non_seq %#x, "
+			  "reset %#x, capacity %llu\n",
+			  zone.start, zone.len, zone.wp,
+			  zone.type, zone.cond, zone.non_seq,
+			  zone.reset, zone.capacity);
+	}
+
+	BUG_ON(zone_start != zone.start);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (folio_batch_count(batch) == 0) {
+		SSDFS_WARN("empty folio batch\n");
+		return 0;
+	}
+
+	bio = ssdfs_bdev_bio_alloc(sb->s_bdev, folio_batch_count(batch),
+				   op, GFP_NOFS);
+	if (IS_ERR_OR_NULL(bio)) {
+		err = !bio ? -ERANGE : PTR_ERR(bio);
+		SSDFS_ERR("fail to allocate bio: err %d\n",
+			  err);
+		return err;
+	}
+
+	bio->bi_iter.bi_sector = zone_start;
+	bio_set_dev(bio, sb->s_bdev);
+	bio->bi_opf = op | op_flags;
+
+	for (i = 0; i < folio_batch_count(batch); i++) {
+		struct folio *folio = batch->folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!folio);
+
+		SSDFS_DBG("folio %p, count %d\n",
+			  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_bdev_bio_add_folio(bio, folio, 0);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to add folio %d into bio: "
+				  "err %d\n",
+				  i, err);
+			goto finish_sync_batch_request;
+		}
+	}
+
+	err = submit_bio_wait(bio);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to process request: "
+			  "err %d\n",
+			  err);
+		goto finish_sync_batch_request;
+	}
+
+finish_sync_batch_request:
+	ssdfs_bdev_bio_put(bio);
+
+	return err;
+}
+
+/*
+ * ssdfs_zns_read_block() - read logical block from the volume
+ * @sb: superblock object
+ * @folio: memory folio
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to read data on @offset
+ * from partition's beginning in memory folio.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+static int ssdfs_zns_read_block(struct super_block *sb, struct folio *folio,
+				loff_t offset)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	struct blk_zone zone;
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	int res;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu\n",
+		  sb, (unsigned long long)offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_bdev_read_block(sb, folio, offset);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+	} else {
+		SSDFS_DBG("zone: start %llu, len %llu, wp %llu, "
+			  "type %#x, cond %#x, non_seq %#x, "
+			  "reset %#x, capacity %llu\n",
+			  zone.start, zone.len, zone.wp,
+			  zone.type, zone.cond, zone.non_seq,
+			  zone.reset, zone.capacity);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * ssdfs_zns_read_blocks() - read logical blocks from the volume
+ * @sb: superblock object
+ * @batch: folio batch
+ * @offset: offset in bytes from partition's beginning
+ *
+ * This function tries to read data on @offset
+ * from partition's beginning.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+static
+int ssdfs_zns_read_blocks(struct super_block *sb, struct folio_batch *batch,
+			  loff_t offset)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	struct blk_zone zone;
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	int res;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu\n",
+		  sb, (unsigned long long)offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_bdev_read_blocks(sb, batch, offset);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+	} else {
+		SSDFS_DBG("zone: start %llu, len %llu, wp %llu, "
+			  "type %#x, cond %#x, non_seq %#x, "
+			  "reset %#x, capacity %llu\n",
+			  zone.start, zone.len, zone.wp,
+			  zone.type, zone.cond, zone.non_seq,
+			  zone.reset, zone.capacity);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * ssdfs_zns_read() - read from volume into buffer
+ * @sb: superblock object
+ * @block_size: block size in bytes
+ * @offset: offset in bytes from partition's begin
+ * @len: size of buffer in bytes
+ * @buf: buffer
+ *
+ * This function tries to read data on @offset
+ * from partition's begin with @len bytes in size
+ * from the volume into @buf.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EIO         - I/O error.
+ */
+static
+int ssdfs_zns_read(struct super_block *sb, u32 block_size,
+		   loff_t offset, size_t len, void *buf)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	struct blk_zone zone;
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	int res;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, block_size %u, offset %llu, len %zu, buf %p\n",
+		  sb, block_size, (unsigned long long)offset, len, buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_bdev_read(sb, block_size, offset, len, buf);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+	} else {
+		SSDFS_DBG("zone: start %llu, len %llu, wp %llu, "
+			  "type %#x, cond %#x, non_seq %#x, "
+			  "reset %#x, capacity %llu\n",
+			  zone.start, zone.len, zone.wp,
+			  zone.type, zone.cond, zone.non_seq,
+			  zone.reset, zone.capacity);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * ssdfs_zns_can_write_block() - check that logical block can be written
+ * @sb: superblock object
+ * @block_size: block size in bytes
+ * @offset: offset in bytes from partition's beginning
+ * @need_check: make check or not?
+ *
+ * This function checks that logical block can be written.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-ENOMEM      - fail to allocate memory.
+ * %-EIO         - I/O error.
+ */
+static int ssdfs_zns_can_write_block(struct super_block *sb, u32 block_size,
+				     loff_t offset, bool need_check)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct blk_zone zone;
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	sector_t zone_size = fsi->erasesize >> SECTOR_SHIFT;
+	u64 peb_id;
+	loff_t zone_offset;
+	int res;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, block_size %u, need_check %d\n",
+		  sb, (unsigned long long)offset,
+		  block_size, (int)need_check);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!need_check)
+		return 0;
+
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+		return res;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("zone before: start %llu, len %llu, wp %llu, "
+		  "type %#x, cond %#x, non_seq %#x, "
+		  "reset %#x, capacity %llu\n",
+		  zone.start, zone.len, zone.wp,
+		  zone.type, zone.cond, zone.non_seq,
+		  zone.reset, zone.capacity);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (zone.type) {
+	case BLK_ZONE_TYPE_CONVENTIONAL:
+		return ssdfs_bdev_can_write_block(sb, block_size,
+						  offset, need_check);
+
+	default:
+		/*
+		 * BLK_ZONE_TYPE_SEQWRITE_REQ
+		 * BLK_ZONE_TYPE_SEQWRITE_PREF
+		 *
+		 * continue logic
+		 */
+		break;
+	}
+
+	switch (zone.cond) {
+	case BLK_ZONE_COND_NOT_WP:
+		return ssdfs_bdev_can_write_block(sb, block_size,
+						  offset, need_check);
+
+	case BLK_ZONE_COND_EMPTY:
+		/* can write */
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is empty: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return 0;
+
+	case BLK_ZONE_COND_CLOSED:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is closed: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		peb_id = offset / fsi->erasesize;
+		zone_offset = peb_id * fsi->erasesize;
+
+		err = ssdfs_zns_reopen_zone(sb, zone_offset);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to reopen zone: "
+				  "zone_offset %llu, zone_size %llu, "
+				  "err %d\n",
+				  zone_offset, zone_size, err);
+			return err;
+		}
+
+		return 0;
+
+	case BLK_ZONE_COND_READONLY:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is READ-ONLY: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	case BLK_ZONE_COND_FULL:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is full: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	case BLK_ZONE_COND_OFFLINE:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("zone is offline: offset %llu\n",
+			  offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EIO;
+
+	default:
+		/* continue logic */
+		break;
+	}
+
+	if (zone_sector < zone.wp) {
+		err = -EIO;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("cannot be written: "
+			  "zone_sector %llu, zone.wp %llu\n",
+			  zone_sector, zone.wp);
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+	} else {
+		SSDFS_DBG("zone after: start %llu, len %llu, wp %llu, "
+			  "type %#x, cond %#x, non_seq %#x, "
+			  "reset %#x, capacity %llu\n",
+			  zone.start, zone.len, zone.wp,
+			  zone.type, zone.cond, zone.non_seq,
+			  zone.reset, zone.capacity);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * ssdfs_zns_write_block() - write logical block to volume
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @folio: memory folio
+ *
+ * This function tries to write from @folio data
+ * on @offset from partition's beginning.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EIO         - I/O error.
+ */
+static
+int ssdfs_zns_write_block(struct super_block *sb, loff_t offset,
+			  struct folio *folio)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	loff_t zone_start;
+#ifdef CONFIG_SSDFS_DEBUG
+	struct blk_zone zone;
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	u32 remainder;
+	int res;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, folio %p\n",
+		  sb, offset, folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (sb->s_flags & SB_RDONLY) {
+		SSDFS_WARN("unable to write on RO file system\n");
+		return -EROFS;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!folio);
+	BUG_ON((offset >= ssdfs_zns_device_size(sb)) ||
+		(folio_size(folio) > (ssdfs_zns_device_size(sb) - offset)));
+	div_u64_rem((u64)offset, (u64)folio_size(folio), &remainder);
+	BUG_ON(remainder);
+	BUG_ON(!folio_test_dirty(folio));
+	BUG_ON(folio_test_locked(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_folio_lock(folio);
+	atomic_inc(&fsi->pending_bios);
+
+	zone_start = (offset / fsi->erasesize) * fsi->erasesize;
+	zone_start >>= SECTOR_SHIFT;
+
+	err = ssdfs_zns_sync_folio_request(sb, folio, zone_start, offset,
+					   REQ_OP_WRITE, REQ_SYNC);
+	if (err) {
+		SSDFS_ERR("failed to write (err %d): offset %llu\n",
+			  err, (unsigned long long)offset);
+	} else {
+		ssdfs_clear_dirty_folio(folio);
+		folio_mark_uptodate(folio);
+	}
+
+	ssdfs_folio_unlock(folio);
+	ssdfs_folio_put(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio %p, count %d\n",
+		  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (atomic_dec_and_test(&fsi->pending_bios))
+		wake_up_all(&zns_wq);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+	} else {
+		SSDFS_DBG("zone: start %llu, len %llu, wp %llu, "
+			  "type %#x, cond %#x, non_seq %#x, "
+			  "reset %#x, capacity %llu\n",
+			  zone.start, zone.len, zone.wp,
+			  zone.type, zone.cond, zone.non_seq,
+			  zone.reset, zone.capacity);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * ssdfs_zns_write_blocks() - write folio batch to volume
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's beginning
+ * @batch: folio batch
+ *
+ * This function tries to write from @batch data
+ * on @offset from partition's beginning.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EIO         - I/O error.
+ */
+static
+int ssdfs_zns_write_blocks(struct super_block *sb, loff_t offset,
+			   struct folio_batch *batch)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct folio *folio;
+	loff_t zone_start;
+	int i;
+#ifdef CONFIG_SSDFS_DEBUG
+	struct blk_zone zone;
+	sector_t zone_sector = offset >> SECTOR_SHIFT;
+	u32 remainder;
+	int res;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, batch %p\n",
+		  sb, offset, batch);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (sb->s_flags & SB_RDONLY) {
+		SSDFS_WARN("unable to write on RO file system\n");
+		return -EROFS;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!batch);
+	BUG_ON(offset >= ssdfs_zns_device_size(sb));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (folio_batch_count(batch) == 0) {
+		SSDFS_WARN("empty folio batch\n");
+		return 0;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	div_u64_rem((u64)offset, (u64)folio_size(batch->folios[0]), &remainder);
+	BUG_ON(remainder);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (i = 0; i < folio_batch_count(batch); i++) {
+		folio = batch->folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!folio);
+		BUG_ON(!folio_test_dirty(folio));
+		BUG_ON(folio_test_locked(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		ssdfs_folio_lock(folio);
+	}
+
+	atomic_inc(&fsi->pending_bios);
+
+	zone_start = (offset / fsi->erasesize) * fsi->erasesize;
+	zone_start >>= SECTOR_SHIFT;
+
+	err = ssdfs_zns_sync_batch_request(sb, batch, zone_start, offset,
+					   REQ_OP_WRITE, REQ_SYNC);
+
+	for (i = 0; i < folio_batch_count(batch); i++) {
+		folio = batch->folios[i];
+
+		if (err) {
+			SSDFS_ERR("failed to write (err %d): "
+				  "folio_index %llu\n",
+				  err,
+				  (unsigned long long)folio->index);
+		} else {
+			ssdfs_clear_dirty_folio(folio);
+			folio_mark_uptodate(folio);
+		}
+
+		ssdfs_folio_unlock(folio);
+		ssdfs_folio_put(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("folio %p, count %d\n",
+			  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+	if (atomic_dec_and_test(&fsi->pending_bios))
+		wake_up_all(&zns_wq);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	res = blkdev_report_zones(sb->s_bdev, zone_sector, 1,
+				  ssdfs_report_zone, &zone);
+	if (res != 1) {
+		SSDFS_ERR("fail to take report zone: "
+			  "zone_sector %llu, err %d\n",
+			  zone_sector, res);
+	} else {
+		SSDFS_DBG("zone: start %llu, len %llu, wp %llu, "
+			  "type %#x, cond %#x, non_seq %#x, "
+			  "reset %#x, capacity %llu\n",
+			  zone.start, zone.len, zone.wp,
+			  zone.type, zone.cond, zone.non_seq,
+			  zone.reset, zone.capacity);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * ssdfs_zns_trim() - initiate background erase operation
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ * @len: size in bytes
+ *
+ * This function tries to initiate background erase operation.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EROFS       - file system in RO mode.
+ * %-EFAULT      - erase operation error.
+ */
+static int ssdfs_zns_trim(struct super_block *sb, loff_t offset, size_t len)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	u32 erase_size = fsi->erasesize;
+	loff_t page_start, page_end;
+	u32 pages_count;
+	u32 remainder;
+	sector_t start_sector;
+	sector_t sectors_count;
+	unsigned int nofs_flags;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, offset %llu, len %zu\n",
+		  sb, (unsigned long long)offset, len);
+
+	div_u64_rem((u64)len, (u64)erase_size, &remainder);
+	BUG_ON(remainder);
+	div_u64_rem((u64)offset, (u64)erase_size, &remainder);
+	BUG_ON(remainder);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (sb->s_flags & SB_RDONLY)
+		return -EROFS;
+
+	div_u64_rem((u64)len, (u64)erase_size, &remainder);
+	if (remainder) {
+		SSDFS_WARN("len %llu, erase_size %u, remainder %u\n",
+			   (unsigned long long)len,
+			   erase_size, remainder);
+		return -ERANGE;
+	}
+
+	page_start = offset >> PAGE_SHIFT;
+	page_end = (offset + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	pages_count = (u32)(page_end - page_start);
+
+	if (pages_count == 0) {
+		SSDFS_WARN("pages_count equals to zero\n");
+		return -ERANGE;
+	}
+
+	start_sector = offset >> SECTOR_SHIFT;
+	sectors_count = fsi->erasesize >> SECTOR_SHIFT;
+
+	nofs_flags = memalloc_nofs_save();
+	err = blkdev_zone_mgmt(sb->s_bdev, REQ_OP_ZONE_RESET,
+				start_sector, sectors_count);
+	memalloc_nofs_restore(nofs_flags);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to reset zone: "
+			  "zone_sector %llu, zone_size %llu, err %d\n",
+			  start_sector, sectors_count, err);
+		return err;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_zns_peb_isbad() - check that PEB is bad
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to detect that PEB is bad or not.
+ */
+static int ssdfs_zns_peb_isbad(struct super_block *sb, loff_t offset)
+{
+	/* do nothing */
+	return 0;
+}
+
+/*
+ * ssdfs_zns_mark_peb_bad() - mark PEB as bad
+ * @sb: superblock object
+ * @offset: offset in bytes from partition's begin
+ *
+ * This function tries to mark PEB as bad.
+ */
+static int ssdfs_zns_mark_peb_bad(struct super_block *sb, loff_t offset)
+{
+	/* do nothing */
+	return 0;
+}
+
+/*
+ * ssdfs_zns_sync() - make sync operation
+ * @sb: superblock object
+ */
+static void ssdfs_zns_sync(struct super_block *sb)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("device %s\n", sb->s_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	wait_event(zns_wq, atomic_read(&fsi->pending_bios) == 0);
+}
+
+const struct ssdfs_device_ops ssdfs_zns_devops = {
+	.device_name		= ssdfs_zns_device_name,
+	.device_size		= ssdfs_zns_device_size,
+	.open_zone		= ssdfs_zns_open_zone,
+	.reopen_zone		= ssdfs_zns_reopen_zone,
+	.close_zone		= ssdfs_zns_close_zone,
+	.read			= ssdfs_zns_read,
+	.read_block		= ssdfs_zns_read_block,
+	.read_blocks		= ssdfs_zns_read_blocks,
+	.can_write_block	= ssdfs_zns_can_write_block,
+	.write_block		= ssdfs_zns_write_block,
+	.write_blocks		= ssdfs_zns_write_blocks,
+	.erase			= ssdfs_zns_trim,
+	.trim			= ssdfs_zns_trim,
+	.peb_isbad		= ssdfs_zns_peb_isbad,
+	.mark_peb_bad		= ssdfs_zns_mark_peb_bad,
+	.sync			= ssdfs_zns_sync,
+};
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 06/79] ssdfs: implement super operations
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (3 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 04/79] ssdfs: implement raw device operations Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 07/79] ssdfs: implement commit superblock logic Viacheslav Dubeyko
                   ` (27 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

SSDFS has specialized superblock segment (erase block) that
has goal to keep the sequence of committed superblocks.
Superblock instance is stored on successful mount operation
and during unmount operation. At first, logic tries to detect
the state of current superblock segment. If segment (erase block)
is completely full, then a new superblock segment is reserved.
As a result, new superblock instance is stored into the sequence.
Actually, SSDFS has main and backup copy of current superblock
segments. Additionally, SSDFS keeps information about previous,
current, next, and reserved superblock segments. SSDFS can use
two policy of segment superblock allocation: (1) reserve a new
segment for every new allocation, (2) use only set of superblock
segments that have been reserved by mkfs tool.

Every commit operation stores log into superblock segment.
This log contains:
(1) segment header,
(2) payload (mapping table cache, for example),
(3) log footer.

Segment header can be considered like static superblock info.
It contains metadata that not changed at all after volume
creation (logical block size, for example) or changed rarely
(number of segments in the volume, for example). Log footer
can be considered like dynamic part of superblock because
it contains frequently updated metadata (for example, root
node of inodes b-tree).

Patch implements register/unregister file system logic.
The register FS logic includes caches creation/initialization,
compression support initialization, sysfs subsystem
initialization. Oppositely, unregister FS logic executes
destruction of caches, compression subsystem, and sysfs
entries.

Also, patch implements basic mount/unmount logic.
The ssdfs_fill_super() implements mount logic that includes:
(1) parsing mount options,
(2) extract superblock info,
(3) create key in-core metadata structures (mapping table,
    segment bitmap, b-trees),
(4) create root inode,
(5) start metadata structures' threads,
(6) commit superblock on finish of mount operation.

The ssdfs_put_super() implements unmount logic:
(1) stop metadata threads,
(2) wait unfinished user data requests,
(3) flush dirty metadata structures,
(4) commit superblock,
(5) destroy in-core metadata structures.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/options.c |  170 ++++
 fs/ssdfs/super.c   | 2077 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 2247 insertions(+)
 create mode 100644 fs/ssdfs/options.c
 create mode 100644 fs/ssdfs/super.c

diff --git a/fs/ssdfs/options.c b/fs/ssdfs/options.c
new file mode 100644
index 000000000000..4a1d710d9350
--- /dev/null
+++ b/fs/ssdfs/options.c
@@ -0,0 +1,170 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/options.c - mount options parsing.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/string.h>
+#include <linux/kernel.h>
+#include <linux/parser.h>
+#include <linux/mount.h>
+#include <linux/slab.h>
+#include <linux/seq_file.h>
+#include <linux/pagevec.h>
+#include <linux/fs_parser.h>
+#include <linux/fs_context.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "folio_vector.h"
+#include "ssdfs.h"
+#include "segment_bitmap.h"
+
+/*
+ * SSDFS mount options.
+ *
+ * Opt_err: behavior if fs error is detected
+ * Opt_compr: change default compressor
+ * Opt_ignore_fs_state: ignore on-disk file system state during mount
+ */
+enum {
+	Opt_err,
+	Opt_compr,
+	Opt_ignore_fs_state,
+};
+
+static const struct constant_table ssdfs_param_err[] = {
+	{"panic",	SSDFS_MOUNT_ERRORS_PANIC},
+	{"remount-ro",	SSDFS_MOUNT_ERRORS_RO},
+	{"continue",	SSDFS_MOUNT_ERRORS_CONT},
+	{}
+};
+
+static const struct constant_table ssdfs_param_compr[] = {
+	{"none",	SSDFS_MOUNT_COMPR_MODE_NONE},
+#ifdef CONFIG_SSDFS_ZLIB
+	{"zlib",	SSDFS_MOUNT_COMPR_MODE_ZLIB},
+#endif
+#ifdef CONFIG_SSDFS_LZO
+	{"lzo",		SSDFS_MOUNT_COMPR_MODE_LZO},
+#endif
+	{}
+};
+
+static const struct constant_table ssdfs_param_fs_state[] = {
+	{"ignore",	SSDFS_MOUNT_IGNORE_FS_STATE},
+	{}
+};
+
+static const struct fs_parameter_spec ssdfs_fs_parameters[] = {
+	fsparam_enum	("errors", Opt_err, ssdfs_param_err),
+	fsparam_enum	("compr", Opt_compr, ssdfs_param_compr),
+	fsparam_enum	("fs_state", Opt_ignore_fs_state, ssdfs_param_fs_state),
+	{}
+};
+
+int ssdfs_parse_param(struct fs_context *fc, struct fs_parameter *param)
+{
+	struct ssdfs_mount_context *ctx = fc->fs_private;
+	struct fs_parse_result result;
+	int opt;
+
+	opt = fs_parse(fc, ssdfs_fs_parameters, param, &result);
+	if (opt < 0)
+		return opt;
+
+	switch (opt) {
+	case Opt_err:
+		ssdfs_clear_opt(ctx->s_mount_opts, ERRORS_PANIC);
+		ssdfs_clear_opt(ctx->s_mount_opts, ERRORS_RO);
+		ssdfs_clear_opt(ctx->s_mount_opts, ERRORS_CONT);
+		ctx->s_mount_opts |= result.uint_32;
+		break;
+
+	case Opt_compr:
+		ssdfs_clear_opt(ctx->s_mount_opts, COMPR_MODE_NONE);
+		ssdfs_clear_opt(ctx->s_mount_opts, COMPR_MODE_ZLIB);
+		ssdfs_clear_opt(ctx->s_mount_opts, COMPR_MODE_LZO);
+		ctx->s_mount_opts |= result.uint_32;
+		break;
+
+	case Opt_ignore_fs_state:
+		ctx->s_mount_opts |= result.uint_32;
+		break;
+
+	default:
+		SSDFS_ERR("unrecognized mount option\n");
+		return -EINVAL;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("DONE: parse options\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+void ssdfs_initialize_fs_errors_option(struct ssdfs_fs_info *fsi)
+{
+	if (fsi->fs_errors == SSDFS_ERRORS_PANIC)
+		ssdfs_set_opt(fsi->mount_opts, ERRORS_PANIC);
+	else if (fsi->fs_errors == SSDFS_ERRORS_RO)
+		ssdfs_set_opt(fsi->mount_opts, ERRORS_RO);
+	else if (fsi->fs_errors == SSDFS_ERRORS_CONTINUE)
+		ssdfs_set_opt(fsi->mount_opts, ERRORS_CONT);
+	else {
+		u16 def_behaviour = SSDFS_ERRORS_DEFAULT;
+
+		switch (def_behaviour) {
+		case SSDFS_ERRORS_PANIC:
+			ssdfs_set_opt(fsi->mount_opts, ERRORS_PANIC);
+			break;
+
+		case SSDFS_ERRORS_RO:
+			ssdfs_set_opt(fsi->mount_opts, ERRORS_RO);
+			break;
+		}
+	}
+}
+
+int ssdfs_show_options(struct seq_file *seq, struct dentry *root)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(root->d_sb);
+	char *compress_type;
+
+	if (ssdfs_test_opt(fsi->mount_opts, COMPR_MODE_ZLIB)) {
+		compress_type = "zlib";
+		seq_printf(seq, ",compress=%s", compress_type);
+	} else if (ssdfs_test_opt(fsi->mount_opts, COMPR_MODE_LZO)) {
+		compress_type = "lzo";
+		seq_printf(seq, ",compress=%s", compress_type);
+	}
+
+	if (ssdfs_test_opt(fsi->mount_opts, ERRORS_PANIC))
+		seq_puts(seq, ",errors=panic");
+	else if (ssdfs_test_opt(fsi->mount_opts, ERRORS_RO))
+		seq_puts(seq, ",errors=remount-ro");
+	else if (ssdfs_test_opt(fsi->mount_opts, ERRORS_CONT))
+		seq_puts(seq, ",errors=continue");
+
+	if (ssdfs_test_opt(fsi->mount_opts, IGNORE_FS_STATE))
+		seq_puts(seq, ",fs_state=ignore");
+
+	return 0;
+}
diff --git a/fs/ssdfs/super.c b/fs/ssdfs/super.c
new file mode 100644
index 000000000000..f430eee9aaf0
--- /dev/null
+++ b/fs/ssdfs/super.c
@@ -0,0 +1,2077 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/super.c - module and superblock management.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/mtd/mtd.h>
+#include <linux/mtd/super.h>
+#include <linux/exportfs.h>
+#include <linux/pagevec.h>
+#include <linux/blkdev.h>
+#include <linux/backing-dev.h>
+#include <linux/delay.h>
+#include <linux/fs_parser.h>
+#include <linux/fs_context.h>
+
+#include <kunit/visibility.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "folio_vector.h"
+#include "ssdfs.h"
+#include "version.h"
+#include "folio_array.h"
+#include "segment_bitmap.h"
+#include "peb.h"
+#include "offset_translation_table.h"
+#include "peb_container.h"
+#include "segment.h"
+#include "segment_tree.h"
+#include "current_segment.h"
+#include "peb_mapping_table.h"
+#include "btree_search.h"
+#include "btree_node.h"
+#include "extents_queue.h"
+#include "btree.h"
+#include "inodes_tree.h"
+#include "shared_extents_tree.h"
+#include "shared_dictionary.h"
+#include "extents_tree.h"
+#include "dentries_tree.h"
+#include "xattr_tree.h"
+#include "xattr.h"
+#include "acl.h"
+#include "snapshots_tree.h"
+#include "invalidated_extents_tree.h"
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/ssdfs.h>
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+atomic64_t ssdfs_allocated_folios;
+EXPORT_SYMBOL_IF_KUNIT(ssdfs_allocated_folios);
+
+atomic64_t ssdfs_memory_leaks;
+atomic64_t ssdfs_super_folio_leaks;
+atomic64_t ssdfs_super_memory_leaks;
+atomic64_t ssdfs_super_cache_leaks;
+
+atomic64_t ssdfs_locked_folios;
+EXPORT_SYMBOL_IF_KUNIT(ssdfs_locked_folios);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+/*
+ * void ssdfs_super_cache_leaks_increment(void *kaddr)
+ * void ssdfs_super_cache_leaks_decrement(void *kaddr)
+ * void *ssdfs_super_kmalloc(size_t size, gfp_t flags)
+ * void *ssdfs_super_kzalloc(size_t size, gfp_t flags)
+ * void *ssdfs_super_kcalloc(size_t n, size_t size, gfp_t flags)
+ * void ssdfs_super_kfree(void *kaddr)
+ * struct folio *ssdfs_super_alloc_folio(gfp_t gfp_mask,
+ *                                       unsigned int order)
+ * struct folio *ssdfs_super_add_batch_folio(struct folio_batch *batch,
+ *                                           unsigned int order)
+ * void ssdfs_super_free_folio(struct folio *folio)
+ * void ssdfs_super_folio_batch_release(struct folio_batch *batch)
+ */
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_MEMORY_LEAKS_CHECKER_FNS(super)
+#else
+	SSDFS_MEMORY_ALLOCATOR_FNS(super)
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+static inline
+void ssdfs_super_memory_leaks_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_super_folio_leaks, 0);
+	atomic64_set(&ssdfs_super_memory_leaks, 0);
+	atomic64_set(&ssdfs_super_cache_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static inline
+void ssdfs_super_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_super_folio_leaks) != 0) {
+		SSDFS_ERR("SUPER: "
+			  "memory leaks include %lld folios\n",
+			  atomic64_read(&ssdfs_super_folio_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_super_memory_leaks) != 0) {
+		SSDFS_ERR("SUPER: "
+			  "memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_super_memory_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_super_cache_leaks) != 0) {
+		SSDFS_ERR("SUPER: "
+			  "caches suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_super_cache_leaks));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+struct ssdfs_payload_content {
+	struct folio_batch batch;
+	u32 bytes_count;
+};
+
+struct ssdfs_sb_log_payload {
+	struct ssdfs_payload_content maptbl_cache;
+};
+
+static struct kmem_cache *ssdfs_inode_cachep;
+
+static int ssdfs_prepare_sb_log(struct super_block *sb,
+				struct ssdfs_peb_extent *last_sb_log);
+static int ssdfs_snapshot_sb_log_payload(struct super_block *sb,
+					 struct ssdfs_sb_log_payload *payload);
+static int ssdfs_commit_super(struct super_block *sb, u16 fs_state,
+				struct ssdfs_peb_extent *last_sb_log,
+				struct ssdfs_sb_log_payload *payload);
+static void ssdfs_put_super(struct super_block *sb);
+static void ssdfs_check_memory_leaks(void);
+
+static void init_once(void *foo)
+{
+	struct ssdfs_inode_info *ii = (struct ssdfs_inode_info *)foo;
+
+	inode_init_once(&ii->vfs_inode);
+}
+
+/*
+ * This method is called by inode_alloc() to allocate memory
+ * for struct inode and initialize it
+ */
+static inline
+struct inode *ssdfs_alloc_inode(struct super_block *sb)
+{
+	struct ssdfs_inode_info *ii;
+	unsigned int nofs_flags;
+
+	nofs_flags = memalloc_nofs_save();
+	ii = alloc_inode_sb(sb, ssdfs_inode_cachep, GFP_KERNEL);
+	memalloc_nofs_restore(nofs_flags);
+
+	if (!ii)
+		return NULL;
+
+	ssdfs_super_cache_leaks_increment(ii);
+
+	init_once((void *)ii);
+
+	atomic_set(&ii->private_flags, 0);
+	init_rwsem(&ii->lock);
+	ii->parent_ino = U64_MAX;
+	ii->flags = 0;
+	ii->name_hash = 0;
+	ii->name_len = 0;
+	ii->extents_tree = NULL;
+	ii->dentries_tree = NULL;
+	ii->xattrs_tree = NULL;
+	ii->inline_file = NULL;
+	memset(&ii->raw_inode, 0, sizeof(struct ssdfs_inode));
+
+	return &ii->vfs_inode;
+}
+
+void ssdfs_destroy_btree_of_inode(struct inode *inode)
+{
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu\n", inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (ii->extents_tree) {
+		ssdfs_extents_tree_destroy(ii);
+		ii->extents_tree = NULL;
+	}
+
+	if (ii->dentries_tree) {
+		ssdfs_dentries_tree_destroy(ii);
+		ii->dentries_tree = NULL;
+	}
+
+	if (ii->xattrs_tree) {
+		ssdfs_xattrs_tree_destroy(ii);
+		ii->xattrs_tree = NULL;
+	}
+
+	if (ii->inline_file) {
+		ssdfs_destroy_inline_file_buffer(inode);
+		ii->inline_file = NULL;
+	}
+}
+
+void ssdfs_destroy_and_decrement_btree_of_inode(struct inode *inode)
+{
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu\n", inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_destroy_btree_of_inode(inode);
+
+	if (inode->i_ino == SSDFS_SEG_BMAP_INO ||
+	    inode->i_ino == SSDFS_SEG_TREE_INO ||
+	    inode->i_ino == SSDFS_TESTING_INO) {
+		ssdfs_super_cache_leaks_decrement(ii);
+	} else
+		BUG();
+}
+
+static void ssdfs_i_callback(struct rcu_head *head)
+{
+	struct inode *inode = container_of(head, struct inode, i_rcu);
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu\n", inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_destroy_btree_of_inode(inode);
+
+	if (inode->i_ino == SSDFS_SEG_BMAP_INO ||
+	    inode->i_ino == SSDFS_SEG_TREE_INO ||
+	    inode->i_ino == SSDFS_TESTING_INO) {
+		/*
+		 * Do nothing.
+		 * The ssdfs_destroy_and_decrement_btree_of_inode did it already.
+		 */
+	} else {
+		ssdfs_super_cache_leaks_decrement(ii);
+	}
+
+	kmem_cache_free(ssdfs_inode_cachep, ii);
+}
+
+/*
+ * This method is called by destroy_inode() to release
+ * resources allocated for struct inode
+ */
+static void ssdfs_destroy_inode(struct inode *inode)
+{
+	call_rcu(&inode->i_rcu, ssdfs_i_callback);
+}
+
+static void ssdfs_init_inode_once(void *obj)
+{
+	struct ssdfs_inode_info *ii = obj;
+	inode_init_once(&ii->vfs_inode);
+}
+
+static int ssdfs_remount_fs(struct fs_context *fc, struct super_block *sb)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct ssdfs_peb_extent last_sb_log = {0};
+	struct ssdfs_sb_log_payload payload;
+	unsigned int flags = fc->sb_flags;
+	unsigned long old_sb_flags;
+	unsigned long old_mount_opts;
+	int err;
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("sb %p, flags %#x\n", sb, flags);
+#else
+	SSDFS_DBG("sb %p, flags %#x\n", sb, flags);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	old_sb_flags = sb->s_flags;
+	old_mount_opts = fsi->mount_opts;
+
+	folio_batch_init(&payload.maptbl_cache.batch);
+
+	set_posix_acl_flag(sb);
+
+	if ((flags & SB_RDONLY) == (sb->s_flags & SB_RDONLY))
+		goto out;
+
+	if (flags & SB_RDONLY) {
+		down_write(&fsi->volume_sem);
+
+		err = ssdfs_prepare_sb_log(sb, &last_sb_log);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to prepare sb log: err %d\n",
+				  err);
+		}
+
+		err = ssdfs_snapshot_sb_log_payload(sb, &payload);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to snapshot sb log's payload: err %d\n",
+				  err);
+		}
+
+		if (!err) {
+			err = ssdfs_commit_super(sb, SSDFS_VALID_FS,
+						 &last_sb_log,
+						 &payload);
+		} else {
+			SSDFS_ERR("fail to prepare sb log payload: "
+				  "err %d\n", err);
+		}
+
+		up_write(&fsi->volume_sem);
+
+		if (err)
+			SSDFS_ERR("fail to commit superblock info\n");
+
+		sb->s_flags |= SB_RDONLY;
+		SSDFS_DBG("remount in RO mode\n");
+	} else {
+		down_write(&fsi->volume_sem);
+
+		err = ssdfs_prepare_sb_log(sb, &last_sb_log);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to prepare sb log: err %d\n",
+				  err);
+		}
+
+		err = ssdfs_snapshot_sb_log_payload(sb, &payload);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to snapshot sb log's payload: err %d\n",
+				  err);
+		}
+
+		if (!err) {
+			err = ssdfs_commit_super(sb, SSDFS_MOUNTED_FS,
+						 &last_sb_log,
+						 &payload);
+		} else {
+			SSDFS_ERR("fail to prepare sb log payload: "
+				  "err %d\n", err);
+		}
+
+		up_write(&fsi->volume_sem);
+
+		if (err) {
+			SSDFS_NOTICE("fail to commit superblock info\n");
+			goto restore_opts;
+		}
+
+		sb->s_flags &= ~SB_RDONLY;
+		SSDFS_DBG("remount in RW mode\n");
+	}
+out:
+	ssdfs_super_folio_batch_release(&payload.maptbl_cache.batch);
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("finished\n");
+#else
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	return 0;
+
+restore_opts:
+	sb->s_flags = old_sb_flags;
+	fsi->mount_opts = old_mount_opts;
+	ssdfs_super_folio_batch_release(&payload.maptbl_cache.batch);
+	return err;
+}
+
+static
+int ssdfs_commit_super(struct super_block *sb, u16 fs_state,
+			struct ssdfs_peb_extent *last_sb_log,
+			struct ssdfs_sb_log_payload *payload)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	__le64 cur_segs[SSDFS_CUR_SEGS_COUNT];
+	size_t size = sizeof(__le64) * SSDFS_CUR_SEGS_COUNT;
+	u64 timestamp = ssdfs_current_timestamp();
+	u64 cno = ssdfs_current_cno(sb);
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !last_sb_log || !payload);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("sb %p, fs_state %u", sb, fs_state);
+#else
+	SSDFS_DBG("sb %p, fs_state %u", sb, fs_state);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	BUG_ON(fs_state > SSDFS_LAST_KNOWN_FS_STATE);
+
+	if (le16_to_cpu(fsi->vs->state) == SSDFS_ERROR_FS &&
+	    !ssdfs_test_opt(fsi->mount_opts, IGNORE_FS_STATE)) {
+		SSDFS_DBG("refuse commit superblock: fs erroneous state\n");
+		return 0;
+	}
+
+	mutex_lock(&fsi->tunefs_request.lock);
+
+	err = ssdfs_prepare_volume_header_for_commit(fsi, fsi->vh);
+	if (unlikely(err)) {
+		SSDFS_CRIT("volume header is inconsistent: err %d\n", err);
+		goto finish_commit_super;
+	}
+
+	err = ssdfs_prepare_current_segment_ids(fsi, cur_segs, size);
+	if (unlikely(err)) {
+		SSDFS_CRIT("fail to prepare current segments IDs: err %d\n",
+			   err);
+		goto finish_commit_super;
+	}
+
+	err = ssdfs_prepare_volume_state_info_for_commit(fsi, fs_state,
+							 cur_segs, size,
+							 timestamp,
+							 cno,
+							 fsi->vs);
+	if (unlikely(err)) {
+		SSDFS_CRIT("volume state info is inconsistent: err %d\n", err);
+		goto finish_commit_super;
+	}
+
+	for (i = 0; i < SSDFS_SB_SEG_COPY_MAX; i++) {
+		last_sb_log->leb_id = fsi->sb_lebs[SSDFS_CUR_SB_SEG][i];
+		last_sb_log->peb_id = fsi->sb_pebs[SSDFS_CUR_SB_SEG][i];
+		err = ssdfs_commit_sb_log(sb, timestamp, cno,
+					  last_sb_log, payload);
+		if (err) {
+			SSDFS_ERR("fail to commit superblock log: "
+				  "leb_id %llu, peb_id %llu, "
+				  "page_offset %u, pages_count %u, "
+				  "err %d\n",
+				  last_sb_log->leb_id,
+				  last_sb_log->peb_id,
+				  last_sb_log->page_offset,
+				  last_sb_log->pages_count,
+				  err);
+			goto finish_commit_super;
+		}
+	}
+
+	last_sb_log->leb_id = fsi->sb_lebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG];
+	last_sb_log->peb_id = fsi->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG];
+
+	ssdfs_memcpy(&fsi->sbi.last_log,
+		     0, sizeof(struct ssdfs_peb_extent),
+		     last_sb_log,
+		     0, sizeof(struct ssdfs_peb_extent),
+		     sizeof(struct ssdfs_peb_extent));
+
+finish_commit_super:
+	mutex_unlock(&fsi->tunefs_request.lock);
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("finished: err %d\n", err);
+#else
+	SSDFS_DBG("finished: err %d\n", err);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	return err;
+}
+
+static void ssdfs_memory_folio_locks_checker_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_locked_folios, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static void ssdfs_check_memory_folio_locks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_locked_folios) != 0) {
+		SSDFS_WARN("Lock keeps %lld memory folios\n",
+			   atomic64_read(&ssdfs_locked_folios));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static void ssdfs_memory_leaks_checker_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_allocated_folios, 0);
+	atomic64_set(&ssdfs_memory_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+#ifdef CONFIG_SSDFS_POSIX_ACL
+	ssdfs_acl_memory_leaks_init();
+#endif /* CONFIG_SSDFS_POSIX_ACL */
+
+	ssdfs_block_bmap_memory_leaks_init();
+	ssdfs_btree_memory_leaks_init();
+	ssdfs_btree_hierarchy_memory_leaks_init();
+	ssdfs_btree_node_memory_leaks_init();
+	ssdfs_btree_search_memory_leaks_init();
+
+#ifdef CONFIG_SSDFS_ZLIB
+	ssdfs_zlib_memory_leaks_init();
+#endif /* CONFIG_SSDFS_ZLIB */
+
+#ifdef CONFIG_SSDFS_LZO
+	ssdfs_lzo_memory_leaks_init();
+#endif /* CONFIG_SSDFS_LZO */
+
+	ssdfs_compr_memory_leaks_init();
+	ssdfs_cur_seg_memory_leaks_init();
+	ssdfs_dentries_memory_leaks_init();
+
+#ifdef CONFIG_SSDFS_MTD_DEVICE
+	ssdfs_dev_mtd_memory_leaks_init();
+#elif defined(CONFIG_SSDFS_BLOCK_DEVICE)
+	ssdfs_dev_bdev_memory_leaks_init();
+	ssdfs_dev_zns_memory_leaks_init();
+#else
+	BUILD_BUG();
+#endif
+
+	ssdfs_dir_memory_leaks_init();
+
+#ifdef CONFIG_SSDFS_DIFF_ON_WRITE_USER_DATA
+	ssdfs_diff_memory_leaks_init();
+#endif /* CONFIG_SSDFS_DIFF_ON_WRITE_USER_DATA */
+
+	ssdfs_dynamic_array_memory_leaks_init();
+	ssdfs_ext_queue_memory_leaks_init();
+	ssdfs_ext_tree_memory_leaks_init();
+
+#ifdef CONFIG_SSDFS_PEB_DEDUPLICATION
+	ssdfs_fingerprint_array_memory_leaks_init();
+#endif /* CONFIG_SSDFS_PEB_DEDUPLICATION */
+
+	ssdfs_file_memory_leaks_init();
+	ssdfs_fs_error_memory_leaks_init();
+
+	ssdfs_global_fsck_memory_leaks_init();
+
+#ifdef CONFIG_SSDFS_ONLINE_FSCK
+	ssdfs_fsck_memory_leaks_init();
+#endif /* CONFIG_SSDFS_ONLINE_FSCK */
+
+	ssdfs_inode_memory_leaks_init();
+	ssdfs_ino_tree_memory_leaks_init();
+	ssdfs_invext_tree_memory_leaks_init();
+	ssdfs_blk2off_memory_leaks_init();
+	ssdfs_farray_memory_leaks_init();
+	ssdfs_folio_vector_memory_leaks_init();
+	ssdfs_flush_memory_leaks_init();
+	ssdfs_gc_memory_leaks_init();
+	ssdfs_map_queue_memory_leaks_init();
+	ssdfs_map_tbl_memory_leaks_init();
+	ssdfs_map_cache_memory_leaks_init();
+	ssdfs_map_thread_memory_leaks_init();
+	ssdfs_migration_memory_leaks_init();
+	ssdfs_peb_memory_leaks_init();
+	ssdfs_read_memory_leaks_init();
+	ssdfs_recovery_memory_leaks_init();
+	ssdfs_req_queue_memory_leaks_init();
+	ssdfs_seg_obj_memory_leaks_init();
+	ssdfs_seg_bmap_memory_leaks_init();
+	ssdfs_seg_blk_memory_leaks_init();
+	ssdfs_seg_tree_memory_leaks_init();
+	ssdfs_seq_arr_memory_leaks_init();
+	ssdfs_dict_memory_leaks_init();
+	ssdfs_shextree_memory_leaks_init();
+	ssdfs_super_memory_leaks_init();
+	ssdfs_xattr_memory_leaks_init();
+	ssdfs_snap_reqs_queue_memory_leaks_init();
+	ssdfs_snap_rules_list_memory_leaks_init();
+	ssdfs_snap_tree_memory_leaks_init();
+}
+
+static void ssdfs_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_POSIX_ACL
+	ssdfs_acl_check_memory_leaks();
+#endif /* CONFIG_SSDFS_POSIX_ACL */
+
+	ssdfs_block_bmap_check_memory_leaks();
+	ssdfs_btree_check_memory_leaks();
+	ssdfs_btree_hierarchy_check_memory_leaks();
+	ssdfs_btree_node_check_memory_leaks();
+	ssdfs_btree_search_check_memory_leaks();
+
+#ifdef CONFIG_SSDFS_ZLIB
+	ssdfs_zlib_check_memory_leaks();
+#endif /* CONFIG_SSDFS_ZLIB */
+
+#ifdef CONFIG_SSDFS_LZO
+	ssdfs_lzo_check_memory_leaks();
+#endif /* CONFIG_SSDFS_LZO */
+
+	ssdfs_compr_check_memory_leaks();
+	ssdfs_cur_seg_check_memory_leaks();
+	ssdfs_dentries_check_memory_leaks();
+
+#ifdef CONFIG_SSDFS_MTD_DEVICE
+	ssdfs_dev_mtd_check_memory_leaks();
+#elif defined(CONFIG_SSDFS_BLOCK_DEVICE)
+	ssdfs_dev_bdev_check_memory_leaks();
+	ssdfs_dev_zns_check_memory_leaks();
+#else
+	BUILD_BUG();
+#endif
+
+	ssdfs_dir_check_memory_leaks();
+
+#ifdef CONFIG_SSDFS_DIFF_ON_WRITE_USER_DATA
+	ssdfs_diff_check_memory_leaks();
+#endif /* CONFIG_SSDFS_DIFF_ON_WRITE_USER_DATA */
+
+	ssdfs_dynamic_array_check_memory_leaks();
+	ssdfs_ext_queue_check_memory_leaks();
+	ssdfs_ext_tree_check_memory_leaks();
+
+#ifdef CONFIG_SSDFS_PEB_DEDUPLICATION
+	ssdfs_fingerprint_array_check_memory_leaks();
+#endif /* CONFIG_SSDFS_PEB_DEDUPLICATION */
+
+	ssdfs_file_check_memory_leaks();
+	ssdfs_fs_error_check_memory_leaks();
+
+	ssdfs_global_fsck_check_memory_leaks();
+
+#ifdef CONFIG_SSDFS_ONLINE_FSCK
+	ssdfs_fsck_check_memory_leaks();
+#endif /* CONFIG_SSDFS_ONLINE_FSCK */
+
+	ssdfs_inode_check_memory_leaks();
+	ssdfs_ino_tree_check_memory_leaks();
+	ssdfs_invext_tree_check_memory_leaks();
+	ssdfs_blk2off_check_memory_leaks();
+	ssdfs_farray_check_memory_leaks();
+	ssdfs_folio_vector_check_memory_leaks();
+	ssdfs_flush_check_memory_leaks();
+	ssdfs_gc_check_memory_leaks();
+	ssdfs_map_queue_check_memory_leaks();
+	ssdfs_map_tbl_check_memory_leaks();
+	ssdfs_map_cache_check_memory_leaks();
+	ssdfs_map_thread_check_memory_leaks();
+	ssdfs_migration_check_memory_leaks();
+	ssdfs_peb_check_memory_leaks();
+	ssdfs_read_check_memory_leaks();
+	ssdfs_recovery_check_memory_leaks();
+	ssdfs_req_queue_check_memory_leaks();
+	ssdfs_seg_obj_check_memory_leaks();
+	ssdfs_seg_bmap_check_memory_leaks();
+	ssdfs_seg_blk_check_memory_leaks();
+	ssdfs_seg_tree_check_memory_leaks();
+	ssdfs_seq_arr_check_memory_leaks();
+	ssdfs_dict_check_memory_leaks();
+	ssdfs_shextree_check_memory_leaks();
+	ssdfs_super_check_memory_leaks();
+	ssdfs_xattr_check_memory_leaks();
+	ssdfs_snap_reqs_queue_check_memory_leaks();
+	ssdfs_snap_rules_list_check_memory_leaks();
+	ssdfs_snap_tree_check_memory_leaks();
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+#ifdef CONFIG_SSDFS_SHOW_CONSUMED_MEMORY
+	if (atomic64_read(&ssdfs_allocated_folios) != 0) {
+		SSDFS_ERR("Memory leaks include %lld folios\n",
+			  atomic64_read(&ssdfs_allocated_folios));
+	}
+
+	if (atomic64_read(&ssdfs_memory_leaks) != 0) {
+		SSDFS_ERR("Memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_memory_leaks));
+	}
+#else
+	if (atomic64_read(&ssdfs_allocated_folios) != 0) {
+		SSDFS_WARN("Memory leaks include %lld folios\n",
+			   atomic64_read(&ssdfs_allocated_folios));
+	}
+
+	if (atomic64_read(&ssdfs_memory_leaks) != 0) {
+		SSDFS_WARN("Memory allocator suffers from %lld leaks\n",
+			   atomic64_read(&ssdfs_memory_leaks));
+	}
+#endif /* CONFIG_SSDFS_SHOW_CONSUMED_MEMORY */
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static int ssdfs_fill_super(struct super_block *sb, struct fs_context *fc)
+{
+	struct ssdfs_fs_info *fs_info;
+	struct ssdfs_mount_context *ctx = fc->fs_private;
+	struct ssdfs_peb_extent last_sb_log = {0};
+	struct ssdfs_sb_log_payload payload;
+	struct inode *root_i;
+	int silent = fc->sb_flags & SB_SILENT;
+	u64 fs_feature_compat;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("sb %p, silent %#x\n", sb, silent);
+#else
+	SSDFS_DBG("sb %p, silent %#x\n", sb, silent);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("segment header size %zu, "
+		  "partial log header size %zu, "
+		  "footer size %zu\n",
+		  sizeof(struct ssdfs_segment_header),
+		  sizeof(struct ssdfs_partial_log_header),
+		  sizeof(struct ssdfs_log_footer));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_memory_folio_locks_checker_init();
+	ssdfs_memory_leaks_checker_init();
+
+	sb->s_fs_info = NULL;
+
+	fs_info = kzalloc(sizeof(*fs_info), GFP_KERNEL);
+	if (!fs_info)
+		return -ENOMEM;
+
+	fs_info->sb = sb;
+	sb->s_fs_info = fs_info;
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&fs_info->ssdfs_writeback_folios, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+	fs_info->fs_ctime = U64_MAX;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	spin_lock_init(&fs_info->requests_lock);
+	INIT_LIST_HEAD(&fs_info->user_data_requests_list);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	/* set initial block size value for valid log search */
+	fs_info->log_pagesize = ilog2(SSDFS_4KB);
+	fs_info->pagesize = SSDFS_4KB;
+
+#ifdef CONFIG_SSDFS_TESTING
+	fs_info->do_fork_invalidation = true;
+#endif /* CONFIG_SSDFS_TESTING */
+
+	fs_info->max_open_zones = 0;
+	fs_info->is_zns_device = false;
+	fs_info->zone_size = U64_MAX;
+	fs_info->zone_capacity = U64_MAX;
+	atomic_set(&fs_info->open_zones, 0);
+
+#ifdef CONFIG_SSDFS_ONLINE_FSCK
+	atomic_set(&fs_info->fsck_priority, 0);
+#endif /* CONFIG_SSDFS_ONLINE_FSCK */
+
+	mutex_init(&fs_info->tunefs_request.lock);
+	fs_info->tunefs_request.state = SSDFS_IGNORE_OPTION;
+	memset(&fs_info->tunefs_request.new_config, 0,
+		sizeof(struct ssdfs_tunefs_config_request));
+
+#ifdef CONFIG_SSDFS_MTD_DEVICE
+	fs_info->mtd = sb->s_mtd;
+	fs_info->devops = &ssdfs_mtd_devops;
+#elif defined(CONFIG_SSDFS_BLOCK_DEVICE)
+	if (bdev_is_zoned(sb->s_bdev)) {
+		fs_info->devops = &ssdfs_zns_devops;
+		fs_info->is_zns_device = true;
+		fs_info->max_open_zones = bdev_max_open_zones(sb->s_bdev);
+
+		fs_info->zone_size = ssdfs_zns_zone_size(sb,
+						SSDFS_RESERVED_VBR_SIZE);
+		if (fs_info->zone_size >= U64_MAX) {
+			SSDFS_ERR("fail to get zone size\n");
+			return -ERANGE;
+		}
+
+		fs_info->zone_capacity = ssdfs_zns_zone_capacity(sb,
+						SSDFS_RESERVED_VBR_SIZE);
+		if (fs_info->zone_capacity >= U64_MAX) {
+			SSDFS_ERR("fail to get zone capacity\n");
+			return -ERANGE;
+		} else if (fs_info->zone_capacity > fs_info->zone_size) {
+			SSDFS_ERR("invalid zone capacity: "
+				  "capacity %llu, size %llu\n",
+				  fs_info->zone_capacity,
+				  fs_info->zone_size);
+			return -ERANGE;
+		}
+	} else
+		fs_info->devops = &ssdfs_bdev_devops;
+
+	atomic_set(&fs_info->pending_bios, 0);
+	fs_info->erase_folio = ssdfs_super_alloc_folio(GFP_KERNEL,
+							get_order(PAGE_SIZE));
+	if (IS_ERR_OR_NULL(fs_info->erase_folio)) {
+		err = (fs_info->erase_folio == NULL ?
+				-ENOMEM : PTR_ERR(fs_info->erase_folio));
+		SSDFS_ERR("unable to allocate memory folio\n");
+		goto free_erase_folio;
+	}
+	memset(folio_address(fs_info->erase_folio), 0xFF, PAGE_SIZE);
+#else
+	BUILD_BUG();
+#endif
+
+	atomic64_set(&fs_info->flush_reqs, 0);
+	init_waitqueue_head(&fs_info->pending_wq);
+	init_waitqueue_head(&fs_info->finish_user_data_read_wq);
+	init_waitqueue_head(&fs_info->finish_user_data_flush_wq);
+	init_waitqueue_head(&fs_info->finish_commit_log_flush_wq);
+	atomic_set(&fs_info->maptbl_users, 0);
+	init_waitqueue_head(&fs_info->maptbl_users_wq);
+	atomic_set(&fs_info->segbmap_users, 0);
+	init_waitqueue_head(&fs_info->segbmap_users_wq);
+	ssdfs_btree_nodes_list_init(&fs_info->btree_nodes);
+	atomic_set(&fs_info->global_fs_state, SSDFS_UNKNOWN_GLOBAL_FS_STATE);
+	spin_lock_init(&fs_info->volume_state_lock);
+	init_completion(&fs_info->mount_end);
+
+	ssdfs_seg_objects_queue_init(&fs_info->pre_destroyed_segs_rq);
+
+	init_waitqueue_head(&fs_info->global_fsck.wait_queue);
+	ssdfs_requests_queue_init(&fs_info->global_fsck.rq);
+
+	for (i = 0; i < SSDFS_GC_THREAD_TYPE_MAX; i++) {
+		init_waitqueue_head(&fs_info->gc_wait_queue[i]);
+		atomic_set(&fs_info->gc_should_act[i], 1);
+	}
+
+	fs_info->mount_opts = ctx->s_mount_opts;
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("gather superblock info started...\n");
+#else
+	SSDFS_DBG("gather superblock info started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	err = ssdfs_gather_superblock_info(fs_info, silent);
+	if (err)
+		goto free_erase_folio;
+
+	spin_lock(&fs_info->volume_state_lock);
+	fs_feature_compat = fs_info->fs_feature_compat;
+	spin_unlock(&fs_info->volume_state_lock);
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create device group started...\n");
+#else
+	SSDFS_DBG("create device group started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	err = ssdfs_sysfs_create_device_group(sb);
+	if (err)
+		goto release_maptbl_cache;
+
+	sb->s_maxbytes = MAX_LFS_FILESIZE;
+	sb->s_magic = SSDFS_SUPER_MAGIC;
+	sb->s_op = &ssdfs_super_operations;
+	sb->s_export_op = &ssdfs_export_ops;
+
+	sb->s_xattr = ssdfs_xattr_handlers;
+	set_posix_acl_flag(sb);
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create snapshots subsystem started...\n");
+#else
+	SSDFS_DBG("create snapshots subsystem started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	err = ssdfs_snapshot_subsystem_init(fs_info);
+	if (err == -EINTR) {
+		/*
+		 * Ignore this error.
+		 */
+		err = 0;
+		goto destroy_sysfs_device_group;
+	} else if (err)
+		goto destroy_sysfs_device_group;
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create segment tree started...\n");
+#else
+	SSDFS_DBG("create segment tree started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	down_write(&fs_info->volume_sem);
+	err = ssdfs_segment_tree_create(fs_info);
+	up_write(&fs_info->volume_sem);
+	if (err)
+		goto destroy_snapshot_subsystem;
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create mapping table started...\n");
+#else
+	SSDFS_DBG("create mapping table started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	if (fs_feature_compat & SSDFS_HAS_MAPTBL_COMPAT_FLAG) {
+		down_write(&fs_info->volume_sem);
+		err = ssdfs_maptbl_create(fs_info);
+		up_write(&fs_info->volume_sem);
+
+		if (err == -EINTR) {
+			/*
+			 * Ignore this error.
+			 */
+			err = 0;
+			goto destroy_segments_tree;
+		} else if (err)
+			goto destroy_segments_tree;
+	} else {
+		err = -EIO;
+		SSDFS_WARN("volume hasn't mapping table\n");
+		goto destroy_segments_tree;
+	}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create segment bitmap started...\n");
+#else
+	SSDFS_DBG("create segment bitmap started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	if (fs_feature_compat & SSDFS_HAS_SEGBMAP_COMPAT_FLAG) {
+		down_write(&fs_info->volume_sem);
+		err = ssdfs_segbmap_create(fs_info);
+		up_write(&fs_info->volume_sem);
+
+		if (err == -EINTR) {
+			/*
+			 * Ignore this error.
+			 */
+			err = 0;
+			goto destroy_maptbl;
+		} else if (err)
+			goto destroy_maptbl;
+	} else {
+		err = -EIO;
+		SSDFS_WARN("volume hasn't segment bitmap\n");
+		goto destroy_maptbl;
+	}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create shared extents tree started...\n");
+#else
+	SSDFS_DBG("create shared extents tree started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	if (fs_info->fs_feature_compat & SSDFS_HAS_SHARED_EXTENTS_COMPAT_FLAG) {
+		down_write(&fs_info->volume_sem);
+		err = ssdfs_shextree_create(fs_info);
+		up_write(&fs_info->volume_sem);
+		if (err)
+			goto destroy_segbmap;
+	} else {
+		err = -EIO;
+		SSDFS_WARN("volume hasn't shared extents tree\n");
+		goto destroy_segbmap;
+	}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create invalidated extents btree started...\n");
+#else
+	SSDFS_DBG("create invalidated extents btree started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	if (fs_feature_compat & SSDFS_HAS_INVALID_EXTENTS_TREE_COMPAT_FLAG) {
+		down_write(&fs_info->volume_sem);
+		err = ssdfs_invextree_create(fs_info);
+		up_write(&fs_info->volume_sem);
+		if (err)
+			goto destroy_shextree;
+	}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create current segment array started...\n");
+#else
+	SSDFS_DBG("create current segment array started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	down_write(&fs_info->volume_sem);
+	err = ssdfs_current_segment_array_create(fs_info);
+	up_write(&fs_info->volume_sem);
+	if (err)
+		goto destroy_invext_btree;
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create shared dictionary started...\n");
+#else
+	SSDFS_DBG("create shared dictionary started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	if (fs_feature_compat & SSDFS_HAS_SHARED_DICT_COMPAT_FLAG) {
+		down_write(&fs_info->volume_sem);
+
+		err = ssdfs_shared_dict_btree_create(fs_info);
+		if (err) {
+			up_write(&fs_info->volume_sem);
+			goto destroy_current_segment_array;
+		}
+
+		err = ssdfs_shared_dict_btree_init(fs_info);
+		if (err) {
+			up_write(&fs_info->volume_sem);
+			goto destroy_shdictree;
+		}
+
+		up_write(&fs_info->volume_sem);
+	} else {
+		err = -EIO;
+		SSDFS_WARN("volume hasn't shared dictionary\n");
+		goto destroy_current_segment_array;
+	}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("create inodes btree started...\n");
+#else
+	SSDFS_DBG("create inodes btree started...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	if (fs_feature_compat & SSDFS_HAS_INODES_TREE_COMPAT_FLAG) {
+		down_write(&fs_info->volume_sem);
+		err = ssdfs_inodes_btree_create(fs_info);
+		up_write(&fs_info->volume_sem);
+		if (err == -ENOSPC) {
+			err = 0;
+			fs_info->sb->s_flags |= SB_RDONLY;
+			SSDFS_DBG("unable to create inodes btree\n");
+		} else if (err)
+			goto destroy_shdictree;
+	} else {
+		err = -EIO;
+		SSDFS_WARN("volume hasn't inodes btree\n");
+		goto destroy_shdictree;
+	}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("getting root inode...\n");
+#else
+	SSDFS_DBG("getting root inode...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	root_i = ssdfs_iget(sb, SSDFS_ROOT_INO);
+	if (IS_ERR(root_i)) {
+		SSDFS_DBG("getting root inode failed\n");
+		err = PTR_ERR(root_i);
+		goto destroy_inodes_btree;
+	}
+
+	if (!S_ISDIR(root_i->i_mode) || !root_i->i_blocks || !root_i->i_size) {
+		err = -ERANGE;
+		iput(root_i);
+		SSDFS_ERR("corrupted root inode\n");
+		goto destroy_inodes_btree;
+	}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("d_make_root()\n");
+#else
+	SSDFS_DBG("d_make_root()\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	sb->s_root = d_make_root(root_i);
+	if (!sb->s_root) {
+		err = -ENOMEM;
+		goto put_root_inode;
+	}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("starting global FSCK thread...\n");
+#else
+	SSDFS_DBG("starting global FSCK thread...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	err = ssdfs_start_global_fsck_thread(fs_info);
+	if (err == -EINTR) {
+		/*
+		 * Ignore this error.
+		 */
+		err = 0;
+		goto put_root_inode;
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to start global FSCK thread: "
+			  "err %d\n", err);
+		goto put_root_inode;
+	}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("starting GC threads...\n");
+#else
+	SSDFS_DBG("starting GC threads...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	for (i = 0; i < SSDFS_GC_THREAD_TYPE_MAX; i++) {
+		err = ssdfs_start_gc_thread(fs_info, i);
+		if (err == -EINTR) {
+			/*
+			 * Ignore this error.
+			 */
+			err = 0;
+			for (i--; i >= 0; i--)
+				ssdfs_stop_gc_thread(fs_info, i);
+			goto stop_global_fsck_thread;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to start GC threads: "
+				  "type %#x, err %d\n",
+				  i, err);
+			for (i--; i >= 0; i--)
+				ssdfs_stop_gc_thread(fs_info, i);
+			goto stop_global_fsck_thread;
+		}
+	}
+
+	if (!(sb->s_flags & SB_RDONLY)) {
+		folio_batch_init(&payload.maptbl_cache.batch);
+
+		down_write(&fs_info->volume_sem);
+
+		err = ssdfs_prepare_sb_log(sb, &last_sb_log);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to prepare sb log: err %d\n",
+				  err);
+		}
+
+		err = ssdfs_snapshot_sb_log_payload(sb, &payload);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to snapshot sb log's payload: err %d\n",
+				  err);
+		}
+
+		if (!err) {
+			err = ssdfs_commit_super(sb, SSDFS_MOUNTED_FS,
+						 &last_sb_log,
+						 &payload);
+		} else {
+			SSDFS_ERR("fail to prepare sb log payload: "
+				  "err %d\n", err);
+		}
+
+		up_write(&fs_info->volume_sem);
+
+		ssdfs_super_folio_batch_release(&payload.maptbl_cache.batch);
+
+		if (err) {
+			SSDFS_NOTICE("fail to commit superblock info: "
+				     "remount filesystem in RO mode\n");
+			sb->s_flags |= SB_RDONLY;
+		}
+	}
+
+	atomic_set(&fs_info->global_fs_state, SSDFS_REGULAR_FS_OPERATIONS);
+	complete_all(&fs_info->mount_end);
+
+	if (sb->s_flags & SB_RDONLY) {
+		SSDFS_INFO("%s (page %s, erase block %s, segment %s) has been mounted READ-ONLY on device %s\n",
+			   SSDFS_VERSION,
+			   GRANULARITY2STRING(fs_info->pagesize),
+			   GRANULARITY2STRING(fs_info->erasesize),
+			   GRANULARITY2STRING(fs_info->segsize),
+			   fs_info->devops->device_name(sb));
+	} else {
+		SSDFS_INFO("%s (page %s, erase block %s, segment %s) has been mounted on device %s\n",
+			   SSDFS_VERSION,
+			   GRANULARITY2STRING(fs_info->pagesize),
+			   GRANULARITY2STRING(fs_info->erasesize),
+			   GRANULARITY2STRING(fs_info->segsize),
+			   fs_info->devops->device_name(sb));
+	}
+
+	return 0;
+
+stop_global_fsck_thread:
+	ssdfs_stop_global_fsck_thread(fs_info);
+
+put_root_inode:
+	iput(root_i);
+
+destroy_inodes_btree:
+	ssdfs_inodes_btree_destroy(fs_info);
+
+destroy_shdictree:
+	ssdfs_shared_dict_btree_destroy(fs_info);
+
+destroy_current_segment_array:
+	ssdfs_destroy_all_curent_segments(fs_info);
+
+destroy_invext_btree:
+	ssdfs_invextree_destroy(fs_info);
+
+destroy_shextree:
+	ssdfs_shextree_destroy(fs_info);
+
+destroy_segbmap:
+	ssdfs_segbmap_destroy(fs_info);
+
+destroy_maptbl:
+	ssdfs_maptbl_stop_thread(fs_info->maptbl);
+	ssdfs_maptbl_destroy(fs_info);
+
+destroy_segments_tree:
+	ssdfs_segment_tree_destroy(fs_info);
+	ssdfs_current_segment_array_destroy(fs_info);
+
+destroy_snapshot_subsystem:
+	ssdfs_snapshot_subsystem_destroy(fs_info);
+
+destroy_sysfs_device_group:
+	ssdfs_sysfs_delete_device_group(fs_info);
+
+release_maptbl_cache:
+	ssdfs_maptbl_cache_destroy(&fs_info->maptbl_cache);
+
+free_erase_folio:
+	if (fs_info->erase_folio)
+		ssdfs_super_free_folio(fs_info->erase_folio);
+
+	ssdfs_destruct_sb_info(&fs_info->sbi);
+	ssdfs_destruct_sb_info(&fs_info->sbi_backup);
+	ssdfs_destruct_sb_snap_info(&fs_info->sb_snapi);
+
+	ssdfs_free_workspaces();
+
+	rcu_barrier();
+
+	ssdfs_check_memory_folio_locks();
+	ssdfs_check_memory_leaks();
+	return err;
+}
+
+static inline
+bool unfinished_commit_log_requests_exist(struct ssdfs_fs_info *fsi)
+{
+	u64 commit_log_requests = 0;
+
+	spin_lock(&fsi->volume_state_lock);
+	commit_log_requests = fsi->commit_log_requests;
+	spin_unlock(&fsi->volume_state_lock);
+
+	return commit_log_requests > 0;
+}
+
+static inline
+void wait_unfinished_commit_log_requests(struct ssdfs_fs_info *fsi)
+{
+	if (unfinished_commit_log_requests_exist(fsi)) {
+		wait_queue_head_t *wq = &fsi->finish_user_data_flush_wq;
+		u64 old_commit_requests, new_commit_requests;
+		int number_of_tries = 0;
+		int err;
+
+		while (number_of_tries < SSDFS_UNMOUNT_NUMBER_OF_TRIES) {
+			spin_lock(&fsi->volume_state_lock);
+			old_commit_requests = fsi->commit_log_requests;
+			spin_unlock(&fsi->volume_state_lock);
+
+			DEFINE_WAIT_FUNC(wait, woken_wake_function);
+			add_wait_queue(wq, &wait);
+			if (unfinished_commit_log_requests_exist(fsi)) {
+				wait_woken(&wait, TASK_INTERRUPTIBLE, HZ);
+			}
+			remove_wait_queue(wq, &wait);
+
+			if (!unfinished_commit_log_requests_exist(fsi))
+				break;
+
+			spin_lock(&fsi->volume_state_lock);
+			new_commit_requests = fsi->commit_log_requests;
+			spin_unlock(&fsi->volume_state_lock);
+
+			if (old_commit_requests != new_commit_requests) {
+				if (number_of_tries > 0)
+					number_of_tries--;
+			} else
+				number_of_tries++;
+		}
+
+		if (unfinished_commit_log_requests_exist(fsi)) {
+			spin_lock(&fsi->volume_state_lock);
+			new_commit_requests = fsi->commit_log_requests;
+			spin_unlock(&fsi->volume_state_lock);
+
+			SSDFS_WARN("there are unfinished commit log requests: "
+				   "commit_log_requests %llu, "
+				   "number_of_tries %d, err %d\n",
+				   new_commit_requests,
+				   number_of_tries, err);
+		}
+	}
+}
+
+static void ssdfs_put_super(struct super_block *sb)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct ssdfs_peb_extent last_sb_log = {0};
+	struct ssdfs_sb_log_payload payload;
+	u64 fs_feature_compat;
+	u16 fs_state;
+	bool can_commit_super = true;
+	int i;
+	int res;
+	int err;
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("sb %p\n", sb);
+#else
+	SSDFS_DBG("sb %p\n", sb);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	atomic_set(&fsi->global_fs_state, SSDFS_UNMOUNT_METADATA_GOING_FLUSHING);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("SSDFS_UNMOUNT_METADATA_GOING_FLUSHING\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	wake_up_all(&fsi->pending_wq);
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("STOP THREADS...\n");
+#else
+	SSDFS_DBG("STOP THREADS...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	for (i = 0; i < SSDFS_GC_THREAD_TYPE_MAX; i++) {
+		err = ssdfs_stop_gc_thread(fsi, i);
+		if (err) {
+			SSDFS_ERR("fail to stop GC thread: "
+				  "type %#x, err %d\n", i, err);
+		}
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("GC threads have been stoped\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_shared_dict_stop_thread(fsi->shdictree);
+	if (err == -EIO) {
+		ssdfs_fs_error(fsi->sb,
+				__FILE__, __func__, __LINE__,
+				"thread I/O issue\n");
+	} else if (unlikely(err)) {
+		SSDFS_WARN("thread stopping issue: err %d\n",
+			   err);
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("shared dictionary thread has been stoped\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (i = SSDFS_INVALIDATION_QUEUE_NUMBER - 1; i >= 0; i--) {
+		err = ssdfs_shextree_stop_thread(fsi->shextree, i);
+		if (err == -EIO) {
+			ssdfs_fs_error(fsi->sb,
+					__FILE__, __func__, __LINE__,
+					"thread I/O issue\n");
+		} else if (unlikely(err)) {
+			SSDFS_WARN("thread stopping issue: ID %d, err %d\n",
+				   i, err);
+		}
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("shared extents threads have been stoped\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_stop_snapshots_btree_thread(fsi);
+	if (err == -EIO) {
+		ssdfs_fs_error(fsi->sb,
+				__FILE__, __func__, __LINE__,
+				"thread I/O issue\n");
+	} else if (unlikely(err)) {
+		SSDFS_WARN("thread stopping issue: err %d\n",
+			   err);
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("snaphots btree thread has been stoped\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_maptbl_stop_thread(fsi->maptbl);
+	if (unlikely(err)) {
+		SSDFS_WARN("maptbl thread stopping issue: err %d\n",
+			   err);
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("mapping table thread has been stoped\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	spin_lock(&fsi->volume_state_lock);
+	fs_feature_compat = fsi->fs_feature_compat;
+	fs_state = fsi->fs_state;
+	spin_unlock(&fsi->volume_state_lock);
+
+	folio_batch_init(&payload.maptbl_cache.batch);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("Wait unfinished user data requests...\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	wake_up_all(&fsi->pending_wq);
+	wait_unfinished_read_data_requests(fsi);
+	wait_unfinished_user_data_requests(fsi);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("Wait unfinished commit log requests...\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	wake_up_all(&fsi->pending_wq);
+	wait_unfinished_commit_log_requests(fsi);
+
+	if (!(sb->s_flags & SB_RDONLY)) {
+		atomic_set(&fsi->global_fs_state,
+				SSDFS_UNMOUNT_METADATA_UNDER_FLUSH);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("SSDFS_UNMOUNT_METADATA_UNDER_FLUSH\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		down_write(&fsi->volume_sem);
+
+		err = ssdfs_prepare_sb_log(sb, &last_sb_log);
+		if (unlikely(err)) {
+			can_commit_super = false;
+			SSDFS_ERR("fail to prepare sb log: err %d\n",
+				  err);
+		}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+		SSDFS_ERR("Flush invalidated extents b-tree...\n");
+#else
+		SSDFS_DBG("Flush invalidated extents b-tree...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+		if (fsi->fs_feature_compat &
+				SSDFS_HAS_INVALID_EXTENTS_TREE_COMPAT_FLAG) {
+			err = ssdfs_invextree_flush(fsi);
+			if (err) {
+				SSDFS_ERR("fail to flush invalidated extents btree: "
+					  "err %d\n", err);
+			}
+		}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+		SSDFS_ERR("Flush shared extents b-tree...\n");
+#else
+		SSDFS_DBG("Flush shared extents b-tree...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+		if (fsi->fs_feature_compat &
+				SSDFS_HAS_SHARED_EXTENTS_COMPAT_FLAG) {
+			err = ssdfs_shextree_flush(fsi);
+			if (err) {
+				SSDFS_ERR("fail to flush shared extents btree: "
+					  "err %d\n", err);
+			}
+		}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+		SSDFS_ERR("Flush inodes b-tree...\n");
+#else
+		SSDFS_DBG("Flush inodes b-tree...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+		if (fs_feature_compat & SSDFS_HAS_INODES_TREE_COMPAT_FLAG) {
+			err = ssdfs_inodes_btree_flush(fsi->inodes_tree);
+			if (err) {
+				SSDFS_ERR("fail to flush inodes btree: "
+					  "err %d\n", err);
+			}
+		}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+		SSDFS_ERR("Flush shared dictionary b-tree...\n");
+#else
+		SSDFS_DBG("Flush shared dictionary b-tree...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+		if (fs_feature_compat & SSDFS_HAS_SHARED_DICT_COMPAT_FLAG) {
+			err = ssdfs_shared_dict_btree_flush(fsi->shdictree);
+			if (err) {
+				SSDFS_ERR("fail to flush shared dictionary: "
+					  "err %d\n", err);
+			}
+		}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+		SSDFS_ERR("Execute create snapshots...\n");
+#else
+		SSDFS_DBG("Execute create snapshots...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+		err = ssdfs_execute_create_snapshots(fsi);
+		if (err) {
+			SSDFS_ERR("fail to process the snapshots creation\n");
+		}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+		SSDFS_ERR("Flush snapshots b-tree...\n");
+#else
+		SSDFS_DBG("Flush snapshots b-tree...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+		if (fsi->fs_feature_compat &
+				SSDFS_HAS_SNAPSHOTS_TREE_COMPAT_FLAG) {
+			err = ssdfs_snapshots_btree_flush(fsi);
+			if (err) {
+				SSDFS_ERR("fail to flush snapshots btree: "
+					  "err %d\n", err);
+			}
+		}
+
+		if (atomic_read(&fsi->segbmap_users) > 0) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("Wait absence of segment bitmap's users...\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			res = wait_event_killable_timeout(fsi->segbmap_users_wq,
+					atomic_read(&fsi->segbmap_users) <= 0,
+					SSDFS_DEFAULT_TIMEOUT);
+			if (res < 0) {
+				WARN_ON(1);
+			} else if (res > 1) {
+				/*
+				 * Condition changed before timeout
+				 */
+			} else {
+				/* timeout is elapsed */
+				WARN_ON(1);
+			}
+		}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+		SSDFS_ERR("Flush segment bitmap...\n");
+#else
+		SSDFS_DBG("Flush segment bitmap...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+		if (fs_feature_compat & SSDFS_HAS_SEGBMAP_COMPAT_FLAG) {
+			err = ssdfs_segbmap_flush(fsi->segbmap);
+			if (err) {
+				SSDFS_ERR("fail to flush segbmap: "
+					  "err %d\n", err);
+			}
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("Wait unfinished commit log requests...\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		wake_up_all(&fsi->pending_wq);
+		wait_unfinished_commit_log_requests(fsi);
+
+		if (atomic_read(&fsi->maptbl_users) > 0) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("Wait absence of mapping table users...\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			res = wait_event_killable_timeout(fsi->maptbl_users_wq,
+					atomic_read(&fsi->maptbl_users) <= 0,
+					SSDFS_DEFAULT_TIMEOUT);
+			if (res < 0) {
+				WARN_ON(1);
+			} else if (res > 1) {
+				/*
+				 * Condition changed before timeout
+				 */
+			} else {
+				/* timeout is elapsed */
+				WARN_ON(1);
+			}
+		}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+		SSDFS_ERR("Flush PEB mapping table...\n");
+#else
+		SSDFS_DBG("Flush PEB mapping table...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+		atomic_set(&fsi->global_fs_state,
+				SSDFS_UNMOUNT_MAPTBL_UNDER_FLUSH);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("SSDFS_UNMOUNT_MAPTBL_UNDER_FLUSH\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (fs_feature_compat & SSDFS_HAS_MAPTBL_COMPAT_FLAG) {
+			err = ssdfs_maptbl_flush(fsi->maptbl);
+			if (err) {
+				SSDFS_ERR("fail to flush maptbl: "
+					  "err %d\n", err);
+			}
+
+			wait_unfinished_commit_log_requests(fsi);
+			set_maptbl_going_to_be_destroyed(fsi);
+		}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+		SSDFS_ERR("Commit superblock...\n");
+#else
+		SSDFS_DBG("Commit superblock...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+		atomic_set(&fsi->global_fs_state,
+				SSDFS_UNMOUNT_COMMIT_SUPERBLOCK);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("SSDFS_UNMOUNT_COMMIT_SUPERBLOCK\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (can_commit_super) {
+			err = ssdfs_snapshot_sb_log_payload(sb, &payload);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to snapshot log's payload: "
+					  "err %d\n", err);
+			} else {
+				err = ssdfs_commit_super(sb, SSDFS_VALID_FS,
+							 &last_sb_log,
+							 &payload);
+			}
+		} else {
+			/* prepare error code */
+			err = -ERANGE;
+		}
+
+		if (err) {
+			SSDFS_ERR("fail to commit superblock info: "
+				  "err %d\n", err);
+		}
+
+		up_write(&fsi->volume_sem);
+	} else {
+		if (fs_state == SSDFS_ERROR_FS) {
+			down_write(&fsi->volume_sem);
+
+			err = ssdfs_prepare_sb_log(sb, &last_sb_log);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to prepare sb log: err %d\n",
+					  err);
+			}
+
+			err = ssdfs_snapshot_sb_log_payload(sb, &payload);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to snapshot log's payload: "
+					  "err %d\n", err);
+			}
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+			SSDFS_ERR("Commit superblock...\n");
+#else
+			SSDFS_DBG("Commit superblock...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+			atomic_set(&fsi->global_fs_state,
+					SSDFS_UNMOUNT_COMMIT_SUPERBLOCK);
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("SSDFS_UNMOUNT_COMMIT_SUPERBLOCK\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			if (!err) {
+				err = ssdfs_commit_super(sb, SSDFS_ERROR_FS,
+							 &last_sb_log,
+							 &payload);
+			}
+
+			up_write(&fsi->volume_sem);
+
+			if (err) {
+				SSDFS_ERR("fail to commit superblock info: "
+					  "err %d\n", err);
+			}
+		}
+	}
+
+	atomic_set(&fsi->global_fs_state, SSDFS_UNMOUNT_DESTROY_METADATA);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("SSDFS_UNMOUNT_DESTROY_METADATA\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("STOP GLOBAL FSCK THREAD...\n");
+#else
+	SSDFS_DBG("STOP GLOBAL FSCK THREAD...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	err = ssdfs_stop_global_fsck_thread(fsi);
+	if (err) {
+		SSDFS_ERR("fail to stop global FSCK thread: "
+			  "err %d\n", err);
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("Global FSCK thread has been stoped\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (i = 0; i < folio_batch_count(&payload.maptbl_cache.batch); i++) {
+		struct folio *payload_folio =
+				payload.maptbl_cache.batch.folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!payload_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		ssdfs_folio_lock(payload_folio);
+		folio_clear_uptodate(payload_folio);
+		ssdfs_folio_unlock(payload_folio);
+	}
+
+	ssdfs_super_folio_batch_release(&payload.maptbl_cache.batch);
+	fsi->devops->sync(sb);
+
+	/*
+	 * Make sure all delayed rcu free inodes are flushed.
+	 */
+	rcu_barrier();
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("All delayed rcu free inodes has been flushed\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("Starting destroy the metadata structures...\n");
+#else
+	SSDFS_DBG("Starting destroy the metadata structures...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	SSDFS_INFO("%s has been unmounted from device %s\n",
+		   SSDFS_VERSION, fsi->devops->device_name(sb));
+
+	ssdfs_snapshot_subsystem_destroy(fsi);
+	ssdfs_invextree_destroy(fsi);
+	ssdfs_shextree_destroy(fsi);
+	ssdfs_inodes_btree_destroy(fsi);
+	ssdfs_shared_dict_btree_destroy(fsi);
+	ssdfs_segbmap_destroy(fsi);
+	ssdfs_maptbl_destroy(fsi);
+	ssdfs_maptbl_cache_destroy(&fsi->maptbl_cache);
+	ssdfs_destroy_all_curent_segments(fsi);
+	ssdfs_segment_tree_destroy(fsi);
+	ssdfs_current_segment_array_destroy(fsi);
+	ssdfs_sysfs_delete_device_group(fsi);
+
+	if (fsi->erase_folio)
+		ssdfs_super_free_folio(fsi->erase_folio);
+
+	ssdfs_destruct_sb_info(&fsi->sbi);
+	ssdfs_destruct_sb_info(&fsi->sbi_backup);
+	ssdfs_destruct_sb_snap_info(&fsi->sb_snapi);
+
+	ssdfs_free_workspaces();
+
+	ssdfs_check_memory_folio_locks();
+	ssdfs_check_memory_leaks();
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("All metadata structures have been destroyed...\n");
+#else
+	SSDFS_DBG("All metadata structures have been destroyed...\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	SSDFS_INFO("All metadata structures have been destroyed\n");
+}
+
+static int ssdfs_reconfigure(struct fs_context *fc)
+{
+	struct ssdfs_mount_context *ctx = fc->fs_private;
+	struct super_block *sb = fc->root->d_sb;
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+
+	sync_filesystem(sb);
+	fsi->mount_opts = ctx->s_mount_opts;
+
+	return ssdfs_remount_fs(fc, sb);
+}
+
+static int ssdfs_get_tree(struct fs_context *fc)
+{
+#ifdef CONFIG_SSDFS_MTD_DEVICE
+	return get_tree_mtd(fc, ssdfs_fill_super);
+#elif defined(CONFIG_SSDFS_BLOCK_DEVICE)
+	return get_tree_bdev(fc, ssdfs_fill_super);
+#else
+	BUILD_BUG();
+	return -EOPNOTSUPP;
+#endif
+}
+
+static void ssdfs_fc_free(struct fs_context *fc)
+{
+	struct ssdfs_mount_context *ctx = fc->fs_private;
+
+	if (!ctx)
+		return;
+
+	kfree(ctx);
+}
+
+static const struct fs_context_operations ssdfs_context_ops = {
+	.parse_param	= ssdfs_parse_param,
+	.get_tree	= ssdfs_get_tree,
+	.reconfigure	= ssdfs_reconfigure,
+	.free		= ssdfs_fc_free,
+};
+
+static int ssdfs_init_fs_context(struct fs_context *fc)
+{
+	struct ssdfs_mount_context *ctx;
+
+	ctx = kzalloc(sizeof(struct ssdfs_mount_context), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	fc->fs_private = ctx;
+	fc->ops = &ssdfs_context_ops;
+
+	return 0;
+}
+
+static void kill_ssdfs_sb(struct super_block *sb)
+{
+#ifdef CONFIG_SSDFS_MTD_DEVICE
+	kill_mtd_super(sb);
+#elif defined(CONFIG_SSDFS_BLOCK_DEVICE)
+	kill_block_super(sb);
+#else
+	BUILD_BUG();
+#endif
+
+	if (sb->s_fs_info) {
+		kfree(sb->s_fs_info);
+		sb->s_fs_info = NULL;
+	}
+}
+
+static struct file_system_type ssdfs_fs_type = {
+	.name		= "ssdfs",
+	.owner		= THIS_MODULE,
+	.init_fs_context = ssdfs_init_fs_context,
+	.kill_sb	= kill_ssdfs_sb,
+#ifdef CONFIG_SSDFS_BLOCK_DEVICE
+	.fs_flags	= FS_REQUIRES_DEV,
+#endif
+};
+MODULE_ALIAS_FS(SSDFS_VERSION);
+
+static void ssdfs_destroy_caches(void)
+{
+	/*
+	 * Make sure all delayed rcu free inodes are flushed before we
+	 * destroy cache.
+	 */
+	rcu_barrier();
+
+	if (ssdfs_inode_cachep)
+		kmem_cache_destroy(ssdfs_inode_cachep);
+
+	ssdfs_destroy_seg_req_obj_cache();
+	ssdfs_destroy_dirty_folios_obj_cache();
+	ssdfs_destroy_btree_search_obj_cache();
+	ssdfs_destroy_free_ino_desc_cache();
+	ssdfs_destroy_btree_node_obj_cache();
+	ssdfs_destroy_seg_obj_cache();
+	ssdfs_destroy_extent_info_cache();
+	ssdfs_destroy_peb_mapping_info_cache();
+	ssdfs_destroy_blk2off_frag_obj_cache();
+	ssdfs_destroy_name_info_cache();
+	ssdfs_destroy_seg_object_info_cache();
+}
+
+static int ssdfs_init_caches(void)
+{
+	int err;
+
+	ssdfs_zero_seg_obj_cache_ptr();
+	ssdfs_zero_seg_req_obj_cache_ptr();
+	ssdfs_zero_dirty_folios_obj_cache_ptr();
+	ssdfs_zero_extent_info_cache_ptr();
+	ssdfs_zero_btree_node_obj_cache_ptr();
+	ssdfs_zero_btree_search_obj_cache_ptr();
+	ssdfs_zero_free_ino_desc_cache_ptr();
+	ssdfs_zero_peb_mapping_info_cache_ptr();
+	ssdfs_zero_blk2off_frag_obj_cache_ptr();
+	ssdfs_zero_name_info_cache_ptr();
+	ssdfs_zero_seg_object_info_cache_ptr();
+
+	ssdfs_inode_cachep = kmem_cache_create("ssdfs_inode_cache",
+					sizeof(struct ssdfs_inode_info), 0,
+					SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
+					ssdfs_init_inode_once);
+	if (!ssdfs_inode_cachep) {
+		SSDFS_ERR("unable to create inode cache\n");
+		return -ENOMEM;
+	}
+
+	err = ssdfs_init_seg_obj_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create segment object cache: err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	err = ssdfs_init_seg_req_obj_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create segment request object cache: "
+			  "err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	err = ssdfs_init_dirty_folios_obj_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create dirty folios object cache: "
+			  "err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	err = ssdfs_init_extent_info_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create extent info object cache: "
+			  "err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	err = ssdfs_init_btree_node_obj_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create btree node object cache: err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	err = ssdfs_init_btree_search_obj_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create btree search object cache: "
+			  "err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	err = ssdfs_init_free_ino_desc_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create free inode descriptors cache: "
+			  "err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	err = ssdfs_init_peb_mapping_info_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create PEB mapping descriptors cache: "
+			  "err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	err = ssdfs_init_blk2off_frag_obj_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create blk2off fragments cache: "
+			  "err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	err = ssdfs_init_name_info_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create name info cache: "
+			  "err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	err = ssdfs_init_seg_object_info_cache();
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to create segment object info cache: "
+			  "err %d\n",
+			  err);
+		goto destroy_caches;
+	}
+
+	return 0;
+
+destroy_caches:
+	ssdfs_destroy_caches();
+	return -ENOMEM;
+}
+
+static inline void ssdfs_print_info(void)
+{
+	SSDFS_INFO("%s loaded\n", SSDFS_VERSION);
+}
+
+static int __init ssdfs_init(void)
+{
+	int err;
+
+	err = ssdfs_init_caches();
+	if (err) {
+		SSDFS_ERR("failed to initialize caches\n");
+		goto failed_init;
+	}
+
+	err = ssdfs_compressors_init();
+	if (err) {
+		SSDFS_ERR("failed to initialize compressors\n");
+		goto free_caches;
+	}
+
+	err = ssdfs_sysfs_init();
+	if (err) {
+		SSDFS_ERR("failed to initialize sysfs subsystem\n");
+		goto stop_compressors;
+	}
+
+	err = register_filesystem(&ssdfs_fs_type);
+	if (err) {
+		SSDFS_ERR("failed to register filesystem\n");
+		goto sysfs_exit;
+	}
+
+	ssdfs_print_info();
+
+	return 0;
+
+sysfs_exit:
+	ssdfs_sysfs_exit();
+
+stop_compressors:
+	ssdfs_compressors_exit();
+
+free_caches:
+	ssdfs_destroy_caches();
+
+failed_init:
+	return err;
+}
+
+static void __exit ssdfs_exit(void)
+{
+	ssdfs_destroy_caches();
+	unregister_filesystem(&ssdfs_fs_type);
+	ssdfs_sysfs_exit();
+	ssdfs_compressors_exit();
+}
+
+module_init(ssdfs_init);
+module_exit(ssdfs_exit);
+
+MODULE_DESCRIPTION("SSDFS -- SSD-oriented File System");
+MODULE_AUTHOR("HGST, San Jose Research Center, Storage Architecture Group");
+MODULE_AUTHOR("Viacheslav Dubeyko <slava@dubeyko.com>");
+MODULE_LICENSE("Dual BSD/GPL");
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 07/79] ssdfs: implement commit superblock logic
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (4 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 06/79] ssdfs: implement super operations Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 08/79] ssdfs: segment header + log footer operations Viacheslav Dubeyko
                   ` (26 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

This patch implements commit superblock logic.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/super.c | 2796 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 2796 insertions(+)

diff --git a/fs/ssdfs/super.c b/fs/ssdfs/super.c
index f430eee9aaf0..853d64d220ae 100644
--- a/fs/ssdfs/super.c
+++ b/fs/ssdfs/super.c
@@ -394,6 +394,2802 @@ static int ssdfs_remount_fs(struct fs_context *fc, struct super_block *sb)
 	return err;
 }
 
+static inline
+bool unfinished_read_data_requests_exist(struct ssdfs_fs_info *fsi)
+{
+	u64 read_requests = 0;
+
+	spin_lock(&fsi->volume_state_lock);
+	read_requests = fsi->read_user_data_requests;
+	spin_unlock(&fsi->volume_state_lock);
+
+	return read_requests > 0;
+}
+
+static inline
+void wait_unfinished_read_data_requests(struct ssdfs_fs_info *fsi)
+{
+	if (unfinished_read_data_requests_exist(fsi)) {
+		wait_queue_head_t *wq = &fsi->finish_user_data_read_wq;
+		u64 old_read_requests, new_read_requests;
+		int number_of_tries = 0;
+		int err;
+
+		while (number_of_tries < SSDFS_UNMOUNT_NUMBER_OF_TRIES) {
+			spin_lock(&fsi->volume_state_lock);
+			old_read_requests = fsi->read_user_data_requests;
+			spin_unlock(&fsi->volume_state_lock);
+
+			DEFINE_WAIT_FUNC(wait, woken_wake_function);
+			add_wait_queue(wq, &wait);
+			if (unfinished_read_data_requests_exist(fsi)) {
+				wait_woken(&wait, TASK_INTERRUPTIBLE, HZ);
+			}
+			remove_wait_queue(wq, &wait);
+
+			if (!unfinished_read_data_requests_exist(fsi))
+				break;
+
+			spin_lock(&fsi->volume_state_lock);
+			new_read_requests = fsi->read_user_data_requests;
+			spin_unlock(&fsi->volume_state_lock);
+
+			if (old_read_requests != new_read_requests) {
+				if (number_of_tries > 0)
+					number_of_tries--;
+			} else
+				number_of_tries++;
+		}
+
+		if (unfinished_read_data_requests_exist(fsi)) {
+			spin_lock(&fsi->volume_state_lock);
+			new_read_requests = fsi->read_user_data_requests;
+			spin_unlock(&fsi->volume_state_lock);
+
+			SSDFS_WARN("there are unfinished requests: "
+				   "unfinished_read_data_requests %llu, "
+				   "number_of_tries %d, err %d\n",
+				   new_read_requests,
+				   number_of_tries, err);
+		}
+	}
+}
+
+static inline
+bool unfinished_user_data_requests_exist(struct ssdfs_fs_info *fsi)
+{
+	u64 flush_requests = 0;
+
+	spin_lock(&fsi->volume_state_lock);
+	flush_requests = fsi->flushing_user_data_requests;
+	spin_unlock(&fsi->volume_state_lock);
+
+	return flush_requests > 0;
+}
+
+#ifdef CONFIG_SSDFS_DEBUG
+static inline
+void ssdfs_show_unfinished_user_data_requests(struct ssdfs_fs_info *fsi)
+{
+	bool is_empty;
+	struct list_head *this, *next;
+
+	spin_lock(&fsi->requests_lock);
+	is_empty = list_empty_careful(&fsi->user_data_requests_list);
+	if (!is_empty) {
+		list_for_each_safe(this, next, &fsi->user_data_requests_list) {
+			struct ssdfs_segment_request *req;
+
+			req = list_entry(this, struct ssdfs_segment_request,
+					 list);
+
+			if (!req) {
+				SSDFS_ERR_DBG("empty request ptr\n");
+				continue;
+			}
+
+			SSDFS_ERR_DBG("request: "
+				      "class %#x, cmd %#x, "
+				      "type %#x, refs_count %u, "
+				      "seg %llu, extent (start %u, len %u)\n",
+				      req->private.class, req->private.cmd,
+				      req->private.type,
+				      atomic_read(&req->private.refs_count),
+				      req->place.start.seg_id,
+				      req->place.start.blk_index,
+				      req->place.len);
+		}
+	}
+	spin_unlock(&fsi->requests_lock);
+
+	if (is_empty) {
+		SSDFS_ERR_DBG("list is empty\n");
+		return;
+	}
+}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+static inline
+void wait_unfinished_user_data_requests(struct ssdfs_fs_info *fsi)
+{
+	if (unfinished_user_data_requests_exist(fsi)) {
+		wait_queue_head_t *wq = &fsi->finish_user_data_flush_wq;
+		u64 old_flush_requests, new_flush_requests;
+		int number_of_tries = 0;
+		int err;
+
+		while (number_of_tries < SSDFS_UNMOUNT_NUMBER_OF_TRIES) {
+			spin_lock(&fsi->volume_state_lock);
+			old_flush_requests =
+				fsi->flushing_user_data_requests;
+			spin_unlock(&fsi->volume_state_lock);
+
+			DEFINE_WAIT_FUNC(wait, woken_wake_function);
+			add_wait_queue(wq, &wait);
+			if (unfinished_user_data_requests_exist(fsi)) {
+				wait_woken(&wait, TASK_INTERRUPTIBLE, HZ);
+			}
+			remove_wait_queue(wq, &wait);
+
+			if (!unfinished_user_data_requests_exist(fsi))
+				break;
+
+			spin_lock(&fsi->volume_state_lock);
+			new_flush_requests =
+				fsi->flushing_user_data_requests;
+			spin_unlock(&fsi->volume_state_lock);
+
+			if (old_flush_requests != new_flush_requests) {
+				if (number_of_tries > 0)
+					number_of_tries--;
+			} else
+				number_of_tries++;
+		}
+
+		if (unfinished_user_data_requests_exist(fsi)) {
+			spin_lock(&fsi->volume_state_lock);
+			new_flush_requests =
+				fsi->flushing_user_data_requests;
+			spin_unlock(&fsi->volume_state_lock);
+
+			SSDFS_WARN("there are unfinished requests: "
+				   "unfinished_user_data_requests %llu, "
+				   "number_of_tries %d, err %d\n",
+				   new_flush_requests,
+				   number_of_tries, err);
+
+#ifdef CONFIG_SSDFS_DEBUG
+			ssdfs_show_unfinished_user_data_requests(fsi);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+	}
+}
+
+static int ssdfs_sync_fs(struct super_block *sb, int wait)
+{
+	struct ssdfs_fs_info *fsi;
+	int err = 0;
+
+	fsi = SSDFS_FS_I(sb);
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("sb %p\n", sb);
+#else
+	SSDFS_DBG("sb %p\n", sb);
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+#ifdef CONFIG_SSDFS_SHOW_CONSUMED_MEMORY
+	SSDFS_ERR("SYNCFS is starting...\n");
+	ssdfs_check_memory_leaks();
+#endif /* CONFIG_SSDFS_SHOW_CONSUMED_MEMORY */
+
+	atomic_set(&fsi->global_fs_state, SSDFS_METADATA_GOING_FLUSHING);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("SSDFS_METADATA_GOING_FLUSHING\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	wake_up_all(&fsi->pending_wq);
+	wait_unfinished_user_data_requests(fsi);
+
+	atomic_set(&fsi->global_fs_state, SSDFS_METADATA_UNDER_FLUSH);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("SSDFS_METADATA_UNDER_FLUSH\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	down_write(&fsi->volume_sem);
+
+	if (fsi->fs_feature_compat &
+			SSDFS_HAS_INVALID_EXTENTS_TREE_COMPAT_FLAG) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("flush invalidated extents btree\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_invextree_flush(fsi);
+		if (err) {
+			SSDFS_ERR("fail to flush invalidated extents btree: "
+				  "err %d\n", err);
+		}
+	}
+
+	if (fsi->fs_feature_compat & SSDFS_HAS_SHARED_EXTENTS_COMPAT_FLAG) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("flush shared extents btree\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_shextree_flush(fsi);
+		if (err) {
+			SSDFS_ERR("fail to flush shared extents btree: "
+				  "err %d\n", err);
+		}
+	}
+
+	if (fsi->fs_feature_compat & SSDFS_HAS_INODES_TREE_COMPAT_FLAG) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("flush inodes btree\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_inodes_btree_flush(fsi->inodes_tree);
+		if (err) {
+			SSDFS_ERR("fail to flush inodes btree: "
+				  "err %d\n", err);
+		}
+	}
+
+	if (fsi->fs_feature_compat & SSDFS_HAS_SHARED_DICT_COMPAT_FLAG) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("flush shared dictionary\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_shared_dict_btree_flush(fsi->shdictree);
+		if (err) {
+			SSDFS_ERR("fail to flush shared dictionary: "
+				  "err %d\n", err);
+		}
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("process the snapshots creation\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_execute_create_snapshots(fsi);
+	if (err) {
+		SSDFS_ERR("fail to process the snapshots creation\n");
+	}
+
+	if (fsi->fs_feature_compat & SSDFS_HAS_SNAPSHOTS_TREE_COMPAT_FLAG) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("flush snapshots btree\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_snapshots_btree_flush(fsi);
+		if (err) {
+			SSDFS_ERR("fail to flush snapshots btree: "
+				  "err %d\n", err);
+		}
+	}
+
+	if (fsi->fs_feature_compat & SSDFS_HAS_SEGBMAP_COMPAT_FLAG) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("flush segment bitmap\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_segbmap_flush(fsi->segbmap);
+		if (err) {
+			SSDFS_ERR("fail to flush segment bitmap: "
+				  "err %d\n", err);
+		}
+	}
+
+	if (fsi->fs_feature_compat & SSDFS_HAS_MAPTBL_COMPAT_FLAG) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("flush mapping table\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_maptbl_flush(fsi->maptbl);
+		if (err) {
+			SSDFS_ERR("fail to flush mapping table: "
+				  "err %d\n", err);
+		}
+	}
+
+	up_write(&fsi->volume_sem);
+
+	atomic_set(&fsi->global_fs_state, SSDFS_REGULAR_FS_OPERATIONS);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("SSDFS_REGULAR_FS_OPERATIONS\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	wake_up_all(&fsi->pending_wq);
+
+#ifdef CONFIG_SSDFS_SHOW_CONSUMED_MEMORY
+	SSDFS_ERR("SYNCFS has been finished...\n");
+	ssdfs_check_memory_leaks();
+#endif /* CONFIG_SSDFS_SHOW_CONSUMED_MEMORY */
+
+#ifdef CONFIG_SSDFS_TRACK_API_CALL
+	SSDFS_ERR("finished\n");
+#endif /* CONFIG_SSDFS_TRACK_API_CALL */
+
+	if (unlikely(err))
+		goto fail_sync_fs;
+
+	trace_ssdfs_sync_fs(sb, wait);
+
+	return 0;
+
+fail_sync_fs:
+	trace_ssdfs_sync_fs_exit(sb, wait, err);
+	return err;
+}
+
+static struct inode *ssdfs_nfs_get_inode(struct super_block *sb,
+					 u64 ino, u32 generation)
+{
+	struct inode *inode;
+
+	if (ino < SSDFS_ROOT_INO)
+		return ERR_PTR(-ESTALE);
+
+	inode = ssdfs_iget(sb, ino);
+	if (IS_ERR(inode))
+		return ERR_CAST(inode);
+	if (generation && inode->i_generation != generation) {
+		iput(inode);
+		return ERR_PTR(-ESTALE);
+	}
+	return inode;
+}
+
+static struct dentry *ssdfs_fh_to_dentry(struct super_block *sb,
+					 struct fid *fid,
+					 int fh_len, int fh_type)
+{
+	return generic_fh_to_dentry(sb, fid, fh_len, fh_type,
+				    ssdfs_nfs_get_inode);
+}
+
+static struct dentry *ssdfs_fh_to_parent(struct super_block *sb,
+					 struct fid *fid,
+					 int fh_len, int fh_type)
+{
+	return generic_fh_to_parent(sb, fid, fh_len, fh_type,
+				    ssdfs_nfs_get_inode);
+}
+
+static struct dentry *ssdfs_get_parent(struct dentry *child)
+{
+	struct qstr dotdot = QSTR_INIT("..", 2);
+	ino_t ino;
+	int err;
+
+	down_read(&SSDFS_I(d_inode(child))->lock);
+	err = ssdfs_inode_by_name(d_inode(child), &dotdot, &ino);
+	up_read(&SSDFS_I(d_inode(child))->lock);
+
+	if (unlikely(err))
+		return ERR_PTR(err);
+
+	return d_obtain_alias(ssdfs_iget(child->d_sb, ino));
+}
+
+static const struct export_operations ssdfs_export_ops = {
+	.get_parent	= ssdfs_get_parent,
+	.fh_to_dentry	= ssdfs_fh_to_dentry,
+	.fh_to_parent	= ssdfs_fh_to_parent,
+};
+
+static const struct super_operations ssdfs_super_operations = {
+	.alloc_inode	= ssdfs_alloc_inode,
+	.destroy_inode	= ssdfs_destroy_inode,
+	.evict_inode	= ssdfs_evict_inode,
+	.write_inode	= ssdfs_write_inode,
+	.statfs		= ssdfs_statfs,
+	.show_options	= ssdfs_show_options,
+	.put_super	= ssdfs_put_super,
+	.sync_fs	= ssdfs_sync_fs,
+};
+
+static inline
+u32 ssdfs_sb_payload_size(struct folio_batch *batch)
+{
+	struct ssdfs_maptbl_cache_header *hdr;
+	struct folio *folio;
+	void *kaddr;
+	u16 fragment_bytes_count;
+	u32 bytes_count = 0;
+	int i;
+
+	for (i = 0; i < folio_batch_count(batch); i++) {
+		folio = batch->folios[i];
+
+		ssdfs_folio_lock(folio);
+		kaddr = kmap_local_folio(folio, 0);
+		hdr = (struct ssdfs_maptbl_cache_header *)kaddr;
+		fragment_bytes_count = le16_to_cpu(hdr->bytes_count);
+		kunmap_local(kaddr);
+		ssdfs_folio_unlock(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		WARN_ON(fragment_bytes_count > PAGE_SIZE);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		bytes_count += fragment_bytes_count;
+	}
+
+	return bytes_count;
+}
+
+static u32 ssdfs_define_sb_log_size(struct super_block *sb)
+{
+	struct ssdfs_fs_info *fsi;
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	u32 inline_capacity;
+	u32 log_size = 0;
+	u32 payload_size;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb);
+
+	SSDFS_DBG("sb %p\n", sb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	fsi = SSDFS_FS_I(sb);
+	payload_size = ssdfs_sb_payload_size(&fsi->maptbl_cache.batch);
+	inline_capacity = PAGE_SIZE - hdr_size;
+
+	if (payload_size > inline_capacity) {
+		log_size += PAGE_SIZE;
+		log_size += atomic_read(&fsi->maptbl_cache.bytes_count);
+		log_size += PAGE_SIZE;
+	} else {
+		log_size += PAGE_SIZE;
+		log_size += PAGE_SIZE;
+	}
+
+	log_size = (log_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+
+	return log_size;
+}
+
+static int ssdfs_snapshot_sb_log_payload(struct super_block *sb,
+					 struct ssdfs_sb_log_payload *payload)
+{
+	struct ssdfs_fs_info *fsi;
+	struct folio *sfolio, *dfolio;
+	unsigned folios_count;
+	unsigned i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !payload);
+	BUG_ON(folio_batch_count(&payload->maptbl_cache.batch) != 0);
+
+	SSDFS_DBG("sb %p, payload %p\n",
+		  sb, payload);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	fsi = SSDFS_FS_I(sb);
+
+	down_read(&fsi->maptbl_cache.lock);
+
+	folios_count = folio_batch_count(&fsi->maptbl_cache.batch);
+
+	for (i = 0; i < folios_count; i++) {
+		dfolio = ssdfs_super_add_batch_folio(&payload->maptbl_cache.batch,
+						     get_order(PAGE_SIZE));
+		if (unlikely(IS_ERR_OR_NULL(dfolio))) {
+			err = !dfolio ? -ENOMEM : PTR_ERR(dfolio);
+			SSDFS_ERR("fail to add folio into batch: "
+				  "index %u, err %d\n",
+				  i, err);
+			goto finish_maptbl_snapshot;
+		}
+
+		sfolio = fsi->maptbl_cache.batch.folios[i];
+		if (unlikely(!sfolio)) {
+			err = -ERANGE;
+			SSDFS_ERR("source folio is absent: index %u\n",
+				  i);
+			goto finish_maptbl_snapshot;
+		}
+
+		ssdfs_folio_lock(sfolio);
+		ssdfs_folio_lock(dfolio);
+		__ssdfs_memcpy_folio(dfolio, 0, PAGE_SIZE,
+				     sfolio, 0, PAGE_SIZE,
+				     PAGE_SIZE);
+		ssdfs_folio_unlock(dfolio);
+		ssdfs_folio_unlock(sfolio);
+	}
+
+	payload->maptbl_cache.bytes_count =
+		atomic_read(&fsi->maptbl_cache.bytes_count);
+
+finish_maptbl_snapshot:
+	up_read(&fsi->maptbl_cache.lock);
+
+	if (unlikely(err))
+		ssdfs_super_folio_batch_release(&payload->maptbl_cache.batch);
+
+	return err;
+}
+
+static inline
+int ssdfs_write_padding_block(struct super_block *sb,
+			      struct folio *folio,
+			      loff_t offset)
+{
+	struct ssdfs_fs_info *fsi;
+	struct ssdfs_padding_header *hdr;
+	size_t hdr_size = sizeof(struct ssdfs_padding_header);
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !folio);
+	BUG_ON(!SSDFS_FS_I(sb)->devops);
+	BUG_ON(!SSDFS_FS_I(sb)->devops->write_block);
+
+	SSDFS_DBG("offset %llu\n", offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	fsi = SSDFS_FS_I(sb);
+
+	/* ->writepage() calls put_folio() */
+	ssdfs_folio_get(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio %p, count %d\n",
+		  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_folio_lock(folio);
+
+	err = __ssdfs_memset_folio(folio, hdr_size, folio_size(folio),
+				   0xdada, folio_size(folio) - hdr_size);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to prepare padding block: err %d\n", err);
+		ssdfs_folio_unlock(folio);
+		goto unlock_folio;
+	}
+
+	hdr = (struct ssdfs_padding_header *)kmap_local_folio(folio, 0);
+	hdr->magic.common = cpu_to_le32(SSDFS_SUPER_MAGIC);
+	hdr->magic.key = cpu_to_le16(SSDFS_PADDING_HDR_MAGIC);
+	hdr->magic.version.major = SSDFS_MAJOR_REVISION;
+	hdr->magic.version.minor = SSDFS_MINOR_REVISION;
+	hdr->blob = cpu_to_le64(SSDFS_PADDING_BLOB);
+	hdr->check.bytes = cpu_to_le16(folio_size(folio));
+	hdr->check.flags = cpu_to_le16(0);
+	hdr->check.csum = cpu_to_le32(0xdada);
+	kunmap_local(hdr);
+
+	folio_mark_uptodate(folio);
+	folio_set_dirty(folio);
+
+	ssdfs_folio_unlock(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("offset %llu\n", offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = fsi->devops->write_block(sb, offset, folio);
+	if (err) {
+		SSDFS_ERR("fail to write segment header: "
+			  "offset %llu, size %zu\n",
+			  (u64)offset, folio_size(folio));
+	}
+
+	ssdfs_folio_lock(folio);
+	folio_clear_uptodate(folio);
+unlock_folio:
+	ssdfs_folio_unlock(folio);
+
+	return err;
+}
+
+static int ssdfs_write_padding(struct super_block *sb,
+				loff_t offset, loff_t size)
+{
+	struct folio *folio;
+	loff_t written = 0;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb);
+	BUG_ON(!SSDFS_FS_I(sb)->devops);
+	BUG_ON(!SSDFS_FS_I(sb)->devops->write_block);
+
+	SSDFS_DBG("offset %llu, size %llu\n", offset, size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (size == 0 || size % PAGE_SIZE) {
+		SSDFS_ERR("invalid size %llu\n", size);
+		return -EINVAL;
+	}
+
+	folio = ssdfs_super_alloc_folio(GFP_KERNEL | __GFP_ZERO,
+					get_order(PAGE_SIZE));
+	if (IS_ERR_OR_NULL(folio)) {
+		err = (folio == NULL ? -ENOMEM : PTR_ERR(folio));
+		SSDFS_ERR("unable to allocate memory folio\n");
+		return err;
+	}
+
+	while (written < size) {
+		offset = offset + written;
+
+		err = ssdfs_write_padding_block(sb, folio, offset);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to write padding block: "
+				  "offset %llu\n", offset);
+			goto free_folio;
+		}
+
+		written += folio_size(folio);
+	}
+
+free_folio:
+	ssdfs_super_free_folio(folio);
+
+	return err;
+}
+
+static inline
+int ssdfs_write_padding_in_sb_segs_pair(struct super_block *sb,
+					u32 log_page)
+{
+	struct ssdfs_fs_info *fsi;
+	u64 cur_peb;
+	u32 pages_per_peb;
+	loff_t peb_offset;
+	loff_t padding_offset;
+	loff_t padding_size;
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb);
+	BUG_ON(!SSDFS_FS_I(sb)->devops);
+	BUG_ON(!SSDFS_FS_I(sb)->devops->write_block);
+
+	SSDFS_DBG("log_page %u,\n", log_page);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	fsi = SSDFS_FS_I(sb);
+
+	/*
+	 * Superblock segment uses 4KB page always.
+	 * It needs to calculate pages_per_peb value.
+	 */
+	pages_per_peb = fsi->erasesize / PAGE_SIZE;
+
+	for (i = 0; i < SSDFS_SB_SEG_COPY_MAX; i++) {
+		cur_peb = fsi->sb_pebs[SSDFS_CUR_SB_SEG][i];
+
+		padding_size = pages_per_peb - log_page;
+		padding_size <<= PAGE_SHIFT;
+
+		peb_offset = cur_peb * fsi->pages_per_peb;
+		peb_offset <<= fsi->log_pagesize;
+
+		padding_offset = log_page;
+		padding_offset <<= PAGE_SHIFT;
+		padding_offset += peb_offset;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("cur_peb %llu, padding_offset %llu, "
+			  "padding_size %llu\n",
+			  cur_peb, (u64)padding_offset,
+			  (u64)padding_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_write_padding(sb, padding_offset, padding_size);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to write padding: "
+				  "padding_offset %llu, "
+				  "padding_size %llu, "
+				  "err %d\n",
+				  padding_offset,
+				  padding_size,
+				  err);
+			return err;
+		}
+	}
+
+	return 0;
+}
+
+static int ssdfs_define_next_sb_log_place(struct super_block *sb,
+					  struct ssdfs_peb_extent *last_sb_log)
+{
+	struct ssdfs_fs_info *fsi;
+	u32 offset;
+	u32 log_size;
+	u64 cur_peb, prev_peb;
+	u64 cur_leb;
+	u32 pages_per_peb;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !last_sb_log);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	fsi = SSDFS_FS_I(sb);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p, last_sb_log %p\n",
+		  sb, last_sb_log);
+	SSDFS_DBG("fsi->sbi.last_log: (leb_id %llu, peb_id %llu, "
+		  "page_offset %u, pages_count %u), "
+		  "last_sb_log: (leb_id %llu, peb_id %llu, "
+		  "page_offset %u, pages_count %u)\n",
+		  fsi->sbi.last_log.leb_id,
+		  fsi->sbi.last_log.peb_id,
+		  fsi->sbi.last_log.page_offset,
+		  fsi->sbi.last_log.pages_count,
+		  last_sb_log->leb_id,
+		  last_sb_log->peb_id,
+		  last_sb_log->page_offset,
+		  last_sb_log->pages_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	/*
+	 * Superblock segment uses 4KB page always.
+	 * It needs to calculate pages_per_peb value.
+	 */
+	pages_per_peb = fsi->erasesize / PAGE_SIZE;
+
+	offset = fsi->sbi.last_log.page_offset;
+
+	log_size = ssdfs_define_sb_log_size(sb);
+	if (log_size > pages_per_peb) {
+		SSDFS_ERR("log_size %u > pages_per_peb %u\n",
+			  log_size, pages_per_peb);
+		return -ERANGE;
+	}
+
+	log_size = max_t(u32, log_size, fsi->sbi.last_log.pages_count);
+
+	if (offset > pages_per_peb || offset > (UINT_MAX - log_size)) {
+		SSDFS_ERR("inconsistent metadata state: "
+			  "last_sb_log.page_offset %u, "
+			  "pages_per_peb %u, log_size %u\n",
+			  offset, pages_per_peb, log_size);
+		return -EINVAL;
+	}
+
+	for (err = -EINVAL, i = 0; i < SSDFS_SB_SEG_COPY_MAX; i++) {
+		cur_peb = fsi->sb_pebs[SSDFS_CUR_SB_SEG][i];
+		prev_peb = fsi->sb_pebs[SSDFS_PREV_SB_SEG][i];
+		cur_leb = fsi->sb_lebs[SSDFS_CUR_SB_SEG][i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("cur_peb %llu, prev_peb %llu, "
+			  "last_sb_log.peb_id %llu, log_size %u, "
+			  "err %d\n",
+			  cur_peb, prev_peb,
+			  fsi->sbi.last_log.peb_id,
+			  log_size, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (fsi->sbi.last_log.peb_id == cur_peb) {
+			offset += fsi->sbi.last_log.pages_count;
+
+			if (offset > pages_per_peb) {
+				SSDFS_ERR("inconsistent metadata state: "
+					  "last_sb_log.page_offset %u, "
+					  "pages_per_peb %u, log_size %u\n",
+					  offset, pages_per_peb,
+					  fsi->sbi.last_log.pages_count);
+				return -EIO;
+			} else if (offset == pages_per_peb) {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("sb PEB %llu is full: "
+					  "offset %u, log_size %u, "
+					  "pages_per_peb %u\n",
+					  cur_peb, offset,
+					  fsi->sbi.last_log.pages_count,
+					  pages_per_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+				return -EFBIG;
+			} else if ((offset + log_size) > pages_per_peb) {
+				err  = ssdfs_write_padding_in_sb_segs_pair(sb,
+									offset);
+				if (unlikely(err)) {
+					SSDFS_ERR("fail to write padding: "
+						  "err %d\n", err);
+					return err;
+				}
+
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("sb PEB %llu is full: "
+					  "offset %u, log_size %u, "
+					  "pages_per_peb %u\n",
+					  cur_peb, offset,
+					  fsi->sbi.last_log.pages_count,
+					  pages_per_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+				return -EFBIG;
+			}
+
+			last_sb_log->leb_id = cur_leb;
+			last_sb_log->peb_id = cur_peb;
+			last_sb_log->page_offset = offset;
+			last_sb_log->pages_count = log_size;
+
+			err = 0;
+			break;
+		} else if (fsi->sbi.last_log.peb_id != cur_peb &&
+			   fsi->sbi.last_log.peb_id == prev_peb) {
+
+			last_sb_log->leb_id = cur_leb;
+			last_sb_log->peb_id = cur_peb;
+			last_sb_log->page_offset = 0;
+			last_sb_log->pages_count = log_size;
+
+			err = 0;
+			break;
+		} else {
+			/* continue to check */
+			err = -ERANGE;
+		}
+	}
+
+	if (err) {
+		SSDFS_ERR("inconsistent metadata state: "
+			  "cur_peb %llu, prev_peb %llu, "
+			  "last_sb_log.peb_id %llu\n",
+			  cur_peb, prev_peb, fsi->sbi.last_log.peb_id);
+		return err;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("last_sb_log->leb_id %llu, "
+		  "last_sb_log->peb_id %llu, "
+		  "last_sb_log->page_offset %u, "
+		  "last_sb_log->pages_count %u\n",
+		  last_sb_log->leb_id,
+		  last_sb_log->peb_id,
+		  last_sb_log->page_offset,
+		  last_sb_log->pages_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (i = 0; i < SSDFS_SB_SEG_COPY_MAX; i++) {
+		last_sb_log->leb_id = fsi->sb_lebs[SSDFS_CUR_SB_SEG][i];
+		last_sb_log->peb_id = fsi->sb_pebs[SSDFS_CUR_SB_SEG][i];
+		err = ssdfs_can_write_sb_log(sb, last_sb_log);
+		if (err) {
+			SSDFS_DBG("possibly logical block has data: "
+				  "PEB %llu: "
+				  "last_sb_log->page_offset %u, "
+				  "last_sb_log->pages_count %u\n",
+				  last_sb_log->peb_id,
+				  last_sb_log->page_offset,
+				  last_sb_log->pages_count);
+		}
+	}
+
+	last_sb_log->leb_id = cur_leb;
+	last_sb_log->peb_id = cur_peb;
+
+	if (fsi->sb_snapi.need_snapshot_sb) {
+		struct ssdfs_peb_extent *last_sb_snap_log;
+
+		last_sb_snap_log = &fsi->sb_snapi.last_log;
+
+		offset = last_sb_snap_log->page_offset;
+
+		if (offset >= pages_per_peb) {
+			SSDFS_ERR("inconsistent metadata state: "
+				  "last_sb_snap_log.page_offset %u, "
+				  "pages_per_peb %u\n",
+				  offset, pages_per_peb);
+			return -EINVAL;
+		}
+
+		offset += last_sb_snap_log->pages_count;
+
+		if (offset >= pages_per_peb) {
+			SSDFS_ERR("inconsistent metadata state: "
+				  "last_sb_snap_log.page_offset %u, "
+				  "last_sb_snap_log.pages_count %u, "
+				  "pages_per_peb %u\n",
+				  last_sb_snap_log->page_offset,
+				  last_sb_snap_log->pages_count,
+				  pages_per_peb);
+			return -ERANGE;
+		}
+
+		last_sb_snap_log->page_offset = offset;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("last_sb_snap_log (page_offset %u, pages_count %u)\n",
+			  last_sb_snap_log->page_offset,
+			  last_sb_snap_log->pages_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_can_write_sb_log(sb, last_sb_snap_log);
+		if (err) {
+			SSDFS_DBG("possibly logical block has data: "
+				  "PEB %llu: "
+				  "last_sb_log->page_offset %u, "
+				  "last_sb_log->pages_count %u\n",
+				  last_sb_log->peb_id,
+				  last_sb_log->page_offset,
+				  last_sb_log->pages_count);
+		}
+	}
+
+	return 0;
+}
+
+#ifndef CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET
+static u64 ssdfs_correct_start_leb_id(struct ssdfs_fs_info *fsi,
+				      int seg_type, u64 leb_id)
+{
+	struct completion *init_end;
+	struct ssdfs_maptbl_peb_relation pebr;
+	struct ssdfs_maptbl_peb_descriptor *ptr;
+	u8 peb_type = SSDFS_MAPTBL_UNKNOWN_PEB_TYPE;
+	u32 pebs_per_seg;
+	u64 seg_id;
+	u64 cur_leb;
+	u64 peb_id1, peb_id2;
+	u64 found_peb_id;
+	u64 peb_id_off;
+	u16 pebs_per_fragment;
+	u16 pebs_per_stripe;
+	u16 stripes_per_fragment;
+	u64 calculated_leb_id = leb_id;
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+
+	SSDFS_DBG("fsi %p, seg_type %#x, leb_id %llu\n",
+		  fsi, seg_type, leb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	found_peb_id = leb_id;
+	peb_type = SEG2PEB_TYPE(seg_type);
+	pebs_per_seg = fsi->pebs_per_seg;
+
+	seg_id = ssdfs_get_seg_id_for_leb_id(fsi, leb_id);
+	if (unlikely(seg_id >= U64_MAX)) {
+		SSDFS_ERR("invalid seg_id: "
+			  "leb_id %llu\n", leb_id);
+		return -ERANGE;
+	}
+
+	err = ssdfs_maptbl_define_fragment_info(fsi, leb_id,
+						&pebs_per_fragment,
+						&pebs_per_stripe,
+						&stripes_per_fragment);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to define fragment info: "
+			  "err %d\n", err);
+		return err;
+	}
+
+	for (i = 0; i < pebs_per_seg; i++) {
+		cur_leb = ssdfs_get_leb_id_for_peb_index(fsi, seg_id, i);
+		if (cur_leb >= U64_MAX) {
+			SSDFS_ERR("fail to convert PEB index into LEB ID: "
+				  "seg %llu, peb_index %u\n",
+				  seg_id, i);
+			return -ERANGE;
+		}
+
+		err = ssdfs_maptbl_convert_leb2peb(fsi, cur_leb,
+						   peb_type, &pebr,
+						   &init_end);
+		if (err == -EAGAIN) {
+			err = SSDFS_WAIT_COMPLETION(init_end);
+			if (unlikely(err)) {
+				SSDFS_ERR("maptbl init failed: "
+					  "err %d\n", err);
+				goto finish_leb_id_correction;
+			}
+
+			err = ssdfs_maptbl_convert_leb2peb(fsi, cur_leb,
+							   peb_type, &pebr,
+							   &init_end);
+		}
+
+		if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("LEB is not mapped: leb_id %llu\n",
+				  cur_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto finish_leb_id_correction;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to convert LEB to PEB: "
+				  "leb_id %llu, peb_type %#x, err %d\n",
+				  cur_leb, peb_type, err);
+			goto finish_leb_id_correction;
+		}
+
+		ptr = &pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX];
+		peb_id1 = ptr->peb_id;
+		ptr = &pebr.pebs[SSDFS_MAPTBL_RELATION_INDEX];
+		peb_id2 = ptr->peb_id;
+
+		if (peb_id1 < U64_MAX)
+			found_peb_id = max_t(u64, peb_id1, found_peb_id);
+
+		if (peb_id2 < U64_MAX)
+			found_peb_id = max_t(u64, peb_id2, found_peb_id);
+
+		peb_id_off = found_peb_id % pebs_per_stripe;
+		if (peb_id_off >= (pebs_per_stripe / 2)) {
+			calculated_leb_id = found_peb_id / pebs_per_stripe;
+			calculated_leb_id++;
+			calculated_leb_id *= pebs_per_stripe;
+		} else {
+			calculated_leb_id = found_peb_id;
+			calculated_leb_id++;
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("found_peb_id %llu, pebs_per_stripe %u, "
+			  "calculated_leb_id %llu\n",
+			  found_peb_id, pebs_per_stripe,
+			  calculated_leb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+finish_leb_id_correction:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("leb_id %llu, calculated_leb_id %llu\n",
+		  leb_id, calculated_leb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return calculated_leb_id;
+}
+#endif /* CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET */
+
+#ifndef CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET
+static int __ssdfs_reserve_clean_segment(struct ssdfs_fs_info *fsi,
+					 int sb_seg_type,
+					 u64 start_search_id,
+					 u64 *reserved_seg)
+{
+	struct ssdfs_segment_bmap *segbmap = fsi->segbmap;
+	u64 start_seg = start_search_id;
+	u64 end_seg = U64_MAX;
+	struct ssdfs_maptbl_peb_relation pebr;
+	struct completion *end;
+	u8 peb_type = SSDFS_MAPTBL_SBSEG_PEB_TYPE;
+	u64 leb_id;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!reserved_seg);
+	BUG_ON(sb_seg_type >= SSDFS_SB_SEG_COPY_MAX);
+
+	SSDFS_DBG("fsi %p, sb_seg_type %#x, start_search_id %llu\n",
+		  fsi, sb_seg_type, start_search_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (sb_seg_type) {
+	case SSDFS_MAIN_SB_SEG:
+	case SSDFS_COPY_SB_SEG:
+		err = ssdfs_segment_detect_search_range(fsi,
+							&start_seg,
+							&end_seg);
+		if (err == -ENOENT) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to find fragment for search: "
+				  "start_seg %llu, end_seg %llu\n",
+				  start_seg, end_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return err;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to define a search range: "
+				  "start_seg %llu, err %d\n",
+				  start_seg, err);
+			return err;
+		}
+		break;
+
+	default:
+		BUG();
+	};
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("start_seg %llu, end_seg %llu\n",
+		  start_seg, end_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_segbmap_reserve_clean_segment(segbmap,
+						  start_seg, end_seg,
+						  reserved_seg, &end);
+	if (err == -EAGAIN) {
+		err = SSDFS_WAIT_COMPLETION(end);
+		if (unlikely(err)) {
+			SSDFS_ERR("segbmap init failed: "
+				  "err %d\n", err);
+			goto finish_search;
+		}
+
+		err = ssdfs_segbmap_reserve_clean_segment(segbmap,
+							  start_seg, end_seg,
+							  reserved_seg,
+							  &end);
+	}
+
+	if (err == -ENODATA) {
+		err = -ENOENT;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unable to reserve segment: "
+			  "type %#x, start_seg %llu, end_seg %llu\n",
+			  sb_seg_type, start_seg, end_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto finish_search;
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to reserve segment: "
+			  "type %#x, start_seg %llu, "
+			   "end_seg %llu, err %d\n",
+			  sb_seg_type, start_seg, end_seg, err);
+		goto finish_search;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("reserved_seg %llu\n", *reserved_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	leb_id = ssdfs_get_leb_id_for_peb_index(fsi, *reserved_seg, i);
+	if (leb_id >= U64_MAX) {
+		err = -ERANGE;
+		SSDFS_ERR("fail to convert PEB index into LEB ID: "
+			  "seg %llu, peb_index %u\n",
+			  *reserved_seg, i);
+		goto finish_search;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("leb_id %llu\n", leb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_maptbl_map_leb2peb(fsi, leb_id, peb_type,
+					&pebr, &end);
+	if (err == -EAGAIN) {
+		err = SSDFS_WAIT_COMPLETION(end);
+		if (unlikely(err)) {
+			SSDFS_ERR("maptbl init failed: "
+				  "err %d\n", err);
+			goto finish_search;
+		}
+
+		err = ssdfs_maptbl_map_leb2peb(fsi, leb_id,
+						peb_type,
+						&pebr, &end);
+	}
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to map LEB to PEB: "
+			  "reserved_seg %llu, leb_id %llu, "
+			  "err %d\n",
+			  *reserved_seg, leb_id, err);
+		goto finish_search;
+	}
+
+finish_search:
+	if (err == -ENOENT)
+		*reserved_seg = end_seg;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("reserved_seg %llu\n", *reserved_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+#endif /* CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET */
+
+#ifndef CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET
+static int ssdfs_reserve_clean_segment(struct super_block *sb,
+					int sb_seg_type, u64 start_leb,
+					u64 *reserved_seg)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	u64 start_search_id;
+	u64 cur_id;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!reserved_seg);
+	BUG_ON(sb_seg_type >= SSDFS_SB_SEG_COPY_MAX);
+
+	SSDFS_DBG("sb %p, sb_seg_type %#x, start_leb %llu\n",
+		  sb, sb_seg_type, start_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*reserved_seg = U64_MAX;
+
+	start_leb = ssdfs_correct_start_leb_id(fsi,
+						SSDFS_SB_SEG_TYPE,
+						start_leb);
+
+	start_search_id = SSDFS_LEB2SEG(fsi, start_leb);
+	if (start_search_id >= fsi->nsegs)
+		start_search_id = 0;
+
+	cur_id = start_search_id;
+
+	while (cur_id < fsi->nsegs) {
+		err = __ssdfs_reserve_clean_segment(fsi, sb_seg_type,
+						    cur_id, reserved_seg);
+		if (err == -ENOENT) {
+			err = 0;
+			cur_id = *reserved_seg;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("cur_id %llu\n", cur_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+			continue;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find a new segment: "
+				  "cur_id %llu, err %d\n",
+				  cur_id, err);
+			return err;
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("found seg_id %llu\n", *reserved_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return 0;
+		}
+	}
+
+	cur_id = 0;
+
+	while (cur_id < start_search_id) {
+		err = __ssdfs_reserve_clean_segment(fsi, sb_seg_type,
+						    cur_id, reserved_seg);
+		if (err == -ENOENT) {
+			err = 0;
+			cur_id = *reserved_seg;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("cur_id %llu\n", cur_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+			continue;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find a new segment: "
+				  "cur_id %llu, err %d\n",
+				  cur_id, err);
+			return err;
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("found seg_id %llu\n", *reserved_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return 0;
+		}
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("no free space for a new segment\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return -ENOSPC;
+}
+#endif /* CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET */
+
+typedef u64 sb_pebs_array[SSDFS_SB_CHAIN_MAX][SSDFS_SB_SEG_COPY_MAX];
+
+static int ssdfs_erase_dirty_prev_sb_segs(struct ssdfs_fs_info *fsi,
+					  u64 prev_leb)
+{
+	struct completion *init_end;
+	u8 peb_type = SSDFS_MAPTBL_UNKNOWN_PEB_TYPE;
+	u32 pebs_per_seg;
+	u64 seg_id;
+	u64 cur_leb;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi);
+
+	SSDFS_DBG("fsi %p, prev_leb %llu\n",
+		  fsi, prev_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	peb_type = SEG2PEB_TYPE(SSDFS_SB_SEG_TYPE);
+	pebs_per_seg = fsi->pebs_per_seg;
+
+	seg_id = SSDFS_LEB2SEG(fsi, prev_leb);
+	if (seg_id >= U64_MAX) {
+		SSDFS_ERR("invalid seg_id for leb_id %llu\n",
+			  prev_leb);
+		return -ERANGE;
+	}
+
+	cur_leb = ssdfs_get_leb_id_for_peb_index(fsi, seg_id, 0);
+	if (cur_leb >= U64_MAX) {
+		SSDFS_ERR("invalid leb_id for seg_id %llu\n",
+			  seg_id);
+		return -ERANGE;
+	}
+
+	err = ssdfs_maptbl_erase_reserved_peb_now(fsi,
+						  cur_leb,
+						  peb_type,
+						  &init_end);
+	if (err == -EAGAIN) {
+		err = SSDFS_WAIT_COMPLETION(init_end);
+		if (unlikely(err)) {
+			SSDFS_ERR("maptbl init failed: "
+				  "err %d\n", err);
+			return err;
+		}
+
+		err = ssdfs_maptbl_erase_reserved_peb_now(fsi,
+							  cur_leb,
+							  peb_type,
+							  &init_end);
+	}
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to erase reserved dirty PEB: "
+			  "leb_id %llu, err %d\n",
+			  cur_leb, err);
+		return err;
+	}
+
+	return 0;
+}
+
+#ifdef CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET
+static int ssdfs_move_on_first_peb_next_sb_seg(struct super_block *sb,
+						int sb_seg_type,
+						sb_pebs_array *sb_lebs,
+						sb_pebs_array *sb_pebs)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	u64 prev_leb, cur_leb, next_leb, reserved_leb;
+	u64 prev_peb, cur_peb, next_peb, reserved_peb;
+	loff_t offset;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !sb_lebs || !sb_pebs);
+
+	if (sb_seg_type >= SSDFS_SB_SEG_COPY_MAX) {
+		SSDFS_ERR("invalid sb_seg_type %#x\n",
+			  sb_seg_type);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("sb %p, sb_seg_type %#x\n", sb, sb_seg_type);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	prev_leb = (*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type];
+	cur_leb = (*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type];
+	next_leb = (*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type];
+	reserved_leb = (*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type];
+
+	prev_peb = (*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type];
+	cur_peb = (*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type];
+	next_peb = (*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type];
+	reserved_peb = (*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type];
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_peb %llu, next_peb %llu, "
+		  "cur_leb %llu, next_leb %llu\n",
+		  cur_peb, next_peb, cur_leb, next_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	(*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type] = next_leb;
+	(*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type] = next_peb;
+
+	if (prev_leb >= U64_MAX) {
+		(*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_leb;
+		(*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_peb;
+
+		(*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] = reserved_leb;
+		(*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] = reserved_peb;
+
+		(*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] = U64_MAX;
+		(*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] = U64_MAX;
+	} else {
+		err = ssdfs_erase_dirty_prev_sb_segs(fsi, prev_leb);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail erase dirty PEBs: "
+				  "prev_leb %llu, err %d\n",
+				  prev_leb, err);
+			goto finish_move_sb_seg;
+		}
+
+		(*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] = prev_leb;
+		(*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] = prev_peb;
+
+		(*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] = U64_MAX;
+		(*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] = U64_MAX;
+
+		(*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_leb;
+		(*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_peb;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_leb %llu, cur_peb %llu, "
+		  "next_leb %llu, next_peb %llu, "
+		  "reserved_leb %llu, reserved_peb %llu, "
+		  "prev_leb %llu, prev_peb %llu\n",
+		  (*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type],
+		  (*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type],
+		  (*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type],
+		  (*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type],
+		  (*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type],
+		  (*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type],
+		  (*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type],
+		  (*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type]);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (fsi->is_zns_device) {
+		cur_peb = (*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type];
+		offset = cur_peb * fsi->erasesize;
+
+		err = fsi->devops->open_zone(fsi->sb, offset);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to open zone: "
+				  "offset %llu, err %d\n",
+				  offset, err);
+			return err;
+		}
+	}
+
+finish_move_sb_seg:
+	return err;
+}
+#else
+static int ssdfs_move_on_first_peb_next_sb_seg(struct super_block *sb,
+						int sb_seg_type,
+						sb_pebs_array *sb_lebs,
+						sb_pebs_array *sb_pebs)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct ssdfs_segment_bmap *segbmap = fsi->segbmap;
+	struct ssdfs_maptbl_cache *maptbl_cache = &fsi->maptbl_cache;
+	u64 prev_leb, cur_leb, next_leb, reserved_leb;
+	u64 prev_peb, cur_peb, next_peb, reserved_peb;
+	u64 new_leb = U64_MAX, new_peb = U64_MAX;
+	u64 reserved_seg;
+	u64 prev_seg, cur_seg;
+	struct ssdfs_maptbl_peb_relation pebr;
+	u8 peb_type = SSDFS_MAPTBL_SBSEG_PEB_TYPE;
+	struct completion *end = NULL;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !sb_lebs || !sb_pebs);
+
+	if (sb_seg_type >= SSDFS_SB_SEG_COPY_MAX) {
+		SSDFS_ERR("invalid sb_seg_type %#x\n",
+			  sb_seg_type);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("sb %p, sb_seg_type %#x\n", sb, sb_seg_type);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	prev_leb = (*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type];
+	cur_leb = (*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type];
+	next_leb = (*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type];
+	reserved_leb = (*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type];
+
+	prev_peb = (*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type];
+	cur_peb = (*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type];
+	next_peb = (*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type];
+	reserved_peb = (*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type];
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_peb %llu, next_peb %llu, "
+		  "cur_leb %llu, next_leb %llu\n",
+		  cur_peb, next_peb, cur_leb, next_leb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_reserve_clean_segment(sb, sb_seg_type, cur_leb,
+					  &reserved_seg);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to reserve clean segment: err %d\n", err);
+		goto finish_move_sb_seg;
+	}
+
+	new_leb = ssdfs_get_leb_id_for_peb_index(fsi, reserved_seg, 0);
+	if (new_leb >= U64_MAX) {
+		err = -ERANGE;
+		SSDFS_ERR("fail to convert PEB index into LEB ID: "
+			  "seg %llu\n", reserved_seg);
+		goto finish_move_sb_seg;
+	}
+
+	err = ssdfs_maptbl_convert_leb2peb(fsi, new_leb,
+					   peb_type,
+					   &pebr, &end);
+	if (err == -EAGAIN) {
+		err = SSDFS_WAIT_COMPLETION(end);
+		if (unlikely(err)) {
+			SSDFS_ERR("maptbl init failed: "
+				  "err %d\n", err);
+			goto finish_move_sb_seg;
+		}
+
+		err = ssdfs_maptbl_convert_leb2peb(fsi, new_leb,
+						   peb_type,
+						   &pebr, &end);
+	}
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to convert LEB %llu to PEB: err %d\n",
+			  new_leb, err);
+		goto finish_move_sb_seg;
+	}
+
+	new_peb = pebr.pebs[SSDFS_MAPTBL_MAIN_INDEX].peb_id;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(new_peb == U64_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	(*sb_lebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_leb;
+	(*sb_pebs)[SSDFS_PREV_SB_SEG][sb_seg_type] = cur_peb;
+
+	(*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type] = next_leb;
+	(*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type] = next_peb;
+
+	(*sb_lebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] = reserved_leb;
+	(*sb_pebs)[SSDFS_NEXT_SB_SEG][sb_seg_type] = reserved_peb;
+
+	(*sb_lebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] = new_leb;
+	(*sb_pebs)[SSDFS_RESERVED_SB_SEG][sb_seg_type] = new_peb;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_leb %llu, cur_peb %llu, "
+		  "next_leb %llu, next_peb %llu, "
+		  "reserved_leb %llu, reserved_peb %llu, "
+		  "new_leb %llu, new_peb %llu\n",
+		  cur_leb, cur_peb,
+		  next_leb, next_peb,
+		  reserved_leb, reserved_peb,
+		  new_leb, new_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (prev_leb == U64_MAX)
+		goto finish_move_sb_seg;
+
+	prev_seg = SSDFS_LEB2SEG(fsi, prev_leb);
+	cur_seg = SSDFS_LEB2SEG(fsi, cur_leb);
+
+	if (prev_seg != cur_seg) {
+		err = ssdfs_segbmap_change_state(segbmap, prev_seg,
+						 SSDFS_SEG_DIRTY, &end);
+		if (err == -EAGAIN) {
+			err = SSDFS_WAIT_COMPLETION(end);
+			if (unlikely(err)) {
+				SSDFS_ERR("segbmap init failed: "
+					  "err %d\n", err);
+				goto finish_move_sb_seg;
+			}
+
+			err = ssdfs_segbmap_change_state(segbmap, prev_seg,
+							 SSDFS_SEG_DIRTY, &end);
+		}
+
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to change segment state: "
+				  "seg %llu, state %#x, err %d\n",
+				  prev_seg, SSDFS_SEG_DIRTY, err);
+			goto finish_move_sb_seg;
+		}
+	}
+
+	err = ssdfs_maptbl_wait_and_change_peb_state(fsi, prev_leb, peb_type,
+						SSDFS_MAPTBL_DIRTY_PEB_STATE);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to change the PEB state: "
+			  "leb_id %llu, new_state %#x, err %d\n",
+			  prev_leb, SSDFS_MAPTBL_DIRTY_PEB_STATE, err);
+		goto finish_move_sb_seg;
+	}
+
+	err = ssdfs_maptbl_cache_forget_leb2peb(maptbl_cache, prev_leb,
+						SSDFS_PEB_STATE_CONSISTENT);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to forget prev_leb %llu, err %d\n",
+			  prev_leb, err);
+		goto finish_move_sb_seg;
+	}
+
+finish_move_sb_seg:
+	return err;
+}
+#endif /* CONFIG_SSDFS_FIXED_SUPERBLOCK_SEGMENTS_SET */
+
+static int ssdfs_move_on_next_sb_seg(struct super_block *sb,
+				     int sb_seg_type,
+				     sb_pebs_array *sb_lebs,
+				     sb_pebs_array *sb_pebs)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	u64 cur_leb, next_leb;
+	u64 cur_peb;
+	u8 peb_type = SSDFS_MAPTBL_SBSEG_PEB_TYPE;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !sb_lebs || !sb_pebs);
+
+	if (sb_seg_type >= SSDFS_SB_SEG_COPY_MAX) {
+		SSDFS_ERR("invalid sb_seg_type %#x\n",
+			  sb_seg_type);
+		return -EINVAL;
+	}
+
+	SSDFS_DBG("sb %p, sb_seg_type %#x\n", sb, sb_seg_type);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	cur_leb = (*sb_lebs)[SSDFS_CUR_SB_SEG][sb_seg_type];
+	cur_peb = (*sb_pebs)[SSDFS_CUR_SB_SEG][sb_seg_type];
+
+	next_leb = cur_leb + 1;
+
+	err = ssdfs_maptbl_wait_and_change_peb_state(fsi, cur_leb, peb_type,
+						SSDFS_MAPTBL_USED_PEB_STATE);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to change the PEB state: "
+			  "leb_id %llu, new_state %#x, err %d\n",
+			  cur_leb, SSDFS_MAPTBL_USED_PEB_STATE, err);
+		return err;
+	}
+
+	err = ssdfs_move_on_first_peb_next_sb_seg(sb, sb_seg_type,
+						sb_lebs, sb_pebs);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to move on next sb segment: "
+			  "sb_seg_type %#x, cur_leb %llu, "
+			  "cur_peb %llu, err %d\n",
+			  sb_seg_type, cur_leb,
+			  cur_peb, err);
+		return err;
+	}
+
+	return 0;
+}
+
+static int ssdfs_snapshot_new_sb_pebs_array(struct super_block *sb)
+{
+	struct ssdfs_fs_info *fsi;
+	struct ssdfs_peb_extent *last_log;
+	struct ssdfs_segment_request *req;
+	struct ssdfs_requests_queue *rq;
+	u64 offset;
+	u32 log_blocks;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb);
+
+	SSDFS_DBG("sb %p", sb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	fsi = SSDFS_FS_I(sb);
+
+	fsi->sb_snapi.need_snapshot_sb = true;
+
+	last_log = &fsi->sb_snapi.last_log;
+	offset = last_log->page_offset + last_log->pages_count;
+	log_blocks = last_log->pages_count;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("last_log (page_offset %u, pages_count %u)\n",
+		  last_log->page_offset, last_log->pages_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (offset > fsi->pages_per_peb) {
+		SSDFS_ERR("last log has inconsistent state: "
+			  "offset %llu > pages_per_peb %u\n",
+			  offset, fsi->pages_per_peb);
+		return -ERANGE;
+	} else if ((offset + log_blocks) > fsi->pages_per_peb) {
+		req = ssdfs_request_alloc();
+		if (IS_ERR_OR_NULL(req)) {
+			err = (req == NULL ? -ENOMEM : PTR_ERR(req));
+			SSDFS_ERR("fail to allocate segment request: err %d\n",
+				  err);
+			return err;
+		}
+
+		ssdfs_request_init(req, fsi->pagesize);
+		ssdfs_get_request(req);
+
+		ssdfs_request_prepare_internal_data(SSDFS_GLOBAL_FSCK_REQ,
+					SSDFS_FSCK_ERASE_RE_WRITE_SB_SNAP_SEG,
+					SSDFS_REQ_ASYNC_NO_FREE, req);
+
+		rq = &fsi->global_fsck.rq;
+		ssdfs_requests_queue_add_tail_inc(fsi, rq, req);
+		fsi->sb_snapi.req = req;
+		wake_up_all(&fsi->global_fsck.wait_queue);
+
+		last_log->page_offset = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("last_log (page_offset %u, pages_count %u)\n",
+			  last_log->page_offset, last_log->pages_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+	return 0;
+}
+
+static int ssdfs_move_on_next_sb_segs_pair(struct super_block *sb)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	sb_pebs_array sb_lebs;
+	sb_pebs_array sb_pebs;
+	size_t array_size;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb %p", sb);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!(fsi->fs_feature_compat & SSDFS_HAS_SEGBMAP_COMPAT_FLAG) ||
+	    !(fsi->fs_feature_compat & SSDFS_HAS_MAPTBL_COMPAT_FLAG)) {
+		SSDFS_ERR("volume hasn't segbmap or maptbl\n");
+		return -EIO;
+	}
+
+	array_size = sizeof(u64);
+	array_size *= SSDFS_SB_CHAIN_MAX;
+	array_size *= SSDFS_SB_SEG_COPY_MAX;
+
+	down_read(&fsi->sb_segs_sem);
+	ssdfs_memcpy(sb_lebs, 0, array_size,
+		     fsi->sb_lebs, 0, array_size,
+		     array_size);
+	ssdfs_memcpy(sb_pebs, 0, array_size,
+		     fsi->sb_pebs, 0, array_size,
+		     array_size);
+	up_read(&fsi->sb_segs_sem);
+
+	for (i = 0; i < SSDFS_SB_SEG_COPY_MAX; i++) {
+		err = ssdfs_move_on_next_sb_seg(sb, i, &sb_lebs, &sb_pebs);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to move on next sb PEB: err %d\n",
+				  err);
+			return err;
+		}
+	}
+
+	down_write(&fsi->sb_segs_sem);
+	ssdfs_memcpy(fsi->sb_lebs, 0, array_size,
+		     sb_lebs, 0, array_size,
+		     array_size);
+	ssdfs_memcpy(fsi->sb_pebs, 0, array_size,
+		     sb_pebs, 0, array_size,
+		     array_size);
+	up_write(&fsi->sb_segs_sem);
+
+	err = ssdfs_snapshot_new_sb_pebs_array(sb);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to prepare snapshot of new superblock chain: "
+			  "err %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
+static
+int ssdfs_prepare_sb_log(struct super_block *sb,
+			 struct ssdfs_peb_extent *last_sb_log)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !last_sb_log);
+
+	SSDFS_DBG("sb %p, last_sb_log %p\n",
+		  sb, last_sb_log);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_define_next_sb_log_place(sb, last_sb_log);
+	switch (err) {
+	case -EFBIG: /* current sb segment is exhausted */
+	case -EIO: /* current sb segment is corrupted */
+		err = ssdfs_move_on_next_sb_segs_pair(sb);
+		if (err) {
+			SSDFS_ERR("fail to move on next sb segs pair: err %d\n",
+				  err);
+			return err;
+		}
+		err = ssdfs_define_next_sb_log_place(sb, last_sb_log);
+		if (unlikely(err)) {
+			SSDFS_ERR("unable to define next sb log place: err %d\n",
+				  err);
+			return err;
+		}
+		break;
+
+	default:
+		if (err) {
+			SSDFS_ERR("unable to define next sb log place: err %d\n",
+				  err);
+			return err;
+		}
+		break;
+	}
+
+	return 0;
+}
+
+static void
+ssdfs_prepare_maptbl_cache_descriptor(struct ssdfs_metadata_descriptor *desc,
+				      u32 offset,
+				      struct ssdfs_payload_content *payload,
+				      u32 payload_size)
+{
+	unsigned i;
+	u32 csum = ~0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!desc || !payload);
+
+	SSDFS_DBG("desc %p, offset %u, payload %p\n",
+		  desc, offset, payload);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	desc->offset = cpu_to_le32(offset);
+	desc->size = cpu_to_le32(payload_size);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(payload_size >= U16_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	desc->check.bytes = cpu_to_le16((u16)payload_size);
+	desc->check.flags = cpu_to_le16(SSDFS_CRC32);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(folio_batch_count(&payload->batch) == 0);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (i = 0; i < folio_batch_count(&payload->batch); i++) {
+		struct folio *folio = payload->batch.folios[i];
+		struct ssdfs_maptbl_cache_header *hdr;
+		void *kaddr;
+		u16 bytes_count;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		ssdfs_folio_lock(folio);
+		kaddr = kmap_local_folio(folio, 0);
+
+		hdr = (struct ssdfs_maptbl_cache_header *)kaddr;
+		bytes_count = le16_to_cpu(hdr->bytes_count);
+
+		csum = crc32(csum, kaddr, bytes_count);
+
+		kunmap_local(kaddr);
+		ssdfs_folio_unlock(folio);
+	}
+
+	desc->check.csum = cpu_to_le32(csum);
+}
+
+static
+int ssdfs_prepare_snapshot_rules_for_commit(struct ssdfs_fs_info *fsi,
+					struct ssdfs_metadata_descriptor *desc,
+					u32 offset)
+{
+	struct ssdfs_snapshot_rules_header *hdr;
+	size_t hdr_size = sizeof(struct ssdfs_snapshot_rules_header);
+	size_t info_size = sizeof(struct ssdfs_snapshot_rule_info);
+	struct ssdfs_snapshot_rule_item *item = NULL;
+	u32 payload_off;
+	u32 item_off;
+	u32 pagesize = fsi->pagesize;
+	u16 items_count = 0;
+	u16 items_capacity = 0;
+	u32 area_size = 0;
+	struct list_head *this, *next;
+	u32 csum = ~0;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !desc);
+
+	SSDFS_DBG("fsi %p, offset %u\n",
+		  fsi, offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (is_ssdfs_snapshot_rules_list_empty(&fsi->snapshots.rules_list)) {
+		SSDFS_DBG("snapshot rules list is empty\n");
+		return -ENODATA;
+	}
+
+	payload_off = offsetof(struct ssdfs_log_footer, payload);
+	hdr = SSDFS_SNRU_HDR((u8 *)fsi->sbi.vs_buf + payload_off);
+	memset(hdr, 0, hdr_size);
+	area_size = pagesize - payload_off;
+	item_off = payload_off + hdr_size;
+
+	items_capacity = (u16)((area_size - hdr_size) / info_size);
+	area_size = min_t(u32, area_size, (u32)items_capacity * info_size);
+
+	spin_lock(&fsi->snapshots.rules_list.lock);
+	list_for_each_safe(this, next, &fsi->snapshots.rules_list.list) {
+		item = list_entry(this, struct ssdfs_snapshot_rule_item, list);
+
+		err = ssdfs_memcpy(fsi->sbi.vs_buf, item_off, pagesize,
+				   &item->rule, 0, info_size,
+				   info_size);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to copy: err %d\n", err);
+			goto finish_copy_items;
+		}
+
+		item_off += info_size;
+		items_count++;
+	}
+finish_copy_items:
+	spin_unlock(&fsi->snapshots.rules_list.lock);
+
+	if (unlikely(err))
+		return err;
+
+	hdr->magic = cpu_to_le32(SSDFS_SNAPSHOT_RULES_MAGIC);
+	hdr->item_size = cpu_to_le16(info_size);
+	hdr->flags = cpu_to_le16(0);
+
+	if (items_count == 0 || items_count > items_capacity) {
+		SSDFS_ERR("invalid items number: "
+			  "items_count %u, items_capacity %u, "
+			  "area_size %u, item_size %zu\n",
+			  items_count, items_capacity,
+			  area_size, info_size);
+		return -ERANGE;
+	}
+
+	hdr->items_count = cpu_to_le16(items_count);
+	hdr->items_capacity = cpu_to_le16(items_capacity);
+	hdr->area_size = cpu_to_le32(area_size);
+
+	desc->offset = cpu_to_le32(offset);
+	desc->size = cpu_to_le32(area_size);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(area_size >= U16_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	desc->check.bytes = cpu_to_le16(area_size);
+	desc->check.flags = cpu_to_le16(SSDFS_CRC32);
+
+	csum = crc32(csum, hdr, area_size);
+	desc->check.csum = cpu_to_le32(csum);
+
+	return 0;
+}
+
+static int __ssdfs_commit_sb_log(struct super_block *sb,
+				 u64 timestamp, u64 cno,
+				 struct ssdfs_peb_extent *last_sb_log,
+				 struct ssdfs_sb_log_payload *payload)
+{
+	struct ssdfs_fs_info *fsi;
+	struct ssdfs_metadata_descriptor hdr_desc[SSDFS_SEG_HDR_DESC_MAX];
+	struct ssdfs_metadata_descriptor footer_desc[SSDFS_LOG_FOOTER_DESC_MAX];
+	size_t desc_size = sizeof(struct ssdfs_metadata_descriptor);
+	size_t hdr_array_bytes = desc_size * SSDFS_SEG_HDR_DESC_MAX;
+	size_t footer_array_bytes = desc_size * SSDFS_LOG_FOOTER_DESC_MAX;
+	struct ssdfs_metadata_descriptor *cur_hdr_desc;
+	struct folio *folio;
+	struct ssdfs_segment_header *hdr;
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	struct ssdfs_log_footer *footer;
+	size_t footer_size = sizeof(struct ssdfs_log_footer);
+#ifdef CONFIG_SSDFS_DEBUG
+	void *kaddr = NULL;
+#endif /* CONFIG_SSDFS_DEBUG */
+	loff_t peb_offset;
+	loff_t sb_offset, sb_snap_offset;
+	u32 flags = 0;
+	u32 written = 0;
+	u64 seg_id;
+	u32 log_pages_count;
+	unsigned i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !last_sb_log);
+	BUG_ON(!SSDFS_FS_I(sb)->devops);
+	BUG_ON(!SSDFS_FS_I(sb)->devops->write_block);
+	BUG_ON((last_sb_log->page_offset + last_sb_log->pages_count) >
+		(ULLONG_MAX >> PAGE_SHIFT));
+	BUG_ON((last_sb_log->leb_id * SSDFS_FS_I(sb)->pebs_per_seg) >=
+		SSDFS_FS_I(sb)->nsegs);
+	BUG_ON(last_sb_log->peb_id >
+		div_u64(ULLONG_MAX, SSDFS_FS_I(sb)->pages_per_peb));
+	BUG_ON((last_sb_log->peb_id * SSDFS_FS_I(sb)->pages_per_peb) >
+		(ULLONG_MAX >> SSDFS_FS_I(sb)->log_pagesize));
+
+	SSDFS_DBG("sb %p, last_sb_log->leb_id %llu, last_sb_log->peb_id %llu, "
+		  "last_sb_log->page_offset %u, last_sb_log->pages_count %u\n",
+		  sb, last_sb_log->leb_id, last_sb_log->peb_id,
+		  last_sb_log->page_offset, last_sb_log->pages_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	fsi = SSDFS_FS_I(sb);
+	hdr = SSDFS_SEG_HDR(fsi->sbi.vh_buf);
+	footer = SSDFS_LF(fsi->sbi.vs_buf);
+
+	memset(hdr_desc, 0, hdr_array_bytes);
+	memset(footer_desc, 0, footer_array_bytes);
+
+	sb_offset = (loff_t)last_sb_log->page_offset << PAGE_SHIFT;
+	sb_offset += PAGE_SIZE;
+
+	cur_hdr_desc = &hdr_desc[SSDFS_MAPTBL_CACHE_INDEX];
+	ssdfs_prepare_maptbl_cache_descriptor(cur_hdr_desc, (u32)sb_offset,
+					     &payload->maptbl_cache,
+					     payload->maptbl_cache.bytes_count);
+
+	sb_offset += payload->maptbl_cache.bytes_count;
+
+	cur_hdr_desc = &hdr_desc[SSDFS_LOG_FOOTER_INDEX];
+	cur_hdr_desc->offset = cpu_to_le32(sb_offset);
+	cur_hdr_desc->size = cpu_to_le32(footer_size);
+
+	ssdfs_memcpy(hdr->desc_array, 0, hdr_array_bytes,
+		     hdr_desc, 0, hdr_array_bytes,
+		     hdr_array_bytes);
+
+	hdr->peb_migration_id[SSDFS_PREV_MIGRATING_PEB] =
+					SSDFS_PEB_UNKNOWN_MIGRATION_ID;
+	hdr->peb_migration_id[SSDFS_CUR_MIGRATING_PEB] =
+					SSDFS_PEB_UNKNOWN_MIGRATION_ID;
+
+	seg_id = ssdfs_get_seg_id_for_leb_id(fsi, last_sb_log->leb_id);
+
+	log_pages_count = PAGE_SIZE; /* header size */
+	log_pages_count += payload->maptbl_cache.bytes_count;
+	log_pages_count += PAGE_SIZE; /* footer size */
+	log_pages_count >>= PAGE_SHIFT;
+
+	err = ssdfs_prepare_segment_header_for_commit(fsi,
+						     seg_id,
+						     last_sb_log->leb_id,
+						     last_sb_log->peb_id,
+						     U64_MAX,
+						     log_pages_count,
+						     SSDFS_SB_SEG_TYPE,
+						     SSDFS_LOG_HAS_FOOTER |
+						     SSDFS_LOG_HAS_MAPTBL_CACHE,
+						     timestamp, cno,
+						     hdr);
+	if (err) {
+		SSDFS_ERR("fail to prepare segment header: err %d\n", err);
+		return err;
+	}
+
+	sb_offset += offsetof(struct ssdfs_log_footer, payload);
+	cur_hdr_desc = &footer_desc[SSDFS_SNAPSHOT_RULES_AREA_INDEX];
+
+	err = ssdfs_prepare_snapshot_rules_for_commit(fsi, cur_hdr_desc,
+						      (u32)sb_offset);
+	if (err == -ENODATA) {
+		err = 0;
+		SSDFS_DBG("snapshot rules list is empty\n");
+	} else if (err) {
+		SSDFS_ERR("fail to prepare snapshot rules: err %d\n", err);
+		return err;
+	} else
+		flags |= SSDFS_LOG_FOOTER_HAS_SNAPSHOT_RULES;
+
+	ssdfs_memcpy(footer->desc_array, 0, footer_array_bytes,
+		     footer_desc, 0, footer_array_bytes,
+		     footer_array_bytes);
+
+	err = ssdfs_prepare_log_footer_for_commit(fsi, PAGE_SIZE,
+						  log_pages_count,
+						  flags, timestamp,
+						  cno, footer);
+	if (err) {
+		SSDFS_ERR("fail to prepare log footer: err %d\n", err);
+		return err;
+	}
+
+	folio = ssdfs_super_alloc_folio(GFP_KERNEL | __GFP_ZERO,
+					get_order(PAGE_SIZE));
+	if (IS_ERR_OR_NULL(folio)) {
+		err = (folio == NULL ? -ENOMEM : PTR_ERR(folio));
+		SSDFS_ERR("unable to allocate memory folio\n");
+		return err;
+	}
+
+	/* ->writepage() calls put_folio() */
+	ssdfs_folio_get(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio %p, count %d\n",
+		  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	/* write segment header */
+	ssdfs_folio_lock(folio);
+	__ssdfs_memcpy_to_folio(folio, 0, PAGE_SIZE,
+				fsi->sbi.vh_buf, 0, hdr_size,
+				hdr_size);
+	folio_mark_uptodate(folio);
+	folio_set_dirty(folio);
+	ssdfs_folio_unlock(folio);
+
+	peb_offset = last_sb_log->peb_id * fsi->pages_per_peb;
+	peb_offset <<= fsi->log_pagesize;
+	sb_offset = (loff_t)last_sb_log->page_offset << PAGE_SHIFT;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(peb_offset > (ULLONG_MAX - (sb_offset + PAGE_SIZE)));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	sb_offset += peb_offset;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("offset %llu\n", sb_offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = fsi->devops->write_block(sb, sb_offset, folio);
+	if (err) {
+		SSDFS_ERR("fail to write segment header: "
+			  "offset %llu, size %zu\n",
+			  (u64)sb_offset, hdr_size);
+		goto cleanup_after_failure;
+	}
+
+	if (fsi->sb_snapi.need_snapshot_sb) {
+		struct ssdfs_peb_extent *last_sb_snap_log;
+
+		last_sb_snap_log = &fsi->sb_snapi.last_log;
+
+		peb_offset = last_sb_snap_log->peb_id * fsi->pages_per_peb;
+		peb_offset <<= fsi->log_pagesize;
+		sb_snap_offset =
+			(loff_t)last_sb_snap_log->page_offset << PAGE_SHIFT;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(peb_offset > (ULLONG_MAX - (sb_snap_offset + PAGE_SIZE)));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		sb_snap_offset += peb_offset;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("offset %llu\n", sb_snap_offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		/* ->writepage() calls put_folio() */
+		ssdfs_folio_get(folio);
+
+		ssdfs_folio_lock(folio);
+		hdr = SSDFS_SEG_HDR(kmap_local_folio(folio, 0));
+		hdr->seg_id = cpu_to_le64(SSDFS_INITIAL_SNAPSHOT_SEG_ID);
+		hdr->leb_id = cpu_to_le64(SSDFS_INITIAL_SNAPSHOT_SEG_LEB_ID);
+		hdr->peb_id = cpu_to_le64(SSDFS_INITIAL_SNAPSHOT_SEG_PEB_ID);
+		hdr->seg_type = cpu_to_le16(SSDFS_INITIAL_SNAPSHOT_SEG_TYPE);
+		hdr->seg_flags = cpu_to_le32(SSDFS_LOG_HAS_FOOTER);
+		hdr->volume_hdr.check.bytes =
+			cpu_to_le16(sizeof(struct ssdfs_segment_header));
+		hdr->volume_hdr.check.flags = cpu_to_le16(SSDFS_CRC32);
+		err = ssdfs_calculate_csum(&hdr->volume_hdr.check,
+					   hdr,
+					   sizeof(struct ssdfs_segment_header));
+		if (unlikely(err)) {
+			SSDFS_ERR("unable to calculate checksum: err %d\n", err);
+		} else {
+			folio_mark_uptodate(folio);
+			folio_set_dirty(folio);
+		}
+		kunmap_local(hdr);
+		ssdfs_folio_unlock(folio);
+
+		if (err)
+			goto cleanup_after_failure;
+
+		err = fsi->devops->write_block(sb, sb_snap_offset, folio);
+		if (err) {
+			SSDFS_ERR("fail to write segment header: "
+				  "offset %llu, size %zu\n",
+				  (u64)sb_snap_offset,
+				  hdr_size);
+			goto cleanup_after_failure;
+		}
+	}
+
+	ssdfs_folio_lock(folio);
+	folio_clear_uptodate(folio);
+	ssdfs_folio_unlock(folio);
+
+	sb_offset += PAGE_SIZE;
+	written = 0;
+
+	for (i = 0; i < folio_batch_count(&payload->maptbl_cache.batch); i++) {
+		struct folio *payload_folio =
+				payload->maptbl_cache.batch.folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!payload_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		/* ->writepage() calls put_folio() */
+		ssdfs_folio_get(payload_folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("folio %p, count %d\n",
+			  payload_folio,
+			  folio_ref_count(payload_folio));
+
+		kaddr = kmap_local_folio(payload_folio, 0);
+		SSDFS_DBG("PAYLOAD FOLIO %d\n", i);
+		print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+				     kaddr, PAGE_SIZE);
+		kunmap_local(kaddr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		ssdfs_folio_lock(payload_folio);
+		folio_mark_uptodate(payload_folio);
+		folio_set_dirty(payload_folio);
+		ssdfs_folio_unlock(payload_folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("offset %llu\n", sb_offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = fsi->devops->write_block(sb, sb_offset, payload_folio);
+		if (err) {
+			SSDFS_ERR("fail to write maptbl cache page: "
+				  "offset %llu, folio_index %u, size %zu\n",
+				  (u64)sb_offset, i, PAGE_SIZE);
+			goto cleanup_after_failure;
+		}
+
+		ssdfs_folio_lock(payload_folio);
+		folio_clear_uptodate(payload_folio);
+		ssdfs_folio_unlock(payload_folio);
+
+		sb_offset += PAGE_SIZE;
+	}
+
+	/* TODO: write metadata payload */
+
+	/* ->writepage() calls put_folio() */
+	ssdfs_folio_get(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio %p, count %d\n",
+		  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	/* write log footer */
+	ssdfs_folio_lock(folio);
+	__ssdfs_memset_folio(folio, 0, PAGE_SIZE,
+			     0, PAGE_SIZE);
+	__ssdfs_memcpy_to_folio(folio, 0, PAGE_SIZE,
+				fsi->sbi.vs_buf, 0, fsi->sbi.vs_buf_size,
+				PAGE_SIZE);
+	folio_mark_uptodate(folio);
+	folio_set_dirty(folio);
+	ssdfs_folio_unlock(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("offset %llu\n", sb_offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = fsi->devops->write_block(sb, sb_offset, folio);
+	if (err) {
+		SSDFS_ERR("fail to write log footer: "
+			  "offset %llu, size %zu\n",
+			  (u64)sb_offset, fsi->sbi.vs_buf_size);
+		goto cleanup_after_failure;
+	}
+
+	if (fsi->sb_snapi.need_snapshot_sb) {
+		sb_snap_offset += PAGE_SIZE;
+
+		/* ->writepage() calls put_folio() */
+		ssdfs_folio_get(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("offset %llu\n", sb_snap_offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		ssdfs_folio_lock(folio);
+		folio_mark_uptodate(folio);
+		folio_set_dirty(folio);
+		ssdfs_folio_unlock(folio);
+
+		err = fsi->devops->write_block(sb, sb_snap_offset, folio);
+		if (err) {
+			SSDFS_ERR("fail to write log footer: "
+				  "offset %llu, size %zu\n",
+				  (u64)sb_snap_offset, fsi->sbi.vs_buf_size);
+			goto cleanup_after_failure;
+		}
+	}
+
+	ssdfs_folio_lock(folio);
+	folio_clear_uptodate(folio);
+	ssdfs_folio_unlock(folio);
+
+	fsi->sb_snapi.need_snapshot_sb = false;
+
+	ssdfs_super_free_folio(folio);
+	return 0;
+
+cleanup_after_failure:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio %p, count %d\n",
+		  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_super_free_folio(folio);
+
+	return err;
+}
+
+static int
+__ssdfs_commit_sb_log_inline(struct super_block *sb,
+			     u64 timestamp, u64 cno,
+			     struct ssdfs_peb_extent *last_sb_log,
+			     struct ssdfs_sb_log_payload *payload,
+			     u32 payload_size)
+{
+	struct ssdfs_fs_info *fsi;
+	struct ssdfs_metadata_descriptor hdr_desc[SSDFS_SEG_HDR_DESC_MAX];
+	struct ssdfs_metadata_descriptor footer_desc[SSDFS_LOG_FOOTER_DESC_MAX];
+	size_t desc_size = sizeof(struct ssdfs_metadata_descriptor);
+	size_t hdr_array_bytes = desc_size * SSDFS_SEG_HDR_DESC_MAX;
+	size_t footer_array_bytes = desc_size * SSDFS_LOG_FOOTER_DESC_MAX;
+	struct ssdfs_metadata_descriptor *cur_hdr_desc;
+	struct folio *folio;
+	struct folio *payload_folio;
+	struct ssdfs_segment_header *hdr;
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	struct ssdfs_log_footer *footer;
+	size_t footer_size = sizeof(struct ssdfs_log_footer);
+	void *kaddr = NULL;
+	loff_t peb_offset;
+	loff_t sb_offset, sb_snap_offset;
+	u32 inline_capacity;
+	void *payload_buf;
+	u32 flags = 0;
+	u64 seg_id;
+	u32 log_pages_count;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !last_sb_log);
+	BUG_ON(!SSDFS_FS_I(sb)->devops);
+	BUG_ON(!SSDFS_FS_I(sb)->devops->write_block);
+	BUG_ON((last_sb_log->page_offset + last_sb_log->pages_count) >
+		(ULLONG_MAX >> PAGE_SHIFT));
+	BUG_ON((last_sb_log->leb_id * SSDFS_FS_I(sb)->pebs_per_seg) >=
+		SSDFS_FS_I(sb)->nsegs);
+	BUG_ON(last_sb_log->peb_id >
+		    div_u64(ULLONG_MAX, SSDFS_FS_I(sb)->pages_per_peb));
+	BUG_ON((last_sb_log->peb_id * SSDFS_FS_I(sb)->pages_per_peb) >
+				(ULLONG_MAX >> SSDFS_FS_I(sb)->log_pagesize));
+
+	SSDFS_DBG("sb %p, last_sb_log->leb_id %llu, last_sb_log->peb_id %llu, "
+		  "last_sb_log->page_offset %u, last_sb_log->pages_count %u\n",
+		  sb, last_sb_log->leb_id, last_sb_log->peb_id,
+		  last_sb_log->page_offset, last_sb_log->pages_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	fsi = SSDFS_FS_I(sb);
+	hdr = SSDFS_SEG_HDR(fsi->sbi.vh_buf);
+	footer = SSDFS_LF(fsi->sbi.vs_buf);
+
+	memset(hdr_desc, 0, hdr_array_bytes);
+	memset(footer_desc, 0, footer_array_bytes);
+
+	sb_offset = (loff_t)last_sb_log->page_offset << PAGE_SHIFT;
+	sb_offset += hdr_size;
+
+	cur_hdr_desc = &hdr_desc[SSDFS_MAPTBL_CACHE_INDEX];
+	ssdfs_prepare_maptbl_cache_descriptor(cur_hdr_desc, (u32)sb_offset,
+					      &payload->maptbl_cache,
+					      payload_size);
+
+	sb_offset += payload_size;
+
+	sb_offset += PAGE_SIZE - 1;
+	sb_offset = (sb_offset >> PAGE_SHIFT) << PAGE_SHIFT;
+
+	cur_hdr_desc = &hdr_desc[SSDFS_LOG_FOOTER_INDEX];
+	cur_hdr_desc->offset = cpu_to_le32(sb_offset);
+	cur_hdr_desc->size = cpu_to_le32(footer_size);
+
+	ssdfs_memcpy(hdr->desc_array, 0, hdr_array_bytes,
+		     hdr_desc, 0, hdr_array_bytes,
+		     hdr_array_bytes);
+
+	hdr->peb_migration_id[SSDFS_PREV_MIGRATING_PEB] =
+					SSDFS_PEB_UNKNOWN_MIGRATION_ID;
+	hdr->peb_migration_id[SSDFS_CUR_MIGRATING_PEB] =
+					SSDFS_PEB_UNKNOWN_MIGRATION_ID;
+
+	seg_id = ssdfs_get_seg_id_for_leb_id(fsi, last_sb_log->leb_id);
+
+	log_pages_count = hdr_size + payload_size;
+	log_pages_count += PAGE_SIZE - 1;
+	log_pages_count = (log_pages_count >> PAGE_SHIFT) << PAGE_SHIFT;
+	log_pages_count += PAGE_SIZE; /* footer size */
+	log_pages_count >>= PAGE_SHIFT;
+
+	err = ssdfs_prepare_segment_header_for_commit(fsi,
+						     seg_id,
+						     last_sb_log->leb_id,
+						     last_sb_log->peb_id,
+						     U64_MAX,
+						     log_pages_count,
+						     SSDFS_SB_SEG_TYPE,
+						     SSDFS_LOG_HAS_FOOTER |
+						     SSDFS_LOG_HAS_MAPTBL_CACHE,
+						     timestamp, cno,
+						     hdr);
+	if (err) {
+		SSDFS_ERR("fail to prepare segment header: err %d\n", err);
+		return err;
+	}
+
+	sb_offset += offsetof(struct ssdfs_log_footer, payload);
+	cur_hdr_desc = &footer_desc[SSDFS_SNAPSHOT_RULES_AREA_INDEX];
+
+	err = ssdfs_prepare_snapshot_rules_for_commit(fsi, cur_hdr_desc,
+						      (u32)sb_offset);
+	if (err == -ENODATA) {
+		err = 0;
+		SSDFS_DBG("snapshot rules list is empty\n");
+	} else if (err) {
+		SSDFS_ERR("fail to prepare snapshot rules: err %d\n", err);
+		return err;
+	} else
+		flags |= SSDFS_LOG_FOOTER_HAS_SNAPSHOT_RULES;
+
+	ssdfs_memcpy(footer->desc_array, 0, footer_array_bytes,
+		     footer_desc, 0, footer_array_bytes,
+		     footer_array_bytes);
+
+	err = ssdfs_prepare_log_footer_for_commit(fsi, PAGE_SIZE,
+						  log_pages_count,
+						  flags, timestamp,
+						  cno, footer);
+	if (err) {
+		SSDFS_ERR("fail to prepare log footer: err %d\n", err);
+		return err;
+	}
+
+	if (folio_batch_count(&payload->maptbl_cache.batch) != 1) {
+		SSDFS_WARN("payload contains several memory folios\n");
+		return -ERANGE;
+	}
+
+	inline_capacity = PAGE_SIZE - hdr_size;
+
+	if (payload_size > inline_capacity) {
+		SSDFS_ERR("payload_size %u > inline_capacity %u\n",
+			  payload_size, inline_capacity);
+		return -ERANGE;
+	}
+
+	payload_buf = ssdfs_super_kmalloc(inline_capacity, GFP_KERNEL);
+	if (!payload_buf) {
+		SSDFS_ERR("fail to allocate payload buffer\n");
+		return -ENOMEM;
+	}
+
+	folio = ssdfs_super_alloc_folio(GFP_KERNEL | __GFP_ZERO,
+					get_order(PAGE_SIZE));
+	if (IS_ERR_OR_NULL(folio)) {
+		err = (folio == NULL ? -ENOMEM : PTR_ERR(folio));
+		SSDFS_ERR("unable to allocate memory folio\n");
+		ssdfs_super_kfree(payload_buf);
+		return err;
+	}
+
+	/* ->writepage() calls put_folio() */
+	ssdfs_folio_get(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio %p, count %d\n",
+		  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	payload_folio = payload->maptbl_cache.batch.folios[0];
+	if (!payload_folio) {
+		err = -ERANGE;
+		SSDFS_ERR("invalid payload folio\n");
+		goto free_payload_buffer;
+	}
+
+	ssdfs_folio_lock(payload_folio);
+	err = __ssdfs_memcpy_from_folio(payload_buf, 0, inline_capacity,
+					payload_folio, 0, PAGE_SIZE,
+					payload_size);
+	ssdfs_folio_unlock(payload_folio);
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to copy: err %d\n", err);
+		goto free_payload_buffer;
+	}
+
+	/* write segment header + payload */
+	ssdfs_folio_lock(folio);
+	kaddr = kmap_local_folio(folio, 0);
+	ssdfs_memcpy(kaddr, 0, PAGE_SIZE,
+		     fsi->sbi.vh_buf, 0, hdr_size,
+		     hdr_size);
+	err = ssdfs_memcpy(kaddr, hdr_size, PAGE_SIZE,
+			   payload_buf, 0, inline_capacity,
+			   payload_size);
+	flush_dcache_folio(folio);
+	kunmap_local(kaddr);
+	if (!err) {
+		folio_mark_uptodate(folio);
+		folio_set_dirty(folio);
+	}
+	ssdfs_folio_unlock(folio);
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to copy: err %d\n", err);
+		goto free_payload_buffer;
+	}
+
+free_payload_buffer:
+	ssdfs_super_kfree(payload_buf);
+
+	if (unlikely(err))
+		goto cleanup_after_failure;
+
+	peb_offset = last_sb_log->peb_id * fsi->pages_per_peb;
+	peb_offset <<= fsi->log_pagesize;
+	sb_offset = (loff_t)last_sb_log->page_offset << PAGE_SHIFT;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(peb_offset > (ULLONG_MAX - (sb_offset + PAGE_SIZE)));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	sb_offset += peb_offset;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("offset %llu\n", sb_offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = fsi->devops->write_block(sb, sb_offset, folio);
+	if (err) {
+		SSDFS_ERR("fail to write segment header: "
+			  "offset %llu, size %zu\n",
+			  (u64)sb_offset, hdr_size + payload_size);
+		goto cleanup_after_failure;
+	}
+
+	if (fsi->sb_snapi.need_snapshot_sb) {
+		struct ssdfs_peb_extent *last_sb_snap_log;
+
+		last_sb_snap_log = &fsi->sb_snapi.last_log;
+
+		peb_offset = last_sb_snap_log->peb_id * fsi->pages_per_peb;
+		peb_offset <<= fsi->log_pagesize;
+		sb_snap_offset =
+			(loff_t)last_sb_snap_log->page_offset << PAGE_SHIFT;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(peb_offset > (ULLONG_MAX - (sb_snap_offset + PAGE_SIZE)));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		sb_snap_offset += peb_offset;
+
+		/* ->writepage() calls put_folio() */
+		ssdfs_folio_get(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("offset %llu\n", sb_snap_offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		ssdfs_folio_lock(folio);
+		hdr = SSDFS_SEG_HDR(kmap_local_folio(folio, 0));
+		hdr->seg_id = cpu_to_le64(SSDFS_INITIAL_SNAPSHOT_SEG_ID);
+		hdr->leb_id = cpu_to_le64(SSDFS_INITIAL_SNAPSHOT_SEG_LEB_ID);
+		hdr->peb_id = cpu_to_le64(SSDFS_INITIAL_SNAPSHOT_SEG_PEB_ID);
+		hdr->seg_type = cpu_to_le16(SSDFS_INITIAL_SNAPSHOT_SEG_TYPE);
+		hdr->seg_flags = cpu_to_le32(SSDFS_LOG_HAS_FOOTER);
+		hdr->volume_hdr.check.bytes =
+			cpu_to_le16(sizeof(struct ssdfs_segment_header));
+		hdr->volume_hdr.check.flags = cpu_to_le16(SSDFS_CRC32);
+		err = ssdfs_calculate_csum(&hdr->volume_hdr.check,
+					   hdr,
+					   sizeof(struct ssdfs_segment_header));
+		if (unlikely(err)) {
+			SSDFS_ERR("unable to calculate checksum: err %d\n", err);
+		} else {
+			folio_mark_uptodate(folio);
+			folio_set_dirty(folio);
+		}
+		kunmap_local(hdr);
+		ssdfs_folio_unlock(folio);
+
+		if (err)
+			goto cleanup_after_failure;
+
+		err = fsi->devops->write_block(sb, sb_snap_offset, folio);
+		if (err) {
+			SSDFS_ERR("fail to write segment header: "
+				  "offset %llu, size %zu\n",
+				  (u64)sb_snap_offset,
+				  hdr_size + payload_size);
+			goto cleanup_after_failure;
+		}
+	}
+
+	ssdfs_folio_lock(folio);
+	folio_clear_uptodate(folio);
+	ssdfs_folio_unlock(folio);
+
+	/* ->writepage() calls put_folio() */
+	ssdfs_folio_get(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio %p, count %d\n",
+		  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	/* write log footer */
+	ssdfs_folio_lock(folio);
+	__ssdfs_memset_folio(folio, 0, PAGE_SIZE,
+			     0, PAGE_SIZE);
+	__ssdfs_memcpy_to_folio(folio, 0, PAGE_SIZE,
+				fsi->sbi.vs_buf, 0, fsi->sbi.vs_buf_size,
+				PAGE_SIZE);
+	folio_mark_uptodate(folio);
+	folio_set_dirty(folio);
+	ssdfs_folio_unlock(folio);
+
+	sb_offset += PAGE_SIZE;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("offset %llu\n", sb_offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = fsi->devops->write_block(sb, sb_offset, folio);
+	if (err) {
+		SSDFS_ERR("fail to write log footer: "
+			  "offset %llu, size %zu\n",
+			  (u64)sb_offset, fsi->sbi.vs_buf_size);
+		goto cleanup_after_failure;
+	}
+
+	if (fsi->sb_snapi.need_snapshot_sb) {
+		sb_snap_offset += PAGE_SIZE;
+
+		/* ->writepage() calls put_folio() */
+		ssdfs_folio_get(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("offset %llu\n", sb_snap_offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		ssdfs_folio_lock(folio);
+		folio_mark_uptodate(folio);
+		folio_set_dirty(folio);
+		ssdfs_folio_unlock(folio);
+
+		err = fsi->devops->write_block(sb, sb_snap_offset, folio);
+		if (err) {
+			SSDFS_ERR("fail to write log footer: "
+				  "offset %llu, size %zu\n",
+				  (u64)sb_snap_offset, fsi->sbi.vs_buf_size);
+			goto cleanup_after_failure;
+		}
+	}
+
+	ssdfs_folio_lock(folio);
+	folio_clear_uptodate(folio);
+	ssdfs_folio_unlock(folio);
+
+	fsi->sb_snapi.need_snapshot_sb = false;
+
+	ssdfs_super_free_folio(folio);
+	return 0;
+
+cleanup_after_failure:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio %p, count %d\n",
+		  folio, folio_ref_count(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_super_free_folio(folio);
+
+	return err;
+}
+
+static int ssdfs_commit_sb_log(struct super_block *sb,
+				u64 timestamp, u64 cno,
+				struct ssdfs_peb_extent *last_sb_log,
+				struct ssdfs_sb_log_payload *payload)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	u32 inline_capacity;
+	u32 payload_size;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!sb || !last_sb_log || !payload);
+
+	SSDFS_DBG("sb %p, last_sb_log->leb_id %llu, last_sb_log->peb_id %llu, "
+		  "last_sb_log->page_offset %u, last_sb_log->pages_count %u\n",
+		  sb, last_sb_log->leb_id, last_sb_log->peb_id,
+		  last_sb_log->page_offset, last_sb_log->pages_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	inline_capacity = PAGE_SIZE - hdr_size;
+	payload_size = ssdfs_sb_payload_size(&payload->maptbl_cache.batch);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("inline_capacity %u, payload_size %u\n",
+		  inline_capacity, payload_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (fsi->sb_snapi.req != NULL) {
+		struct ssdfs_segment_request *req = fsi->sb_snapi.req;
+
+		switch (atomic_read(&req->result.state)) {
+		case SSDFS_REQ_CREATED:
+		case SSDFS_REQ_STARTED:
+			fsi->sb_snapi.need_snapshot_sb = false;
+			break;
+
+		case SSDFS_REQ_FINISHED:
+			ssdfs_request_free(fsi->sb_snapi.req, NULL);
+			fsi->sb_snapi.req = NULL;
+			break;
+
+		case SSDFS_REQ_FAILED:
+			SSDFS_ERR("erase and re-write request failed\n");
+			ssdfs_request_free(fsi->sb_snapi.req, NULL);
+			fsi->sb_snapi.req = NULL;
+			fsi->sb_snapi.need_snapshot_sb = false;
+			break;
+
+		default:
+			SSDFS_ERR("invalid result's state %#x\n",
+				  atomic_read(&req->result.state));
+			ssdfs_request_free(fsi->sb_snapi.req, NULL);
+			fsi->sb_snapi.req = NULL;
+			fsi->sb_snapi.need_snapshot_sb = false;
+			break;
+		}
+	}
+
+	if (payload_size > inline_capacity) {
+		err = __ssdfs_commit_sb_log(sb, timestamp, cno,
+					    last_sb_log, payload);
+	} else {
+		err = __ssdfs_commit_sb_log_inline(sb, timestamp, cno,
+						   last_sb_log,
+						   payload, payload_size);
+	}
+
+	if (unlikely(err))
+		SSDFS_ERR("fail to commit sb log: err %d\n", err);
+
+	return err;
+}
+
 static
 int ssdfs_commit_super(struct super_block *sb, u16 fs_state,
 			struct ssdfs_peb_extent *last_sb_log,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 08/79] ssdfs: segment header + log footer operations
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (5 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 07/79] ssdfs: implement commit superblock logic Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 15/79] ssdfs: introduce PEB's block bitmap Viacheslav Dubeyko
                   ` (25 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

SSDFS is Log-structured File System (LFS). It means that volume
is a sequence of segments that can contain one or several erase
blocks. Any write operation into erase block is a creation of log.
And content of every erase block is a sequence of logs. Log can be
full or partial. Full log starts by header and it is finished by footer.
The size of full log is fixed and it is defined during mkfs phase.
However, tunefs tool can change this value. But if commit operation
has not enough data to prepare the full log, then partial log is created.
Partial log starts with partial log header and it hasn't footer.
The partial log can be imagined like mixture of segment header and
log footer.

Segment header can be considered like static superblock info.
It contains metadata that not changed at all after volume
creation (logical block size, for example) or changed rarely
(number of segments in the volume, for example). Log footer
can be considered like dynamic part of superblock because
it contains frequently updated metadata (for example, root
node of inodes b-tree).

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/log_footer.c    |  991 ++++++++++++++++++++++++++
 fs/ssdfs/volume_header.c | 1431 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 2422 insertions(+)
 create mode 100644 fs/ssdfs/log_footer.c
 create mode 100644 fs/ssdfs/volume_header.c

diff --git a/fs/ssdfs/log_footer.c b/fs/ssdfs/log_footer.c
new file mode 100644
index 000000000000..75ec1c470bad
--- /dev/null
+++ b/fs/ssdfs/log_footer.c
@@ -0,0 +1,991 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/log_footer.c - operations with log footer.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/kernel.h>
+#include <linux/rwsem.h>
+#include <linux/pagevec.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "folio_vector.h"
+#include "ssdfs.h"
+#include "segment_bitmap.h"
+#include "folio_array.h"
+#include "peb.h"
+#include "offset_translation_table.h"
+#include "peb_container.h"
+#include "segment.h"
+#include "current_segment.h"
+
+#include <trace/events/ssdfs.h>
+
+/*
+ * is_ssdfs_volume_state_info_consistent() - check volume state consistency
+ * @fsi: pointer on shared file system object
+ * @buf: log header
+ * @footer: log footer
+ * @dev_size: partition size in bytes
+ *
+ * RETURN:
+ * [true]  - volume state metadata is consistent.
+ * [false] - volume state metadata is corrupted.
+ */
+bool is_ssdfs_volume_state_info_consistent(struct ssdfs_fs_info *fsi,
+					   void *buf,
+					   struct ssdfs_log_footer *footer,
+					   u64 dev_size)
+{
+	struct ssdfs_signature *magic;
+	u64 nsegs;
+	u64 free_pages;
+	u8 log_segsize = U8_MAX;
+	u32 seg_size = U32_MAX;
+	u32 page_size = U32_MAX;
+	u64 cno = U64_MAX;
+	u16 log_pages = U16_MAX;
+	u32 log_bytes = U32_MAX;
+	u64 pages_count;
+	u32 pages_per_seg;
+	u32 remainder;
+	u16 fs_state;
+	u16 fs_errors;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!buf || !footer);
+
+	SSDFS_DBG("buf %p, footer %p, dev_size %llu\n",
+		  buf, footer, dev_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	magic = (struct ssdfs_signature *)buf;
+
+	if (!is_ssdfs_magic_valid(magic)) {
+		SSDFS_DBG("valid magic is not detected\n");
+		return false;
+	}
+
+	if (__is_ssdfs_segment_header_magic_valid(magic)) {
+		struct ssdfs_segment_header *hdr;
+		struct ssdfs_volume_header *vh;
+
+		hdr = SSDFS_SEG_HDR(buf);
+		vh = SSDFS_VH(buf);
+
+		log_segsize = vh->log_segsize;
+		seg_size = 1 << vh->log_segsize;
+		page_size = 1 << vh->log_pagesize;
+		cno = le64_to_cpu(hdr->cno);
+		log_pages = le16_to_cpu(hdr->log_pages);
+	} else if (is_ssdfs_partial_log_header_magic_valid(magic)) {
+		struct ssdfs_partial_log_header *pl_hdr;
+
+		pl_hdr = SSDFS_PLH(buf);
+
+		log_segsize = pl_hdr->log_segsize;
+		seg_size = 1 << pl_hdr->log_segsize;
+		page_size = 1 << pl_hdr->log_pagesize;
+		cno = le64_to_cpu(pl_hdr->cno);
+		log_pages = le16_to_cpu(pl_hdr->log_pages);
+	} else {
+		SSDFS_DBG("log header is corrupted\n");
+		return false;
+	}
+
+	nsegs = le64_to_cpu(footer->volume_state.nsegs);
+
+	if (nsegs == 0 || nsegs > (dev_size >> log_segsize)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("invalid nsegs %llu, dev_size %llu, seg_size) %u\n",
+			  nsegs, dev_size, seg_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	free_pages = le64_to_cpu(footer->volume_state.free_pages);
+
+	pages_count = div_u64_rem(dev_size, page_size, &remainder);
+	if (remainder) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("dev_size %llu is unaligned on page_size %u\n",
+			  dev_size, page_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+	if (free_pages > pages_count) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("free_pages %llu is greater than pages_count %llu\n",
+			  free_pages, pages_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	pages_per_seg = seg_size / page_size;
+	if (nsegs < div_u64(free_pages, pages_per_seg)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("invalid nsegs %llu, free_pages %llu, "
+			  "pages_per_seg %u\n",
+			  nsegs, free_pages, pages_per_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	if (cno > le64_to_cpu(footer->cno)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("create_cno %llu is greater than write_cno %llu\n",
+			  cno, le64_to_cpu(footer->cno));
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	log_bytes = (u32)log_pages * fsi->pagesize;
+	if (le32_to_cpu(footer->log_bytes) > log_bytes) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("hdr log_bytes %u > footer log_bytes %u\n",
+			  log_bytes,
+			  le32_to_cpu(footer->log_bytes));
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	fs_state = le16_to_cpu(footer->volume_state.state);
+	if (fs_state > SSDFS_LAST_KNOWN_FS_STATE) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unknown FS state %#x\n",
+			  fs_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	fs_errors = le16_to_cpu(footer->volume_state.errors);
+	if (fs_errors > SSDFS_LAST_KNOWN_FS_ERROR) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unknown FS error %#x\n",
+			  fs_errors);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * ssdfs_check_log_footer() - check log footer consistency
+ * @fsi: pointer on shared file system object
+ * @buf: log header
+ * @footer: log footer
+ * @silent: show error or not?
+ *
+ * This function checks consistency of log footer.
+ *
+ * RETURN:
+ * [success] - log footer is consistent.
+ * [failure] - error code:
+ *
+ * %-ENODATA     - valid magic doesn't detected.
+ * %-EIO         - log footer is corrupted.
+ */
+int ssdfs_check_log_footer(struct ssdfs_fs_info *fsi,
+			   void *buf,
+			   struct ssdfs_log_footer *footer,
+			   bool silent)
+{
+	struct ssdfs_volume_state *vs;
+	size_t footer_size = sizeof(struct ssdfs_log_footer);
+	u64 dev_size;
+	bool major_magic_valid, minor_magic_valid;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !buf || !footer);
+
+	SSDFS_DBG("fsi %p, buf %p, footer %p, silent %#x\n",
+		  fsi, buf, footer, silent);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	vs = SSDFS_VS(footer);
+
+	major_magic_valid = is_ssdfs_magic_valid(&vs->magic);
+	minor_magic_valid = is_ssdfs_log_footer_magic_valid(footer);
+
+	if (!major_magic_valid && !minor_magic_valid) {
+		if (!silent)
+			SSDFS_ERR("valid magic doesn't detected\n");
+		else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("valid magic doesn't detected\n");
+#else
+			SSDFS_DBG("valid magic doesn't detected\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -ENODATA;
+	} else if (!major_magic_valid) {
+		if (!silent)
+			SSDFS_ERR("invalid SSDFS magic signature\n");
+		else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("invalid SSDFS magic signature\n");
+#else
+			SSDFS_DBG("invalid SSDFS magic signature\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	} else if (!minor_magic_valid) {
+		if (!silent)
+			SSDFS_ERR("invalid log footer magic signature\n");
+		else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("invalid log footer magic signature\n");
+#else
+			SSDFS_DBG("invalid log footer magic signature\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	if (!is_ssdfs_log_footer_csum_valid(footer, footer_size)) {
+		if (!silent)
+			SSDFS_ERR("invalid checksum of log footer\n");
+		else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("invalid checksum of log footer\n");
+#else
+			SSDFS_DBG("invalid checksum of log footer\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	dev_size = fsi->devops->device_size(fsi->sb);
+	if (!is_ssdfs_volume_state_info_consistent(fsi, buf,
+						   footer, dev_size)) {
+		if (!silent)
+			SSDFS_ERR("log footer is corrupted\n");
+		else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("log footer is corrupted\n");
+#else
+			SSDFS_DBG("log footer is corrupted\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	if (le32_to_cpu(footer->log_flags) & ~SSDFS_LOG_FOOTER_FLAG_MASK) {
+		if (!silent) {
+			SSDFS_ERR("corrupted log_flags %#x\n",
+				  le32_to_cpu(footer->log_flags));
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("corrupted log_flags %#x\n",
+				  le32_to_cpu(footer->log_flags));
+#else
+			SSDFS_DBG("corrupted log_flags %#x\n",
+				  le32_to_cpu(footer->log_flags));
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_read_unchecked_log_footer() - read log footer without check
+ * @fsi: pointer on shared file system object
+ * @peb_id: PEB identification number
+ * @block_size: block size in bytes
+ * @bytes_off: offset inside PEB in bytes
+ * @buf: buffer for log footer
+ * @silent: show error or not?
+ * @log_pages: number of pages in the log
+ *
+ * This function reads log footer without
+ * the consistency check.
+ *
+ * RETURN:
+ * [success] - log footer is consistent.
+ * [failure] - error code:
+ *
+ * %-ENODATA     - valid magic doesn't detected.
+ * %-EIO         - log footer is corrupted.
+ */
+int ssdfs_read_unchecked_log_footer(struct ssdfs_fs_info *fsi,
+				    u64 peb_id, u32 block_size, u32 bytes_off,
+				    void *buf, bool silent,
+				    u32 *log_pages)
+{
+	struct ssdfs_signature *magic;
+	struct ssdfs_log_footer *footer;
+	struct ssdfs_volume_state *vs;
+	size_t footer_size = sizeof(struct ssdfs_log_footer);
+	struct ssdfs_partial_log_header *pl_hdr;
+	size_t hdr_size = sizeof(struct ssdfs_partial_log_header);
+	bool major_magic_valid, minor_magic_valid;
+	u32 pages_per_peb;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !fsi->devops->read);
+	BUG_ON(!buf || !log_pages);
+	BUG_ON(bytes_off >= fsi->erasesize);
+
+	SSDFS_DBG("peb_id %llu, block_size %u, bytes_off %u, buf %p\n",
+		  peb_id, block_size, bytes_off, buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*log_pages = U32_MAX;
+	pages_per_peb = fsi->erasesize / block_size;
+
+	err = ssdfs_unaligned_read_buffer(fsi, peb_id, block_size, bytes_off,
+					  buf, footer_size);
+	if (unlikely(err)) {
+		if (!silent) {
+			SSDFS_ERR("fail to read log footer: "
+				  "peb_id %llu, block_size %u, "
+				  "bytes_off %u, err %d\n",
+				  peb_id, block_size, bytes_off, err);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to read log footer: "
+				  "peb_id %llu, block_size %u, "
+				  "bytes_off %u, err %d\n",
+				  peb_id, block_size, bytes_off, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return err;
+	}
+
+	magic = (struct ssdfs_signature *)buf;
+
+	if (!is_ssdfs_magic_valid(magic)) {
+		if (!silent)
+			SSDFS_ERR("valid magic is not detected\n");
+		else
+			SSDFS_DBG("valid magic is not detected\n");
+
+		return -ENODATA;
+	}
+
+	if (__is_ssdfs_log_footer_magic_valid(magic)) {
+		footer = SSDFS_LF(buf);
+		vs = SSDFS_VS(footer);
+
+		major_magic_valid = is_ssdfs_magic_valid(&vs->magic);
+		minor_magic_valid = is_ssdfs_log_footer_magic_valid(footer);
+
+		if (!major_magic_valid && !minor_magic_valid) {
+			if (!silent)
+				SSDFS_ERR("valid magic doesn't detected\n");
+			else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_ERR("valid magic doesn't detected\n");
+#else
+				SSDFS_DBG("valid magic doesn't detected\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+			return -ENODATA;
+		} else if (!major_magic_valid) {
+			if (!silent)
+				SSDFS_ERR("invalid SSDFS magic signature\n");
+			else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_ERR("invalid SSDFS magic signature\n");
+#else
+				SSDFS_DBG("invalid SSDFS magic signature\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+			return -EIO;
+		} else if (!minor_magic_valid) {
+			if (!silent)
+				SSDFS_ERR("invalid log footer magic\n");
+			else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_ERR("invalid log footer magic\n");
+#else
+				SSDFS_DBG("invalid log footer magic\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+			return -EIO;
+		}
+
+		if (!is_ssdfs_log_footer_csum_valid(footer, footer_size)) {
+			if (!silent)
+				SSDFS_ERR("invalid checksum of log footer\n");
+			else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_ERR("invalid checksum of log footer\n");
+#else
+				SSDFS_DBG("invalid checksum of log footer\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+			return -EIO;
+		}
+
+		*log_pages = le32_to_cpu(footer->log_bytes);
+		*log_pages /= block_size;
+
+		if (*log_pages == 0 || *log_pages >= pages_per_peb) {
+			if (!silent)
+				SSDFS_ERR("invalid log pages %u\n", *log_pages);
+			else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_ERR("invalid log pages %u\n", *log_pages);
+#else
+				SSDFS_DBG("invalid log pages %u\n", *log_pages);
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+			return -EIO;
+		}
+	} else if (is_ssdfs_partial_log_header_magic_valid(magic)) {
+		pl_hdr = SSDFS_PLH(buf);
+
+		major_magic_valid = is_ssdfs_magic_valid(&pl_hdr->magic);
+		minor_magic_valid =
+			is_ssdfs_partial_log_header_magic_valid(&pl_hdr->magic);
+
+		if (!major_magic_valid && !minor_magic_valid) {
+			if (!silent)
+				SSDFS_ERR("valid magic doesn't detected\n");
+			else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_ERR("valid magic doesn't detected\n");
+#else
+				SSDFS_DBG("valid magic doesn't detected\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+			return -ENODATA;
+		} else if (!major_magic_valid) {
+			if (!silent)
+				SSDFS_ERR("invalid SSDFS magic signature\n");
+			else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_ERR("invalid SSDFS magic signature\n");
+#else
+				SSDFS_DBG("invalid SSDFS magic signature\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+			return -EIO;
+		} else if (!minor_magic_valid) {
+			if (!silent)
+				SSDFS_ERR("invalid partial log header magic\n");
+			else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_ERR("invalid partial log header magic\n");
+#else
+				SSDFS_DBG("invalid partial log header magic\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+			return -EIO;
+		}
+
+		if (!is_ssdfs_partial_log_header_csum_valid(pl_hdr, hdr_size)) {
+			if (!silent)
+				SSDFS_ERR("invalid checksum of footer\n");
+			else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_ERR("invalid checksum of footer\n");
+#else
+				SSDFS_DBG("invalid checksum of footer\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+			return -EIO;
+		}
+
+		*log_pages = le32_to_cpu(pl_hdr->log_bytes);
+		*log_pages /= block_size;
+
+		if (*log_pages == 0 || *log_pages >= pages_per_peb) {
+			if (!silent)
+				SSDFS_ERR("invalid log pages %u\n", *log_pages);
+			else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_ERR("invalid log pages %u\n", *log_pages);
+#else
+				SSDFS_DBG("invalid log pages %u\n", *log_pages);
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+			return -EIO;
+		}
+	} else {
+		if (!silent) {
+			SSDFS_ERR("log footer is corrupted: "
+				  "peb_id %llu, block_size %u, bytes_off %u\n",
+				  peb_id, block_size, bytes_off);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("log footer is corrupted: "
+				  "peb_id %llu, block_size %u, bytes_off %u\n",
+				  peb_id, block_size, bytes_off);
+#else
+			SSDFS_DBG("log footer is corrupted: "
+				  "peb_id %llu, block_size %u, bytes_off %u\n",
+				  peb_id, block_size, bytes_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_read_checked_log_footer() - read and check log footer
+ * @fsi: pointer on shared file system object
+ * @log_hdr: log header
+ * @peb_id: PEB identification number
+ * @block_size: block size in bytes
+ * @bytes_off: offset inside PEB in bytes
+ * @buf: buffer for log footer
+ * @silent: show error or not?
+ *
+ * This function reads and checks consistency of log footer.
+ *
+ * RETURN:
+ * [success] - log footer is consistent.
+ * [failure] - error code:
+ *
+ * %-ENODATA     - valid magic doesn't detected.
+ * %-EIO         - log footer is corrupted.
+ */
+int ssdfs_read_checked_log_footer(struct ssdfs_fs_info *fsi, void *log_hdr,
+				  u64 peb_id, u32 block_size,
+				  u32 bytes_off, void *buf,
+				  bool silent)
+{
+	struct ssdfs_signature *magic;
+	struct ssdfs_log_footer *footer;
+	struct ssdfs_partial_log_header *pl_hdr;
+	size_t footer_size = sizeof(struct ssdfs_log_footer);
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !fsi->devops->read);
+	BUG_ON(!log_hdr || !buf);
+	BUG_ON(bytes_off >= (fsi->pages_per_peb * fsi->pagesize));
+
+	SSDFS_DBG("peb_id %llu, block_size %u, bytes_off %u, buf %p\n",
+		  peb_id, block_size, bytes_off, buf);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_unaligned_read_buffer(fsi, peb_id, block_size, bytes_off,
+					  buf, footer_size);
+	if (unlikely(err)) {
+		if (!silent) {
+			SSDFS_ERR("fail to read log footer: "
+				  "peb_id %llu, block_size %u, "
+				  "bytes_off %u, err %d\n",
+				  peb_id, block_size, bytes_off, err);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("fail to read log footer: "
+				  "peb_id %llu, block_size %u, "
+				  "bytes_off %u, err %d\n",
+				  peb_id, block_size, bytes_off, err);
+#else
+			SSDFS_DBG("fail to read log footer: "
+				  "peb_id %llu, block_size %u, "
+				  "bytes_off %u, err %d\n",
+				  peb_id, block_size, bytes_off, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return err;
+	}
+
+	magic = (struct ssdfs_signature *)buf;
+
+	if (!is_ssdfs_magic_valid(magic)) {
+		if (!silent)
+			SSDFS_ERR("valid magic is not detected\n");
+		else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("valid magic is not detected\n");
+#else
+			SSDFS_DBG("valid magic is not detected\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		return -ENODATA;
+	}
+
+	if (__is_ssdfs_log_footer_magic_valid(magic)) {
+		footer = SSDFS_LF(buf);
+
+		err = ssdfs_check_log_footer(fsi, log_hdr, footer, silent);
+		if (err) {
+			if (!silent) {
+				SSDFS_ERR("log footer is corrupted: "
+					  "peb_id %llu, bytes_off %u, err %d\n",
+					  peb_id, bytes_off, err);
+			} else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_ERR("log footer is corrupted: "
+					  "peb_id %llu, bytes_off %u, err %d\n",
+					  peb_id, bytes_off, err);
+#else
+				SSDFS_DBG("log footer is corrupted: "
+					  "peb_id %llu, bytes_off %u, err %d\n",
+					  peb_id, bytes_off, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+			return err;
+		}
+	} else if (is_ssdfs_partial_log_header_magic_valid(magic)) {
+		pl_hdr = SSDFS_PLH(buf);
+
+		err = ssdfs_check_partial_log_header(fsi, pl_hdr, silent);
+		if (unlikely(err)) {
+			if (!silent) {
+				SSDFS_ERR("partial log header is corrupted: "
+					  "peb_id %llu, bytes_off %u\n",
+					  peb_id, bytes_off);
+			} else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_ERR("partial log header is corrupted: "
+					  "peb_id %llu, bytes_off %u\n",
+					  peb_id, bytes_off);
+#else
+				SSDFS_DBG("partial log header is corrupted: "
+					  "peb_id %llu, bytes_off %u\n",
+					  peb_id, bytes_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+
+			return err;
+		}
+	} else {
+		if (!silent) {
+			SSDFS_ERR("log footer is corrupted: "
+				  "peb_id %llu, bytes_off %u\n",
+				  peb_id, bytes_off);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("log footer is corrupted: "
+				  "peb_id %llu, bytes_off %u\n",
+				  peb_id, bytes_off);
+#else
+			SSDFS_DBG("log footer is corrupted: "
+				  "peb_id %llu, bytes_off %u\n",
+				  peb_id, bytes_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_store_nsegs() - store volume's segments number in volume state
+ * @fsi: pointer on shared file system object
+ * @vs: volume state [out]
+ *
+ * This function stores volume's segments number in volume state.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-ENOLCK     - volume is under resize.
+ */
+static inline
+int ssdfs_store_nsegs(struct ssdfs_fs_info *fsi,
+			struct ssdfs_volume_state *vs)
+{
+	mutex_lock(&fsi->resize_mutex);
+	vs->nsegs = cpu_to_le64(fsi->nsegs);
+	mutex_unlock(&fsi->resize_mutex);
+
+	return 0;
+}
+
+/*
+ * ssdfs_prepare_current_segment_ids() - prepare current segment IDs
+ * @fsi: pointer on shared file system object
+ * @array: pointer on array of IDs [out]
+ * @size: size the array in bytes
+ *
+ * This function prepares the current segment IDs.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code.
+ */
+int ssdfs_prepare_current_segment_ids(struct ssdfs_fs_info *fsi,
+					__le64 *array,
+					size_t size)
+{
+	size_t count = size / sizeof(__le64);
+	int i;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !array);
+
+	SSDFS_DBG("fsi %p, array %p, size %zu\n",
+		  fsi, array, size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (size != (sizeof(__le64) * SSDFS_CUR_SEGS_COUNT)) {
+		SSDFS_ERR("invalid array size %zu\n",
+			  size);
+		return -EINVAL;
+	}
+
+	memset(array, 0xFF, size);
+
+	if (fsi->cur_segs) {
+		down_read(&fsi->cur_segs->lock);
+		for (i = 0; i < count; i++) {
+			struct ssdfs_segment_info *real_seg;
+			u64 seg;
+
+			if (!fsi->cur_segs->objects[i])
+				continue;
+
+			ssdfs_current_segment_lock(fsi->cur_segs->objects[i]);
+
+			real_seg = fsi->cur_segs->objects[i]->real_seg;
+			if (real_seg) {
+				seg = real_seg->seg_id;
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("index %d, seg_id %llu\n",
+					  i, seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+				array[i] = cpu_to_le64(seg);
+			} else {
+				seg = fsi->cur_segs->objects[i]->seg_id;
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("index %d, seg_id %llu\n",
+					  i, seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+				array[i] = cpu_to_le64(seg);
+			}
+
+			ssdfs_current_segment_unlock(fsi->cur_segs->objects[i]);
+		}
+		up_read(&fsi->cur_segs->lock);
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_prepare_volume_state_info_for_commit() - prepare volume state
+ * @fsi: pointer on shared file system object
+ * @fs_state: file system state
+ * @array: pointer on array of IDs
+ * @size: size the array in bytes
+ * @last_log_time: log creation timestamp
+ * @last_log_cno: last log checkpoint
+ * @vs: volume state [out]
+ *
+ * This function prepares volume state info for commit.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code.
+ */
+int ssdfs_prepare_volume_state_info_for_commit(struct ssdfs_fs_info *fsi,
+						u16 fs_state,
+						__le64 *cur_segs,
+						size_t size,
+						u64 last_log_time,
+						u64 last_log_cno,
+						struct ssdfs_volume_state *vs)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !vs);
+
+	SSDFS_DBG("fsi %p, fs_state %#x\n", fsi, fs_state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (size != (sizeof(__le64) * SSDFS_CUR_SEGS_COUNT)) {
+		SSDFS_ERR("invalid array size %zu\n",
+			  size);
+		return -EINVAL;
+	}
+
+	err = ssdfs_store_nsegs(fsi, vs);
+	if (err) {
+		SSDFS_DBG("unable to store segments number: err %d\n", err);
+		return err;
+	}
+
+	vs->magic.common = cpu_to_le32(SSDFS_SUPER_MAGIC);
+	vs->magic.version.major = SSDFS_MAJOR_REVISION;
+	vs->magic.version.minor = SSDFS_MINOR_REVISION;
+
+	spin_lock(&fsi->volume_state_lock);
+
+	fsi->fs_mod_time = last_log_time;
+	fsi->fs_state = fs_state;
+
+	vs->free_pages = cpu_to_le64(fsi->free_pages);
+	vs->timestamp = cpu_to_le64(last_log_time);
+	vs->cno = cpu_to_le64(last_log_cno);
+	vs->flags = cpu_to_le32(fsi->fs_flags);
+	vs->state = cpu_to_le16(fs_state);
+	vs->errors = cpu_to_le16(fsi->fs_errors);
+	vs->feature_compat = cpu_to_le64(fsi->fs_feature_compat);
+	vs->feature_compat_ro = cpu_to_le64(fsi->fs_feature_compat_ro);
+	vs->feature_incompat = cpu_to_le64(fsi->fs_feature_incompat);
+
+	ssdfs_memcpy(vs->uuid, 0, SSDFS_UUID_SIZE,
+		     fsi->vs->uuid, 0, SSDFS_UUID_SIZE,
+		     SSDFS_UUID_SIZE);
+	ssdfs_memcpy(vs->label, 0, SSDFS_VOLUME_LABEL_MAX,
+		     fsi->vs->label, 0, SSDFS_VOLUME_LABEL_MAX,
+		     SSDFS_VOLUME_LABEL_MAX);
+	ssdfs_memcpy(vs->cur_segs, 0, size,
+		     cur_segs, 0, size,
+		     size);
+
+	vs->migration_threshold = cpu_to_le16(fsi->migration_threshold);
+	vs->open_zones = cpu_to_le32(atomic_read(&fsi->open_zones));
+
+	spin_unlock(&fsi->volume_state_lock);
+
+	if (atomic_read(&fsi->open_zones) < 0) {
+		SSDFS_ERR("invalid open_zones %d\n",
+			  atomic_read(&fsi->open_zones));
+		return -ERANGE;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("open_zones %d\n",
+		  atomic_read(&fsi->open_zones));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_memcpy(&vs->blkbmap,
+		     0, sizeof(struct ssdfs_blk_bmap_options),
+		     &fsi->vs->blkbmap,
+		     0, sizeof(struct ssdfs_blk_bmap_options),
+		     sizeof(struct ssdfs_blk_bmap_options));
+	ssdfs_memcpy(&vs->blk2off_tbl,
+		     0, sizeof(struct ssdfs_blk2off_tbl_options),
+		     &fsi->vs->blk2off_tbl,
+		     0, sizeof(struct ssdfs_blk2off_tbl_options),
+		     sizeof(struct ssdfs_blk2off_tbl_options));
+
+	ssdfs_memcpy(&vs->user_data,
+		     0, sizeof(struct ssdfs_user_data_options),
+		     &fsi->vs->user_data,
+		     0, sizeof(struct ssdfs_user_data_options),
+		     sizeof(struct ssdfs_user_data_options));
+	ssdfs_memcpy(&vs->root_folder,
+		     0, sizeof(struct ssdfs_inode),
+		     &fsi->vs->root_folder,
+		     0, sizeof(struct ssdfs_inode),
+		     sizeof(struct ssdfs_inode));
+
+	ssdfs_memcpy(&vs->inodes_btree,
+		     0, sizeof(struct ssdfs_inodes_btree),
+		     &fsi->vs->inodes_btree,
+		     0, sizeof(struct ssdfs_inodes_btree),
+		     sizeof(struct ssdfs_inodes_btree));
+	ssdfs_memcpy(&vs->shared_extents_btree,
+		     0, sizeof(struct ssdfs_shared_extents_btree),
+		     &fsi->vs->shared_extents_btree,
+		     0, sizeof(struct ssdfs_shared_extents_btree),
+		     sizeof(struct ssdfs_shared_extents_btree));
+	ssdfs_memcpy(&vs->shared_dict_btree,
+		     0, sizeof(struct ssdfs_shared_dictionary_btree),
+		     &fsi->vs->shared_dict_btree,
+		     0, sizeof(struct ssdfs_shared_dictionary_btree),
+		     sizeof(struct ssdfs_shared_dictionary_btree));
+	ssdfs_memcpy(&vs->snapshots_btree,
+		     0, sizeof(struct ssdfs_snapshots_btree),
+		     &fsi->vs->snapshots_btree,
+		     0, sizeof(struct ssdfs_snapshots_btree),
+		     sizeof(struct ssdfs_snapshots_btree));
+
+	return 0;
+}
+
+/*
+ * ssdfs_prepare_log_footer_for_commit() - prepare log footer for commit
+ * @fsi: pointer on shared file system object
+ * @block_size: block size in bytes
+ * @log_pages: count of pages in the log
+ * @log_flags: log's flags
+ * @last_log_time: log creation timestamp
+ * @last_log_cno: last log checkpoint
+ * @footer: log footer [out]
+ *
+ * This function prepares log footer for commit.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input values.
+ */
+int ssdfs_prepare_log_footer_for_commit(struct ssdfs_fs_info *fsi,
+					u32 block_size,
+					u32 log_pages,
+					u32 log_flags,
+					u64 last_log_time,
+					u64 last_log_cno,
+					struct ssdfs_log_footer *footer)
+{
+	u16 data_size = sizeof(struct ssdfs_log_footer);
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("fsi %p, block_size %u, log_pages %u, "
+		  "log_flags %#x, footer %p\n",
+		  fsi, block_size, log_pages, log_flags, footer);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	footer->volume_state.magic.key = cpu_to_le16(SSDFS_LOG_FOOTER_MAGIC);
+
+	footer->timestamp = cpu_to_le64(last_log_time);
+	footer->cno = cpu_to_le64(last_log_cno);
+
+	if (log_pages >= (U32_MAX / block_size)) {
+		SSDFS_ERR("invalid value of log_pages %u\n", log_pages);
+		return -EINVAL;
+	}
+
+	footer->log_bytes = cpu_to_le32(log_pages * block_size);
+
+	if (log_flags & ~SSDFS_LOG_FOOTER_FLAG_MASK) {
+		SSDFS_ERR("unknow log flags %#x\n", log_flags);
+		return -EINVAL;
+	}
+
+	footer->log_flags = cpu_to_le32(log_flags);
+
+	footer->volume_state.check.bytes = cpu_to_le16(data_size);
+	footer->volume_state.check.flags = cpu_to_le16(SSDFS_CRC32);
+
+	err = ssdfs_calculate_csum(&footer->volume_state.check,
+				   footer, data_size);
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to calculate checksum: err %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
diff --git a/fs/ssdfs/volume_header.c b/fs/ssdfs/volume_header.c
new file mode 100644
index 000000000000..df44fb80e125
--- /dev/null
+++ b/fs/ssdfs/volume_header.c
@@ -0,0 +1,1431 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/volume_header.c - operations with volume header.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/kernel.h>
+#include <linux/rwsem.h>
+#include <linux/pagevec.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "folio_vector.h"
+#include "ssdfs.h"
+
+#include <trace/events/ssdfs.h>
+
+static inline
+void ssdfs_show_volume_header(struct ssdfs_volume_header *hdr)
+{
+	SSDFS_ERR("MAGIC: common %#x, key %#x, "
+		  "version (major %u, minor %u)\n",
+		  le32_to_cpu(hdr->magic.common),
+		  le16_to_cpu(hdr->magic.key),
+		  hdr->magic.version.major,
+		  hdr->magic.version.minor);
+	SSDFS_ERR("CHECK: bytes %u, flags %#x, csum %#x\n",
+		  le16_to_cpu(hdr->check.bytes),
+		  le16_to_cpu(hdr->check.flags),
+		  le32_to_cpu(hdr->check.csum));
+	SSDFS_ERR("KEY VALUES: log_pagesize %u, log_erasesize %u, "
+		  "log_segsize %u, log_pebs_per_seg %u, "
+		  "megabytes_per_peb %u, pebs_per_seg %u, "
+		  "create_time %llu, create_cno %llu, flags %#x\n",
+		  hdr->log_pagesize,
+		  hdr->log_erasesize,
+		  hdr->log_segsize,
+		  hdr->log_pebs_per_seg,
+		  le16_to_cpu(hdr->megabytes_per_peb),
+		  le16_to_cpu(hdr->pebs_per_seg),
+		  le64_to_cpu(hdr->create_time),
+		  le64_to_cpu(hdr->create_cno),
+		  le32_to_cpu(hdr->flags));
+}
+
+/*
+ * is_ssdfs_volume_header_consistent() - check volume header consistency
+ * @fsi: pointer on shared file system object
+ * @vh: volume header
+ * @dev_size: partition size in bytes
+ *
+ * RETURN:
+ * [true]  - volume header is consistent.
+ * [false] - volume header is corrupted.
+ */
+bool is_ssdfs_volume_header_consistent(struct ssdfs_fs_info *fsi,
+					struct ssdfs_volume_header *vh,
+					u64 dev_size)
+{
+	u32 page_size;
+	u64 erase_size;
+	u32 seg_size;
+	u32 pebs_per_seg;
+	u64 leb_array[SSDFS_SB_CHAIN_MAX * SSDFS_SB_SEG_COPY_MAX] = {0};
+	u64 peb_array[SSDFS_SB_CHAIN_MAX * SSDFS_SB_SEG_COPY_MAX] = {0};
+	int array_index = 0;
+	int i, j, k;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!vh);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	page_size = 1 << vh->log_pagesize;
+	erase_size = 1 << vh->log_erasesize;
+	seg_size = 1 << vh->log_segsize;
+	pebs_per_seg = 1 << vh->log_pebs_per_seg;
+
+	if (page_size >= erase_size) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("page_size %u >= erase_size %llu\n",
+			  page_size, erase_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	switch (page_size) {
+	case SSDFS_4KB:
+	case SSDFS_8KB:
+	case SSDFS_16KB:
+	case SSDFS_32KB:
+		/* do nothing */
+		break;
+
+	default:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unexpected page_size %u\n", page_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	switch (erase_size) {
+	case SSDFS_128KB:
+	case SSDFS_256KB:
+	case SSDFS_512KB:
+	case SSDFS_1MB:
+	case SSDFS_2MB:
+	case SSDFS_4MB:
+	case SSDFS_8MB:
+	case SSDFS_16MB:
+	case SSDFS_32MB:
+	case SSDFS_64MB:
+	case SSDFS_128MB:
+	case SSDFS_256MB:
+	case SSDFS_512MB:
+	case SSDFS_1GB:
+	case SSDFS_2GB:
+	case SSDFS_4GB:
+	case SSDFS_8GB:
+	case SSDFS_16GB:
+	case SSDFS_32GB:
+	case SSDFS_64GB:
+		/* do nothing */
+		break;
+
+	default:
+		if (fsi->is_zns_device) {
+			u64 zone_size = le16_to_cpu(vh->megabytes_per_peb);
+
+			zone_size *= SSDFS_1MB;
+
+			if (fsi->zone_size != zone_size) {
+				SSDFS_ERR("invalid zone size: "
+					  "size1 %llu != size2 %llu\n",
+					  fsi->zone_size, zone_size);
+				return -ERANGE;
+			}
+
+			erase_size = zone_size;
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unexpected erase_size %llu\n", erase_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return false;
+		}
+	};
+
+	if (seg_size < erase_size) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("seg_size %u < erase_size %llu\n",
+			  seg_size, erase_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	if (pebs_per_seg != (seg_size >> vh->log_erasesize)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("pebs_per_seg %u != (seg_size %u / erase_size %llu)\n",
+			  pebs_per_seg, seg_size, erase_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	if (seg_size >= dev_size) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("seg_size %u >= dev_size %llu\n",
+			  seg_size, dev_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	for (i = 0; i < SSDFS_SB_CHAIN_MAX; i++) {
+		for (j = 0; j < SSDFS_SB_SEG_COPY_MAX; j++) {
+			u64 leb_id = le64_to_cpu(vh->sb_pebs[i][j].leb_id);
+			u64 peb_id = le64_to_cpu(vh->sb_pebs[i][j].peb_id);
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("i %d, j %d, LEB %llu, PEB %llu\n",
+				  i, j, leb_id, peb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			for (k = 0; k < array_index; k++) {
+				if (leb_id == leb_array[k]) {
+#ifdef CONFIG_SSDFS_DEBUG
+					SSDFS_DBG("corrupted LEB number: "
+						  "leb_id %llu, "
+						  "leb_array[%d] %llu\n",
+						  leb_id, k,
+						  leb_array[k]);
+#endif /* CONFIG_SSDFS_DEBUG */
+					return false;
+				}
+
+				if (peb_id == peb_array[k]) {
+#ifdef CONFIG_SSDFS_DEBUG
+					SSDFS_DBG("corrupted PEB number: "
+						  "peb_id %llu, "
+						  "peb_array[%d] %llu\n",
+						  peb_id, k,
+						  peb_array[k]);
+#endif /* CONFIG_SSDFS_DEBUG */
+					return false;
+				}
+			}
+
+			if (i == SSDFS_PREV_SB_SEG &&
+			    leb_id == U64_MAX && peb_id == U64_MAX) {
+				/* prev id is U64_MAX after volume creation */
+				continue;
+			}
+
+			if (i == SSDFS_RESERVED_SB_SEG &&
+			    leb_id == U64_MAX && peb_id == U64_MAX) {
+				/*
+				 * The reserved seg could be U64_MAX
+				 * if there is no clean segment.
+				 */
+				continue;
+			}
+
+			if (leb_id >= (dev_size >> vh->log_erasesize)) {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("corrupted LEB number %llu\n",
+					  leb_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+				return false;
+			}
+
+			leb_array[array_index] = leb_id;
+			peb_array[array_index] = peb_id;
+
+			array_index++;
+		}
+	}
+
+	return true;
+}
+
+/*
+ * ssdfs_check_segment_header() - check segment header consistency
+ * @fsi: pointer on shared file system object
+ * @hdr: segment header
+ * @silent: show error or not?
+ *
+ * This function checks consistency of segment header.
+ *
+ * RETURN:
+ * [success] - segment header is consistent.
+ * [failure] - error code:
+ *
+ * %-ENODATA     - valid magic doesn't detected.
+ * %-EIO         - segment header is corrupted.
+ * %-ENOENT      - header has older FS creation time.
+ */
+int ssdfs_check_segment_header(struct ssdfs_fs_info *fsi,
+				struct ssdfs_segment_header *hdr,
+				bool silent)
+{
+	struct ssdfs_volume_header *vh;
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	bool major_magic_valid, minor_magic_valid;
+	u64 dev_size;
+	int res;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !hdr);
+
+	SSDFS_DBG("fsi %p, hdr %p, silent %#x\n", fsi, hdr, silent);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	vh = SSDFS_VH(hdr);
+
+	major_magic_valid = is_ssdfs_magic_valid(&vh->magic);
+	minor_magic_valid = is_ssdfs_segment_header_magic_valid(hdr);
+
+	if (!major_magic_valid && !minor_magic_valid) {
+		if (!silent) {
+			SSDFS_ERR("valid magic doesn't detected\n");
+			ssdfs_show_volume_header(vh);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("valid magic doesn't detected\n");
+			ssdfs_show_volume_header(vh);
+#else
+			SSDFS_DBG("valid magic doesn't detected\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -ENODATA;
+	} else if (!major_magic_valid) {
+		if (!silent) {
+			SSDFS_ERR("invalid SSDFS magic signature\n");
+			ssdfs_show_volume_header(vh);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("invalid SSDFS magic signature\n");
+			ssdfs_show_volume_header(vh);
+#else
+			SSDFS_DBG("invalid SSDFS magic signature\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	} else if (!minor_magic_valid) {
+		if (!silent) {
+			SSDFS_ERR("invalid segment header magic signature\n");
+			ssdfs_show_volume_header(vh);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("invalid segment header magic signature\n");
+			ssdfs_show_volume_header(vh);
+#else
+			SSDFS_DBG("invalid segment header magic signature\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	if (!is_ssdfs_volume_header_csum_valid(hdr, hdr_size)) {
+		if (!silent) {
+			SSDFS_ERR("invalid checksum of volume header\n");
+			ssdfs_show_volume_header(vh);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("invalid checksum of volume header\n");
+			ssdfs_show_volume_header(vh);
+#else
+			SSDFS_DBG("invalid checksum of volume header\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	dev_size = fsi->devops->device_size(fsi->sb);
+	if (!is_ssdfs_volume_header_consistent(fsi, vh, dev_size)) {
+		if (!silent) {
+			SSDFS_ERR("volume header is corrupted\n");
+			ssdfs_show_volume_header(vh);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("volume header is corrupted\n");
+			ssdfs_show_volume_header(vh);
+#else
+			SSDFS_DBG("volume header is corrupted\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	if (SSDFS_VH_CNO(vh) > SSDFS_SEG_CNO(hdr)) {
+		if (!silent) {
+			SSDFS_ERR("invalid checkpoint/timestamp\n");
+			ssdfs_show_volume_header(vh);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("invalid checkpoint/timestamp\n");
+			ssdfs_show_volume_header(vh);
+#else
+			SSDFS_DBG("invalid checkpoint/timestamp\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	if (le16_to_cpu(hdr->log_pages) > fsi->pages_per_peb) {
+		if (!silent) {
+			SSDFS_ERR("log_pages %u > pages_per_peb %u\n",
+				  le16_to_cpu(hdr->log_pages),
+				  fsi->pages_per_peb);
+			ssdfs_show_volume_header(vh);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("log_pages %u > pages_per_peb %u\n",
+				  le16_to_cpu(hdr->log_pages),
+				  fsi->pages_per_peb);
+			ssdfs_show_volume_header(vh);
+#else
+			SSDFS_DBG("log_pages %u > pages_per_peb %u\n",
+				  le16_to_cpu(hdr->log_pages),
+				  fsi->pages_per_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	if (le16_to_cpu(hdr->seg_type) > SSDFS_LAST_KNOWN_SEG_TYPE) {
+		if (!silent) {
+			SSDFS_ERR("unknown seg_type %#x\n",
+				  le16_to_cpu(hdr->seg_type));
+			ssdfs_show_volume_header(vh);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("unknown seg_type %#x\n",
+				  le16_to_cpu(hdr->seg_type));
+			ssdfs_show_volume_header(vh);
+#else
+			SSDFS_DBG("unknown seg_type %#x\n",
+				  le16_to_cpu(hdr->seg_type));
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	if (le32_to_cpu(hdr->seg_flags) & ~SSDFS_SEG_HDR_FLAG_MASK) {
+		if (!silent) {
+			SSDFS_ERR("corrupted seg_flags %#x\n",
+				  le32_to_cpu(hdr->seg_flags));
+			ssdfs_show_volume_header(vh);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("corrupted seg_flags %#x\n",
+				  le32_to_cpu(hdr->seg_flags));
+			ssdfs_show_volume_header(vh);
+#else
+			SSDFS_DBG("corrupted seg_flags %#x\n",
+				  le32_to_cpu(hdr->seg_flags));
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	res = ssdfs_compare_fs_ctime(fsi, hdr);
+	if (res < 0) {
+		/* header has younger FS creation time */
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("valid magic is not detected\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -ENODATA;
+	} else if (res > 0) {
+		/* header has older FS creation time */
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("header has older FS creation time\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		fsi->fs_ctime = le64_to_cpu(vh->create_time);
+		spin_lock(&fsi->volume_state_lock);
+		ssdfs_memcpy(fsi->fs_uuid, 0, sizeof(fsi->fs_uuid),
+			     vh->uuid, 0, sizeof(vh->uuid),
+			     sizeof(vh->uuid));
+		spin_unlock(&fsi->volume_state_lock);
+		return -ENOENT;
+	}
+
+	return 0;
+}
+
+/*
+ * is_ssdfs_partial_log_header_consistent() - check partial header consistency
+ * @fsi: pointer on shared file system object
+ * @ph: partial log header
+ * @dev_size: partition size in bytes
+ *
+ * RETURN:
+ * [true]  - partial log header is consistent.
+ * [false] - partial log header is corrupted.
+ */
+static
+bool is_ssdfs_partial_log_header_consistent(struct ssdfs_fs_info *fsi,
+					    struct ssdfs_partial_log_header *ph,
+					    u64 dev_size)
+{
+	u32 page_size;
+	u64 erase_size;
+	u32 seg_size;
+	u32 pebs_per_seg;
+	u64 nsegs;
+	u64 free_pages;
+	u64 pages_count;
+	u32 remainder;
+	u32 pages_per_seg;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!ph);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	page_size = 1 << ph->log_pagesize;
+	erase_size = 1 << ph->log_erasesize;
+	seg_size = 1 << ph->log_segsize;
+	pebs_per_seg = 1 << ph->log_pebs_per_seg;
+
+	if (page_size >= erase_size) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("page_size %u >= erase_size %llu\n",
+			  page_size, erase_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	switch (page_size) {
+	case SSDFS_4KB:
+	case SSDFS_8KB:
+	case SSDFS_16KB:
+	case SSDFS_32KB:
+		/* do nothing */
+		break;
+
+	default:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unexpected page_size %u\n", page_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	switch (erase_size) {
+	case SSDFS_128KB:
+	case SSDFS_256KB:
+	case SSDFS_512KB:
+	case SSDFS_1MB:
+	case SSDFS_2MB:
+	case SSDFS_4MB:
+	case SSDFS_8MB:
+	case SSDFS_16MB:
+	case SSDFS_32MB:
+	case SSDFS_64MB:
+	case SSDFS_128MB:
+	case SSDFS_256MB:
+	case SSDFS_512MB:
+	case SSDFS_1GB:
+	case SSDFS_2GB:
+	case SSDFS_4GB:
+	case SSDFS_8GB:
+	case SSDFS_16GB:
+	case SSDFS_32GB:
+	case SSDFS_64GB:
+		/* do nothing */
+		break;
+
+	default:
+		if (fsi->is_zns_device) {
+			u64 zone_size = le16_to_cpu(fsi->vh->megabytes_per_peb);
+
+			zone_size *= SSDFS_1MB;
+
+			if (fsi->zone_size != zone_size) {
+				SSDFS_ERR("invalid zone size: "
+					  "size1 %llu != size2 %llu\n",
+					  fsi->zone_size, zone_size);
+				return false;
+			}
+
+			erase_size = (u32)zone_size;
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unexpected erase_size %llu\n", erase_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return false;
+		}
+	};
+
+	if (seg_size < erase_size) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("seg_size %u < erase_size %llu\n",
+			  seg_size, erase_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	if (pebs_per_seg != (seg_size >> ph->log_erasesize)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("pebs_per_seg %u != (seg_size %u / erase_size %llu)\n",
+			  pebs_per_seg, seg_size, erase_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	if (seg_size >= dev_size) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("seg_size %u >= dev_size %llu\n",
+			  seg_size, dev_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	nsegs = le64_to_cpu(ph->nsegs);
+
+	if (nsegs == 0 || nsegs > (dev_size >> ph->log_segsize)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("invalid nsegs %llu, dev_size %llu, seg_size) %u\n",
+			  nsegs, dev_size, seg_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	free_pages = le64_to_cpu(ph->free_pages);
+
+	pages_count = div_u64_rem(dev_size, page_size, &remainder);
+	if (remainder) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("dev_size %llu is unaligned on page_size %u\n",
+			  dev_size, page_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+	if (free_pages > pages_count) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("free_pages %llu is greater than pages_count %llu\n",
+			  free_pages, pages_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	pages_per_seg = seg_size / page_size;
+	if (nsegs < div_u64(free_pages, pages_per_seg)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("invalid nsegs %llu, free_pages %llu, "
+			  "pages_per_seg %u\n",
+			  nsegs, free_pages, pages_per_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * ssdfs_check_partial_log_header() - check partial log header consistency
+ * @fsi: pointer on shared file system object
+ * @hdr: partial log header
+ * @silent: show error or not?
+ *
+ * This function checks consistency of partial log header.
+ *
+ * RETURN:
+ * [success] - partial log header is consistent.
+ * [failure] - error code:
+ *
+ * %-ENODATA     - valid magic doesn't detected.
+ * %-EIO         - partial log header is corrupted.
+ * %-ENOENT      - header has older FS creation time.
+ */
+int ssdfs_check_partial_log_header(struct ssdfs_fs_info *fsi,
+				   struct ssdfs_partial_log_header *hdr,
+				   bool silent)
+{
+	size_t hdr_size = sizeof(struct ssdfs_partial_log_header);
+	bool major_magic_valid, minor_magic_valid;
+	u64 dev_size;
+	u32 log_bytes;
+	int res;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !hdr);
+
+	SSDFS_DBG("fsi %p, hdr %p, silent %#x\n", fsi, hdr, silent);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	major_magic_valid = is_ssdfs_magic_valid(&hdr->magic);
+	minor_magic_valid =
+		is_ssdfs_partial_log_header_magic_valid(&hdr->magic);
+
+	if (!major_magic_valid && !minor_magic_valid) {
+		if (!silent)
+			SSDFS_ERR("valid magic doesn't detected\n");
+		else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("valid magic doesn't detected\n");
+#else
+			SSDFS_DBG("valid magic doesn't detected\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -ENODATA;
+	} else if (!major_magic_valid) {
+		if (!silent)
+			SSDFS_ERR("invalid SSDFS magic signature\n");
+		else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("invalid SSDFS magic signature\n");
+#else
+			SSDFS_DBG("invalid SSDFS magic signature\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	} else if (!minor_magic_valid) {
+		if (!silent)
+			SSDFS_ERR("invalid partial log header magic\n");
+		else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("invalid partial log header magic\n");
+#else
+			SSDFS_DBG("invalid partial log header magic\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	if (!is_ssdfs_partial_log_header_csum_valid(hdr, hdr_size)) {
+		if (!silent)
+			SSDFS_ERR("invalid checksum of partial log header\n");
+		else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("invalid checksum of partial log header\n");
+#else
+			SSDFS_DBG("invalid checksum of partial log header\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	dev_size = fsi->devops->device_size(fsi->sb);
+	if (!is_ssdfs_partial_log_header_consistent(fsi, hdr, dev_size)) {
+		if (!silent)
+			SSDFS_ERR("partial log header is corrupted\n");
+		else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("partial log header is corrupted\n");
+#else
+			SSDFS_DBG("partial log header is corrupted\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	if (le16_to_cpu(hdr->log_pages) > fsi->pages_per_peb) {
+		if (!silent) {
+			SSDFS_ERR("log_pages %u > pages_per_peb %u\n",
+				  le16_to_cpu(hdr->log_pages),
+				  fsi->pages_per_peb);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("log_pages %u > pages_per_peb %u\n",
+				  le16_to_cpu(hdr->log_pages),
+				  fsi->pages_per_peb);
+#else
+			SSDFS_DBG("log_pages %u > pages_per_peb %u\n",
+				  le16_to_cpu(hdr->log_pages),
+				  fsi->pages_per_peb);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	log_bytes = (u32)le16_to_cpu(hdr->log_pages) * fsi->pagesize;
+	if (le32_to_cpu(hdr->log_bytes) > log_bytes) {
+		if (!silent) {
+			SSDFS_ERR("calculated log_bytes %u < log_bytes %u\n",
+				  log_bytes,
+				  le32_to_cpu(hdr->log_bytes));
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("calculated log_bytes %u < log_bytes %u\n",
+				  log_bytes,
+				  le32_to_cpu(hdr->log_bytes));
+#else
+			SSDFS_DBG("calculated log_bytes %u < log_bytes %u\n",
+				  log_bytes,
+				  le32_to_cpu(hdr->log_bytes));
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	if (le16_to_cpu(hdr->seg_type) > SSDFS_LAST_KNOWN_SEG_TYPE) {
+		if (!silent) {
+			SSDFS_ERR("unknown seg_type %#x\n",
+				  le16_to_cpu(hdr->seg_type));
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("unknown seg_type %#x\n",
+				  le16_to_cpu(hdr->seg_type));
+#else
+			SSDFS_DBG("unknown seg_type %#x\n",
+				  le16_to_cpu(hdr->seg_type));
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	if (le32_to_cpu(hdr->pl_flags) & ~SSDFS_SEG_HDR_FLAG_MASK) {
+		if (!silent) {
+			SSDFS_ERR("corrupted pl_flags %#x\n",
+				  le32_to_cpu(hdr->pl_flags));
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("corrupted pl_flags %#x\n",
+				  le32_to_cpu(hdr->pl_flags));
+#else
+			SSDFS_DBG("corrupted pl_flags %#x\n",
+				  le32_to_cpu(hdr->pl_flags));
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -EIO;
+	}
+
+	res = ssdfs_compare_fs_ctime(fsi, hdr);
+	if (res < 0) {
+		/* header has younger FS creation time */
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("valid magic is not detected\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -ENODATA;
+	} else if (res > 0) {
+		/* header has older FS creation time */
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("header has older FS creation time\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		fsi->fs_ctime = le64_to_cpu(hdr->volume_create_time);
+		spin_lock(&fsi->volume_state_lock);
+		ssdfs_memcpy(fsi->fs_uuid, 0, sizeof(fsi->fs_uuid),
+			     hdr->uuid, 0, sizeof(hdr->uuid),
+			     sizeof(hdr->uuid));
+		spin_unlock(&fsi->volume_state_lock);
+		return -ENOENT;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_read_checked_segment_header() - read and check segment header
+ * @fsi: pointer on shared file system object
+ * @peb_id: PEB identification number
+ * @block_size: block size in bytes
+ * @pages_off: offset from PEB's begin in pages
+ * @buf: buffer
+ * @silent: show error or not?
+ *
+ * This function reads and checks consistency of segment header.
+ *
+ * RETURN:
+ * [success] - segment header is consistent.
+ * [failure] - error code:
+ *
+ * %-ENODATA     - valid magic doesn't detected.
+ * %-EIO         - segment header is corrupted.
+ * %-ENOENT      - header has older FS creation time.
+ */
+int ssdfs_read_checked_segment_header(struct ssdfs_fs_info *fsi,
+					u64 peb_id, u32 block_size,
+					u32 pages_off,
+					void *buf, bool silent)
+{
+	struct ssdfs_signature *magic;
+	struct ssdfs_segment_header *hdr;
+	struct ssdfs_partial_log_header *pl_hdr;
+	size_t hdr_size = sizeof(struct ssdfs_segment_header);
+	u64 offset = 0;
+	size_t read_bytes;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("peb_id %llu, block_size %u, "
+		  "pages_off %u, buf %p, silent %#x\n",
+		  peb_id, block_size, pages_off, buf, silent);
+
+	BUG_ON(!fsi);
+	BUG_ON(!fsi->devops->read);
+	BUG_ON(!buf);
+	BUG_ON(pages_off >= (fsi->erasesize / block_size));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (peb_id == 0 && pages_off == 0)
+		offset = SSDFS_RESERVED_VBR_SIZE;
+	else
+		offset = (u64)pages_off * block_size;
+
+	err = ssdfs_aligned_read_buffer(fsi, peb_id,
+					block_size, offset,
+					buf, hdr_size,
+					&read_bytes);
+	if (unlikely(err)) {
+		if (!silent) {
+			SSDFS_ERR("fail to read segment header: "
+				  "peb_id %llu, block_size %u, "
+				  "pages_off %u, err %d\n",
+				  peb_id, block_size, pages_off, err);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("fail to read segment header: "
+				  "peb_id %llu, block_size %u, "
+				  "pages_off %u, err %d\n",
+				  peb_id, block_size, pages_off, err);
+#else
+			SSDFS_DBG("fail to read segment header: "
+				  "peb_id %llu, block_size %u, "
+				  "pages_off %u, err %d\n",
+				  peb_id, block_size, pages_off, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return err;
+	}
+
+	if (unlikely(read_bytes != hdr_size)) {
+		if (!silent) {
+			SSDFS_ERR("fail to read segment header: "
+				  "peb_id %llu, pages_off %u: "
+				  "read_bytes %zu != hdr_size %zu\n",
+				  peb_id, pages_off, read_bytes, hdr_size);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("fail to read segment header: "
+				  "peb_id %llu, pages_off %u: "
+				  "read_bytes %zu != hdr_size %zu\n",
+				  peb_id, pages_off, read_bytes, hdr_size);
+#else
+			SSDFS_DBG("fail to read segment header: "
+				  "peb_id %llu, pages_off %u: "
+				  "read_bytes %zu != hdr_size %zu\n",
+				  peb_id, pages_off, read_bytes, hdr_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -ERANGE;
+	}
+
+	magic = (struct ssdfs_signature *)buf;
+
+	if (!is_ssdfs_magic_valid(magic)) {
+		if (!silent)
+			SSDFS_ERR("valid magic is not detected\n");
+		else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("valid magic is not detected\n");
+#else
+			SSDFS_DBG("valid magic is not detected\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+		return -ENODATA;
+	}
+
+	if (__is_ssdfs_segment_header_magic_valid(magic)) {
+		hdr = SSDFS_SEG_HDR(buf);
+
+		err = ssdfs_check_segment_header(fsi, hdr, silent);
+		if (err == -ENOENT) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("header has older FS creation time: "
+				  "peb_id %llu, pages_off %u, err %d\n",
+				  peb_id, pages_off, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		} else if (unlikely(err)) {
+			if (!silent) {
+				SSDFS_ERR("segment header is corrupted: "
+					  "peb_id %llu, pages_off %u, err %d\n",
+					  peb_id, pages_off, err);
+			} else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_ERR("segment header is corrupted: "
+					  "peb_id %llu, pages_off %u, err %d\n",
+					  peb_id, pages_off, err);
+#else
+				SSDFS_DBG("segment header is corrupted: "
+					  "peb_id %llu, pages_off %u, err %d\n",
+					  peb_id, pages_off, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+
+			return err;
+		}
+	} else if (is_ssdfs_partial_log_header_magic_valid(magic)) {
+		pl_hdr = SSDFS_PLH(buf);
+
+		err = ssdfs_check_partial_log_header(fsi, pl_hdr, silent);
+		if (err == -ENOENT) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("header has older FS creation time: "
+				  "peb_id %llu, pages_off %u, err %d\n",
+				  peb_id, pages_off, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		} else if (unlikely(err)) {
+			if (!silent) {
+				SSDFS_ERR("partial log header is corrupted: "
+					  "peb_id %llu, pages_off %u\n",
+					  peb_id, pages_off);
+			} else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_ERR("partial log header is corrupted: "
+					  "peb_id %llu, pages_off %u\n",
+					  peb_id, pages_off);
+#else
+				SSDFS_DBG("partial log header is corrupted: "
+					  "peb_id %llu, pages_off %u\n",
+					  peb_id, pages_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+
+			return err;
+		}
+	} else {
+		if (!silent) {
+			SSDFS_ERR("log header is corrupted: "
+				  "peb_id %llu, pages_off %u\n",
+				  peb_id, pages_off);
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_ERR("log header is corrupted: "
+				  "peb_id %llu, pages_off %u\n",
+				  peb_id, pages_off);
+#else
+			SSDFS_DBG("log header is corrupted: "
+				  "peb_id %llu, pages_off %u\n",
+				  peb_id, pages_off);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_create_volume_header() - initialize volume header from the scratch
+ * @fsi: pointer on shared file system object
+ * @vh: volume header
+ */
+void ssdfs_create_volume_header(struct ssdfs_fs_info *fsi,
+				struct ssdfs_volume_header *vh)
+{
+	u64 erase_size;
+	u32 megabytes_per_peb;
+	u32 flags;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !vh);
+
+	SSDFS_DBG("fsi %p, vh %p\n", fsi, vh);
+	SSDFS_DBG("fsi->log_pagesize %u, fsi->log_erasesize %u, "
+		  "fsi->log_segsize %u, fsi->log_pebs_per_seg %u\n",
+		  fsi->log_pagesize,
+		  fsi->log_erasesize,
+		  fsi->log_segsize,
+		  fsi->log_pebs_per_seg);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	vh->magic.common = cpu_to_le32(SSDFS_SUPER_MAGIC);
+	vh->magic.key = cpu_to_le16(SSDFS_SEGMENT_HDR_MAGIC);
+	vh->magic.version.major = SSDFS_MAJOR_REVISION;
+	vh->magic.version.minor = SSDFS_MINOR_REVISION;
+
+	vh->log_pagesize = fsi->log_pagesize;
+	vh->log_erasesize = fsi->log_erasesize;
+	vh->log_segsize = fsi->log_segsize;
+	vh->log_pebs_per_seg = fsi->log_pebs_per_seg;
+
+	megabytes_per_peb = fsi->erasesize / SSDFS_1MB;
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(megabytes_per_peb >= U16_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+	vh->megabytes_per_peb = cpu_to_le16((u16)megabytes_per_peb);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(fsi->pebs_per_seg >= U16_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+	vh->pebs_per_seg = cpu_to_le16((u16)fsi->pebs_per_seg);
+
+	vh->create_time = cpu_to_le64(fsi->fs_ctime);
+	vh->create_cno = cpu_to_le64(fsi->fs_cno);
+	ssdfs_memcpy(vh->uuid, 0, SSDFS_UUID_SIZE,
+		     fsi->fs_uuid, 0, SSDFS_UUID_SIZE,
+		     SSDFS_UUID_SIZE);
+
+	vh->lebs_per_peb_index = cpu_to_le32(fsi->lebs_per_peb_index);
+	vh->create_threads_per_seg = cpu_to_le16(fsi->create_threads_per_seg);
+
+	vh->flags = cpu_to_le32(0);
+
+	if (fsi->is_zns_device) {
+		flags = le32_to_cpu(vh->flags);
+		flags |= SSDFS_VH_ZNS_BASED_VOLUME;
+
+		erase_size = 1 << fsi->log_erasesize;
+		if (erase_size != fsi->zone_size)
+			flags |= SSDFS_VH_UNALIGNED_ZONE;
+
+		vh->flags = cpu_to_le32(flags);
+	}
+
+	vh->sb_seg_log_pages = cpu_to_le16(fsi->sb_seg_log_pages);
+	vh->segbmap_log_pages = cpu_to_le16(fsi->segbmap_log_pages);
+	vh->maptbl_log_pages = cpu_to_le16(fsi->maptbl_log_pages);
+	vh->lnodes_seg_log_pages = cpu_to_le16(fsi->lnodes_seg_log_pages);
+	vh->hnodes_seg_log_pages = cpu_to_le16(fsi->hnodes_seg_log_pages);
+	vh->inodes_seg_log_pages = cpu_to_le16(fsi->inodes_seg_log_pages);
+	vh->user_data_log_pages = cpu_to_le16(fsi->user_data_log_pages);
+
+	ssdfs_memcpy(&vh->segbmap,
+		     0, sizeof(struct ssdfs_segbmap_sb_header),
+		     &fsi->vh->segbmap,
+		     0, sizeof(struct ssdfs_segbmap_sb_header),
+		     sizeof(struct ssdfs_segbmap_sb_header));
+	ssdfs_memcpy(&vh->maptbl,
+		     0, sizeof(struct ssdfs_maptbl_sb_header),
+		     &fsi->vh->maptbl,
+		     0, sizeof(struct ssdfs_maptbl_sb_header),
+		     sizeof(struct ssdfs_maptbl_sb_header));
+	ssdfs_memcpy(&vh->dentries_btree,
+		     0, sizeof(struct ssdfs_dentries_btree_descriptor),
+		     &fsi->vh->dentries_btree,
+		     0, sizeof(struct ssdfs_dentries_btree_descriptor),
+		     sizeof(struct ssdfs_dentries_btree_descriptor));
+	ssdfs_memcpy(&vh->extents_btree,
+		     0, sizeof(struct ssdfs_extents_btree_descriptor),
+		     &fsi->vh->extents_btree,
+		     0, sizeof(struct ssdfs_extents_btree_descriptor),
+		     sizeof(struct ssdfs_extents_btree_descriptor));
+	ssdfs_memcpy(&vh->xattr_btree,
+		     0, sizeof(struct ssdfs_xattr_btree_descriptor),
+		     &fsi->vh->xattr_btree,
+		     0, sizeof(struct ssdfs_xattr_btree_descriptor),
+		     sizeof(struct ssdfs_xattr_btree_descriptor));
+	ssdfs_memcpy(&vh->invextree,
+		     0, sizeof(struct ssdfs_invalidated_extents_btree),
+		     &fsi->vh->invextree,
+		     0, sizeof(struct ssdfs_invalidated_extents_btree),
+		     sizeof(struct ssdfs_invalidated_extents_btree));
+}
+
+/*
+ * ssdfs_store_sb_segs_array() - store sb segments array
+ * @fsi: pointer on shared file system object
+ * @vh: volume header
+ */
+static inline
+void ssdfs_store_sb_segs_array(struct ssdfs_fs_info *fsi,
+				struct ssdfs_volume_header *vh)
+{
+	int i, j;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("fsi %p, vh %p\n", fsi, vh);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	down_read(&fsi->sb_segs_sem);
+
+	for (i = SSDFS_CUR_SB_SEG; i < SSDFS_SB_CHAIN_MAX; i++) {
+		for (j = SSDFS_MAIN_SB_SEG; j < SSDFS_SB_SEG_COPY_MAX; j++) {
+			vh->sb_pebs[i][j].leb_id =
+				cpu_to_le64(fsi->sb_lebs[i][j]);
+			vh->sb_pebs[i][j].peb_id =
+				cpu_to_le64(fsi->sb_pebs[i][j]);
+		}
+	}
+
+	up_read(&fsi->sb_segs_sem);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("sb_lebs[CUR][MAIN] %llu, sb_pebs[CUR][MAIN] %llu\n",
+		  fsi->sb_lebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG],
+		  fsi->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_MAIN_SB_SEG]);
+	SSDFS_DBG("sb_lebs[CUR][COPY] %llu, sb_pebs[CUR][COPY] %llu\n",
+		  fsi->sb_lebs[SSDFS_CUR_SB_SEG][SSDFS_COPY_SB_SEG],
+		  fsi->sb_pebs[SSDFS_CUR_SB_SEG][SSDFS_COPY_SB_SEG]);
+	SSDFS_DBG("sb_lebs[NEXT][MAIN] %llu, sb_pebs[NEXT][MAIN] %llu\n",
+		  fsi->sb_lebs[SSDFS_NEXT_SB_SEG][SSDFS_MAIN_SB_SEG],
+		  fsi->sb_pebs[SSDFS_NEXT_SB_SEG][SSDFS_MAIN_SB_SEG]);
+	SSDFS_DBG("sb_lebs[NEXT][COPY] %llu, sb_pebs[NEXT][COPY] %llu\n",
+		  fsi->sb_lebs[SSDFS_NEXT_SB_SEG][SSDFS_COPY_SB_SEG],
+		  fsi->sb_pebs[SSDFS_NEXT_SB_SEG][SSDFS_COPY_SB_SEG]);
+	SSDFS_DBG("sb_lebs[RESERVED][MAIN] %llu, sb_pebs[RESERVED][MAIN] %llu\n",
+		  fsi->sb_lebs[SSDFS_RESERVED_SB_SEG][SSDFS_MAIN_SB_SEG],
+		  fsi->sb_pebs[SSDFS_RESERVED_SB_SEG][SSDFS_MAIN_SB_SEG]);
+	SSDFS_DBG("sb_lebs[RESERVED][COPY] %llu, sb_pebs[RESERVED][COPY] %llu\n",
+		  fsi->sb_lebs[SSDFS_RESERVED_SB_SEG][SSDFS_COPY_SB_SEG],
+		  fsi->sb_pebs[SSDFS_RESERVED_SB_SEG][SSDFS_COPY_SB_SEG]);
+	SSDFS_DBG("sb_lebs[PREV][MAIN] %llu, sb_pebs[PREV][MAIN] %llu\n",
+		  fsi->sb_lebs[SSDFS_PREV_SB_SEG][SSDFS_MAIN_SB_SEG],
+		  fsi->sb_pebs[SSDFS_PREV_SB_SEG][SSDFS_MAIN_SB_SEG]);
+	SSDFS_DBG("sb_lebs[PREV][COPY] %llu, sb_pebs[PREV][COPY] %llu\n",
+		  fsi->sb_lebs[SSDFS_PREV_SB_SEG][SSDFS_COPY_SB_SEG],
+		  fsi->sb_pebs[SSDFS_PREV_SB_SEG][SSDFS_COPY_SB_SEG]);
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+/*
+ * ssdfs_prepare_volume_header_for_commit() - prepare volume header for commit
+ * @fsi: pointer on shared file system object
+ * @vh: volume header
+ */
+int ssdfs_prepare_volume_header_for_commit(struct ssdfs_fs_info *fsi,
+					   struct ssdfs_volume_header *vh)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	struct super_block *sb = fsi->sb;
+	u64 dev_size;
+
+	SSDFS_DBG("fsi %p, vh %p\n", fsi, vh);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_store_sb_segs_array(fsi, vh);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	dev_size = fsi->devops->device_size(sb);
+	if (!is_ssdfs_volume_header_consistent(fsi, vh, dev_size)) {
+		SSDFS_ERR("volume header is inconsistent\n");
+		return -EIO;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+/*
+ * ssdfs_prepare_segment_header_for_commit() - prepare segment header
+ * @fsi: pointer on shared file system object
+ * @seg_id: segment ID that contains this PEB
+ * @leb_id: LEB ID that mapped with this PEB
+ * @peb_id: PEB ID
+ * @relation_peb_id: source PEB ID during migration
+ * @log_pages: full log pages count
+ * @seg_type: segment type
+ * @seg_flags: segment flags
+ * @last_log_time: log creation time
+ * @last_log_cno: log checkpoint
+ * @hdr: segment header [out]
+ */
+int ssdfs_prepare_segment_header_for_commit(struct ssdfs_fs_info *fsi,
+					    u64 seg_id,
+					    u64 leb_id,
+					    u64 peb_id,
+					    u64 relation_peb_id,
+					    u32 log_pages,
+					    u16 seg_type,
+					    u32 seg_flags,
+					    u64 last_log_time,
+					    u64 last_log_cno,
+					    struct ssdfs_segment_header *hdr)
+{
+	u16 data_size = sizeof(struct ssdfs_segment_header);
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("fsi %p, hdr %p, "
+		  "seg_id %llu, leb_id %llu, "
+		  "peb_id %llu, relation_peb_id %llu, "
+		  "log_pages %u, seg_type %#x, seg_flags %#x\n",
+		  fsi, hdr, seg_id, leb_id, peb_id, relation_peb_id,
+		  log_pages, seg_type, seg_flags);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	hdr->timestamp = cpu_to_le64(last_log_time);
+	hdr->cno = cpu_to_le64(last_log_cno);
+
+	if (log_pages > fsi->pages_per_seg || log_pages > U16_MAX) {
+		SSDFS_ERR("invalid value of log_pages %u\n", log_pages);
+		return -EINVAL;
+	}
+
+	hdr->log_pages = cpu_to_le16((u16)log_pages);
+
+	if (seg_type == SSDFS_UNKNOWN_SEG_TYPE ||
+	    seg_type > SSDFS_LAST_KNOWN_SEG_TYPE) {
+		SSDFS_ERR("invalid value of seg_type %#x\n", seg_type);
+		return -EINVAL;
+	}
+
+	hdr->seg_type = cpu_to_le16(seg_type);
+	hdr->seg_flags = cpu_to_le32(seg_flags);
+
+	hdr->seg_id = cpu_to_le64(seg_id);
+	hdr->leb_id = cpu_to_le64(leb_id);
+	hdr->peb_id = cpu_to_le64(peb_id);
+	hdr->relation_peb_id = cpu_to_le64(relation_peb_id);
+
+	hdr->volume_hdr.check.bytes = cpu_to_le16(data_size);
+	hdr->volume_hdr.check.flags = cpu_to_le16(SSDFS_CRC32);
+
+	err = ssdfs_calculate_csum(&hdr->volume_hdr.check,
+				   hdr, data_size);
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to calculate checksum: err %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_prepare_partial_log_header_for_commit() - prepare partial log header
+ * @fsi: pointer on shared file system object
+ * @sequence_id: sequence ID of the partial log inside the full log
+ * @seg_id: segment ID that contains this PEB
+ * @leb_id: LEB ID that mapped with this PEB
+ * @peb_id: PEB ID
+ * @relation_peb_id: source PEB ID during migration
+ * @log_pages: log pages count
+ * @seg_type: segment type
+ * @pl_flags: partial log's flags
+ * @last_log_time: log creation time
+ * @last_log_cno: log checkpoint
+ * @hdr: partial log's header [out]
+ */
+int ssdfs_prepare_partial_log_header_for_commit(struct ssdfs_fs_info *fsi,
+					int sequence_id,
+					u64 seg_id,
+					u64 leb_id,
+					u64 peb_id,
+					u64 relation_peb_id,
+					u32 log_pages,
+					u16 seg_type,
+					u32 pl_flags,
+					u64 last_log_time,
+					u64 last_log_cno,
+					struct ssdfs_partial_log_header *hdr)
+{
+	u16 data_size = sizeof(struct ssdfs_partial_log_header);
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("fsi %p, hdr %p, sequence_id %d, "
+		  "seg_id %llu, leb_id %llu, "
+		  "peb_id %llu, relation_peb_id %llu, "
+		  "log_pages %u, seg_type %#x, pl_flags %#x\n",
+		  fsi, hdr, sequence_id,
+		  seg_id, leb_id, peb_id, relation_peb_id,
+		  log_pages, seg_type, pl_flags);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	hdr->magic.common = cpu_to_le32(SSDFS_SUPER_MAGIC);
+	hdr->magic.key = cpu_to_le16(SSDFS_PARTIAL_LOG_HDR_MAGIC);
+	hdr->magic.version.major = SSDFS_MAJOR_REVISION;
+	hdr->magic.version.minor = SSDFS_MINOR_REVISION;
+
+	hdr->timestamp = cpu_to_le64(last_log_time);
+	hdr->cno = cpu_to_le64(last_log_cno);
+
+	if (log_pages > fsi->pages_per_seg || log_pages > U16_MAX) {
+		SSDFS_ERR("invalid value of log_pages %u\n", log_pages);
+		return -EINVAL;
+	}
+
+	hdr->log_pages = cpu_to_le16((u16)log_pages);
+	hdr->log_bytes = cpu_to_le32(log_pages << fsi->log_pagesize);
+
+	if (seg_type == SSDFS_UNKNOWN_SEG_TYPE ||
+	    seg_type > SSDFS_LAST_KNOWN_SEG_TYPE) {
+		SSDFS_ERR("invalid value of seg_type %#x\n", seg_type);
+		return -EINVAL;
+	}
+
+	hdr->seg_type = cpu_to_le16(seg_type);
+	hdr->pl_flags = cpu_to_le32(pl_flags);
+
+	spin_lock(&fsi->volume_state_lock);
+	hdr->free_pages = cpu_to_le64(fsi->free_pages);
+	hdr->flags = cpu_to_le32(fsi->fs_flags);
+	spin_unlock(&fsi->volume_state_lock);
+
+	mutex_lock(&fsi->resize_mutex);
+	hdr->nsegs = cpu_to_le64(fsi->nsegs);
+	mutex_unlock(&fsi->resize_mutex);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("seg_id %llu, peb_id %llu, "
+		  "volume_free_pages %llu, nsegs %llu\n",
+		  seg_id, peb_id,
+		  le64_to_cpu(hdr->free_pages),
+		  le64_to_cpu(hdr->nsegs));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_memcpy(&hdr->root_folder,
+		     0, sizeof(struct ssdfs_inode),
+		     &fsi->vs->root_folder,
+		     0, sizeof(struct ssdfs_inode),
+		     sizeof(struct ssdfs_inode));
+
+	ssdfs_memcpy(&hdr->inodes_btree,
+		     0, sizeof(struct ssdfs_inodes_btree),
+		     &fsi->vs->inodes_btree,
+		     0, sizeof(struct ssdfs_inodes_btree),
+		     sizeof(struct ssdfs_inodes_btree));
+	ssdfs_memcpy(&hdr->shared_extents_btree,
+		     0, sizeof(struct ssdfs_shared_extents_btree),
+		     &fsi->vs->shared_extents_btree,
+		     0, sizeof(struct ssdfs_shared_extents_btree),
+		     sizeof(struct ssdfs_shared_extents_btree));
+	ssdfs_memcpy(&hdr->shared_dict_btree,
+		     0, sizeof(struct ssdfs_shared_dictionary_btree),
+		     &fsi->vs->shared_dict_btree,
+		     0, sizeof(struct ssdfs_shared_dictionary_btree),
+		     sizeof(struct ssdfs_shared_dictionary_btree));
+	ssdfs_memcpy(&hdr->snapshots_btree,
+		     0, sizeof(struct ssdfs_snapshots_btree),
+		     &fsi->vs->snapshots_btree,
+		     0, sizeof(struct ssdfs_snapshots_btree),
+		     sizeof(struct ssdfs_snapshots_btree));
+	ssdfs_memcpy(&hdr->invextree,
+		     0, sizeof(struct ssdfs_invalidated_extents_btree),
+		     &fsi->vh->invextree,
+		     0, sizeof(struct ssdfs_invalidated_extents_btree),
+		     sizeof(struct ssdfs_invalidated_extents_btree));
+
+	hdr->sequence_id = cpu_to_le32(sequence_id);
+
+	hdr->log_pagesize = fsi->log_pagesize;
+	hdr->log_erasesize = fsi->log_erasesize;
+	hdr->log_segsize = fsi->log_segsize;
+	hdr->log_pebs_per_seg = fsi->log_pebs_per_seg;
+	hdr->lebs_per_peb_index = cpu_to_le32(fsi->lebs_per_peb_index);
+	hdr->create_threads_per_seg = cpu_to_le16(fsi->create_threads_per_seg);
+
+	hdr->open_zones = cpu_to_le32(atomic_read(&fsi->open_zones));
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("open_zones %d\n",
+		  atomic_read(&fsi->open_zones));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	hdr->seg_id = cpu_to_le64(seg_id);
+	hdr->leb_id = cpu_to_le64(leb_id);
+	hdr->peb_id = cpu_to_le64(peb_id);
+	hdr->relation_peb_id = cpu_to_le64(relation_peb_id);
+
+	hdr->volume_create_time = cpu_to_le64(fsi->fs_ctime);
+	ssdfs_memcpy(hdr->uuid, 0, SSDFS_UUID_SIZE,
+		     fsi->fs_uuid, 0, SSDFS_UUID_SIZE,
+		     SSDFS_UUID_SIZE);
+
+	hdr->check.bytes = cpu_to_le16(data_size);
+	hdr->check.flags = cpu_to_le16(SSDFS_CRC32);
+
+	err = ssdfs_calculate_csum(&hdr->check,
+				   hdr, data_size);
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to calculate checksum: err %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 15/79] ssdfs: introduce PEB's block bitmap
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (6 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 08/79] ssdfs: segment header + log footer operations Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 17/79] ssdfs: implement support of migration scheme in PEB bitmap Viacheslav Dubeyko
                   ` (24 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

SSDFS splits a partition/volume on sequence of fixed-sized
segments. Every segment can include one or several Logical
Erase Blocks (LEB). LEB can be mapped into "Physical" Erase
Block (PEB). Generally speaking, PEB is fixed-sized container
that include some number of logical blocks (or NAND flash
pages). PEB has block bitmap with the goal to track the state
(free, pre-allocated, allocated, invalid) of logical blocks
and to account the physical space is used for storing log's
metadata (segment header, partial log header, footer).

Block bitmap implements API:
(1) create - create empty block bitmap
(2) destroy - destroy block bitmap object
(3) init - intialize block bitmap by metadata from PEB's log
(4) snapshot - take block bitmap snapshot for flush operation
(5) forget_snapshot - free block bitmap's snapshot resources
(6) lock/unlock - lock/unlock block bitmap
(7) test_block/test_range - check state of block or range of blocks
(8) get_free_pages - get number of free pages
(9) get_used_pages - get number of used pages
(10) get_invalid_pages - get number of invalid pages
(11) pre_allocate - pre_allocate logical block or range of blocks
(12) allocate - allocate logical block or range of blocks
(13) invalidate - invalidate logical block or range of blocks
(14) collect_garbage - get contigous range of blocks in state
(15) clean - convert the whole block bitmap into clean state

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/block_bitmap.h        | 393 +++++++++++++++++++++++++++++++++
 fs/ssdfs/block_bitmap_tables.c | 311 ++++++++++++++++++++++++++
 fs/ssdfs/common_bitmap.h       | 230 +++++++++++++++++++
 3 files changed, 934 insertions(+)
 create mode 100644 fs/ssdfs/block_bitmap.h
 create mode 100644 fs/ssdfs/block_bitmap_tables.c
 create mode 100644 fs/ssdfs/common_bitmap.h

diff --git a/fs/ssdfs/block_bitmap.h b/fs/ssdfs/block_bitmap.h
new file mode 100644
index 000000000000..a5aec5b1ec20
--- /dev/null
+++ b/fs/ssdfs/block_bitmap.h
@@ -0,0 +1,393 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/block_bitmap.h - PEB's block bitmap declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_BLOCK_BITMAP_H
+#define _SSDFS_BLOCK_BITMAP_H
+
+#include "common_bitmap.h"
+
+#define SSDFS_BLK_STATE_BITS	2
+#define SSDFS_BLK_STATE_MASK	0x3
+
+enum {
+	SSDFS_BLK_FREE		= 0x0,
+	SSDFS_BLK_PRE_ALLOCATED	= 0x1,
+	SSDFS_BLK_VALID		= 0x3,
+	SSDFS_BLK_INVALID	= 0x2,
+	SSDFS_BLK_STATE_MAX	= SSDFS_BLK_VALID + 1,
+};
+
+#define SSDFS_BLK_BMAP_CAPACITY_MAX	U16_MAX
+
+#define SSDFS_FREE_STATES_BYTE		0x00
+#define SSDFS_PRE_ALLOC_STATES_BYTE	0x55
+#define SSDFS_VALID_STATES_BYTE		0xFF
+#define SSDFS_INVALID_STATES_BYTE	0xAA
+
+#define SSDFS_BLK_BMAP_BYTE(blk_state)({ \
+	u8 value; \
+	switch (blk_state) { \
+	case SSDFS_BLK_FREE: \
+		value = SSDFS_FREE_STATES_BYTE; \
+		break; \
+	case SSDFS_BLK_PRE_ALLOCATED: \
+		value = SSDFS_PRE_ALLOC_STATES_BYTE; \
+		break; \
+	case SSDFS_BLK_VALID: \
+		value = SSDFS_VALID_STATES_BYTE; \
+		break; \
+	case SSDFS_BLK_INVALID: \
+		value = SSDFS_INVALID_STATES_BYTE; \
+		break; \
+	default: \
+		BUG(); \
+	}; \
+	value; \
+})
+
+#define BLK_BMAP_BYTES(items_count) \
+	(((items_count + SSDFS_ITEMS_PER_LONG(SSDFS_BLK_STATE_BITS) - 1)  / \
+	 SSDFS_ITEMS_PER_LONG(SSDFS_BLK_STATE_BITS)) * sizeof(unsigned long))
+
+static inline
+int SSDFS_BLK2FOLIO(u32 blk, u8 item_bits, u16 *offset)
+{
+	u32 blks_per_byte = SSDFS_ITEMS_PER_BYTE(item_bits);
+	u32 blks_per_long = SSDFS_ITEMS_PER_LONG(item_bits);
+	u32 blks_per_folio = PAGE_SIZE * blks_per_byte;
+	u32 off;
+
+	if (offset) {
+		off = (blk % blks_per_folio) / blks_per_long;
+		off *= sizeof(unsigned long);
+		BUG_ON(off >= U16_MAX);
+		*offset = off;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("blk %u, item_bits %u, blks_per_byte %u, "
+		  "blks_per_long %u, blks_per_folio %u, "
+		  "folio_index %u\n",
+		  blk, item_bits, blks_per_byte,
+		  blks_per_long, blks_per_folio,
+		  blk / blks_per_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return blk / blks_per_folio;
+}
+
+/*
+ * struct ssdfs_last_bmap_search - last search in bitmap
+ * @folio_index: index of folio in folio vector
+ * @offset: offset of cache from page's begining
+ * @cache: cached bmap's part
+ */
+struct ssdfs_last_bmap_search {
+	int folio_index;
+	u16 offset;
+	unsigned long cache;
+};
+
+static inline
+u32 SSDFS_FIRST_CACHED_BLOCK(struct ssdfs_last_bmap_search *search)
+{
+	u32 blks_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_BLK_STATE_BITS);
+	u32 blks_per_folio = PAGE_SIZE * blks_per_byte;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio_index %d, offset %u, "
+		  "blks_per_byte %u, blks_per_folio %u\n",
+		  search->folio_index,
+		  search->offset,
+		  blks_per_byte, blks_per_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return (search->folio_index * blks_per_folio) +
+		(search->offset * blks_per_byte);
+}
+
+enum {
+	SSDFS_FREE_BLK_SEARCH,
+	SSDFS_VALID_BLK_SEARCH,
+	SSDFS_OTHER_BLK_SEARCH,
+	SSDFS_SEARCH_TYPE_MAX,
+};
+
+static inline
+int SSDFS_GET_CACHE_TYPE(int blk_state)
+{
+	switch (blk_state) {
+	case SSDFS_BLK_FREE:
+		return SSDFS_FREE_BLK_SEARCH;
+
+	case SSDFS_BLK_VALID:
+		return SSDFS_VALID_BLK_SEARCH;
+
+	case SSDFS_BLK_PRE_ALLOCATED:
+	case SSDFS_BLK_INVALID:
+		return SSDFS_OTHER_BLK_SEARCH;
+	};
+
+	return SSDFS_SEARCH_TYPE_MAX;
+}
+
+#define SSDFS_BLK_BMAP_INITIALIZED	(1 << 0)
+#define SSDFS_BLK_BMAP_DIRTY		(1 << 1)
+
+/*
+ * struct ssdfs_block_bmap_storage - block bitmap's storage
+ * @state: storage state
+ * @array: vector of folios
+ * @buf: pointer on memory buffer
+ */
+struct ssdfs_block_bmap_storage {
+	int state;
+	struct ssdfs_folio_vector array;
+	void *buf;
+};
+
+/* Block bitmap's storage's states */
+enum {
+	SSDFS_BLOCK_BMAP_STORAGE_ABSENT,
+	SSDFS_BLOCK_BMAP_STORAGE_FOLIO_VEC,
+	SSDFS_BLOCK_BMAP_STORAGE_BUFFER,
+	SSDFS_BLOCK_BMAP_STORAGE_STATE_MAX
+};
+
+/*
+ * struct ssdfs_block_bmap - in-core segment's block bitmap
+ * @lock: block bitmap lock
+ * @flags: block bitmap state flags
+ * @storage: block bitmap's storage
+ * @bytes_count: block bitmap size in bytes
+ * @items_capacity: total available items in bitmap
+ * @allocation_pool: number of items available for allocation
+ * @metadata_items: count of metadata items
+ * @used_blks: count of valid blocks
+ * @invalid_blks: count of invalid blocks
+ * @last_search: last search/access cache array
+ */
+struct ssdfs_block_bmap {
+	struct mutex lock;
+	atomic_t flags;
+	struct ssdfs_block_bmap_storage storage;
+	size_t bytes_count;
+	size_t items_capacity;
+	size_t allocation_pool;
+	u32 metadata_items;
+	u32 used_blks;
+	u32 invalid_blks;
+	struct ssdfs_last_bmap_search last_search[SSDFS_SEARCH_TYPE_MAX];
+};
+
+/*
+ * compare_block_bmap_ranges() - compare two ranges
+ * @range1: left range
+ * @range2: right range
+ *
+ * RETURN:
+ *  0: range1 == range2
+ * -1: range1 < range2
+ *  1: range1 > range2
+ */
+static inline
+int compare_block_bmap_ranges(struct ssdfs_block_bmap_range *range1,
+				struct ssdfs_block_bmap_range *range2)
+{
+	u32 range1_end, range2_end;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!range1 || !range2);
+
+	SSDFS_DBG("range1 (start %u, len %u), range2 (start %u, len %u)\n",
+		  range1->start, range1->len, range2->start, range2->len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (range1->start == range2->start) {
+		if (range1->len == range2->len)
+			return 0;
+		else if (range1->len < range2->len)
+			return -1;
+		else
+			return 1;
+	} else if (range1->start < range2->start) {
+		range1_end = range1->start + range1->len;
+		range2_end = range2->start + range2->len;
+
+		if (range2_end <= range1_end)
+			return 1;
+		else
+			return -1;
+	}
+
+	/* range1->start > range2->start */
+	return -1;
+}
+
+/*
+ * ranges_have_intersection() - have ranges intersection?
+ * @range1: left range
+ * @range2: right range
+ *
+ * RETURN:
+ * [true]  - ranges have intersection
+ * [false] - ranges doesn't intersect
+ */
+static inline
+bool ranges_have_intersection(struct ssdfs_block_bmap_range *range1,
+				struct ssdfs_block_bmap_range *range2)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!range1 || !range2);
+
+	SSDFS_DBG("range1 (start %u, len %u), range2 (start %u, len %u)\n",
+		  range1->start, range1->len, range2->start, range2->len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if ((range2->start + range2->len) <= range1->start)
+		return false;
+	else if ((range1->start + range1->len) <= range2->start)
+		return false;
+
+	return true;
+}
+
+enum {
+	SSDFS_BLK_BMAP_CREATE,
+	SSDFS_BLK_BMAP_INIT,
+};
+
+/* Function prototypes */
+int ssdfs_block_bmap_create(struct ssdfs_fs_info *fsi,
+			    struct ssdfs_block_bmap *bmap,
+			    u32 items_capacity,
+			    u32 allocation_pool,
+			    int flag, int init_state);
+int ssdfs_block_bmap_destroy(struct ssdfs_block_bmap *blk_bmap);
+int ssdfs_block_bmap_init(struct ssdfs_block_bmap *blk_bmap,
+			  struct ssdfs_folio_vector *source,
+			  u8 flags,
+			  u32 last_free_blk,
+			  u32 metadata_blks,
+			  u32 invalid_blks,
+			  u32 bmap_bytes);
+int ssdfs_block_bmap_inflate(struct ssdfs_block_bmap *blk_bmap,
+			     u32 free_items);
+int ssdfs_block_bmap_correct_capacity(struct ssdfs_block_bmap *blk_bmap,
+					struct ssdfs_block_bmap_range *range);
+int ssdfs_block_bmap_snapshot(struct ssdfs_block_bmap *blk_bmap,
+				struct ssdfs_folio_vector *snapshot,
+				u32 *last_free_page,
+				u32 *metadata_blks,
+				u32 *invalid_blks,
+				size_t *items_capacity,
+				size_t *bytes_count);
+void ssdfs_block_bmap_forget_snapshot(struct ssdfs_folio_vector *snapshot);
+
+int ssdfs_block_bmap_lock(struct ssdfs_block_bmap *blk_bmap);
+bool ssdfs_block_bmap_is_locked(struct ssdfs_block_bmap *blk_bmap);
+void ssdfs_block_bmap_unlock(struct ssdfs_block_bmap *blk_bmap);
+
+bool ssdfs_block_bmap_dirtied(struct ssdfs_block_bmap *blk_bmap);
+void ssdfs_block_bmap_set_dirty_state(struct ssdfs_block_bmap *blk_bmap);
+void ssdfs_block_bmap_clear_dirty_state(struct ssdfs_block_bmap *blk_bmap);
+bool ssdfs_block_bmap_initialized(struct ssdfs_block_bmap *blk_bmap);
+void ssdfs_set_block_bmap_initialized(struct ssdfs_block_bmap *blk_bmap);
+
+bool ssdfs_block_bmap_test_block(struct ssdfs_block_bmap *blk_bmap,
+				 u32 blk, int blk_state);
+bool ssdfs_block_bmap_test_range(struct ssdfs_block_bmap *blk_bmap,
+				 struct ssdfs_block_bmap_range *range,
+				 int blk_state);
+int ssdfs_get_block_state(struct ssdfs_block_bmap *blk_bmap, u32 blk);
+int ssdfs_get_range_state(struct ssdfs_block_bmap *blk_bmap,
+			  struct ssdfs_block_bmap_range *range);
+int ssdfs_block_bmap_reserve_metadata_pages(struct ssdfs_block_bmap *blk_bmap,
+					    u32 count);
+int ssdfs_block_bmap_free_metadata_pages(struct ssdfs_block_bmap *blk_bmap,
+					 u32 count, u32 *freed_items);
+int ssdfs_block_bmap_get_free_pages(struct ssdfs_block_bmap *blk_bmap);
+int ssdfs_block_bmap_get_used_pages(struct ssdfs_block_bmap *blk_bmap);
+int ssdfs_block_bmap_get_invalid_pages(struct ssdfs_block_bmap *blk_bmap);
+int ssdfs_block_bmap_get_pages_capacity(struct ssdfs_block_bmap *blk_bmap);
+int ssdfs_block_bmap_get_allocation_pool(struct ssdfs_block_bmap *blk_bmap);
+int ssdfs_block_bmap_get_metadata_pages(struct ssdfs_block_bmap *blk_bmap);
+int ssdfs_block_bmap_pre_allocate(struct ssdfs_block_bmap *blk_bmap,
+				  u32 start, u32 *len,
+				  struct ssdfs_block_bmap_range *range);
+int ssdfs_block_bmap_allocate(struct ssdfs_block_bmap *blk_bmap,
+				u32 start, u32 *len,
+				struct ssdfs_block_bmap_range *range);
+int ssdfs_block_bmap_invalidate(struct ssdfs_block_bmap *blk_bmap,
+				struct ssdfs_block_bmap_range *range);
+int ssdfs_block_bmap_collect_garbage(struct ssdfs_block_bmap *blk_bmap,
+				     u32 start, u32 max_len,
+				     int blk_state,
+				     struct ssdfs_block_bmap_range *range);
+int ssdfs_block_bmap_clean(struct ssdfs_block_bmap *blk_bmap,
+			   size_t items_capacity);
+int ssdfs_block_bmap_invalid2clean(struct ssdfs_block_bmap *blk_bmap);
+
+#if IS_ENABLED(CONFIG_KUNIT)
+bool BLK_BMAP_BYTE_CONTAINS_STATE(u8 *value, int blk_state);
+#endif
+
+#define SSDFS_BLK_BMAP_FNS(state, name)					\
+static inline								\
+bool is_block_##name(struct ssdfs_block_bmap *blk_bmap, u32 blk)	\
+{									\
+	return ssdfs_block_bmap_test_block(blk_bmap, blk,		\
+					    SSDFS_BLK_##state);		\
+}									\
+static inline								\
+bool is_range_##name(struct ssdfs_block_bmap *blk_bmap,			\
+			struct ssdfs_block_bmap_range *range)		\
+{									\
+	return ssdfs_block_bmap_test_range(blk_bmap, range,		\
+					    SSDFS_BLK_##state);		\
+}									\
+
+/*
+ * is_block_free()
+ * is_range_free()
+ */
+SSDFS_BLK_BMAP_FNS(FREE, free)
+
+/*
+ * is_block_pre_allocated()
+ * is_range_pre_allocated()
+ */
+SSDFS_BLK_BMAP_FNS(PRE_ALLOCATED, pre_allocated)
+
+/*
+ * is_block_valid()
+ * is_range_valid()
+ */
+SSDFS_BLK_BMAP_FNS(VALID, valid)
+
+/*
+ * is_block_invalid()
+ * is_range_invalid()
+ */
+SSDFS_BLK_BMAP_FNS(INVALID, invalid)
+
+#endif /* _SSDFS_BLOCK_BITMAP_H */
diff --git a/fs/ssdfs/block_bitmap_tables.c b/fs/ssdfs/block_bitmap_tables.c
new file mode 100644
index 000000000000..7e6b7da5daaa
--- /dev/null
+++ b/fs/ssdfs/block_bitmap_tables.c
@@ -0,0 +1,311 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/block_bitmap_tables.c - declaration of block bitmap's search tables.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/kernel.h>
+
+/*
+ * Table for determination presence of free block
+ * state in provided byte. Checking byte is used
+ * as index in array.
+ */
+const bool detect_free_blk[U8_MAX + 1] = {
+/* 00 - 0x00 */	true, true, true, true,
+/* 01 - 0x04 */	true, true, true, true,
+/* 02 - 0x08 */	true, true, true, true,
+/* 03 - 0x0C */	true, true, true, true,
+/* 04 - 0x10 */	true, true, true, true,
+/* 05 - 0x14 */	true, true, true, true,
+/* 06 - 0x18 */	true, true, true, true,
+/* 07 - 0x1C */	true, true, true, true,
+/* 08 - 0x20 */	true, true, true, true,
+/* 09 - 0x24 */	true, true, true, true,
+/* 10 - 0x28 */	true, true, true, true,
+/* 11 - 0x2C */	true, true, true, true,
+/* 12 - 0x30 */	true, true, true, true,
+/* 13 - 0x34 */	true, true, true, true,
+/* 14 - 0x38 */	true, true, true, true,
+/* 15 - 0x3C */	true, true, true, true,
+/* 16 - 0x40 */	true, true, true, true,
+/* 17 - 0x44 */	true, true, true, true,
+/* 18 - 0x48 */	true, true, true, true,
+/* 19 - 0x4C */	true, true, true, true,
+/* 20 - 0x50 */	true, true, true, true,
+/* 21 - 0x54 */	true, false, false, false,
+/* 22 - 0x58 */	true, false, false, false,
+/* 23 - 0x5C */	true, false, false, false,
+/* 24 - 0x60 */	true, true, true, true,
+/* 25 - 0x64 */	true, false, false, false,
+/* 26 - 0x68 */	true, false, false, false,
+/* 27 - 0x6C */	true, false, false, false,
+/* 28 - 0x70 */	true, true, true, true,
+/* 29 - 0x74 */	true, false, false, false,
+/* 30 - 0x78 */	true, false, false, false,
+/* 31 - 0x7C */	true, false, false, false,
+/* 32 - 0x80 */	true, true, true, true,
+/* 33 - 0x84 */	true, true, true, true,
+/* 34 - 0x88 */	true, true, true, true,
+/* 35 - 0x8C */	true, true, true, true,
+/* 36 - 0x90 */	true, true, true, true,
+/* 37 - 0x94 */	true, false, false, false,
+/* 38 - 0x98 */	true, false, false, false,
+/* 39 - 0x9C */	true, false, false, false,
+/* 40 - 0xA0 */	true, true, true, true,
+/* 41 - 0xA4 */	true, false, false, false,
+/* 42 - 0xA8 */	true, false, false, false,
+/* 43 - 0xAC */	true, false, false, false,
+/* 44 - 0xB0 */	true, true, true, true,
+/* 45 - 0xB4 */	true, false, false, false,
+/* 46 - 0xB8 */	true, false, false, false,
+/* 47 - 0xBC */	true, false, false, false,
+/* 48 - 0xC0 */	true, true, true, true,
+/* 49 - 0xC4 */	true, true, true, true,
+/* 50 - 0xC8 */	true, true, true, true,
+/* 51 - 0xCC */	true, true, true, true,
+/* 52 - 0xD0 */	true, true, true, true,
+/* 53 - 0xD4 */	true, false, false, false,
+/* 54 - 0xD8 */	true, false, false, false,
+/* 55 - 0xDC */	true, false, false, false,
+/* 56 - 0xE0 */	true, true, true, true,
+/* 57 - 0xE4 */	true, false, false, false,
+/* 58 - 0xE8 */	true, false, false, false,
+/* 59 - 0xEC */	true, false, false, false,
+/* 60 - 0xF0 */	true, true, true, true,
+/* 61 - 0xF4 */	true, false, false, false,
+/* 62 - 0xF8 */	true, false, false, false,
+/* 63 - 0xFC */	true, false, false, false
+};
+
+/*
+ * Table for determination presence of pre-allocated
+ * block state in provided byte. Checking byte is used
+ * as index in array.
+ */
+const bool detect_pre_allocated_blk[U8_MAX + 1] = {
+/* 00 - 0x00 */	false, true, false, false,
+/* 01 - 0x04 */	true, true, true, true,
+/* 02 - 0x08 */	false, true, false, false,
+/* 03 - 0x0C */	false, true, false, false,
+/* 04 - 0x10 */	true, true, true, true,
+/* 05 - 0x14 */	true, true, true, true,
+/* 06 - 0x18 */	true, true, true, true,
+/* 07 - 0x1C */	true, true, true, true,
+/* 08 - 0x20 */	false, true, false, false,
+/* 09 - 0x24 */	true, true, true, true,
+/* 10 - 0x28 */	false, true, false, false,
+/* 11 - 0x2C */	false, true, false, false,
+/* 12 - 0x30 */	false, true, false, false,
+/* 13 - 0x34 */	true, true, true, true,
+/* 14 - 0x38 */	false, true, false, false,
+/* 15 - 0x3C */	false, true, false, false,
+/* 16 - 0x40 */	true, true, true, true,
+/* 17 - 0x44 */	true, true, true, true,
+/* 18 - 0x48 */	true, true, true, true,
+/* 19 - 0x4C */	true, true, true, true,
+/* 20 - 0x50 */	true, true, true, true,
+/* 21 - 0x54 */	true, true, true, true,
+/* 22 - 0x58 */	true, true, true, true,
+/* 23 - 0x5C */	true, true, true, true,
+/* 24 - 0x60 */	true, true, true, true,
+/* 25 - 0x64 */	true, true, true, true,
+/* 26 - 0x68 */	true, true, true, true,
+/* 27 - 0x6C */	true, true, true, true,
+/* 28 - 0x70 */	true, true, true, true,
+/* 29 - 0x74 */	true, true, true, true,
+/* 30 - 0x78 */	true, true, true, true,
+/* 31 - 0x7C */	true, true, true, true,
+/* 32 - 0x80 */	false, true, false, false,
+/* 33 - 0x84 */	true, true, true, true,
+/* 34 - 0x88 */	false, true, false, false,
+/* 35 - 0x8C */	false, true, false, false,
+/* 36 - 0x90 */	true, true, true, true,
+/* 37 - 0x94 */	true, true, true, true,
+/* 38 - 0x98 */	true, true, true, true,
+/* 39 - 0x9C */	true, true, true, true,
+/* 40 - 0xA0 */	false, true, false, false,
+/* 41 - 0xA4 */	true, true, true, true,
+/* 42 - 0xA8 */	false, true, false, false,
+/* 43 - 0xAC */	false, true, false, false,
+/* 44 - 0xB0 */	false, true, false, false,
+/* 45 - 0xB4 */	true, true, true, true,
+/* 46 - 0xB8 */	false, true, false, false,
+/* 47 - 0xBC */	false, true, false, false,
+/* 48 - 0xC0 */	false, true, false, false,
+/* 49 - 0xC4 */	true, true, true, true,
+/* 50 - 0xC8 */	false, true, false, false,
+/* 51 - 0xCC */	false, true, false, false,
+/* 52 - 0xD0 */	true, true, true, true,
+/* 53 - 0xD4 */	true, true, true, true,
+/* 54 - 0xD8 */	true, true, true, true,
+/* 55 - 0xDC */	true, true, true, true,
+/* 56 - 0xE0 */	false, true, false, false,
+/* 57 - 0xE4 */	true, true, true, true,
+/* 58 - 0xE8 */	false, true, false, false,
+/* 59 - 0xEC */	false, true, false, false,
+/* 60 - 0xF0 */	false, true, false, false,
+/* 61 - 0xF4 */	true, true, true, true,
+/* 62 - 0xF8 */	false, true, false, false,
+/* 63 - 0xFC */	false, true, false, false
+};
+
+/*
+ * Table for determination presence of valid block
+ * state in provided byte. Checking byte is used
+ * as index in array.
+ */
+const bool detect_valid_blk[U8_MAX + 1] = {
+/* 00 - 0x00 */	false, false, false, true,
+/* 01 - 0x04 */	false, false, false, true,
+/* 02 - 0x08 */	false, false, false, true,
+/* 03 - 0x0C */	true, true, true, true,
+/* 04 - 0x10 */	false, false, false, true,
+/* 05 - 0x14 */	false, false, false, true,
+/* 06 - 0x18 */	false, false, false, true,
+/* 07 - 0x1C */	true, true, true, true,
+/* 08 - 0x20 */	false, false, false, true,
+/* 09 - 0x24 */	false, false, false, true,
+/* 10 - 0x28 */	false, false, false, true,
+/* 11 - 0x2C */	true, true, true, true,
+/* 12 - 0x30 */	true, true, true, true,
+/* 13 - 0x34 */	true, true, true, true,
+/* 14 - 0x38 */	true, true, true, true,
+/* 15 - 0x3C */	true, true, true, true,
+/* 16 - 0x40 */	false, false, false, true,
+/* 17 - 0x44 */	false, false, false, true,
+/* 18 - 0x48 */	false, false, false, true,
+/* 19 - 0x4C */	true, true, true, true,
+/* 20 - 0x50 */	false, false, false, true,
+/* 21 - 0x54 */	false, false, false, true,
+/* 22 - 0x58 */	false, false, false, true,
+/* 23 - 0x5C */	true, true, true, true,
+/* 24 - 0x60 */	false, false, false, true,
+/* 25 - 0x64 */	false, false, false, true,
+/* 26 - 0x68 */	false, false, false, true,
+/* 27 - 0x6C */	true, true, true, true,
+/* 28 - 0x70 */	true, true, true, true,
+/* 29 - 0x74 */	true, true, true, true,
+/* 30 - 0x78 */	true, true, true, true,
+/* 31 - 0x7C */	true, true, true, true,
+/* 32 - 0x80 */	false, false, false, true,
+/* 33 - 0x84 */	false, false, false, true,
+/* 34 - 0x88 */	false, false, false, true,
+/* 35 - 0x8C */	true, true, true, true,
+/* 36 - 0x90 */	false, false, false, true,
+/* 37 - 0x94 */	false, false, false, true,
+/* 38 - 0x98 */	false, false, false, true,
+/* 39 - 0x9C */	true, true, true, true,
+/* 40 - 0xA0 */	false, false, false, true,
+/* 41 - 0xA4 */	false, false, false, true,
+/* 42 - 0xA8 */	false, false, false, true,
+/* 43 - 0xAC */	true, true, true, true,
+/* 44 - 0xB0 */	true, true, true, true,
+/* 45 - 0xB4 */	true, true, true, true,
+/* 46 - 0xB8 */	true, true, true, true,
+/* 47 - 0xBC */	true, true, true, true,
+/* 48 - 0xC0 */	true, true, true, true,
+/* 49 - 0xC4 */	true, true, true, true,
+/* 50 - 0xC8 */	true, true, true, true,
+/* 51 - 0xCC */	true, true, true, true,
+/* 52 - 0xD0 */	true, true, true, true,
+/* 53 - 0xD4 */	true, true, true, true,
+/* 54 - 0xD8 */	true, true, true, true,
+/* 55 - 0xDC */	true, true, true, true,
+/* 56 - 0xE0 */	true, true, true, true,
+/* 57 - 0xE4 */	true, true, true, true,
+/* 58 - 0xE8 */	true, true, true, true,
+/* 59 - 0xEC */	true, true, true, true,
+/* 60 - 0xF0 */	true, true, true, true,
+/* 61 - 0xF4 */	true, true, true, true,
+/* 62 - 0xF8 */	true, true, true, true,
+/* 63 - 0xFC */	true, true, true, true
+};
+
+/*
+ * Table for determination presence of invalid block
+ * state in provided byte. Checking byte is used
+ * as index in array.
+ */
+const bool detect_invalid_blk[U8_MAX + 1] = {
+/* 00 - 0x00 */	false, false, true, false,
+/* 01 - 0x04 */	false, false, true, false,
+/* 02 - 0x08 */	true, true, true, true,
+/* 03 - 0x0C */	false, false, true, false,
+/* 04 - 0x10 */	false, false, true, false,
+/* 05 - 0x14 */	false, false, true, false,
+/* 06 - 0x18 */	true, true, true, true,
+/* 07 - 0x1C */	false, false, true, false,
+/* 08 - 0x20 */	true, true, true, true,
+/* 09 - 0x24 */	true, true, true, true,
+/* 10 - 0x28 */	true, true, true, true,
+/* 11 - 0x2C */	true, true, true, true,
+/* 12 - 0x30 */	false, false, true, false,
+/* 13 - 0x34 */	false, false, true, false,
+/* 14 - 0x38 */	true, true, true, true,
+/* 15 - 0x3C */	false, false, true, false,
+/* 16 - 0x40 */	false, false, true, false,
+/* 17 - 0x44 */	false, false, true, false,
+/* 18 - 0x48 */	true, true, true, true,
+/* 19 - 0x4C */	false, false, true, false,
+/* 20 - 0x50 */	false, false, true, false,
+/* 21 - 0x54 */	false, false, true, false,
+/* 22 - 0x58 */	true, true, true, true,
+/* 23 - 0x5C */	false, false, true, false,
+/* 24 - 0x60 */	true, true, true, true,
+/* 25 - 0x64 */	true, true, true, true,
+/* 26 - 0x68 */	true, true, true, true,
+/* 27 - 0x6C */	true, true, true, true,
+/* 28 - 0x70 */	false, false, true, false,
+/* 29 - 0x74 */	false, false, true, false,
+/* 30 - 0x78 */	true, true, true, true,
+/* 31 - 0x7C */	false, false, true, false,
+/* 32 - 0x80 */	true, true, true, true,
+/* 33 - 0x84 */	true, true, true, true,
+/* 34 - 0x88 */	true, true, true, true,
+/* 35 - 0x8C */	true, true, true, true,
+/* 36 - 0x90 */	true, true, true, true,
+/* 37 - 0x94 */	true, true, true, true,
+/* 38 - 0x98 */	true, true, true, true,
+/* 39 - 0x9C */	true, true, true, true,
+/* 40 - 0xA0 */	true, true, true, true,
+/* 41 - 0xA4 */	true, true, true, true,
+/* 42 - 0xA8 */	true, true, true, true,
+/* 43 - 0xAC */	true, true, true, true,
+/* 44 - 0xB0 */	true, true, true, true,
+/* 45 - 0xB4 */	true, true, true, true,
+/* 46 - 0xB8 */	true, true, true, true,
+/* 47 - 0xBC */	true, true, true, true,
+/* 48 - 0xC0 */	false, false, true, false,
+/* 49 - 0xC4 */	false, false, true, false,
+/* 50 - 0xC8 */	true, true, true, true,
+/* 51 - 0xCC */	false, false, true, false,
+/* 52 - 0xD0 */	false, false, true, false,
+/* 53 - 0xD4 */	false, false, true, false,
+/* 54 - 0xD8 */	true, true, true, true,
+/* 55 - 0xDC */	false, false, true, false,
+/* 56 - 0xE0 */	true, true, true, true,
+/* 57 - 0xE4 */	true, true, true, true,
+/* 58 - 0xE8 */	true, true, true, true,
+/* 59 - 0xEC */	true, true, true, true,
+/* 60 - 0xF0 */	false, false, true, false,
+/* 61 - 0xF4 */	false, false, true, false,
+/* 62 - 0xF8 */	true, true, true, true,
+/* 63 - 0xFC */	false, false, true, false
+};
diff --git a/fs/ssdfs/common_bitmap.h b/fs/ssdfs/common_bitmap.h
new file mode 100644
index 000000000000..3aa978d0541b
--- /dev/null
+++ b/fs/ssdfs/common_bitmap.h
@@ -0,0 +1,230 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/common_bitmap.h - shared declarations for all bitmaps.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_COMMON_BITMAP_H
+#define _SSDFS_COMMON_BITMAP_H
+
+#define SSDFS_ITEMS_PER_BYTE(item_bits) ({ \
+	BUG_ON(item_bits > BITS_PER_BYTE); \
+	BITS_PER_BYTE / item_bits; \
+})
+
+#define SSDFS_ITEMS_PER_LONG(item_bits) ({ \
+	BUG_ON(item_bits > BITS_PER_BYTE); \
+	BITS_PER_LONG / item_bits; \
+})
+
+#define ALIGNED_START_ITEM(item, state_bits) ({ \
+	u64 aligned_start; \
+	aligned_start = div_u64(item, SSDFS_ITEMS_PER_BYTE(state_bits)); \
+	aligned_start *= SSDFS_ITEMS_PER_BYTE(state_bits); \
+	aligned_start; \
+})
+
+#define ALIGNED_END_ITEM(item, state_bits) ({ \
+	u64 aligned_end; \
+	aligned_end = item + SSDFS_ITEMS_PER_BYTE(state_bits) - 1; \
+	aligned_end = div_u64(aligned_end, SSDFS_ITEMS_PER_BYTE(state_bits)); \
+	aligned_end *= SSDFS_ITEMS_PER_BYTE(state_bits); \
+	aligned_end; \
+})
+
+typedef bool (*byte_check_func)(u8 *value, int state);
+typedef u8 (*byte_op_func)(u8 *value, int state, u8 start_off, u8 state_bits,
+			   int state_mask);
+
+/*
+ * FIRST_STATE_IN_BYTE() - determine first item's offset for requested state
+ * @value: pointer on analysed byte
+ * @state: requested state
+ * @start_off: starting item's offset for analysis beginning
+ * @state_bits: bits per state
+ * @state_mask: mask of a bitmap's state
+ *
+ * This function tries to determine an item with @state in
+ * @value starting from @start_off.
+ *
+ * RETURN:
+ * [success] - found item's offset.
+ * [failure] - BITS_PER_BYTE.
+ */
+static inline
+u8 FIRST_STATE_IN_BYTE(u8 *value, int state,
+			u8 start_offset, u8 state_bits,
+			int state_mask)
+{
+	u8 i;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!value);
+	BUG_ON(state_bits == 0);
+	BUG_ON(state_bits > BITS_PER_BYTE);
+	BUG_ON(state_bits > 1 && (state_bits % 2) != 0);
+	BUG_ON(start_offset > SSDFS_ITEMS_PER_BYTE(state_bits));
+
+	SSDFS_DBG("value %#x, state %#x, "
+		  "start_offset %u, state_bits %u\n",
+		  *value, state, start_offset, state_bits);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	i = start_offset * state_bits;
+	for (; i < BITS_PER_BYTE; i += state_bits) {
+		if (((*value >> i) & state_mask) == state) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("found bit %u, found item %u\n",
+				  i, i / state_bits);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return i / state_bits;
+		}
+	}
+
+	return SSDFS_ITEMS_PER_BYTE(state_bits);
+}
+
+/*
+ * FIND_FIRST_ITEM_IN_BYTE() - find item in byte value
+ * @value: pointer on analysed byte
+ * @state: requested state or mask
+ * @state_bits: bits per state
+ * @state_mask: mask of a bitmap's state
+ * @start_offset: starting item's offset for search
+ * @check: pointer on check function
+ * @op: pointer on concrete operation function
+ * @found_offset: pointer on found item's offset into byte for state [out]
+ *
+ * This function tries to find in byte items with @state starting from
+ * @start_offset.
+ *
+ * RETURN:
+ * [success] - @found_offset contains found items' offset into byte.
+ * [failure] - error code:
+ *
+ * %-ENODATA    - analyzed @value doesn't contain any item for @state.
+ */
+static inline
+int FIND_FIRST_ITEM_IN_BYTE(u8 *value, int state, u8 state_bits,
+			    int state_mask,
+			    u8 start_offset, byte_check_func check,
+			    byte_op_func op, u8 *found_offset)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!value || !found_offset || !check || !op);
+	BUG_ON(state_bits == 0);
+	BUG_ON(state_bits > BITS_PER_BYTE);
+	BUG_ON(state_bits > 1 && (state_bits % 2) != 0);
+	BUG_ON(start_offset > SSDFS_ITEMS_PER_BYTE(state_bits));
+
+	SSDFS_DBG("value %#x, state %#x, state_bits %u, "
+		  "start_offset %u, found_offset %p\n",
+		  *value, state, state_bits,
+		  start_offset, found_offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*found_offset = U8_MAX;
+
+	if (check(value, state)) {
+		u8 offset = op(value, state, start_offset, state_bits,
+				state_mask);
+
+		if (offset < SSDFS_ITEMS_PER_BYTE(state_bits)) {
+			*found_offset = offset;
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("item's offset %u for state %#x\n",
+				  *found_offset, state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			return 0;
+		}
+	}
+
+	return -ENODATA;
+}
+
+/*
+ * ssdfs_find_first_dirty_fragment() - find first dirty fragment
+ * @addr: start address
+ * @max_fragment: upper bound for search
+ * @found_addr: found address with dirty fragments [out]
+ *
+ * This method tries to find address of first found bitmap's
+ * part that contains dirty fragments.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-ENODATA     - nothing found.
+ */
+static inline
+int ssdfs_find_first_dirty_fragment(unsigned long *addr,
+				    unsigned long max_fragment,
+				    unsigned long **found_addr)
+{
+	unsigned long found;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("addr %p, max_fragment %lu, found_addr %p\n",
+		  addr, max_fragment, found_addr);
+
+	BUG_ON(!addr || !found_addr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	found = find_first_bit(addr, max_fragment);
+
+	if (found >= max_fragment) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unable to find fragment: "
+			  "found %lu, max_fragment %lu\n",
+			  found, max_fragment);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -ENODATA;
+	}
+
+	*found_addr = (unsigned long *)(addr + (found / BITS_PER_LONG));
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("found %lu, addr %p, found_addr %p\n",
+		  found, addr, *found_addr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+/*
+ * ssdfs_clear_dirty_state() - clear all dirty states for address
+ * @addr: pointer on unsigned long value
+ */
+static inline
+int ssdfs_clear_dirty_state(unsigned long *addr)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!addr);
+
+	SSDFS_DBG("addr %p\n", addr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	memset(addr, 0, sizeof(unsigned long));
+	return 0;
+}
+
+#endif /* _SSDFS_COMMON_BITMAP_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 17/79] ssdfs: implement support of migration scheme in PEB bitmap
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (7 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 15/79] ssdfs: introduce PEB's block bitmap Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 19/79] ssdfs: introduce segment block bitmap Viacheslav Dubeyko
                   ` (23 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

SSDFS implements a migration scheme. Migration scheme is
a fundamental technique of GC overhead management. The key
responsibility of the migration scheme is to guarantee
the presence of data in the same segment for any update
operations. Generally speaking, the migration scheme’s model
is implemented on the basis of association an exhausted
"Physical" Erase Block (PEB) with a clean one. The goal such
association of two PEBs is to implement the gradual migration
of data by means of the update operations in the initial
(exhausted) PEB. As a result, the old, exhausted PEB becomes
invalidated after complete data migration and it will be
possible to apply the erase operation to convert it in the
clean state. Moreover, the destination PEB in the association
changes the initial PEB for some index in the segment and, finally,
it becomes the only PEB for this position. Namely such technique
implements the concept of logical extent with the goal to decrease
the write amplification issue and to manage the GC overhead.
Because the logical extent concept excludes the necessity
to update metadata is tracking the position of user data on
the file system’s volume. Generally speaking, the migration scheme
is capable to decrease the GC activity significantly by means of
excluding the necessity to update metadata and by means of
self-migration of data between of PEBs is triggered by regular
update operations.

To implement the migration scheme concept, SSDFS introduces
PEB container that includes source and destination erase blocks.
As a result, PEB block bitmap object represents the same aggregation
for source PEB's block bitmap and destination PEB's block bitmap.
PEB block bitmap implements API:
(1) create - create PEB block bitmap
(2) destroy - destroy PEB block bitmap
(3) init - initialize PEB block bitmap by metadata from a log
(4) get_free_pages - get free pages in aggregation of block bitmaps
(5) get_used_pages - get used pages in aggregation of block bitmaps
(6) get_invalid_pages - get invalid pages in aggregation of block bitmaps
(7) pre_allocate - pre_allocate page/range in aggregation of block bitmaps
(8) allocate - allocate page/range in aggregation of block bitmaps
(9) invalidate - invalidate page/range in aggregation of block bitmaps
(10) update_range - change the state of range in aggregation of block bitmaps
(11) collect_garbage - find contiguous range for requested state
(12) start_migration - prepare PEB's environment for migration
(13) migrate - move range from source block bitmap into destination one
(14) finish_migration - clean source block bitmap and swap block bitmaps

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/peb_block_bitmap.h | 179 ++++++++++++++++++++++++++++++++++++
 1 file changed, 179 insertions(+)
 create mode 100644 fs/ssdfs/peb_block_bitmap.h

diff --git a/fs/ssdfs/peb_block_bitmap.h b/fs/ssdfs/peb_block_bitmap.h
new file mode 100644
index 000000000000..4d38cfb1aefb
--- /dev/null
+++ b/fs/ssdfs/peb_block_bitmap.h
@@ -0,0 +1,179 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/peb_block_bitmap.h - PEB's block bitmap declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_PEB_BLOCK_BITMAP_H
+#define _SSDFS_PEB_BLOCK_BITMAP_H
+
+#include "block_bitmap.h"
+
+/* PEB's block bitmap indexes */
+enum {
+	SSDFS_PEB_BLK_BMAP1,
+	SSDFS_PEB_BLK_BMAP2,
+	SSDFS_PEB_BLK_BMAP_ITEMS_MAX
+};
+
+/*
+ * struct ssdfs_peb_blk_bmap - PEB container's block bitmap object
+ * @state: PEB container's block bitmap's state
+ * @peb_index: PEB index in array
+ * @pages_per_peb: pages per physical erase block
+ * @modification_lock: lock for modification operations
+ * @peb_valid_blks: PEB container's valid logical blocks count
+ * @peb_invalid_blks: PEB container's invalid logical blocks count
+ * @peb_free_blks: PEB container's free logical blocks count
+ * @peb_blks_capacity: PEB container's logical blocks capacity
+ * @buffers_state: buffers state
+ * @lock: buffers lock
+ * @init_cno: initialization checkpoint
+ * @src: source PEB's block bitmap object's pointer
+ * @dst: destination PEB's block bitmap object's pointer
+ * @buffers: block bitmap buffers
+ * @init_end: wait of init ending
+ * @parent: pointer on parent segment block bitmap
+ */
+struct ssdfs_peb_blk_bmap {
+	atomic_t state;
+
+	u16 peb_index;
+	u32 pages_per_peb;
+
+	struct rw_semaphore modification_lock;
+	atomic_t peb_valid_blks;
+	atomic_t peb_invalid_blks;
+	atomic_t peb_free_blks;
+	atomic_t peb_blks_capacity;
+
+	atomic_t buffers_state;
+	struct rw_semaphore lock;
+	u64 init_cno;
+	struct ssdfs_block_bmap *src;
+	struct ssdfs_block_bmap *dst;
+	struct ssdfs_block_bmap buffer[SSDFS_PEB_BLK_BMAP_ITEMS_MAX];
+	struct completion init_end;
+
+	struct ssdfs_segment_blk_bmap *parent;
+};
+
+/* PEB container's block bitmap's possible states */
+enum {
+	SSDFS_PEB_BLK_BMAP_STATE_UNKNOWN,
+	SSDFS_PEB_BLK_BMAP_CREATED,
+	SSDFS_PEB_BLK_BMAP_HAS_CLEAN_DST,
+	SSDFS_PEB_BLK_BMAP_INITIALIZED,
+	SSDFS_PEB_BLK_BMAP_STATE_MAX,
+};
+
+/* PEB's buffer array possible states */
+enum {
+	SSDFS_PEB_BMAP_BUFFERS_EMPTY,
+	SSDFS_PEB_BMAP1_SRC,
+	SSDFS_PEB_BMAP1_SRC_PEB_BMAP2_DST,
+	SSDFS_PEB_BMAP2_SRC,
+	SSDFS_PEB_BMAP2_SRC_PEB_BMAP1_DST,
+	SSDFS_PEB_BMAP_BUFFERS_STATE_MAX
+};
+
+/* PEB's block bitmap operation destination */
+enum {
+	SSDFS_PEB_BLK_BMAP_SOURCE,
+	SSDFS_PEB_BLK_BMAP_DESTINATION,
+	SSDFS_PEB_BLK_BMAP_INDEX_MAX
+};
+
+/*
+ * PEB block bitmap API
+ */
+int ssdfs_peb_blk_bmap_create(struct ssdfs_segment_blk_bmap *parent,
+			      u16 peb_index, u32 items_count,
+			      int init_flag, int init_state);
+int ssdfs_peb_blk_bmap_destroy(struct ssdfs_peb_blk_bmap *ptr);
+int ssdfs_peb_blk_bmap_init(struct ssdfs_peb_blk_bmap *bmap,
+			    struct ssdfs_folio_vector *source,
+			    struct ssdfs_block_bitmap_fragment *hdr,
+			    u32 peb_free_pages,
+			    u64 cno);
+int ssdfs_peb_blk_bmap_clean_init(struct ssdfs_peb_blk_bmap *bmap);
+void ssdfs_peb_blk_bmap_init_failed(struct ssdfs_peb_blk_bmap *bmap);
+
+bool has_ssdfs_peb_blk_bmap_initialized(struct ssdfs_peb_blk_bmap *bmap);
+int ssdfs_peb_blk_bmap_wait_init_end(struct ssdfs_peb_blk_bmap *bmap);
+
+bool ssdfs_peb_blk_bmap_initialized(struct ssdfs_peb_blk_bmap *ptr);
+bool is_ssdfs_peb_blk_bmap_dirty(struct ssdfs_peb_blk_bmap *ptr);
+int ssdfs_peb_blk_bmap_inflate(struct ssdfs_peb_blk_bmap *ptr,
+				u32 free_items);
+
+int ssdfs_peb_blk_bmap_get_free_pages(struct ssdfs_peb_blk_bmap *ptr);
+int ssdfs_peb_blk_bmap_get_used_pages(struct ssdfs_peb_blk_bmap *ptr);
+int ssdfs_peb_blk_bmap_get_invalid_pages(struct ssdfs_peb_blk_bmap *ptr);
+int ssdfs_peb_blk_bmap_get_metadata_pages(struct ssdfs_peb_blk_bmap *ptr);
+int ssdfs_peb_blk_bmap_get_pages_capacity(struct ssdfs_peb_blk_bmap *ptr);
+
+int ssdfs_peb_define_reserved_pages_per_log(struct ssdfs_peb_blk_bmap *bmap);
+int ssdfs_peb_blk_bmap_reserve_metapages(struct ssdfs_peb_blk_bmap *bmap,
+					 int bmap_index,
+					 u32 count);
+int ssdfs_peb_blk_bmap_free_metapages(struct ssdfs_peb_blk_bmap *bmap,
+				      int bmap_index,
+				      u32 count);
+int ssdfs_peb_blk_bmap_get_block_state(struct ssdfs_peb_blk_bmap *bmap,
+					int bmap_index,
+					u32 blk);
+int ssdfs_peb_blk_bmap_pre_allocate(struct ssdfs_peb_blk_bmap *bmap,
+				    int bmap_index,
+				    struct ssdfs_block_bmap_range *range);
+int ssdfs_peb_blk_bmap_allocate(struct ssdfs_peb_blk_bmap *bmap,
+				int bmap_index,
+				struct ssdfs_block_bmap_range *range);
+int ssdfs_peb_blk_bmap_invalidate(struct ssdfs_peb_blk_bmap *bmap,
+				  int bmap_index,
+				  struct ssdfs_block_bmap_range *range);
+int ssdfs_peb_blk_bmap_update_range(struct ssdfs_peb_blk_bmap *bmap,
+				    int bmap_index,
+				    int new_range_state,
+				    struct ssdfs_block_bmap_range *range);
+int ssdfs_peb_blk_bmap_collect_garbage(struct ssdfs_peb_blk_bmap *bmap,
+					u32 start, u32 max_len,
+					int blk_state,
+					struct ssdfs_block_bmap_range *range);
+int ssdfs_peb_blk_bmap_start_migration(struct ssdfs_peb_blk_bmap *bmap);
+int ssdfs_peb_blk_bmap_migrate(struct ssdfs_peb_blk_bmap *bmap,
+				int new_range_state,
+				struct ssdfs_block_bmap_range *range);
+int ssdfs_peb_blk_bmap_finish_migration(struct ssdfs_peb_blk_bmap *bmap);
+
+/*
+ * PEB block bitmap internal API
+ */
+int ssdfs_src_blk_bmap_get_free_pages(struct ssdfs_peb_blk_bmap *ptr);
+int ssdfs_src_blk_bmap_get_used_pages(struct ssdfs_peb_blk_bmap *ptr);
+int ssdfs_src_blk_bmap_get_invalid_pages(struct ssdfs_peb_blk_bmap *ptr);
+int ssdfs_src_blk_bmap_get_metadata_pages(struct ssdfs_peb_blk_bmap *ptr);
+int ssdfs_src_blk_bmap_get_pages_capacity(struct ssdfs_peb_blk_bmap *ptr);
+int ssdfs_dst_blk_bmap_get_free_pages(struct ssdfs_peb_blk_bmap *ptr);
+int ssdfs_dst_blk_bmap_get_used_pages(struct ssdfs_peb_blk_bmap *ptr);
+int ssdfs_dst_blk_bmap_get_invalid_pages(struct ssdfs_peb_blk_bmap *ptr);
+int ssdfs_dst_blk_bmap_get_metadata_pages(struct ssdfs_peb_blk_bmap *ptr);
+int ssdfs_dst_blk_bmap_get_pages_capacity(struct ssdfs_peb_blk_bmap *ptr);
+
+#endif /* _SSDFS_PEB_BLOCK_BITMAP_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 19/79] ssdfs: introduce segment block bitmap
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (8 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 17/79] ssdfs: implement support of migration scheme in PEB bitmap Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 22/79] ssdfs: introduce offset translation table Viacheslav Dubeyko
                   ` (22 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

SSDFS splits a partition/volume on sequence of fixed-sized
segments. Every segment can include one or several Logical
Erase Blocks (LEB). LEB can be mapped into "Physical" Erase
Block (PEB). PEB block bitmap object represents the aggregation
of source PEB's block bitmap and destination PEB's block bitmap.
Finally, segment block bitmap implements an array of PEB block
bitmaps. Segment block bitmap has API:
(1) create - create segment block bitmap
(2) destroy - destroy segment block bitmap
(3) partial_init - initialize by state of one PEB block bitmap
(4) get_free_pages - get free pages in segment block bitmap
(5) get_used_pages - get used pages in segment block bitmap
(6) get_invalid_pages - get invalid pages in segment block bitmap
(7) reserve_block - reserve a free block
(8) reserved_extent - reserve some number of free blocks
(9) pre_allocate - pre_allocate page/range in segment block bitmap
(10) allocate - allocate page/range in segment block bitmap
(11) update_range - change the state of range in segment block bitmap

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/segment_block_bitmap.h | 240 ++++++++++++++++++++++++++++++++
 1 file changed, 240 insertions(+)
 create mode 100644 fs/ssdfs/segment_block_bitmap.h

diff --git a/fs/ssdfs/segment_block_bitmap.h b/fs/ssdfs/segment_block_bitmap.h
new file mode 100644
index 000000000000..7ee7e39972ef
--- /dev/null
+++ b/fs/ssdfs/segment_block_bitmap.h
@@ -0,0 +1,240 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/segment_block_bitmap.h - segment's block bitmap declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_SEGMENT_BLOCK_BITMAP_H
+#define _SSDFS_SEGMENT_BLOCK_BITMAP_H
+
+#include "peb_block_bitmap.h"
+
+/*
+ * struct ssdfs_segment_blk_bmap - segment block bitmap object
+ * @state: segment block bitmap's state
+ * @pages_per_peb: pages per physical erase block
+ * @pages_per_seg: pages per segment
+ * @modification_lock: lock for modification operations
+ * @seg_valid_blks: segment's valid logical blocks count
+ * @seg_invalid_blks: segment's invalid logical blocks count
+ * @seg_free_blks: segment's free logical blocks count
+ * @seg_reserved_metapages: number of reserved metapages
+ * @peb: array of PEB block bitmap objects
+ * @pebs_count: PEBs count in segment
+ * @parent_si: pointer on parent segment object
+ */
+struct ssdfs_segment_blk_bmap {
+	atomic_t state;
+
+	u32 pages_per_peb;
+	u32 pages_per_seg;
+
+	struct rw_semaphore modification_lock;
+	atomic_t seg_valid_blks;
+	atomic_t seg_invalid_blks;
+	atomic_t seg_free_blks;
+	atomic_t seg_reserved_metapages;
+
+	struct ssdfs_peb_blk_bmap *peb;
+	u16 pebs_count;
+
+	struct ssdfs_segment_info *parent_si;
+};
+
+/* Segment block bitmap's possible states */
+enum {
+	SSDFS_SEG_BLK_BMAP_STATE_UNKNOWN,
+	SSDFS_SEG_BLK_BMAP_CREATED,
+	SSDFS_SEG_BLK_BMAP_STATE_MAX,
+};
+
+/*
+ * Segment block bitmap API
+ */
+int ssdfs_segment_blk_bmap_create(struct ssdfs_segment_info *si,
+				  int init_flag, int init_state);
+void ssdfs_segment_blk_bmap_destroy(struct ssdfs_segment_blk_bmap *ptr);
+int ssdfs_segment_blk_bmap_partial_init(struct ssdfs_segment_blk_bmap *bmap,
+				    u16 peb_index,
+				    struct ssdfs_folio_vector *source,
+				    struct ssdfs_block_bitmap_fragment *hdr,
+				    u32 peb_free_pages, u64 cno);
+int ssdfs_segment_blk_bmap_partial_inflate(struct ssdfs_segment_blk_bmap *bmap,
+					   u16 peb_index, u32 free_items);
+int
+ssdfs_segment_blk_bmap_partial_clean_init(struct ssdfs_segment_blk_bmap *bmap,
+					  u16 peb_index);
+void ssdfs_segment_blk_bmap_init_failed(struct ssdfs_segment_blk_bmap *bmap,
+					u16 peb_index);
+
+bool is_ssdfs_segment_blk_bmap_dirty(struct ssdfs_segment_blk_bmap *bmap,
+					u16 peb_index);
+
+bool has_ssdfs_segment_blk_bmap_initialized(struct ssdfs_segment_blk_bmap *ptr,
+					    struct ssdfs_peb_container *pebc);
+int ssdfs_segment_blk_bmap_wait_init_end(struct ssdfs_segment_blk_bmap *ptr,
+					 struct ssdfs_peb_container *pebc);
+
+int ssdfs_segment_blk_bmap_get_block_state(struct ssdfs_segment_blk_bmap *ptr,
+					   struct ssdfs_peb_info *pebi,
+					   u32 blk);
+int ssdfs_segment_blk_bmap_reserve_metapages(struct ssdfs_segment_blk_bmap *ptr,
+					     struct ssdfs_peb_info *pebi,
+					     u32 count);
+int ssdfs_segment_blk_bmap_free_metapages(struct ssdfs_segment_blk_bmap *ptr,
+					  struct ssdfs_peb_info *pebi,
+					  u32 count);
+int ssdfs_segment_blk_bmap_reserve_block(struct ssdfs_segment_blk_bmap *ptr);
+int ssdfs_segment_blk_bmap_reserve_extent(struct ssdfs_segment_blk_bmap *ptr,
+					  u32 count, u32 *reserved_blks);
+int ssdfs_segment_blk_bmap_release_extent(struct ssdfs_segment_blk_bmap *ptr,
+					  u32 count);
+int ssdfs_segment_blk_bmap_pre_allocate(struct ssdfs_segment_blk_bmap *ptr,
+					struct ssdfs_peb_info *pebi,
+					struct ssdfs_block_bmap_range *range);
+int ssdfs_segment_blk_bmap_allocate(struct ssdfs_segment_blk_bmap *ptr,
+				    struct ssdfs_peb_info *pebi,
+				    struct ssdfs_block_bmap_range *range);
+int ssdfs_segment_blk_bmap_update_range(struct ssdfs_segment_blk_bmap *ptr,
+				    struct ssdfs_peb_info *pebi,
+				    u8 peb_migration_id,
+				    int range_state,
+				    struct ssdfs_block_bmap_range *range);
+
+/* Inline methods */
+
+static inline
+int __ssdfs_segment_blk_bmap_get_capacity(struct ssdfs_segment_blk_bmap *ptr)
+{
+	int capacity = 0;
+	int i;
+
+	for (i = 0; i < ptr->pebs_count; i++) {
+		capacity += atomic_read(&ptr->peb[i].peb_blks_capacity);
+	}
+
+	return capacity;
+}
+
+static inline
+int ssdfs_segment_blk_bmap_get_capacity(struct ssdfs_segment_blk_bmap *ptr)
+{
+	int capacity;
+
+	down_read(&ptr->modification_lock);
+	capacity = __ssdfs_segment_blk_bmap_get_capacity(ptr);
+	up_read(&ptr->modification_lock);
+
+	return capacity;
+}
+
+static inline
+bool is_pages_balance_correct(struct ssdfs_segment_blk_bmap *ptr)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	int free_blks;
+	int valid_blks;
+	int invalid_blks;
+	int capacity;
+	int calculated;
+
+	BUG_ON(!ptr);
+
+	down_read(&ptr->modification_lock);
+	free_blks = atomic_read(&ptr->seg_free_blks);
+	valid_blks = atomic_read(&ptr->seg_valid_blks);
+	invalid_blks = atomic_read(&ptr->seg_invalid_blks);
+	calculated = free_blks + valid_blks + invalid_blks;
+	capacity = __ssdfs_segment_blk_bmap_get_capacity(ptr);
+	up_read(&ptr->modification_lock);
+
+	BUG_ON(free_blks < 0);
+	BUG_ON(valid_blks < 0);
+	BUG_ON(invalid_blks < 0);
+	BUG_ON(capacity < 0);
+
+	SSDFS_DBG("free_logical_blks %d, valid_logical_blks %d, "
+		  "invalid_logical_blks %d, capacity %d\n",
+		  free_blks, valid_blks, invalid_blks,
+		  capacity);
+
+	if (calculated > capacity) {
+		SSDFS_ERR("free_logical_blks %d, valid_logical_blks %d, "
+			  "invalid_logical_blks %d, calculated %d, "
+			  "capacity %d\n",
+			  free_blks, valid_blks, invalid_blks,
+			  calculated, capacity);
+		return false;
+	}
+
+	return true;
+#else
+	return true;
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+static inline
+int ssdfs_segment_blk_bmap_get_free_pages(struct ssdfs_segment_blk_bmap *ptr)
+{
+	int free_blks;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	WARN_ON(!is_pages_balance_correct(ptr));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	down_read(&ptr->modification_lock);
+	free_blks = atomic_read(&ptr->seg_free_blks);
+	up_read(&ptr->modification_lock);
+
+	return free_blks;
+}
+
+static inline
+int ssdfs_segment_blk_bmap_get_used_pages(struct ssdfs_segment_blk_bmap *ptr)
+{
+	int valid_blks;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	WARN_ON(!is_pages_balance_correct(ptr));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	down_read(&ptr->modification_lock);
+	valid_blks = atomic_read(&ptr->seg_valid_blks);
+	up_read(&ptr->modification_lock);
+
+	return valid_blks;
+}
+
+static inline
+int ssdfs_segment_blk_bmap_get_invalid_pages(struct ssdfs_segment_blk_bmap *ptr)
+{
+	int invalid_blks;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	WARN_ON(!is_pages_balance_correct(ptr));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	down_read(&ptr->modification_lock);
+	invalid_blks = atomic_read(&ptr->seg_invalid_blks);
+	up_read(&ptr->modification_lock);
+
+	return invalid_blks;
+}
+
+#endif /* _SSDFS_SEGMENT_BLOCK_BITMAP_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 22/79] ssdfs: introduce offset translation table
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (9 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 19/79] ssdfs: introduce segment block bitmap Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 24/79] ssdfs: introduce PEB object Viacheslav Dubeyko
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

One of the key goal of SSDFS is to decrease the write
amplification factor. Logical extent concept is the
fundamental technique to achieve the goal. Logical extent
describes any volume extent on the basis of segment ID,
logical block ID, and length. Migration scheme guarantee
that segment ID will be always the same for any logical
extent. Offset translation table provides the way to convert
logical block ID into offset inside of a log of particular
"Physical" Erase Block (PEB). As a result, extents b-tree
never needs to be updated because logical extent will never
change until intentional movement from one segment into
another one.

Offset translation table is a metadata structure that
is stored as metadata in every log. The responsibility of
offset translation table keeps the knowledge which particular
logical blocks are stored into log's payload and which offset
in the payload should be used to access and retrieve the
content of logical block.

Offset translation table can be imagined like a sequence of
fragments. Every fragment contains array of physical
offset descriptors that provides the way to convert logical
block ID into the physical offset in the log. Flush logic
identifies dirty fragments and stores these fragments as
part of log's metadata during log commit operation.

The responsibility of offset translation table is to implement
the mechanism of converting logical block ID into the physical
offset in the log. As a result, offset translation table
implements API:
(1) create - create empty offset translation table
(2) destroy - destroy offset translation table
(3) partial_init - initialize offset translation table by one fragment
(4) store_offsets_table - flush dirty fragments
(5) convert - convert logical block ID into offset descriptor
(6) allocate_block - allocate logical block
(7) allocate_extent - allocate logical extent
(8) change_offset - initialize offset of allocated logical block
(9) free_block - free logical block
(10) free_extent - free logical extent

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/offset_translation_table.h | 459 ++++++++++++++++++++++++++++
 1 file changed, 459 insertions(+)
 create mode 100644 fs/ssdfs/offset_translation_table.h

diff --git a/fs/ssdfs/offset_translation_table.h b/fs/ssdfs/offset_translation_table.h
new file mode 100644
index 000000000000..7bc5a4764679
--- /dev/null
+++ b/fs/ssdfs/offset_translation_table.h
@@ -0,0 +1,459 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/offset_translation_table.h - offset table declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates.
+ *              https://www.bytedance.com/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ *                  Cong Wang
+ */
+
+#ifndef _SSDFS_OFFSET_TRANSLATION_TABLE_H
+#define _SSDFS_OFFSET_TRANSLATION_TABLE_H
+
+#include <linux/pagevec.h>
+
+#include "request_queue.h"
+#include "sequence_array.h"
+#include "dynamic_array.h"
+
+/*
+ * struct ssdfs_phys_offset_table_fragment - fragment of phys offsets table
+ * @lock: table fragment lock
+ * @start_id: starting physical offset id number in fragment
+ * @sequence_id: fragment's sequence_id in PEB
+ * @id_count: count of id numbers in sequence
+ * @state: fragment state
+ * @peb_id: PEB ID containing the fragment
+ * @migrating_blocks: number of logical blocks under migration
+ * @actual_records: number of actual records in fragment
+ * @hdr: pointer on fragment's header
+ * @phys_offs: array of physical offsets in fragment
+ * @buf: buffer of fragment
+ * @buf_size: size of buffer in bytes
+ *
+ * One fragment can be used for one PEB's log. But one log can contain
+ * several fragments too. In memory exists the same count of fragments
+ * as on volume.
+ */
+struct ssdfs_phys_offset_table_fragment {
+	struct rw_semaphore lock;
+	u16 start_id;
+	u16 sequence_id;
+	atomic_t id_count;
+	atomic_t state;
+	u64 peb_id;
+	atomic_t migrating_blocks;
+	atomic_t actual_records;
+
+	struct ssdfs_phys_offset_table_header *hdr;
+	struct ssdfs_phys_offset_descriptor *phys_offs;
+	unsigned char *buf;
+	size_t buf_size;
+};
+
+enum {
+	SSDFS_BLK2OFF_FRAG_UNDEFINED,
+	SSDFS_BLK2OFF_FRAG_CREATED,
+	SSDFS_BLK2OFF_FRAG_INITIALIZED,
+	SSDFS_BLK2OFF_FRAG_DIRTY,
+	SSDFS_BLK2OFF_FRAG_UNDER_COMMIT,
+	SSDFS_BLK2OFF_FRAG_COMMITED,
+	SSDFS_BLK2OFF_PLEASE_DELETE_FRAGMENT,
+	SSDFS_BLK2OFF_FRAG_STATE_MAX,
+};
+
+#define SSDFS_INVALID_FRAG_ID				U16_MAX
+#define SSDFS_BLK2OFF_TBL_REVERT_THRESHOLD		(U16_MAX - 1)
+#define SSDFS_BLK2OFF_TBL_REVERT_PIVOT			(U16_MAX / 2)
+#define SSDFS_BLK2OFF_TBL_LBLK2OFF_CAPACITY_MAX		U16_MAX
+
+/*
+ * struct ssdfs_phys_offset_table_array - array of log's fragments in PEB
+ * @state: PEB's translation table state
+ * @fragment_count: fragments count
+ * @array: array of fragments
+ */
+struct ssdfs_phys_offset_table_array {
+	atomic_t state;
+	atomic_t fragment_count;
+	struct ssdfs_sequence_array *sequence;
+};
+
+enum {
+	SSDFS_BLK2OFF_TABLE_UNDEFINED,
+	SSDFS_BLK2OFF_TABLE_CREATED,
+	SSDFS_BLK2OFF_TABLE_PARTIAL_INIT,
+	SSDFS_BLK2OFF_TABLE_DIRTY_PARTIAL_INIT,
+	SSDFS_BLK2OFF_TABLE_COMPLETE_INIT,
+	SSDFS_BLK2OFF_TABLE_DIRTY,
+	SSDFS_BLK2OFF_TABLE_STATE_MAX,
+};
+
+#define SSDFS_BLK2OFF_TABLE_INVALID_ID		U16_MAX
+
+/*
+ * struct ssdfs_block_descriptor_state - block descriptor state
+ * @status: state of block descriptor buffer
+ * @buf: block descriptor buffer
+ */
+struct ssdfs_block_descriptor_state {
+	u32 status;
+	struct ssdfs_block_descriptor buf;
+};
+
+/*
+ * Block descriptor buffer state
+ */
+enum {
+	SSDFS_BLK_DESC_BUF_UNKNOWN_STATE,
+	SSDFS_BLK_DESC_BUF_INITIALIZED,
+	SSDFS_BLK_DESC_BUF_STATE_MAX,
+	SSDFS_BLK_DESC_BUF_ALLOCATED = U32_MAX,
+};
+
+/*
+ * struct ssdfs_offset_position - defines offset id and position
+ * @cno: checkpoint of change
+ * @id: physical offset ID
+ * @peb_index: PEB's index
+ * @sequence_id: sequence ID of physical offset table's fragment
+ * @offset_index: offset index inside of fragment
+ * @blk_desc: logical block descriptor
+ */
+struct ssdfs_offset_position {
+	u64 cno;
+	u16 id;
+	u16 peb_index;
+	u16 sequence_id;
+	u16 offset_index;
+
+	struct ssdfs_block_descriptor_state blk_desc;
+};
+
+/*
+ * struct ssdfs_migrating_block - migrating block state
+ * @state: logical block's state
+ * @peb_index: PEB's index
+ * @sequence_id: fragment's sequence_id in PEB
+ * @batch: copy of logical block's content (under migration only)
+ */
+struct ssdfs_migrating_block {
+	int state;
+	u16 peb_index;
+	u16 sequence_id;
+	struct folio_batch batch;
+};
+
+/*
+ * Migrating block's states
+ */
+enum {
+	SSDFS_LBLOCK_UNKNOWN_STATE,
+	SSDFS_LBLOCK_UNDER_MIGRATION,
+	SSDFS_LBLOCK_UNDER_COMMIT,
+	SSDFS_LBLOCK_STATE_MAX
+};
+
+enum {
+	SSDFS_LBMAP_INIT_INDEX,
+	SSDFS_LBMAP_STATE_INDEX,
+	SSDFS_LBMAP_MODIFICATION_INDEX,
+	SSDFS_LBMAP_ARRAY_MAX,
+};
+
+
+/*
+ * struct ssdfs_bitmap_array - bitmap array
+ * @bits_count: number of available bits in every bitmap
+ * @bytes_count: number of allocated bytes in every bitmap
+ * @array: array of bitmaps
+ */
+struct ssdfs_bitmap_array {
+	u32 bits_count;
+	u32 bytes_count;
+	unsigned long *array[SSDFS_LBMAP_ARRAY_MAX];
+};
+
+/*
+ * struct ssdfs_blk2off_table - in-core translation table
+ * @flags: flags of translation table
+ * @state: translation table object state
+ * @pages_per_peb: pages per physical erase block
+ * @pages_per_seg: pages per segment
+ * @type: translation table type
+ * @translation_lock: lock of translation operation
+ * @init_cno: last actual checkpoint
+ * @used_logical_blks: count of used logical blocks
+ * @free_logical_blks: count of free logical blocks
+ * @last_allocated_blk: last allocated block (hint for allocation)
+ * @lbmap: array of block bitmaps
+ * @lblk2off: array of correspondence between logical numbers and phys off ids
+ * @migrating_blks: array of migrating blocks
+ * @lblk2off_capacity: capacity of correspondence array
+ * @peb: sequence of physical offset arrays
+ * @pebs_count: count of PEBs in segment
+ * @partial_init_end: wait of partial init ending
+ * @full_init_end: wait of full init ending
+ * @wait_queue: wait queue of blk2off table
+ * @fsi: pointer on shared file system object
+ */
+struct ssdfs_blk2off_table {
+	atomic_t flags;
+	atomic_t state;
+
+	u32 pages_per_peb;
+	u32 pages_per_seg;
+	u8 type;
+
+	struct rw_semaphore translation_lock;
+	u64 init_cno;
+	u16 used_logical_blks;
+	u16 free_logical_blks;
+	u16 last_allocated_blk;
+	struct ssdfs_bitmap_array lbmap;
+	struct ssdfs_dynamic_array lblk2off;
+	struct ssdfs_dynamic_array migrating_blks;
+	u16 lblk2off_capacity;
+
+	struct ssdfs_phys_offset_table_array *peb;
+	u16 pebs_count;
+
+	struct completion partial_init_end;
+	struct completion full_init_end;
+	wait_queue_head_t wait_queue;
+
+	struct ssdfs_fs_info *fsi;
+};
+
+#define SSDFS_OFF_POS(ptr) \
+	((struct ssdfs_offset_position *)(ptr))
+#define SSDFS_MIGRATING_BLK(ptr) \
+	((struct ssdfs_migrating_block *)(ptr))
+#define SSDFS_TRANS_EXT(ptr) \
+	((struct ssdfs_translation_extent *)(ptr))
+
+enum {
+	SSDFS_BLK2OFF_OBJECT_UNKNOWN,
+	SSDFS_BLK2OFF_OBJECT_CREATED,
+	SSDFS_BLK2OFF_OBJECT_PARTIAL_INIT,
+	SSDFS_BLK2OFF_OBJECT_COMPLETE_INIT,
+	SSDFS_BLK2OFF_OBJECT_STATE_MAX,
+};
+
+/*
+ * struct ssdfs_blk2off_table_snapshot - table state snapshot
+ * @cno: checkpoint of snapshot
+ * @bmap_copy: copy of modification bitmap
+ * @tbl_copy: copy of translation table
+ * @capacity: capacity of table
+ * @used_logical_blks: count of used logical blocks
+ * @free_logical_blks: count of free logical blocks
+ * @last_allocated_blk: last allocated block (hint for allocation)
+ * @peb_index: PEB index
+ * @start_sequence_id: sequence ID of the first dirty fragment
+ * @new_sequence_id: sequence ID of the first newly added fragment
+ * @start_offset_id: starting offset ID
+ * @end_offset_id: ending offset ID
+ * @dirty_fragments: count of dirty fragments
+ * @fragments_count: total count of fragments
+ *
+ * The @bmap_copy and @tbl_copy are allocated during getting
+ * snapshot inside of called function. Freeing of allocated
+ * memory SHOULD BE MADE by caller.
+ */
+struct ssdfs_blk2off_table_snapshot {
+	u64 cno;
+
+	unsigned long *bmap_copy;
+	struct ssdfs_offset_position *tbl_copy;
+	u16 capacity;
+
+	u16 used_logical_blks;
+	u16 free_logical_blks;
+	u16 last_allocated_blk;
+
+	u16 peb_index;
+	u16 start_sequence_id;
+	u16 new_sequence_id;
+	u16 start_offset_id;
+	u16 end_offset_id;
+	u16 dirty_fragments;
+	u32 fragments_count;
+};
+
+/*
+ * Inline functions
+ */
+
+/*
+ * ssdfs_blk2off_table_bmap_bytes() - calculate bmap bytes count
+ * @items_count: bits count in bitmap
+ */
+static inline
+size_t ssdfs_blk2off_table_bmap_bytes(size_t items_count)
+{
+	size_t bytes;
+
+	bytes = (items_count + BITS_PER_LONG - 1) / BITS_PER_LONG;
+	bytes *= sizeof(unsigned long);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("items_count %zu, bmap_bytes %zu\n",
+		  items_count, bytes);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return bytes;
+}
+
+static inline
+bool is_ssdfs_logical_block_migrating(int blk_state)
+{
+	bool is_migrating = false;
+
+	switch (blk_state) {
+	case SSDFS_LBLOCK_UNDER_MIGRATION:
+	case SSDFS_LBLOCK_UNDER_COMMIT:
+		is_migrating = true;
+		break;
+
+	default:
+		/* do nothing */
+		break;
+	}
+
+	return is_migrating;
+}
+
+/* Function prototypes */
+struct ssdfs_blk2off_table *
+ssdfs_blk2off_table_create(struct ssdfs_fs_info *fsi,
+			   u16 items_count, u8 type,
+			   int state);
+void ssdfs_blk2off_table_destroy(struct ssdfs_blk2off_table *table);
+int ssdfs_blk2off_table_partial_clean_init(struct ssdfs_blk2off_table *table,
+					   u16 peb_index);
+int ssdfs_blk2off_table_partial_init(struct ssdfs_blk2off_table *table,
+				     struct ssdfs_read_init_env *env,
+				     u16 peb_index, u64 peb_id, u64 cno);
+int ssdfs_blk2off_table_blk_desc_init(struct ssdfs_blk2off_table *table,
+					u16 logical_blk,
+					struct ssdfs_offset_position *pos);
+int ssdfs_blk2off_table_resize(struct ssdfs_blk2off_table *table,
+				u16 new_items_count);
+int ssdfs_blk2off_table_snapshot(struct ssdfs_blk2off_table *table,
+				 u16 peb_index, u64 peb_id,
+				 struct ssdfs_blk2off_table_snapshot *snapshot);
+void ssdfs_blk2off_table_free_snapshot(struct ssdfs_blk2off_table_snapshot *sp);
+int ssdfs_blk2off_table_extract_extents(struct ssdfs_blk2off_table_snapshot *sp,
+					struct ssdfs_dynamic_array *array,
+					u16 capacity, u16 *extent_count);
+
+int
+ssdfs_blk2off_table_prepare_for_commit(struct ssdfs_blk2off_table *table,
+				       u16 peb_index, u16 sequence_id,
+				       struct ssdfs_blk2off_table_snapshot *sp);
+int ssdfs_peb_store_offsets_table_header(struct ssdfs_peb_info *pebi,
+					struct ssdfs_blk2off_table_header *hdr,
+					struct ssdfs_peb_log_offset *log_offset);
+int
+ssdfs_peb_store_offsets_table_extents(struct ssdfs_peb_info *pebi,
+				      struct ssdfs_dynamic_array *array,
+				      u16 extent_count,
+				      struct ssdfs_peb_log_offset *log_offset);
+int ssdfs_peb_store_offsets_table_fragment(struct ssdfs_peb_info *pebi,
+					struct ssdfs_blk2off_table *table,
+					u16 peb_index, u16 sequence_id,
+					struct ssdfs_peb_log_offset *log_offset);
+int ssdfs_blk2off_table_pre_delete_fragment(void *item, u64 peb_id);
+int ssdfs_peb_store_offsets_table(struct ssdfs_peb_info *pebi,
+				  struct ssdfs_metadata_descriptor *desc,
+				  struct ssdfs_peb_log_offset *log_offset);
+int
+ssdfs_blk2off_table_forget_snapshot(struct ssdfs_blk2off_table *table,
+				    struct ssdfs_blk2off_table_snapshot *sp,
+				    struct ssdfs_dynamic_array *array,
+				    u16 extent_count);
+
+bool ssdfs_blk2off_table_dirtied(struct ssdfs_blk2off_table *table,
+				 u16 peb_index);
+bool ssdfs_blk2off_table_initialized(struct ssdfs_blk2off_table *table,
+				     u16 peb_index);
+
+int ssdfs_blk2off_table_get_used_logical_blks(struct ssdfs_blk2off_table *tbl,
+						u16 *used_blks);
+int ssdfs_blk2off_table_get_offset_position(struct ssdfs_blk2off_table *table,
+					    u16 logical_blk,
+					    struct ssdfs_offset_position *pos);
+struct ssdfs_phys_offset_descriptor *
+ssdfs_blk2off_table_convert(struct ssdfs_blk2off_table *table,
+			    u16 logical_blk, u16 *peb_index,
+			    int *migration_state,
+			    struct ssdfs_offset_position *pos);
+int ssdfs_blk2off_table_allocate_block(struct ssdfs_blk2off_table *table,
+					u32 max_blk,
+					u16 *logical_blk);
+int ssdfs_blk2off_table_allocate_extent(struct ssdfs_blk2off_table *table,
+					u16 len, u32 max_blk,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_blk2off_table_change_offset(struct ssdfs_blk2off_table *table,
+				      u16 logical_blk,
+				      u16 peb_index,
+				      u64 peb_id,
+				      struct ssdfs_block_descriptor *blk_desc,
+				      struct ssdfs_phys_offset_descriptor *off);
+int ssdfs_blk2off_table_free_block(struct ssdfs_blk2off_table *table,
+				   u16 peb_index, u64 peb_id, u16 logical_blk);
+int ssdfs_blk2off_table_free_extent(struct ssdfs_blk2off_table *table,
+				    u16 peb_index, u64 peb_id,
+				    struct ssdfs_blk2off_range *extent);
+bool is_ssdfs_blk2off_table_extent_modified(struct ssdfs_blk2off_table *table,
+					    u16 peb_index,
+					    struct ssdfs_blk2off_range *extent);
+
+int ssdfs_blk2off_table_get_block_migration(struct ssdfs_blk2off_table *table,
+					    u16 logical_blk,
+					    u16 peb_index);
+int ssdfs_blk2off_table_set_block_migration(struct ssdfs_blk2off_table *table,
+					    u16 logical_blk,
+					    u16 peb_index,
+					    struct ssdfs_segment_request *req);
+int ssdfs_blk2off_table_get_block_state(struct ssdfs_blk2off_table *table,
+					struct ssdfs_segment_request *req);
+int ssdfs_blk2off_table_update_block_state(struct ssdfs_blk2off_table *table,
+					   struct ssdfs_segment_request *req);
+int ssdfs_blk2off_table_set_block_commit(struct ssdfs_blk2off_table *table,
+					 u16 logical_blk,
+					 u16 peb_index);
+int ssdfs_blk2off_table_revert_migration_state(struct ssdfs_blk2off_table *tbl,
+						u16 peb_index);
+
+#ifdef CONFIG_SSDFS_TESTING
+int ssdfs_blk2off_table_fragment_set_clean(struct ssdfs_blk2off_table *table,
+					   u16 peb_index, u16 sequence_id);
+#else
+static inline
+int ssdfs_blk2off_table_fragment_set_clean(struct ssdfs_blk2off_table *table,
+					   u16 peb_index, u16 sequence_id)
+{
+	SSDFS_ERR("set fragment clean is not supported\n");
+	return -EOPNOTSUPP;
+}
+#endif /* CONFIG_SSDFS_TESTING */
+
+#endif /* _SSDFS_OFFSET_TRANSLATION_TABLE_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 24/79] ssdfs: introduce PEB object
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (10 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 22/79] ssdfs: introduce offset translation table Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 27/79] ssdfs: introduce PEB container Viacheslav Dubeyko
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

SSDFS splits a partition/volume on sequence of fixed-sized
segments. Every segment can include one or several Logical
Erase Blocks (LEB). LEB can be mapped into "Physical" Erase
Block (PEB). PEB represent concept of erase block or zone
that can be allocated, be filled by logs, and be erased.
PEB object keeps knowledge about PEB ID, index in segment,
and current log details.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/peb.h      | 600 ++++++++++++++++++++++++++++++++++++++++++++
 fs/ssdfs/peb_init.h | 364 +++++++++++++++++++++++++++
 2 files changed, 964 insertions(+)
 create mode 100644 fs/ssdfs/peb.h
 create mode 100644 fs/ssdfs/peb_init.h

diff --git a/fs/ssdfs/peb.h b/fs/ssdfs/peb.h
new file mode 100644
index 000000000000..5f1d81683e8d
--- /dev/null
+++ b/fs/ssdfs/peb.h
@@ -0,0 +1,600 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/peb.h - Physical Erase Block (PEB) object declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_PEB_H
+#define _SSDFS_PEB_H
+
+#include "request_queue.h"
+#include "fingerprint_array.h"
+#include "peb_init.h"
+
+/*
+ * struct ssdfs_protection_window - protection window length
+ * @cno_lock: lock of checkpoints set
+ * @create_cno: creation checkpoint
+ * @last_request_cno: last request checkpoint
+ * @reqs_count: current number of active requests
+ * @protected_range: last measured protected range length
+ * @future_request_cno: expectation to receive a next request in the future
+ */
+struct ssdfs_protection_window {
+	spinlock_t cno_lock;
+	u64 create_cno;
+	u64 last_request_cno;
+	u32 reqs_count;
+	u64 protected_range;
+	u64 future_request_cno;
+};
+
+/*
+ * struct ssdfs_peb_diffs_area_metadata - diffs area's metadata
+ * @hdr: diffs area's table header
+ */
+struct ssdfs_peb_diffs_area_metadata {
+	struct ssdfs_block_state_descriptor hdr;
+};
+
+/*
+ * struct ssdfs_peb_journal_area_metadata - journal area's metadata
+ * @hdr: journal area's table header
+ */
+struct ssdfs_peb_journal_area_metadata {
+	struct ssdfs_block_state_descriptor hdr;
+};
+
+/*
+ * struct ssdfs_peb_area_metadata - descriptor of area's items chain
+ * @area.blk_desc.table: block descriptors area table
+ * @area.blk_desc.flush_buf: write block descriptors buffer (compression case)
+ * @area.blk_desc.capacity: max number of block descriptors in reserved space
+ * @area.blk_desc.items_count: number of items in the whole table
+ * @area.diffs.table: diffs area's table
+ * @area.journal.table: journal area's table
+ * @area.main.desc: main area's descriptor
+ * @reserved_offset: reserved write offset of table
+ * @sequence_id: fragment's sequence number
+ */
+struct ssdfs_peb_area_metadata {
+	union {
+		struct {
+			struct ssdfs_area_block_table table;
+			struct ssdfs_peb_temp_buffer flush_buf;
+			int capacity;
+			int items_count;
+		} blk_desc;
+
+		struct {
+			struct ssdfs_peb_diffs_area_metadata table;
+		} diffs;
+
+		struct {
+			struct ssdfs_peb_journal_area_metadata table;
+		} journal;
+
+		struct {
+			struct ssdfs_block_state_descriptor desc;
+		} main;
+	} area;
+
+	u32 reserved_offset;
+	u8 sequence_id;
+};
+
+/*
+ * struct ssdfs_peb_area - log's area descriptor
+ * @has_metadata: does area contain metadata?
+ * @metadata: descriptor of area's items chain
+ * @write_offset: current write offset
+ * @compressed_offset: current write offset for compressed data
+ * @frag_offset: offset of current fragment
+ * @array: area's memory folios
+ */
+struct ssdfs_peb_area {
+	bool has_metadata;
+	struct ssdfs_peb_area_metadata metadata;
+
+	u32 write_offset;
+	u32 compressed_offset;
+	u32 frag_offset;
+	struct ssdfs_folio_array array;
+};
+
+/*
+ * struct ssdfs_blk2off_table_area - blk2off table descriptor
+ * @hdr: offset descriptors area table's header
+ * @reserved_offset: reserved header offset
+ * @compressed_offset: current write offset for compressed data
+ * @sequence_id: fragment's sequence number
+ */
+struct ssdfs_blk2off_table_area {
+	struct ssdfs_blk2off_table_header hdr;
+
+	u32 reserved_offset;
+	u32 compressed_offset;
+	u8 sequence_id;
+};
+
+/* Log possible states */
+enum {
+	SSDFS_LOG_UNKNOWN,
+	SSDFS_LOG_PREPARED,
+	SSDFS_LOG_INITIALIZED,
+	SSDFS_LOG_CREATED,
+	SSDFS_LOG_COMMITTED,
+	SSDFS_LOG_STATE_MAX,
+};
+
+/*
+ * struct ssdfs_peb_prev_log - previous log's details
+ * @bmap_bytes: bytes count in block bitmap of previous log
+ * @blk2off_bytes: bytes count in blk2off table of previous log
+ * @blk_desc_bytes: bytes count in blk desc table of previous log
+ */
+struct ssdfs_peb_prev_log {
+	u32 bmap_bytes;
+	u32 blk2off_bytes;
+	u32 blk_desc_bytes;
+};
+
+/*
+ * struct ssdfs_peb_log - current log
+ * @lock: exclusive lock of current log
+ * @state: current log's state
+ * @sequence_id: index of partial log in the sequence
+ * @start_block: current log's start block index
+ * @reserved_blocks: metadata blocks in the log
+ * @free_data_blocks: free data blocks capacity
+ * @seg_flags: segment header's flags for the log
+ * @prev_log: previous log's details
+ * @last_log_time: creation timestamp of last log
+ * @last_log_cno: last log checkpoint
+ * @bmap_snapshot: snapshot of block bitmap
+ * @blk2off_tbl: blk2off table descriptor
+ * @area: log's areas (main, diff updates, journal)
+ */
+struct ssdfs_peb_log {
+	struct mutex lock;
+	atomic_t state;
+	atomic_t sequence_id;
+	u32 start_block;
+	u32 reserved_blocks; /* metadata blocks in the log */
+	u32 free_data_blocks; /* free data blocks capacity */
+	u32 seg_flags;
+	struct ssdfs_peb_prev_log prev_log;
+	u64 last_log_time;
+	u64 last_log_cno;
+	struct ssdfs_folio_vector bmap_snapshot;
+	struct ssdfs_blk2off_table_area blk2off_tbl;
+	struct ssdfs_peb_area area[SSDFS_LOG_AREA_MAX];
+};
+
+/*
+ * struct ssdfs_peb_log_offset - current log offset
+ * @blocksize_shift: log2(block size)
+ * @log_blocks: count of blocks in full partial log
+ * @start_block: current log's start block index
+ * @cur_block: current block in the log
+ * @offset_into_block: current offset into block
+ */
+struct ssdfs_peb_log_offset {
+	u32 blocksize_shift;
+	u32 log_blocks;
+	pgoff_t start_block;
+	pgoff_t cur_block;
+	u32 offset_into_block;
+};
+
+/*
+ * struct ssdfs_peb_deduplication - PEB deduplication environment
+ * @shash_tfm: message digest handle
+ * @fingerprints: fingeprints array
+ */
+struct ssdfs_peb_deduplication {
+	struct crypto_shash *shash_tfm;
+	struct ssdfs_fingerprint_array fingerprints;
+};
+
+/*
+ * struct ssdfs_peb_info - Physical Erase Block (PEB) description
+ * @peb_id: PEB number
+ * @peb_index: PEB index
+ * @log_blocks: count of blocks in full partial log
+ * @peb_create_time: PEB creation timestamp
+ * @peb_migration_id: identification number of PEB in migration sequence
+ * @state: PEB object state
+ * @init_end: wait of full init ending
+ * @peb_state: current PEB state
+ * @reserved_bytes.blk_bmap: reserved bytes for block bitmap
+ * @reserved_bytes.blk2off_tbl: reserved bytes for blk2off table
+ * @reserved_bytes.blk_desc_tbl: reserved bytes for block descriptor table
+ * @current_log: PEB's current log
+ * @dedup: PEB's deduplication environment
+ * @read_buffer: temporary read buffers (compression case)
+ * @env: init environment
+ * @cache: PEB's memory folios
+ * @pebc: pointer on parent container
+ */
+struct ssdfs_peb_info {
+	/* Static data */
+	u64 peb_id;
+	u16 peb_index;
+	u32 log_blocks;
+
+	u64 peb_create_time;
+
+	/*
+	 * The peb_migration_id is stored in two places:
+	 * (1) struct ssdfs_segment_header;
+	 * (2) struct ssdfs_blk_state_offset.
+	 *
+	 * The goal of peb_migration_id is to distinguish PEB
+	 * objects during PEB object's migration. Every
+	 * destination PEB is received the migration_id that
+	 * is incremented migration_id value of source PEB
+	 * object. If peb_migration_id is achieved value of
+	 * SSDFS_PEB_MIGRATION_ID_MAX then peb_migration_id
+	 * is started from SSDFS_PEB_MIGRATION_ID_START again.
+	 *
+	 * A PEB object is received the peb_migration_id value
+	 * during the PEB object creation operation. The "clean"
+	 * PEB object receives SSDFS_PEB_MIGRATION_ID_START
+	 * value. The destinaton PEB object receives incremented
+	 * peb_migration_id value of source PEB object during
+	 * creation operation. Otherwise, the real peb_migration_id
+	 * value is set during PEB's initialization
+	 * by means of extracting the actual value from segment
+	 * header.
+	 */
+	atomic_t peb_migration_id;
+
+	atomic_t state;
+	struct completion init_end;
+
+	atomic_t peb_state;
+
+	/* Reserved bytes */
+	struct {
+		atomic_t blk_bmap;
+		atomic_t blk2off_tbl;
+		atomic_t blk_desc_tbl;
+	} reserved_bytes;
+
+	/* Current log */
+	struct ssdfs_peb_log current_log;
+
+	/* Fingerprints array */
+#ifdef CONFIG_SSDFS_PEB_DEDUPLICATION
+	struct ssdfs_peb_deduplication dedup;
+#endif /* CONFIG_SSDFS_PEB_DEDUPLICATION */
+
+	/* Read buffer */
+	struct ssdfs_peb_temp_read_buffers read_buffer;
+
+	/* Init environment */
+	struct ssdfs_read_init_env env;
+
+	/* PEB's memory folios */
+	struct ssdfs_folio_array cache;
+
+	/* Parent container */
+	struct ssdfs_peb_container *pebc;
+};
+
+/* PEB object states */
+enum {
+	SSDFS_PEB_OBJECT_UNKNOWN_STATE,
+	SSDFS_PEB_OBJECT_CREATED,
+	SSDFS_PEB_OBJECT_INITIALIZED,
+	SSDFS_PEB_OBJECT_STATE_MAX
+};
+
+#define SSDFS_AREA_TYPE2INDEX(type)({ \
+	int index; \
+	switch (type) { \
+	case SSDFS_LOG_BLK_DESC_AREA: \
+		index = SSDFS_BLK_DESC_AREA_INDEX; \
+		break; \
+	case SSDFS_LOG_MAIN_AREA: \
+		index = SSDFS_COLD_PAYLOAD_AREA_INDEX; \
+		break; \
+	case SSDFS_LOG_DIFFS_AREA: \
+		index = SSDFS_WARM_PAYLOAD_AREA_INDEX; \
+		break; \
+	case SSDFS_LOG_JOURNAL_AREA: \
+		index = SSDFS_HOT_PAYLOAD_AREA_INDEX; \
+		break; \
+	default: \
+		BUG(); \
+	}; \
+	index; \
+})
+
+#define SSDFS_AREA_TYPE2FLAG(type)({ \
+	int flag; \
+	switch (type) { \
+	case SSDFS_LOG_BLK_DESC_AREA: \
+		flag = SSDFS_LOG_HAS_BLK_DESC_CHAIN; \
+		break; \
+	case SSDFS_LOG_MAIN_AREA: \
+		flag = SSDFS_LOG_HAS_COLD_PAYLOAD; \
+		break; \
+	case SSDFS_LOG_DIFFS_AREA: \
+		flag = SSDFS_LOG_HAS_WARM_PAYLOAD; \
+		break; \
+	case SSDFS_LOG_JOURNAL_AREA: \
+		flag = SSDFS_LOG_HAS_HOT_PAYLOAD; \
+		break; \
+	default: \
+		BUG(); \
+	}; \
+	flag; \
+})
+
+/*
+ * Inline functions
+ */
+
+/*
+ * ssdfs_peb_current_log_state() - check current log's state
+ * @pebi: pointer on PEB object
+ * @state: checked state
+ */
+static inline
+bool ssdfs_peb_current_log_state(struct ssdfs_peb_info *pebi,
+				 int state)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pebi);
+	BUG_ON(state < SSDFS_LOG_UNKNOWN || state >= SSDFS_LOG_STATE_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return atomic_read(&pebi->current_log.state) >= state;
+}
+
+/*
+ * ssdfs_peb_set_current_log_state() - set current log's state
+ * @pebi: pointer on PEB object
+ * @state: new log's state
+ */
+static inline
+void ssdfs_peb_set_current_log_state(struct ssdfs_peb_info *pebi,
+				     int state)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pebi);
+	BUG_ON(state < SSDFS_LOG_UNKNOWN || state >= SSDFS_LOG_STATE_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return atomic_set(&pebi->current_log.state, state);
+}
+
+/*
+ * IS_SSDFS_BLK_STATE_OFFSET_INVALID() - check that block state offset invalid
+ * @desc: block state offset
+ */
+static inline
+bool IS_SSDFS_BLK_STATE_OFFSET_INVALID(struct ssdfs_blk_state_offset *desc)
+{
+	if (!desc)
+		return true;
+
+	if (le16_to_cpu(desc->log_start_page) == U16_MAX &&
+	    desc->log_area == U8_MAX &&
+	    desc->peb_migration_id == U8_MAX &&
+	    le32_to_cpu(desc->byte_offset) == U32_MAX) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("log_start_block %u, log_area %u, "
+			  "peb_migration_id %u, byte_offset %u\n",
+			  le16_to_cpu(desc->log_start_page),
+			  desc->log_area,
+			  desc->peb_migration_id,
+			  le32_to_cpu(desc->byte_offset));
+#endif /* CONFIG_SSDFS_DEBUG */
+		return true;
+	}
+
+	if (desc->peb_migration_id == SSDFS_PEB_UNKNOWN_MIGRATION_ID) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("log_start_block %u, log_area %u, "
+			  "peb_migration_id %u, byte_offset %u\n",
+			  le16_to_cpu(desc->log_start_page),
+			  desc->log_area,
+			  desc->peb_migration_id,
+			  le32_to_cpu(desc->byte_offset));
+#endif /* CONFIG_SSDFS_DEBUG */
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * SSDFS_BLK_DESC_INIT() - init block descriptor
+ * @blk_desc: block descriptor
+ */
+static inline
+void SSDFS_BLK_DESC_INIT(struct ssdfs_block_descriptor *blk_desc)
+{
+	if (!blk_desc) {
+		SSDFS_WARN("block descriptor pointer is NULL\n");
+		return;
+	}
+
+	memset(blk_desc, 0xFF, sizeof(struct ssdfs_block_descriptor));
+}
+
+/*
+ * IS_SSDFS_BLK_DESC_EXHAUSTED() - check that block descriptor is exhausted
+ * @blk_desc: block descriptor
+ */
+static inline
+bool IS_SSDFS_BLK_DESC_EXHAUSTED(struct ssdfs_block_descriptor *blk_desc)
+{
+	struct ssdfs_blk_state_offset *offset = NULL;
+
+	if (!blk_desc)
+		return true;
+
+	offset = &blk_desc->state[SSDFS_BLK_STATE_OFF_MAX - 1];
+
+	if (!IS_SSDFS_BLK_STATE_OFFSET_INVALID(offset))
+		return true;
+
+	return false;
+}
+
+static inline
+bool IS_SSDFS_BLK_DESC_READY_FOR_DIFF(struct ssdfs_block_descriptor *blk_desc)
+{
+	return !IS_SSDFS_BLK_STATE_OFFSET_INVALID(&blk_desc->state[0]);
+}
+
+static inline
+u8 SSDFS_GET_BLK_DESC_MIGRATION_ID(struct ssdfs_block_descriptor *blk_desc)
+{
+	if (IS_SSDFS_BLK_STATE_OFFSET_INVALID(&blk_desc->state[0]))
+		return U8_MAX;
+
+	return blk_desc->state[0].peb_migration_id;
+}
+
+static inline
+void DEBUG_BLOCK_DESCRIPTOR(u64 seg_id, u64 peb_id,
+			    struct ssdfs_block_descriptor *blk_desc)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	int i;
+
+	SSDFS_DBG("seg_id %llu, peb_id %llu, ino %llu, "
+		  "logical_offset %u, peb_index %u, peb_page %u\n",
+		  seg_id, peb_id,
+		  le64_to_cpu(blk_desc->ino),
+		  le32_to_cpu(blk_desc->logical_offset),
+		  le16_to_cpu(blk_desc->peb_index),
+		  le16_to_cpu(blk_desc->peb_page));
+
+	for (i = 0; i < SSDFS_BLK_STATE_OFF_MAX; i++) {
+		SSDFS_DBG("BLK STATE OFFSET %d: "
+			  "log_start_page %u, log_area %#x, "
+			  "byte_offset %u, peb_migration_id %u\n",
+			  i,
+			  le16_to_cpu(blk_desc->state[i].log_start_page),
+			  blk_desc->state[i].log_area,
+			  le32_to_cpu(blk_desc->state[i].byte_offset),
+			  blk_desc->state[i].peb_migration_id);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+/*
+ * PEB object's API
+ */
+int ssdfs_peb_object_create(struct ssdfs_peb_info *pebi,
+			    struct ssdfs_peb_container *pebc,
+			    u64 peb_id, int peb_state,
+			    u8 peb_migration_id);
+int ssdfs_peb_object_destroy(struct ssdfs_peb_info *pebi);
+void ssdfs_peb_current_log_init(struct ssdfs_peb_info *pebi,
+				u32 free_blocks,
+				u32 start_block,
+				int sequence_id,
+				struct ssdfs_peb_prev_log *prev_log);
+u64 ssdfs_get_leb_id_for_peb_index(struct ssdfs_fs_info *fsi,
+				   u64 seg, u32 peb_index);
+u64 ssdfs_get_seg_id_for_leb_id(struct ssdfs_fs_info *fsi,
+				u64 leb_id);
+int ssdfs_get_peb_migration_id(struct ssdfs_peb_info *pebi);
+bool is_peb_migration_id_valid(int peb_migration_id);
+int ssdfs_get_peb_migration_id_checked(struct ssdfs_peb_info *pebi);
+void ssdfs_set_peb_migration_id(struct ssdfs_peb_info *pebi,
+				int id);
+int __ssdfs_define_next_peb_migration_id(int prev_id);
+int ssdfs_define_next_peb_migration_id(struct ssdfs_peb_info *src_peb);
+int ssdfs_define_prev_peb_migration_id(struct ssdfs_peb_info *pebi);
+
+#ifdef CONFIG_SSDFS_PEB_DEDUPLICATION
+bool is_ssdfs_block_duplicated(struct ssdfs_peb_info *pebi,
+				struct ssdfs_segment_request *req,
+				struct ssdfs_fingerprint_pair *pair);
+int ssdfs_peb_deduplicate_logical_block(struct ssdfs_peb_info *pebi,
+					struct ssdfs_segment_request *req,
+					struct ssdfs_fingerprint_pair *pair,
+					struct ssdfs_block_descriptor *blk_desc);
+bool should_ssdfs_save_fingerprint(struct ssdfs_peb_info *pebi,
+				   struct ssdfs_segment_request *req);
+int ssdfs_peb_save_fingerprint(struct ssdfs_peb_info *pebi,
+				struct ssdfs_segment_request *req,
+				struct ssdfs_block_descriptor *blk_desc,
+				struct ssdfs_fingerprint_pair *pair);
+#else
+static inline
+bool is_ssdfs_block_duplicated(struct ssdfs_peb_info *pebi,
+				struct ssdfs_segment_request *req,
+				struct ssdfs_fingerprint_pair *pair)
+{
+	return false;
+}
+static inline
+int ssdfs_peb_deduplicate_logical_block(struct ssdfs_peb_info *pebi,
+					struct ssdfs_segment_request *req,
+					struct ssdfs_fingerprint_pair *pair,
+					struct ssdfs_block_descriptor *blk_desc)
+{
+	return -EOPNOTSUPP;
+}
+static inline
+bool should_ssdfs_save_fingerprint(struct ssdfs_peb_info *pebi,
+				   struct ssdfs_segment_request *req)
+{
+	return false;
+}
+static inline
+int ssdfs_peb_save_fingerprint(struct ssdfs_peb_info *pebi,
+				struct ssdfs_segment_request *req,
+				struct ssdfs_block_descriptor *blk_desc,
+				struct ssdfs_fingerprint_pair *pair)
+{
+	return -EOPNOTSUPP;
+}
+#endif /* CONFIG_SSDFS_PEB_DEDUPLICATION */
+
+/*
+ * PEB internal functions declaration
+ */
+int ssdfs_unaligned_read_cache(struct ssdfs_peb_info *pebi,
+				struct ssdfs_segment_request *req,
+				u32 area_offset, u32 area_size,
+				void *buf);
+int ssdfs_peb_read_log_hdr_desc_array(struct ssdfs_peb_info *pebi,
+				      struct ssdfs_segment_request *req,
+				      u16 log_start_block,
+				      struct ssdfs_metadata_descriptor *array,
+				      size_t array_size);
+u16 ssdfs_peb_estimate_min_partial_log_pages(struct ssdfs_peb_info *pebi);
+u32 ssdfs_request_rest_bytes(struct ssdfs_peb_info *pebi,
+			     struct ssdfs_segment_request *req);
+bool is_ssdfs_peb_exhausted(struct ssdfs_fs_info *fsi,
+			    struct ssdfs_peb_info *pebi);
+bool is_ssdfs_peb_ready_to_exhaust(struct ssdfs_fs_info *fsi,
+				   struct ssdfs_peb_info *pebi);
+
+#endif /* _SSDFS_PEB_H */
diff --git a/fs/ssdfs/peb_init.h b/fs/ssdfs/peb_init.h
new file mode 100644
index 000000000000..658f5f73e6bc
--- /dev/null
+++ b/fs/ssdfs/peb_init.h
@@ -0,0 +1,364 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/peb_init.h - PEB init structures' declarations.
+ *
+ * Copyright (c) 2024-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_PEB_INIT_H
+#define _SSDFS_PEB_INIT_H
+
+/*
+ * struct ssdfs_contigous_bytes - contigous sequence of bytes
+ * @offset: offset of sequence in bytes
+ * @size: length of sequence in bytes
+ */
+struct ssdfs_contigous_bytes {
+	u32 offset;
+	u32 size;
+};
+
+/*
+ * struct ssdfs_compressed_area - compressed area descriptor
+ * @compressed: descriptor of compressed byte stream
+ * @meta_desc: copy of metadata area's descriptor
+ */
+struct ssdfs_compressed_area {
+	struct ssdfs_contigous_bytes compressed;
+	struct ssdfs_metadata_descriptor meta_desc;
+
+};
+
+/*
+ * struct ssdfs_compressed_portion - compressed portion descriptor
+ * @area: descriptor of area that contains portion
+ * @header_size: size of the portion header
+ * @compressed: descriptor of compressed state of portion
+ * @uncompressed: descriptor of decompressed state of portion
+ */
+struct ssdfs_compressed_portion {
+	struct ssdfs_compressed_area area;
+
+	size_t header_size;
+
+	struct ssdfs_contigous_bytes compressed;
+	struct ssdfs_contigous_bytes uncompressed;
+};
+
+/*
+ * struct ssdfs_compressed_fragment - compressed fragment descriptor
+ * @portion: portion descriptor that contains fragment
+ * @compressed: descriptor of compressed state of fragment
+ * @uncompressed: descriptor of decompressed state of fragment
+ * @frag_desc: fragment descriptor
+ */
+struct ssdfs_compressed_fragment {
+	struct ssdfs_compressed_portion portion;
+
+	struct ssdfs_contigous_bytes compressed;
+	struct ssdfs_contigous_bytes uncompressed;
+
+	struct ssdfs_fragment_desc frag_desc;
+};
+
+/*
+ * struct ssdfs_fragment_raw_iterator - raw fragment iterator
+ * @frag_desc: fragment descriptor
+ * @offset: current offset
+ * @bytes_count: total number of bytes
+ * @processed_bytes: number of processed bytes
+ * @fragments_count: total number of fragments
+ * @processed_fragments: number of preocessed fragments
+ */
+struct ssdfs_fragment_raw_iterator {
+	struct ssdfs_compressed_fragment fragment_desc;
+
+	u32 offset;
+	u32 bytes_count;
+	u32 processed_bytes;
+
+	u32 fragments_count;
+	u32 processed_fragments;
+};
+
+/*
+ * struct ssdfs_raw_iterator - raw stream iterator
+ * @start_offset: start offset in stream
+ * @current_offset: current offset in stream
+ * @bytes_count: total size of content in bytes
+ */
+struct ssdfs_raw_iterator {
+	u32 start_offset;
+	u32 current_offset;
+	u32 bytes_count;
+};
+
+/*
+ * struct ssdfs_content_stream - content stream
+ * @batch: folio vector with content's byte stream
+ * @write_iter: write iterator
+ */
+struct ssdfs_content_stream {
+	struct ssdfs_folio_vector batch;
+//	struct ssdfs_raw_iterator write_iter;
+
+	u32 write_off;
+	u32 bytes_count;
+};
+
+#define SSDFS_BLKBMAP_FRAG_HDR_CAPACITY \
+	(sizeof(struct ssdfs_block_bitmap_fragment) + \
+	 (sizeof(struct ssdfs_fragment_desc) * \
+	  SSDFS_BLK_BMAP_FRAGMENTS_CHAIN_MAX))
+
+#define SSDFS_BLKBMAP_HDR_CAPACITY \
+	(sizeof(struct ssdfs_block_bitmap_header) + \
+	 SSDFS_BLKBMAP_FRAG_HDR_CAPACITY)
+
+/*
+ * struct ssdfs_blk_bmap_init_env - block bitmap init environment
+ * @raw.content: folio vector that stores block bitmap content
+ * @raw.metadata: block bitmap fragment's metadata buffer
+ * @header.ptr: pointer on block bitmap header
+ * @fragment.index: index of block bitmap's fragment
+ * @fragment.header: block bitmap fragment's header
+ * @read_bytes: counter of all read bytes
+ */
+struct ssdfs_blk_bmap_init_env {
+	struct {
+		struct ssdfs_folio_vector content;
+		u8 metadata[SSDFS_BLKBMAP_HDR_CAPACITY];
+	} raw;
+
+	struct {
+		struct ssdfs_block_bitmap_header *ptr;
+	} header;
+
+	struct {
+		int index;
+		struct ssdfs_block_bitmap_fragment *header;
+	} fragment;
+
+	u32 read_bytes;
+};
+
+/*
+ * struct ssdfs_blk2off_table_init_env - blk2off table init environment
+ * @extents.stream: translation extents sequence
+ * @extents.count: count of extents in sequence
+ * @portion.header: blk2off table header
+ * @portion.fragments.stream: phys offset descriptors sequence
+ * @portion.read_iter: read iterator in portion
+ * @portion.area_offset: offset to the blk2off area
+ * @portion.read_off: current read offset
+ */
+struct ssdfs_blk2off_table_init_env {
+	struct {
+		struct ssdfs_content_stream stream;
+		u32 count;
+	} extents;
+
+	struct {
+		struct ssdfs_blk2off_table_header header;
+
+		struct {
+			struct ssdfs_content_stream stream;
+		} fragments;
+
+//		struct ssdfs_fragment_raw_iterator read_iter;
+		u32 area_offset;
+		u32 read_off;
+	} portion;
+};
+
+/*
+ * struct ssdfs_blk_desc_table_init_env - blk desc table init environment
+ * @portion.header: blk desc table header
+ * @portion.raw.content: pagevec with blk desc table fragment
+ * @portion.read_iter: read iterator in portion
+ * @portion.area_offset: offset to the blk2off area
+ * @portion.read_off: current read offset
+ * @portion.write_off: current write offset
+ */
+struct ssdfs_blk_desc_table_init_env {
+	struct {
+		struct ssdfs_area_block_table header;
+
+		struct {
+			struct ssdfs_folio_vector content;
+		} raw;
+
+//		struct ssdfs_fragment_raw_iterator read_iter;
+		u32 area_offset;
+		u32 read_off;
+		u32 write_off;
+	} portion;
+};
+
+/*
+ * struct ssdfs_read_init_env - read operation init environment
+ * @peb.cur_migration_id: current PEB's migration ID
+ * @peb.prev_migration_id: previous PEB's migration ID
+ * @peb.free_pages: PEB's free pages
+ * @log.offset: offset in pages of the requested log
+ * @log.blocks: blocks count in every log of segment
+ * @log.bytes: number of bytes in the requested log
+ * @log.header.ptr: log header
+ * @log.header.of_full_log: is it full log header (segment header)?
+ * @log.footer.ptr: log footer
+ * @log.footer.is_present: does log have footer?
+ * @log.bmap: block bitmap init environment
+ * @log.off_tbl: blk2off table init environment
+ * @log.desc_tbl: blk desc table init environment
+ */
+struct ssdfs_read_init_env {
+	struct {
+		int cur_migration_id;
+		int prev_migration_id;
+		int free_pages;
+	} peb;
+
+	struct {
+		u32 offset;
+		u32 blocks;
+		u32 bytes;
+
+		struct {
+			void *ptr;
+			bool of_full_log;
+		} header;
+
+		struct {
+			struct ssdfs_log_footer *ptr;
+			bool is_present;
+		} footer;
+
+		struct ssdfs_blk_bmap_init_env blk_bmap;
+		struct ssdfs_blk2off_table_init_env blk2off_tbl;
+		struct ssdfs_blk_desc_table_init_env blk_desc_tbl;
+	} log;
+};
+
+/*
+ * struct ssdfs_peb_read_buffer - read buffer
+ * @ptr: pointer on buffer
+ * @buf_size: buffer size in bytes
+ * @frag_desc: fragment descriptor
+ */
+struct ssdfs_peb_read_buffer {
+	void *ptr;
+	size_t buf_size;
+
+	struct ssdfs_compressed_fragment frag_desc;
+};
+
+/*
+ * struct ssdfs_peb_temp_read_buffers - read temporary buffers
+ * @lock: temporary buffers lock
+ * @blk_desc: block descriptor table's temp read buffer
+ */
+struct ssdfs_peb_temp_read_buffers {
+	struct rw_semaphore lock;
+	struct ssdfs_peb_read_buffer blk_desc;
+};
+
+/*
+ * struct ssdfs_peb_temp_buffer - temporary (write) buffer
+ * @ptr: pointer on buffer
+ * @write_offset: current write offset into buffer
+ * @granularity: size of one item in bytes
+ * @size: buffer size in bytes
+ */
+struct ssdfs_peb_temp_buffer {
+	void *ptr;
+	u32 write_offset;
+	size_t granularity;
+	size_t size;
+};
+
+struct ssdfs_peb_log_offset;
+
+/*
+ * PEB object init API
+ */
+void ssdfs_create_content_stream(struct ssdfs_content_stream *stream,
+				 u32 capacity);
+void ssdfs_reinit_content_stream(struct ssdfs_content_stream *stream);
+void ssdfs_destroy_content_stream(struct ssdfs_content_stream *stream);
+
+bool IS_SSDFS_CONTIGOUS_BYTES_DESC_INVALID(struct ssdfs_contigous_bytes *desc);
+void SSDFS_INIT_CONTIGOUS_BYTES_DESC(struct ssdfs_contigous_bytes *desc,
+				     u32 offset, u32 size);
+u32 SSDFS_AREA_COMPRESSED_OFFSET(struct ssdfs_compressed_area *area);
+u32 SSDFS_AREA_COMPRESSED_SIZE(struct ssdfs_compressed_area *area);
+bool IS_SSDFS_COMPRESSED_AREA_DESC_INVALID(struct ssdfs_compressed_area *desc);
+void SSDFS_INIT_COMPRESSED_AREA_DESC(struct ssdfs_compressed_area *desc,
+				     struct ssdfs_metadata_descriptor *meta_desc);
+u64 SSDFS_COMPRESSED_AREA_UPPER_BOUND(struct ssdfs_compressed_area *desc);
+
+bool IS_SSDFS_COMPRESSED_PORTION_INVALID(struct ssdfs_compressed_portion *desc);
+u32 SSDFS_PORTION_COMPRESSED_OFFSET(struct ssdfs_compressed_portion *portion);
+u32 SSDFS_PORTION_UNCOMPRESSED_OFFSET(struct ssdfs_compressed_portion *portion);
+bool IS_SSDFS_COMPRESSED_PORTION_IN_AREA(struct ssdfs_compressed_portion *desc);
+void SSDFS_INIT_COMPRESSED_PORTION_DESC(struct ssdfs_compressed_portion *desc,
+					struct ssdfs_metadata_descriptor *meta,
+					struct ssdfs_fragments_chain_header *hdr,
+					size_t header_size);
+int SSDFS_ADD_COMPRESSED_PORTION(struct ssdfs_compressed_portion *desc,
+				 struct ssdfs_fragments_chain_header *hdr);
+bool IS_OFFSET_INSIDE_UNCOMPRESSED_PORTION(struct ssdfs_compressed_portion *desc,
+					   u32 offset);
+u64 SSDFS_COMPRESSED_PORTION_UPPER_BOUND(struct ssdfs_compressed_portion *desc);
+u64 SSDFS_UNCOMPRESSED_PORTION_UPPER_BOUND(struct ssdfs_compressed_portion *desc);
+
+bool IS_SSDFS_COMPRESSED_FRAGMENT_INVALID(struct ssdfs_compressed_fragment *desc);
+bool
+IS_SSDFS_COMPRESSED_FRAGMENT_IN_PORTION(struct ssdfs_compressed_fragment *desc);
+int SSDFS_INIT_COMPRESSED_FRAGMENT_DESC(struct ssdfs_compressed_fragment *desc,
+					 struct ssdfs_fragment_desc *frag);
+int SSDFS_ADD_COMPRESSED_FRAGMENT(struct ssdfs_compressed_fragment *desc,
+				  struct ssdfs_fragment_desc *frag);
+u32 SSDFS_FRAGMENT_COMPRESSED_OFFSET(struct ssdfs_compressed_fragment *desc);
+u32 SSDFS_FRAGMENT_UNCOMPRESSED_OFFSET(struct ssdfs_compressed_fragment *desc);
+bool
+IS_OFFSET_INSIDE_UNCOMPRESSED_FRAGMENT(struct ssdfs_compressed_fragment *desc,
+					u32 offset);
+
+bool IS_SSDFS_FRAG_RAW_ITER_INVALID(struct ssdfs_fragment_raw_iterator *iter);
+void SSDFS_FRAG_RAW_ITER_CREATE(struct ssdfs_fragment_raw_iterator *iter);
+void SSDFS_FRAG_RAW_ITER_INIT(struct ssdfs_fragment_raw_iterator *iter,
+			      u32 offset, u32 bytes_count, u32 fragments_count);
+int SSDFS_FRAG_RAW_ITER_ADD_FRAGMENT(struct ssdfs_fragment_raw_iterator *iter,
+				     struct ssdfs_fragment_desc *frag);
+int SSDFS_FRAG_RAW_ITER_SHIFT_OFFSET(struct ssdfs_fragment_raw_iterator *iter,
+				     u32 shift);
+bool IS_SSDFS_FRAG_RAW_ITER_ENDED(struct ssdfs_fragment_raw_iterator *iter);
+
+void SSDFS_LOG_OFFSET_INIT(struct ssdfs_peb_log_offset *log,
+			   u32 block_size,
+			   u32 log_blocks,
+			   pgoff_t start_block);
+bool IS_SSDFS_LOG_OFFSET_VALID(struct ssdfs_peb_log_offset *log);
+u64 SSDFS_ABSOLUTE_LOG_OFFSET(struct ssdfs_peb_log_offset *log);
+u32 SSDFS_LOCAL_LOG_OFFSET(struct ssdfs_peb_log_offset *log);
+int SSDFS_SHIFT_LOG_OFFSET(struct ssdfs_peb_log_offset *log,
+			   u32 shift);
+bool IS_SSDFS_LOG_OFFSET_UNALIGNED(struct ssdfs_peb_log_offset *log);
+void SSDFS_ALIGN_LOG_OFFSET(struct ssdfs_peb_log_offset *log);
+u32 ssdfs_peb_correct_area_write_offset(u32 write_offset, u32 data_size);
+int SSDFS_CORRECT_LOG_OFFSET(struct ssdfs_peb_log_offset *log,
+			     u32 data_size);
+
+size_t ssdfs_peb_temp_buffer_default_size(u32 pagesize);
+int ssdfs_peb_realloc_read_buffer(struct ssdfs_peb_read_buffer *buf,
+				  size_t new_size);
+int ssdfs_peb_realloc_write_buffer(struct ssdfs_peb_temp_buffer *buf);
+
+#endif /* _SSDFS_PEB_INIT_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 27/79] ssdfs: introduce PEB container
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (11 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 24/79] ssdfs: introduce PEB object Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 33/79] ssdfs: introduce segment object Viacheslav Dubeyko
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

SSDFS implements a migration scheme. Migration scheme is
a fundamental technique of GC overhead management. The key
responsibility of the migration scheme is to guarantee
the presence of data in the same segment for any update
operations. Generally speaking, the migration scheme’s model
is implemented on the basis of association an exhausted
"Physical" Erase Block (PEB) with a clean one. The goal such
association of two PEBs is to implement the gradual migration
of data by means of the update operations in the initial
(exhausted) PEB. As a result, the old, exhausted PEB becomes
invalidated after complete data migration and it will be
possible to apply the erase operation to convert it in the
clean state. To implement the migration scheme concept, SSDFS
introduces PEB container that includes source and destination
erase blocks. PEB container object keeps the pointers on source
and destination PEB objects during migration logic execution.

"Physical" Erase Block object can be in one of the possible
state (clean, using, used, pre-dirty, dirty). It means that
PEB container creation logic needs to define the state of
particular erase block and detect that it's under migration
or not. As a result, creation logic prepare proper sequence
of initialization requests, add these request into request
queue, and start threads that executes PEB container
initialization.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/peb_container.h | 631 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 631 insertions(+)
 create mode 100644 fs/ssdfs/peb_container.h

diff --git a/fs/ssdfs/peb_container.h b/fs/ssdfs/peb_container.h
new file mode 100644
index 000000000000..9e7bc76cb043
--- /dev/null
+++ b/fs/ssdfs/peb_container.h
@@ -0,0 +1,631 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/peb_container.h - PEB container declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_PEB_CONTAINER_H
+#define _SSDFS_PEB_CONTAINER_H
+
+#include "block_bitmap.h"
+#include "peb.h"
+
+/* PEB container's array indexes */
+enum {
+	SSDFS_SEG_PEB1,
+	SSDFS_SEG_PEB2,
+	SSDFS_SEG_PEB_ITEMS_MAX
+};
+
+/* PEB container possible states */
+enum {
+	SSDFS_PEB_CONTAINER_EMPTY,
+	SSDFS_PEB1_SRC_CONTAINER,
+	SSDFS_PEB1_DST_CONTAINER,
+	SSDFS_PEB1_SRC_PEB2_DST_CONTAINER,
+	SSDFS_PEB1_SRC_EXT_PTR_DST_CONTAINER,
+	SSDFS_PEB2_SRC_CONTAINER,
+	SSDFS_PEB2_DST_CONTAINER,
+	SSDFS_PEB2_SRC_PEB1_DST_CONTAINER,
+	SSDFS_PEB2_SRC_EXT_PTR_DST_CONTAINER,
+	SSDFS_PEB_CONTAINER_STATE_MAX
+};
+
+/*
+ * PEB migration state
+ */
+enum {
+	SSDFS_PEB_UNKNOWN_MIGRATION_STATE,
+	SSDFS_PEB_NOT_MIGRATING,
+	SSDFS_PEB_MIGRATION_PREPARATION,
+	SSDFS_PEB_RELATION_PREPARATION,
+	SSDFS_PEB_UNDER_MIGRATION,
+	SSDFS_PEB_FINISHING_MIGRATION,
+	SSDFS_PEB_MIGRATION_STATE_MAX
+};
+
+/*
+ * PEB migration phase
+ */
+enum {
+	SSDFS_PEB_MIGRATION_STATUS_UNKNOWN,
+	SSDFS_SRC_PEB_NOT_EXHAUSTED,
+	SSDFS_DST_PEB_RECEIVES_DATA,
+	SSDFS_SHARED_ZONE_RECEIVES_DATA,
+	SSDFS_PEB_MIGRATION_PHASE_MAX
+};
+
+/*
+ * struct ssdfs_thread_execution_point - execution point in thread logic
+ * @file: file name
+ * @function: function name
+ * @code_line: line number
+ */
+struct ssdfs_thread_execution_point {
+	const char *file;
+	const char *function;
+	u32 code_line;
+};
+
+/*
+ * struct ssdfs_thread_call_stack - thread's call stack
+ * @points: execution points array
+ * @count: current number of execution points in array
+ */
+struct ssdfs_thread_call_stack {
+#define SSDFS_CALL_STACK_CAPACITY	(16)
+	struct ssdfs_thread_execution_point points[SSDFS_CALL_STACK_CAPACITY];
+	u32 count;
+};
+
+/*
+ * struct ssdfs_thread_state - PEB container's thread state
+ * @state: current state of the thread
+ * @req: pointer on segment request
+ * @postponed_req: pointer on postponed segment request
+ * @skip_finish_flush_request: should commit skip the finish request?
+ * @has_extent_been_invalidated: has been extent invalidated?
+ * @has_migration_check_requested: has migration check been requested?
+ * @err: current error
+ */
+struct ssdfs_thread_state {
+#define SSDFS_THREAD_UNKNOWN_STATE	(-1)
+	int state;
+	struct ssdfs_segment_request *req;
+	struct ssdfs_segment_request *postponed_req;
+	bool has_extent_been_invalidated;
+	bool has_migration_check_requested;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	struct ssdfs_thread_call_stack call_stack;
+	int unfinished_reqs;
+#endif /* CONFIG_SSDFS_DEBUG */
+};
+
+/*
+ * struct ssdfs_peb_container - PEB container
+ * @peb_type: type of PEB
+ * @peb_index: index of PEB in the array
+ * @log_blocks: count of logical blocks in full log
+ * @peb_state: aggregated PEB container state
+ * @thread_state: threads state
+ * @threads: PEB container's threads array
+ * @read_rq: read requests queue
+ * @update_rq: update requests queue
+ * @crq_ptr_lock: lock of pointer on create requests queue
+ * @create_rq: pointer on shared new page requests queue
+ * @fsck_rq: online FSCK requests queue
+ * @parent_si: pointer on parent segment object
+ * @migration_lock: migration lock
+ * @migration_state: PEB migration state
+ * @migration_phase: PEB migration phase
+ * @items_state: items array state
+ * @shared_free_dst_blks: count of blocks that destination is able to share
+ * @migration_wq: wait queue for migration operations
+ * @cache_protection: PEB cache protection window
+ * @lock: container's internals lock
+ * @src_peb: pointer on source PEB
+ * @dst_peb: pointer on destination PEB
+ * @dst_peb_refs: reference counter of destination PEB (sharing counter)
+ * @items: buffers for PEB objects
+ * @peb_kobj: /sys/fs/ssdfs/<device>/<segN>/<pebN> kernel object
+ * @peb_kobj_unregister: completion state for <pebN> kernel object
+ */
+struct ssdfs_peb_container {
+	/* Static data */
+	u8 peb_type;
+	u16 peb_index;
+	u32 log_blocks;
+
+	atomic_t peb_state;
+
+	/* PEB container's threads */
+	struct ssdfs_thread_state thread_state[SSDFS_PEB_THREAD_TYPE_MAX];
+	struct ssdfs_thread_info thread[SSDFS_PEB_THREAD_TYPE_MAX];
+
+	/* Read requests queue */
+	struct ssdfs_requests_queue read_rq;
+
+	/* Update requests queue */
+	struct ssdfs_requests_queue update_rq;
+
+	spinlock_t pending_lock;
+	u32 pending_updated_user_data_pages;
+
+	/* Shared new page requests queue */
+	spinlock_t crq_ptr_lock;
+	struct ssdfs_requests_queue *create_rq;
+
+	/* Online FSCK requests queue */
+#ifdef CONFIG_SSDFS_ONLINE_FSCK
+	struct ssdfs_requests_queue fsck_rq;
+#endif /* CONFIG_SSDFS_ONLINE_FSCK */
+
+	/* Parent segment */
+	struct ssdfs_segment_info *parent_si;
+
+	/* Migration info */
+	struct mutex migration_lock;
+	atomic_t migration_state;
+	atomic_t migration_phase;
+	atomic_t items_state;
+	atomic_t shared_free_dst_blks;
+	wait_queue_head_t migration_wq;
+
+	/* PEB cache protection window */
+	struct ssdfs_protection_window cache_protection;
+
+	/* PEB objects */
+	struct rw_semaphore lock;
+	struct ssdfs_peb_info *src_peb;
+	struct ssdfs_peb_info *dst_peb;
+	atomic_t dst_peb_refs;
+	struct ssdfs_peb_info items[SSDFS_SEG_PEB_ITEMS_MAX];
+
+	/* /sys/fs/ssdfs/<device>/<segN>/<pebN> */
+	struct kobject peb_kobj;
+	struct completion peb_kobj_unregister;
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_t writeback_folios;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+};
+
+#define PEBI_PTR(pebi) \
+	((struct ssdfs_peb_info *)(pebi))
+#define PEBC_PTR(pebc) \
+	((struct ssdfs_peb_container *)(pebc))
+#define READ_RQ_PTR(pebc) \
+	(&PEBC_PTR(pebc)->read_rq)
+
+#define SSDFS_GC_FINISH_MIGRATION	(4)
+
+/*
+ * Inline functions
+ */
+static inline
+bool is_peb_container_empty(struct ssdfs_peb_container *pebc)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pebc);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return atomic_read(&pebc->items_state) == SSDFS_PEB_CONTAINER_EMPTY;
+}
+
+/*
+ * is_create_requests_queue_empty() - check that create queue has requests
+ * @pebc: pointer on PEB container
+ */
+static inline
+bool is_create_requests_queue_empty(struct ssdfs_peb_container *pebc)
+{
+	bool is_create_rq_empty = true;
+
+	spin_lock(&pebc->crq_ptr_lock);
+	if (pebc->create_rq) {
+		is_create_rq_empty =
+			is_ssdfs_requests_queue_empty(pebc->create_rq);
+	}
+	spin_unlock(&pebc->crq_ptr_lock);
+
+	return is_create_rq_empty;
+}
+
+/*
+ * have_flush_requests() - check that create or update queue have requests
+ * @pebc: pointer on PEB container
+ */
+static inline
+bool have_flush_requests(struct ssdfs_peb_container *pebc)
+{
+	bool is_create_rq_empty = true;
+	bool is_update_rq_empty = true;
+
+	is_create_rq_empty = is_create_requests_queue_empty(pebc);
+	is_update_rq_empty = is_ssdfs_requests_queue_empty(&pebc->update_rq);
+
+	return !is_create_rq_empty || !is_update_rq_empty;
+}
+
+/*
+ * is_fsck_requests_queue_empty() - check that FSCK queue has requests
+ * @pebc: pointer on PEB container
+ */
+static inline
+bool is_fsck_requests_queue_empty(struct ssdfs_peb_container *pebc)
+{
+#ifdef CONFIG_SSDFS_ONLINE_FSCK
+	return is_ssdfs_requests_queue_empty(&pebc->fsck_rq);
+#else
+	return true;
+#endif /* CONFIG_SSDFS_ONLINE_FSCK */
+}
+
+static inline
+bool is_ssdfs_peb_containing_user_data(struct ssdfs_peb_container *pebc)
+{
+	return pebc->peb_type == SSDFS_MAPTBL_DATA_PEB_TYPE;
+}
+
+static inline
+void SSDFS_THREAD_CALL_STACK_INIT(struct ssdfs_thread_call_stack *call_stack)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	struct ssdfs_thread_execution_point *point;
+	int i;
+
+	for (i = 0; i < SSDFS_CALL_STACK_CAPACITY; i++) {
+		point = &call_stack->points[i];
+		point->file = NULL;
+		point->function = NULL;
+		point->code_line = U32_MAX;
+	}
+
+	call_stack->count = 0;
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+static inline
+void SSDFS_THREAD_STATE_INIT(struct ssdfs_thread_state *thread_state)
+{
+	thread_state->state = SSDFS_THREAD_UNKNOWN_STATE;
+	thread_state->req = NULL;
+	thread_state->postponed_req = NULL;
+	thread_state->has_extent_been_invalidated = false;
+	thread_state->has_migration_check_requested = false;
+	thread_state->err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_THREAD_CALL_STACK_INIT(&thread_state->call_stack);
+	thread_state->unfinished_reqs = 0;
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+/*
+ * ssdfs_peb_container_lock() - lock PEB container
+ * @pebc: pointer on PEB container object
+ */
+static inline
+void ssdfs_peb_container_lock(struct ssdfs_peb_container *pebc)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pebc);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	down_read(&pebc->lock);
+	mutex_lock(&pebc->migration_lock);
+}
+
+/*
+ * ssdfs_peb_container_unlock() - unlock PEB container
+ * @pebc: pointer on PEB container object
+ */
+static inline
+void ssdfs_peb_container_unlock(struct ssdfs_peb_container *pebc)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pebc);
+	WARN_ON(!mutex_is_locked(&pebc->migration_lock));
+	WARN_ON(!rwsem_is_locked(&pebc->lock));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	mutex_unlock(&pebc->migration_lock);
+	up_read(&pebc->lock);
+}
+
+/*
+ * is_ssdfs_peb_container_locked() - is PEB container locked?
+ * @pebc: pointer on PEB container object
+ */
+static inline
+bool is_ssdfs_peb_container_locked(struct ssdfs_peb_container *pebc)
+{
+	return rwsem_is_locked(&pebc->lock) &&
+		mutex_is_locked(&pebc->migration_lock);
+}
+
+/*
+ * ssdfs_peb_current_log_lock() - lock current log object
+ * @pebi: pointer on PEB object
+ */
+static inline
+void ssdfs_peb_current_log_lock(struct ssdfs_peb_info *pebi)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pebi);
+	BUG_ON(!rwsem_is_locked(&pebi->pebc->lock));
+	BUG_ON(!mutex_is_locked(&pebi->pebc->migration_lock));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = mutex_lock_killable(&pebi->current_log.lock);
+	WARN_ON(err);
+}
+
+/*
+ * ssdfs_peb_current_log_unlock() - unlock current log object
+ * @pebi: pointer on PEB object
+ */
+static inline
+void ssdfs_peb_current_log_unlock(struct ssdfs_peb_info *pebi)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pebi);
+	BUG_ON(!pebi->pebc);
+	WARN_ON(!mutex_is_locked(&pebi->current_log.lock));
+	WARN_ON(!mutex_is_locked(&pebi->pebc->migration_lock));
+	WARN_ON(!rwsem_is_locked(&pebi->pebc->lock));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	mutex_unlock(&pebi->current_log.lock);
+}
+
+static inline
+bool is_ssdfs_peb_current_log_locked(struct ssdfs_peb_info *pebi)
+{
+	bool is_locked;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pebi);
+	WARN_ON(!mutex_is_locked(&pebi->pebc->migration_lock));
+	WARN_ON(!rwsem_is_locked(&pebi->pebc->lock));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	is_locked = mutex_is_locked(&pebi->current_log.lock);
+
+	return is_locked;
+}
+
+/*
+ * can_peb_receive_new_blocks() - check that PEB can receive new blocks
+ * @fsi: shared file system information
+ * @pebc: PEB container object
+ * @peb_used_pages: number of used pages in PEB
+ */
+static inline
+bool can_peb_receive_new_blocks(struct ssdfs_fs_info *fsi,
+				struct ssdfs_peb_container *pebc,
+				int peb_used_pages)
+{
+	bool is_peb_under_migration = false;
+	bool is_peb_inflated = false;
+
+	switch (atomic_read(&pebc->migration_state)) {
+	case SSDFS_PEB_MIGRATION_PREPARATION:
+	case SSDFS_PEB_RELATION_PREPARATION:
+	case SSDFS_PEB_UNDER_MIGRATION:
+	case SSDFS_PEB_FINISHING_MIGRATION:
+		is_peb_under_migration = true;
+		break;
+
+	default:
+		is_peb_under_migration = false;
+		break;
+	}
+
+	is_peb_inflated = peb_used_pages > fsi->pages_per_peb;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("is_peb_under_migration %#x, "
+		  "is_peb_inflated %#x\n",
+		  is_peb_under_migration,
+		  is_peb_inflated);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (is_peb_under_migration && is_peb_inflated) {
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * is_peb_preparing_migration() - check that PEB preparing migration
+ * @pebc: pointer on PEB container
+ */
+static inline
+bool is_peb_preparing_migration(struct ssdfs_peb_container *pebc)
+{
+	bool is_preparing = false;
+
+	switch (atomic_read(&pebc->migration_state)) {
+	case SSDFS_PEB_MIGRATION_PREPARATION:
+	case SSDFS_PEB_RELATION_PREPARATION:
+	case SSDFS_PEB_FINISHING_MIGRATION:
+		is_preparing = true;
+		break;
+
+	default:
+		/* do nothing */
+		break;
+	}
+
+	return is_preparing;
+}
+
+/*
+ * ssdfs_thread_call_stack_remember() - remember execution point
+ * @stack: thread's call stack
+ * @file: file name pointer
+ * @function: function name pointer
+ * @code_line: code line in file
+ */
+static inline
+void ssdfs_thread_call_stack_remember(struct ssdfs_thread_call_stack *stack,
+					const char *file,
+					const char *function,
+					u32 code_line)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	struct ssdfs_thread_execution_point *point;
+
+	if (stack->count < SSDFS_CALL_STACK_CAPACITY) {
+		if (stack->count > 0) {
+			point = &stack->points[stack->count - 1];
+
+			if (point->function && function) {
+				if (strcmp(point->function, function) == 0)
+					point->code_line = code_line;
+				else
+					goto process_new_execution_point;
+			} else
+				goto process_new_execution_point;
+		} else {
+process_new_execution_point:
+			point = &stack->points[stack->count];
+			point->file = file;
+			point->function = function;
+			point->code_line = code_line;
+			stack->count++;
+		}
+	} else
+		stack->count++;
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+/*
+ * ssdfs_thread_call_stack_forget() - forget execution point
+ * @stack: thread's call stack
+ */
+static inline
+void ssdfs_thread_call_stack_forget(struct ssdfs_thread_call_stack *stack)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	struct ssdfs_thread_execution_point *point;
+
+	stack->count--;
+
+	if (stack->count < SSDFS_CALL_STACK_CAPACITY) {
+		point = &stack->points[stack->count];
+
+		point->file = NULL;
+		point->function = NULL;
+		point->code_line = U32_MAX;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+/*
+ * PEB container's API
+ */
+int ssdfs_peb_container_create(struct ssdfs_fs_info *fsi,
+				u64 seg, u32 peb_index,
+				u8 peb_type,
+				u32 log_pages,
+				struct ssdfs_segment_info *si);
+void ssdfs_peb_container_destroy(struct ssdfs_peb_container *pebc);
+
+int ssdfs_peb_container_invalidate_block(struct ssdfs_peb_container *pebc,
+				    struct ssdfs_phys_offset_descriptor *desc);
+int ssdfs_peb_get_free_pages(struct ssdfs_peb_container *pebc);
+int ssdfs_peb_get_used_data_pages(struct ssdfs_peb_container *pebc);
+int ssdfs_peb_get_invalid_pages(struct ssdfs_peb_container *pebc);
+int ssdfs_peb_get_pages_capacity(struct ssdfs_peb_container *pebc);
+
+int ssdfs_peb_join_create_requests_queue(struct ssdfs_peb_container *pebc,
+					 struct ssdfs_requests_queue *create_rq);
+void ssdfs_peb_forget_create_requests_queue(struct ssdfs_peb_container *pebc);
+bool is_peb_joined_into_create_requests_queue(struct ssdfs_peb_container *pebc);
+
+struct ssdfs_peb_info *
+ssdfs_get_current_peb_locked(struct ssdfs_peb_container *pebc);
+void ssdfs_unlock_current_peb(struct ssdfs_peb_container *pebc);
+struct ssdfs_peb_info *
+ssdfs_get_peb_for_migration_id(struct ssdfs_peb_container *pebc,
+			       u8 migration_id);
+
+int ssdfs_peb_container_create_destination(struct ssdfs_peb_container *ptr);
+int ssdfs_peb_container_forget_source(struct ssdfs_peb_container *pebc);
+int ssdfs_peb_container_forget_relation(struct ssdfs_peb_container *pebc);
+int ssdfs_peb_container_change_state(struct ssdfs_peb_container *pebc);
+
+/*
+ * PEB container's private API
+ */
+int ssdfs_peb_gc_thread_func(void *data);
+int ssdfs_peb_read_thread_func(void *data);
+int ssdfs_peb_flush_thread_func(void *data);
+int ssdfs_peb_fsck_thread_func(void *data);
+
+u16 ssdfs_peb_estimate_reserved_metapages(u32 page_size, u32 pages_per_peb,
+					  u16 log_pages, u32 pebs_per_seg,
+					  bool is_migrating);
+int ssdfs_peb_read_page(struct ssdfs_peb_container *pebc,
+			struct ssdfs_segment_request *req,
+			struct completion **end);
+int ssdfs_peb_readahead_pages(struct ssdfs_peb_container *pebc,
+			      struct ssdfs_segment_request *req,
+			      struct completion **end);
+void ssdfs_peb_mark_request_block_uptodate(struct ssdfs_peb_container *pebc,
+					   struct ssdfs_segment_request *req,
+					   int blk_index);
+int ssdfs_peb_copy_block(struct ssdfs_peb_container *pebc,
+			 u32 logical_blk,
+			 struct ssdfs_segment_request *req);
+int ssdfs_peb_copy_blocks_range(struct ssdfs_peb_container *pebc,
+				struct ssdfs_block_bmap_range *range,
+				struct ssdfs_segment_request *req);
+int ssdfs_peb_copy_pre_alloc_block(struct ssdfs_peb_container *pebc,
+				   u32 logical_blk,
+				   struct ssdfs_segment_request *req);
+int __ssdfs_peb_get_block_state_desc(struct ssdfs_peb_info *pebi,
+				struct ssdfs_segment_request *req,
+				struct ssdfs_metadata_descriptor *area_desc,
+				struct ssdfs_block_state_descriptor *desc,
+				u64 *cno, u64 *parent_snapshot);
+int ssdfs_blk_desc_buffer_init(struct ssdfs_peb_container *pebc,
+				struct ssdfs_segment_request *req,
+				struct ssdfs_phys_offset_descriptor *desc_off,
+				struct ssdfs_offset_position *pos,
+				struct ssdfs_metadata_descriptor *array,
+				size_t array_size);
+int ssdfs_peb_read_block_state(struct ssdfs_peb_container *pebc,
+				struct ssdfs_segment_request *req,
+				struct ssdfs_phys_offset_descriptor *desc_off,
+				struct ssdfs_offset_position *pos,
+				struct ssdfs_metadata_descriptor *array,
+				size_t array_size);
+bool ssdfs_peb_has_dirty_folios(struct ssdfs_peb_info *pebi);
+int ssdfs_collect_dirty_segments_now(struct ssdfs_fs_info *fsi);
+bool can_peb_process_create_requests(struct ssdfs_peb_container *pebc);
+
+#endif /* _SSDFS_PEB_CONTAINER_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 33/79] ssdfs: introduce segment object
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (12 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 27/79] ssdfs: introduce PEB container Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 38/79] ssdfs: introduce PEB mapping table Viacheslav Dubeyko
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

Segment is a basic unit for allocation, manage and free a file
system volume space. It's a fixed size portion of volume that
can contain one or several Logical Erase Blocks (LEB). Initially,
segment is empty container that has clean state. File system logic
can find a clean segment by means of search operation in segment
bitmap. LEBs of clean segment need to be mapped into "Physical"
Erase Blocks (PEB) by using PEB mapping table. Technically
speaking, not every LEB can be mapped into PEB if mapping table
hasn't any clean PEB. Segment can be imagined like a container
that includes an array of PEB containers. Segment object implements
the logic of logical blocks allocation, prepare create and update
requests. Current segment has create queue that is used to add
new data into file, for example. PEB container has update queue
that is used for adding update requests. Flush thread is woken up
after every operation of adding request into queue. Finally,
flush thread executes create/update requests and commit logs with
compressed and compacted user data or metadata.

Segment object implements API of adding logical blocks
into user files or metadata structures. It means that if file
or metadata structure (for example, b-tree) needs to grow,
then file system logic has to add/allocate new block or extent.
Add/Allocate logical block operation requires several steps:
(1) Reserve logical block(s) by means decrementing/checking
    the counter of free logical blocks for the whole volume;
(2) Allocate logical block ID(s) by offset translation table
    of segment object;
(3) Add create request into flush thread's queue;
(4) Flush thread processes create request by means of
    compressing user data or metadata and compact several
    compressed logical block into one or several memory pages;
(5) Flush thread execute commit operation by means of preparing
    the log (header + payload + footer) and stores into offset
    translation table the association of logical block ID with
    particular offset into log's payload.

Any file or metadata structure can be updated, truncated, or
deleted. Segment object supports the update and invalidate
operations with user data or metadata. SSDFS uses logical extent
concept to track the location of any user data or metadata.
It means that every metadata structure is described by a sequence
of extents. Inode object keeps inline extents or root node of
extents b-tree that tracks the location of a file's content.
Extent identifies a segment ID, logical block ID, and length of
extent. Segment ID is used to create or access the segment object.
The segment object has offset translation table that provides
the mechanism to convert a logical block ID into "Physical"
Erase Block (PEB) ID. Finally, it is possible to add update or
invalidation request into PEB's update queue. PEB's flush thread
takes the update/invalidate requests from the queue and executes
the requests. Execution of request means the creation of new
log that will contain the actual state of updated or invalidated
data in the log's metadata (header, block bitmap, offset
translation table) and payload.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/current_segment.h |  116 +++
 fs/ssdfs/segment.h         | 1367 ++++++++++++++++++++++++++++++++++++
 fs/ssdfs/segment_tree.h    |  107 +++
 3 files changed, 1590 insertions(+)
 create mode 100644 fs/ssdfs/current_segment.h
 create mode 100644 fs/ssdfs/segment.h
 create mode 100644 fs/ssdfs/segment_tree.h

diff --git a/fs/ssdfs/current_segment.h b/fs/ssdfs/current_segment.h
new file mode 100644
index 000000000000..28612fb6b555
--- /dev/null
+++ b/fs/ssdfs/current_segment.h
@@ -0,0 +1,116 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/current_segment.h - current segment abstraction declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_CURRENT_SEGMENT_H
+#define _SSDFS_CURRENT_SEGMENT_H
+
+/*
+ * struct ssdfs_current_segment - current segment container
+ * @lock: exclusive lock of current segment object
+ * @type: current segment type
+ * @seg_id: last known segment ID
+ * @real_seg: concrete current segment
+ * @fsi: pointer on shared file system object
+ */
+struct ssdfs_current_segment {
+	struct mutex *lock;
+	int type;
+	u64 seg_id;
+	struct ssdfs_segment_info *real_seg;
+	struct ssdfs_fs_info *fsi;
+};
+
+enum {
+	SSDFS_CUR_DATA_SEG_LOCK,
+	SSDFS_CUR_LNODE_SEG_LOCK,
+	SSDFS_CUR_HNODE_SEG_LOCK,
+	SSDFS_CUR_IDXNODE_SEG_LOCK,
+	SSDFS_CUR_SEG_LOCK_COUNT
+};
+
+/*
+ * struct ssdfs_current_segs_array - array of current segments
+ * @lock: current segments array's lock
+ * @objects: array of pointers on current segment objects
+ * @buffer: buffer for all current segment objects
+ * @lock_buffer: array of current segments' locks
+ */
+struct ssdfs_current_segs_array {
+	struct rw_semaphore lock;
+	struct ssdfs_current_segment *objects[SSDFS_CUR_SEGS_COUNT];
+	u8 buffer[sizeof(struct ssdfs_current_segment) * SSDFS_CUR_SEGS_COUNT];
+	struct mutex lock_buffer[SSDFS_CUR_SEG_LOCK_COUNT];
+};
+
+/*
+ * Inline functions
+ */
+static inline
+bool is_ssdfs_current_segment_empty(struct ssdfs_current_segment *cur_seg)
+{
+	return cur_seg->real_seg == NULL;
+}
+
+static inline
+struct mutex *CUR_SEG2LOCK(struct ssdfs_fs_info *fsi, int seg_type)
+{
+	struct ssdfs_current_segs_array *array = fsi->cur_segs;
+
+	switch (seg_type) {
+	case SSDFS_CUR_DATA_SEG:
+	case SSDFS_CUR_DATA_UPDATE_SEG:
+		return &array->lock_buffer[SSDFS_CUR_DATA_SEG_LOCK];
+
+	case SSDFS_CUR_LNODE_SEG:
+		return &array->lock_buffer[SSDFS_CUR_LNODE_SEG_LOCK];
+
+	case SSDFS_CUR_HNODE_SEG:
+		return &array->lock_buffer[SSDFS_CUR_HNODE_SEG_LOCK];
+
+	case SSDFS_CUR_IDXNODE_SEG:
+		return &array->lock_buffer[SSDFS_CUR_IDXNODE_SEG_LOCK];
+
+	default:
+		/* do nothing */
+		break;
+	}
+
+	return NULL;
+}
+
+/*
+ * Current segment container's API
+ */
+int ssdfs_current_segment_array_create(struct ssdfs_fs_info *fsi);
+void ssdfs_destroy_all_curent_segments(struct ssdfs_fs_info *fsi);
+void ssdfs_current_segment_array_destroy(struct ssdfs_fs_info *fsi);
+
+void ssdfs_current_segment_lock(struct ssdfs_current_segment *cur_seg);
+void ssdfs_current_segment_unlock(struct ssdfs_current_segment *cur_seg);
+bool is_ssdfs_current_segment_locked(struct ssdfs_current_segment *cur_seg);
+
+int ssdfs_current_segment_add(struct ssdfs_current_segment *cur_seg,
+			      struct ssdfs_segment_info *si,
+			      struct ssdfs_segment_search_state *state);
+void ssdfs_current_segment_remove(struct ssdfs_current_segment *cur_seg);
+
+#endif /* _SSDFS_CURRENT_SEGMENT_H */
diff --git a/fs/ssdfs/segment.h b/fs/ssdfs/segment.h
new file mode 100644
index 000000000000..9478683fdd6f
--- /dev/null
+++ b/fs/ssdfs/segment.h
@@ -0,0 +1,1367 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/segment.h - segment concept declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_SEGMENT_H
+#define _SSDFS_SEGMENT_H
+
+#include "peb.h"
+#include "segment_block_bitmap.h"
+
+/* Available indexes for destination */
+enum {
+	SSDFS_LAST_DESTINATION,
+	SSDFS_CREATING_DESTINATION,
+	SSDFS_DESTINATION_MAX
+};
+
+/* Possible states of destination descriptor */
+enum {
+	SSDFS_EMPTY_DESTINATION,
+	SSDFS_DESTINATION_UNDER_CREATION,
+	SSDFS_VALID_DESTINATION,
+	SSDFS_OBSOLETE_DESTINATION,
+	SSDFS_DESTINATION_STATE_MAX
+};
+
+/*
+ * struct ssdfs_migration_destination - destination descriptor
+ * @state: descriptor's state
+ * @destination_pebs: count of destination PEBs for migration
+ * @shared_peb_index: shared index of destination PEB for migration
+ */
+struct ssdfs_migration_destination {
+	int state;
+	int destination_pebs;
+	int shared_peb_index;
+};
+
+/*
+ * struct ssdfs_segment_migration_info - migration info
+ * @migrating_pebs: count of migrating PEBs
+ * @lock: migration data lock
+ * @array: destination descriptors
+ */
+struct ssdfs_segment_migration_info {
+	atomic_t migrating_pebs;
+
+	spinlock_t lock;
+	struct ssdfs_migration_destination array[SSDFS_DESTINATION_MAX];
+};
+
+/*
+ * struct ssdfs_segment_info - segment object description
+ * @seg_id: segment identification number
+ * @log_pages: count of pages in full partial log
+ * @create_threads: number of flush PEB's threads for new page requests
+ * @seg_type: segment type
+ * @protection: segment's protection window
+ * @obj_state: segment object's state
+ * @activity_type: type of activity with segment object
+ * @modification_lock: lock protecting modificaiton of segment's state
+ * @seg_state: current state of segment
+ * @peb_array: array of PEB's descriptors
+ * @pebs_count: count of items in PEBS array
+ * @migration: migration info
+ * @refs_count: counter of references on segment object
+ * @object_queue: wait queue for segment creation/destruction
+ * @create_rq: new page requests queue
+ * @pending_lock: lock of pending pages' counter
+ * @pending_new_user_data_pages: counter of pending new user data pages
+ * @invalidated_user_data_pages: counter of invalidated user data pages
+ * @wait_queue: array of PEBs' wait queues
+ * @blk_bmap: segment's block bitmap
+ * @blk2off_table: offset translation table
+ * @fsi: pointer on shared file system object
+ * @seg_kobj: /sys/fs/ssdfs/<device>/segments/<segN> kernel object
+ * @seg_kobj_unregister: completion state for <segN> kernel object
+ * @pebs_kobj: /sys/fs/<ssdfs>/<device>/segments/<segN>/pebs kernel object
+ * @pebs_kobj_unregister: completion state for pebs kernel object
+ */
+struct ssdfs_segment_info {
+	/* Static data */
+	u64 seg_id;
+	u16 log_pages;
+	u8 create_threads;
+	u16 seg_type;
+
+	/* Checkpoints set */
+	struct ssdfs_protection_window protection;
+
+	/* Mutable data */
+	atomic_t obj_state;
+	atomic_t activity_type;
+
+	struct rw_semaphore modification_lock;
+	atomic_t seg_state;
+
+	/* Segment's PEB's containers array */
+	struct ssdfs_peb_container *peb_array;
+	u16 pebs_count;
+
+	/* Migration info */
+	struct ssdfs_segment_migration_info migration;
+
+	/* Reference counter */
+	atomic_t refs_count;
+	wait_queue_head_t object_queue;
+
+	/*
+	 * New pages processing:
+	 * requests queue, wait queue
+	 */
+	struct ssdfs_requests_queue create_rq;
+
+	spinlock_t pending_lock;
+	u32 pending_new_user_data_pages;
+	u32 invalidated_user_data_pages;
+
+	/* Threads' wait queues */
+	wait_queue_head_t wait_queue[SSDFS_PEB_THREAD_TYPE_MAX];
+
+	struct ssdfs_segment_blk_bmap blk_bmap;
+	struct ssdfs_blk2off_table *blk2off_table;
+	struct ssdfs_fs_info *fsi;
+
+	/* /sys/fs/ssdfs/<device>/segments/<segN> */
+	struct kobject *seg_kobj;
+	struct kobject seg_kobj_buf;
+	struct completion seg_kobj_unregister;
+
+	/* /sys/fs/<ssdfs>/<device>/segments/<segN>/pebs */
+	struct kobject pebs_kobj;
+	struct completion pebs_kobj_unregister;
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_t writeback_folios;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+};
+
+/* Segment object states */
+enum {
+	SSDFS_SEG_OBJECT_UNKNOWN_STATE,
+	SSDFS_SEG_OBJECT_UNDER_CREATION,
+	SSDFS_SEG_OBJECT_CREATED,
+	SSDFS_CURRENT_SEG_OBJECT,
+	SSDFS_SEG_OBJECT_FAILURE,
+	SSDFS_SEG_OBJECT_PRE_DELETED,
+	SSDFS_SEG_OBJECT_STATE_MAX
+};
+
+/* Segment object's activity type */
+enum {
+	SSDFS_SEG_OBJECT_NO_ACTIVITY,
+	SSDFS_SEG_OBJECT_REGULAR_ACTIVITY,
+	SSDFS_SEG_UNDER_GC_ACTIVITY,
+	SSDFS_SEG_UNDER_INVALIDATION,
+	SSDFS_SEG_UNDER_FINISHING_MIGRATION,
+	SSDFS_SEG_UNDER_GC_FINISHING_MIGRATION,
+	SSDFS_SEG_OBJECT_ACTIVITY_TYPE_MAX
+};
+
+/*
+ * struct ssdfs_segment_search_state - state of segment search
+ * @request.seg_type: type of segment
+ * @request.start_search_id: starting segment ID for search
+ * @request.need_find_new_segment: does it need to find a new segment?
+ * @request.search_clean_segment_only: does it need to fins clean segment only?
+ * @result.seg_state: segment state
+ * @result.seg_id: requested or found segment ID
+ * @result.free_pages: number of free pages in found segment
+ * @result.used_pages: number of used pages in found segment
+ * @result.invalid_pages: number of invalid_pages pages in found segment
+ * @result.number_of_tries: number of tries to find segment
+ */
+struct ssdfs_segment_search_state {
+	struct {
+		int seg_type;
+		u64 start_search_id;
+		bool need_find_new_segment;
+		bool search_clean_segment_only;
+	} request;
+
+	struct {
+		int seg_state;
+		u64 seg_id;
+		int free_pages;
+		int used_pages;
+		int invalid_pages;
+		int number_of_tries;
+	} result;
+};
+
+/*
+ * Inline functions
+ */
+
+/*
+ * ssdfs_segment_search_state_init() - initialize segment search state
+ */
+static inline
+void ssdfs_segment_search_state_init(struct ssdfs_segment_search_state *state,
+				     int seg_type, u64 seg_id,
+				     u64 start_search_id)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!state);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	state->request.seg_type = seg_type;
+	state->request.start_search_id = start_search_id;
+	state->request.need_find_new_segment = false;
+	state->request.search_clean_segment_only = false;
+
+	state->result.seg_state = SSDFS_SEG_STATE_MAX;
+	state->result.seg_id = seg_id;
+	state->result.free_pages = -1;
+	state->result.used_pages = -1;
+	state->result.invalid_pages = -1;
+	state->result.number_of_tries = 0;
+}
+
+/*
+ * is_ssdfs_segment_created() - check that segment object is created
+ *
+ * This function returns TRUE for the case of successful
+ * creation of segment's object or failure of the creation.
+ * The responsibility of the caller to check that
+ * segment object has been created successfully.
+ */
+static inline
+bool is_ssdfs_segment_created(struct ssdfs_segment_info *si)
+{
+	bool is_created = false;
+
+	switch (atomic_read(&si->obj_state)) {
+	case SSDFS_SEG_OBJECT_CREATED:
+	case SSDFS_CURRENT_SEG_OBJECT:
+	case SSDFS_SEG_OBJECT_FAILURE:
+	case SSDFS_SEG_OBJECT_PRE_DELETED:
+		is_created = true;
+		break;
+
+	default:
+		is_created = false;
+		break;
+	}
+
+	return is_created;
+}
+
+/*
+ * CUR_SEG_TYPE() - convert request class into current segment type
+ */
+static inline
+int CUR_SEG_TYPE(int req_class)
+{
+	int cur_seg_type = SSDFS_CUR_SEGS_COUNT;
+
+	switch (req_class) {
+	case SSDFS_PEB_PRE_ALLOCATE_DATA_REQ:
+	case SSDFS_PEB_CREATE_DATA_REQ:
+		cur_seg_type = SSDFS_CUR_DATA_SEG;
+		break;
+
+	case SSDFS_PEB_PRE_ALLOCATE_LNODE_REQ:
+	case SSDFS_PEB_CREATE_LNODE_REQ:
+		cur_seg_type = SSDFS_CUR_LNODE_SEG;
+		break;
+
+	case SSDFS_PEB_PRE_ALLOCATE_HNODE_REQ:
+	case SSDFS_PEB_CREATE_HNODE_REQ:
+		cur_seg_type = SSDFS_CUR_HNODE_SEG;
+		break;
+
+	case SSDFS_PEB_PRE_ALLOCATE_IDXNODE_REQ:
+	case SSDFS_PEB_CREATE_IDXNODE_REQ:
+		cur_seg_type = SSDFS_CUR_IDXNODE_SEG;
+		break;
+
+	case SSDFS_ZONE_USER_DATA_MIGRATE_REQ:
+	case SSDFS_PEB_USER_DATA_MOVE_REQ:
+		cur_seg_type = SSDFS_CUR_DATA_UPDATE_SEG;
+		break;
+
+	default:
+		BUG();
+	}
+
+	return cur_seg_type;
+}
+
+/*
+ * SEG_TYPE() - convert request class into segment type
+ */
+static inline
+int SEG_TYPE(int req_class)
+{
+	int seg_type = SSDFS_LAST_KNOWN_SEG_TYPE;
+
+	switch (req_class) {
+	case SSDFS_PEB_PRE_ALLOCATE_DATA_REQ:
+	case SSDFS_PEB_CREATE_DATA_REQ:
+	case SSDFS_ZONE_USER_DATA_MIGRATE_REQ:
+	case SSDFS_PEB_USER_DATA_MOVE_REQ:
+		seg_type = SSDFS_USER_DATA_SEG_TYPE;
+		break;
+
+	case SSDFS_PEB_PRE_ALLOCATE_LNODE_REQ:
+	case SSDFS_PEB_CREATE_LNODE_REQ:
+		seg_type = SSDFS_LEAF_NODE_SEG_TYPE;
+		break;
+
+	case SSDFS_PEB_PRE_ALLOCATE_HNODE_REQ:
+	case SSDFS_PEB_CREATE_HNODE_REQ:
+		seg_type = SSDFS_HYBRID_NODE_SEG_TYPE;
+		break;
+
+	case SSDFS_PEB_PRE_ALLOCATE_IDXNODE_REQ:
+	case SSDFS_PEB_CREATE_IDXNODE_REQ:
+		seg_type = SSDFS_INDEX_NODE_SEG_TYPE;
+		break;
+
+	default:
+		BUG();
+	}
+
+	return seg_type;
+}
+
+/*
+ * SEG_TYPE_TO_USING_STATE() - convert segment type to segment using state
+ * @seg_type: segment type
+ */
+static inline
+int SEG_TYPE_TO_USING_STATE(u16 seg_type)
+{
+	switch (seg_type) {
+	case SSDFS_USER_DATA_SEG_TYPE:
+		return SSDFS_SEG_DATA_USING;
+
+	case SSDFS_LEAF_NODE_SEG_TYPE:
+		return SSDFS_SEG_LEAF_NODE_USING;
+
+	case SSDFS_HYBRID_NODE_SEG_TYPE:
+		return SSDFS_SEG_HYBRID_NODE_USING;
+
+	case SSDFS_INDEX_NODE_SEG_TYPE:
+		return SSDFS_SEG_INDEX_NODE_USING;
+	}
+
+	return SSDFS_SEG_STATE_MAX;
+}
+
+/*
+ * SEG_TYPE2MASK() - convert segment type into search mask
+ */
+static inline
+int SEG_TYPE2MASK(int seg_type)
+{
+	int mask;
+
+	switch (seg_type) {
+	case SSDFS_USER_DATA_SEG_TYPE:
+		mask = SSDFS_SEG_DATA_USING_STATE_FLAG |
+			SSDFS_SEG_DATA_USING_INVALIDATED_STATE_FLAG;
+		break;
+
+	case SSDFS_LEAF_NODE_SEG_TYPE:
+		mask = SSDFS_SEG_LEAF_NODE_USING_STATE_FLAG;
+		break;
+
+	case SSDFS_HYBRID_NODE_SEG_TYPE:
+		mask = SSDFS_SEG_HYBRID_NODE_USING_STATE_FLAG;
+		break;
+
+	case SSDFS_INDEX_NODE_SEG_TYPE:
+		mask = SSDFS_SEG_INDEX_NODE_USING_STATE_FLAG;
+		break;
+
+	default:
+		BUG();
+	};
+
+	return mask;
+}
+
+/*
+ * SEG_TYPE_TO_CUR_SEG_TYPE() - convert segment type to current segment type
+ * @seg_type: segment type
+ */
+static inline
+int SEG_TYPE_TO_CUR_SEG_TYPE(u16 seg_type)
+{
+	int cur_seg_type = SSDFS_CUR_SEGS_COUNT;
+
+	switch (seg_type) {
+	case SSDFS_USER_DATA_SEG_TYPE:
+		return SSDFS_CUR_DATA_SEG;
+
+	case SSDFS_LEAF_NODE_SEG_TYPE:
+		return SSDFS_CUR_LNODE_SEG;
+
+	case SSDFS_HYBRID_NODE_SEG_TYPE:
+		return SSDFS_CUR_HNODE_SEG;
+
+	case SSDFS_INDEX_NODE_SEG_TYPE:
+		return SSDFS_CUR_IDXNODE_SEG;
+	}
+
+	return cur_seg_type;
+}
+
+static inline
+bool is_regular_fs_operations(struct ssdfs_segment_info *si)
+{
+	int state;
+
+	state = atomic_read(&si->fsi->global_fs_state);
+	return state == SSDFS_REGULAR_FS_OPERATIONS;
+}
+
+static inline
+bool is_metadata_under_flush(struct ssdfs_segment_info *si)
+{
+	switch (atomic_read(&si->fsi->global_fs_state)) {
+	case SSDFS_METADATA_UNDER_FLUSH:
+	case SSDFS_UNMOUNT_METADATA_UNDER_FLUSH:
+		return true;
+
+	default:
+		/* continue logic */
+		break;
+	}
+
+	return false;
+}
+
+static inline
+bool is_metadata_going_flushing(struct ssdfs_segment_info *si)
+{
+	switch (atomic_read(&si->fsi->global_fs_state)) {
+	case SSDFS_METADATA_GOING_FLUSHING:
+	case SSDFS_UNMOUNT_METADATA_GOING_FLUSHING:
+		return true;
+
+	default:
+		/* continue logic */
+		break;
+	}
+
+	return false;
+}
+
+static inline
+bool is_unmount_in_progress(struct ssdfs_segment_info *si)
+{
+	switch (atomic_read(&si->fsi->global_fs_state)) {
+	case SSDFS_UNMOUNT_METADATA_GOING_FLUSHING:
+	case SSDFS_UNMOUNT_METADATA_UNDER_FLUSH:
+	case SSDFS_UNMOUNT_MAPTBL_UNDER_FLUSH:
+	case SSDFS_UNMOUNT_COMMIT_SUPERBLOCK:
+	case SSDFS_UNMOUNT_DESTROY_METADATA:
+		return true;
+
+	default:
+		/* continue logic */
+		break;
+	}
+
+	return false;
+}
+
+static inline
+void ssdfs_account_user_data_read_request(struct ssdfs_segment_info *si,
+					  struct ssdfs_segment_request *req)
+{
+	u64 read_requests = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!si || !req);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (si->seg_type == SSDFS_USER_DATA_SEG_TYPE) {
+		switch (atomic_read(&si->fsi->global_fs_state)) {
+		case SSDFS_UNMOUNT_MAPTBL_UNDER_FLUSH:
+		case SSDFS_UNMOUNT_COMMIT_SUPERBLOCK:
+		case SSDFS_UNMOUNT_DESTROY_METADATA:
+			/*
+			 * Unexpected state.
+			 */
+			BUG();
+			break;
+
+		default:
+			/* do nothing */
+			break;
+		}
+
+		spin_lock(&si->fsi->volume_state_lock);
+		si->fsi->read_user_data_requests++;
+		read_requests = si->fsi->read_user_data_requests;
+		spin_unlock(&si->fsi->volume_state_lock);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("seg_id %llu, read_requests %llu, "
+			  "req->private.class %#x, req->private.cmd %#x\n",
+			  si->seg_id, read_requests,
+			  req->private.class,
+			  req->private.cmd);
+		BUG_ON(!is_request_command_valid(req->private.class,
+						 req->private.cmd));
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+}
+
+static inline
+void ssdfs_forget_user_data_read_request(struct ssdfs_segment_info *si,
+					 struct ssdfs_segment_request *req)
+{
+	u64 read_requests = 0;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!si);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (si->seg_type == SSDFS_USER_DATA_SEG_TYPE) {
+		spin_lock(&si->fsi->volume_state_lock);
+		read_requests = si->fsi->read_user_data_requests;
+		if (read_requests > 0) {
+			si->fsi->read_user_data_requests--;
+			read_requests = si->fsi->read_user_data_requests;
+		} else
+			err = -ERANGE;
+		spin_unlock(&si->fsi->volume_state_lock);
+
+		if (unlikely(err))
+			SSDFS_WARN("fail to decrement\n");
+
+#ifdef CONFIG_SSDFS_DEBUG
+		if (req == NULL) {
+			SSDFS_DBG("seg_id %llu, read_requests %llu\n",
+				  si->seg_id, read_requests);
+		} else {
+			SSDFS_DBG("seg_id %llu, read_requests %llu, "
+				  "req->private.class %#x, "
+				  "req->private.cmd %#x\n",
+				  si->seg_id, read_requests,
+				  req->private.class,
+				  req->private.cmd);
+			BUG_ON(!is_request_command_valid(req->private.class,
+							 req->private.cmd));
+		}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (read_requests == 0)
+			wake_up_all(&si->fsi->finish_user_data_read_wq);
+	}
+}
+
+static inline
+void ssdfs_account_user_data_flush_request(struct ssdfs_segment_info *si,
+					   struct ssdfs_segment_request *req)
+{
+	u64 flush_requests = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!si || !req);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (si->seg_type == SSDFS_USER_DATA_SEG_TYPE) {
+		switch (atomic_read(&si->fsi->global_fs_state)) {
+		case SSDFS_UNMOUNT_MAPTBL_UNDER_FLUSH:
+		case SSDFS_UNMOUNT_COMMIT_SUPERBLOCK:
+		case SSDFS_UNMOUNT_DESTROY_METADATA:
+			/*
+			 * Unexpected state.
+			 */
+			BUG();
+			break;
+
+		default:
+			/* do nothing */
+			break;
+		}
+
+		spin_lock(&si->fsi->volume_state_lock);
+		si->fsi->flushing_user_data_requests++;
+		flush_requests = si->fsi->flushing_user_data_requests;
+#ifdef CONFIG_SSDFS_DEBUG
+		spin_lock(&si->fsi->requests_lock);
+		list_add_tail(&req->user_data_requests_list,
+				&si->fsi->user_data_requests_list);
+		spin_unlock(&si->fsi->requests_lock);
+#endif /* CONFIG_SSDFS_DEBUG */
+		spin_unlock(&si->fsi->volume_state_lock);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("seg_id %llu, flush_requests %llu, "
+			  "req->private.class %#x, req->private.cmd %#x\n",
+			  si->seg_id, flush_requests,
+			  req->private.class,
+			  req->private.cmd);
+		BUG_ON(!is_request_command_valid(req->private.class,
+						 req->private.cmd));
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+}
+
+static inline
+void ssdfs_forget_user_data_flush_request(struct ssdfs_segment_info *si,
+					  struct ssdfs_segment_request *req)
+{
+	u64 flush_requests = 0;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!si);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (si->seg_type == SSDFS_USER_DATA_SEG_TYPE) {
+		spin_lock(&si->fsi->volume_state_lock);
+		flush_requests = si->fsi->flushing_user_data_requests;
+		if (flush_requests > 0) {
+			si->fsi->flushing_user_data_requests--;
+			flush_requests = si->fsi->flushing_user_data_requests;
+#ifdef CONFIG_SSDFS_DEBUG
+			if (req) {
+				spin_lock(&si->fsi->requests_lock);
+				list_del(&req->user_data_requests_list);
+				spin_unlock(&si->fsi->requests_lock);
+			}
+#endif /* CONFIG_SSDFS_DEBUG */
+		} else
+			err = -ERANGE;
+		spin_unlock(&si->fsi->volume_state_lock);
+
+		if (unlikely(err))
+			SSDFS_WARN("fail to decrement\n");
+
+#ifdef CONFIG_SSDFS_DEBUG
+		if (req == NULL) {
+			SSDFS_DBG("seg_id %llu, flush_requests %llu\n",
+				  si->seg_id, flush_requests);
+		} else {
+			SSDFS_DBG("seg_id %llu, flush_requests %llu, "
+				  "req->private.class %#x, "
+				  "req->private.cmd %#x\n",
+				  si->seg_id, flush_requests,
+				  req->private.class,
+				  req->private.cmd);
+			BUG_ON(!is_request_command_valid(req->private.class,
+							 req->private.cmd));
+		}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (flush_requests == 0)
+			wake_up_all(&si->fsi->finish_user_data_flush_wq);
+	}
+}
+
+static inline
+bool is_user_data_pages_invalidated(struct ssdfs_segment_info *si)
+{
+	u64 invalidated = 0;
+
+	if (si->seg_type != SSDFS_USER_DATA_SEG_TYPE)
+		return false;
+
+	spin_lock(&si->pending_lock);
+	invalidated = si->invalidated_user_data_pages;
+	spin_unlock(&si->pending_lock);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("seg_id %llu, invalidated %llu\n",
+		  si->seg_id, invalidated);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return invalidated > 0;
+}
+
+static inline
+void ssdfs_account_invalidated_user_data_pages(struct ssdfs_segment_info *si,
+						u32 count)
+{
+	u64 invalidated = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!si);
+
+	SSDFS_DBG("si %p, count %u\n",
+		  si, count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (si->seg_type == SSDFS_USER_DATA_SEG_TYPE) {
+		switch (atomic_read(&si->fsi->global_fs_state)) {
+		case SSDFS_UNMOUNT_MAPTBL_UNDER_FLUSH:
+		case SSDFS_UNMOUNT_COMMIT_SUPERBLOCK:
+		case SSDFS_UNMOUNT_DESTROY_METADATA:
+			/*
+			 * Unexpected state.
+			 */
+			BUG();
+			break;
+
+		default:
+			/* do nothing */
+			break;
+		}
+
+		spin_lock(&si->pending_lock);
+		si->invalidated_user_data_pages += count;
+		invalidated = si->invalidated_user_data_pages;
+		spin_unlock(&si->pending_lock);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("seg_id %llu, invalidated %llu\n",
+			  si->seg_id, invalidated);
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+}
+
+static inline
+void ssdfs_forget_invalidated_user_data_pages(struct ssdfs_segment_info *si)
+{
+	u64 invalidated = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!si);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (si->seg_type == SSDFS_USER_DATA_SEG_TYPE) {
+		spin_lock(&si->pending_lock);
+		invalidated = si->invalidated_user_data_pages;
+		si->invalidated_user_data_pages = 0;
+		spin_unlock(&si->pending_lock);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("seg_id %llu, invalidated %llu\n",
+			  si->seg_id, invalidated);
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+}
+
+static inline
+void ssdfs_account_commit_log_request(struct ssdfs_segment_info *si)
+{
+	u64 commit_log_requests = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!si);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (si->seg_type == SSDFS_USER_DATA_SEG_TYPE) {
+		switch (atomic_read(&si->fsi->global_fs_state)) {
+		case SSDFS_UNMOUNT_MAPTBL_UNDER_FLUSH:
+		case SSDFS_UNMOUNT_COMMIT_SUPERBLOCK:
+		case SSDFS_UNMOUNT_DESTROY_METADATA:
+			/*
+			 * Unexpected state.
+			 */
+			BUG();
+			break;
+
+		default:
+			/* do nothing */
+			break;
+		}
+	}
+
+	spin_lock(&si->fsi->volume_state_lock);
+	si->fsi->commit_log_requests++;
+	commit_log_requests = si->fsi->commit_log_requests;
+	spin_unlock(&si->fsi->volume_state_lock);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("seg_id %llu, commit_log_requests %llu\n",
+		  si->seg_id, commit_log_requests);
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+static inline
+void ssdfs_forget_commit_log_request(struct ssdfs_segment_info *si)
+{
+	u64 commit_log_requests = 0;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!si);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	spin_lock(&si->fsi->volume_state_lock);
+	commit_log_requests = si->fsi->commit_log_requests;
+	if (commit_log_requests > 0) {
+		si->fsi->commit_log_requests--;
+		commit_log_requests = si->fsi->commit_log_requests;
+	} else
+		err = -ERANGE;
+	spin_unlock(&si->fsi->volume_state_lock);
+
+	if (unlikely(err)) {
+		SSDFS_WARN("fail to decrement: "
+			   "seg_id %llu\n",
+			   si->seg_id);
+	}
+
+	if (commit_log_requests == 0)
+		wake_up_all(&si->fsi->finish_commit_log_flush_wq);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("seg_id %llu, commit_log_requests %llu\n",
+		  si->seg_id, commit_log_requests);
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+static inline
+void ssdfs_protection_account_request(struct ssdfs_protection_window *ptr,
+				      u64 current_cno)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	u64 create_cno;
+	u64 last_request_cno;
+	u32 reqs_count;
+	u64 protected_range;
+	u64 future_request_cno;
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	spin_lock(&ptr->cno_lock);
+
+	if (ptr->reqs_count == 0) {
+		ptr->reqs_count = 1;
+		ptr->last_request_cno = current_cno;
+	} else
+		ptr->reqs_count++;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	create_cno = ptr->create_cno;
+	last_request_cno = ptr->last_request_cno;
+	reqs_count = ptr->reqs_count;
+	protected_range = ptr->protected_range;
+	future_request_cno = ptr->future_request_cno;
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	spin_unlock(&ptr->cno_lock);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("create_cno %llu, "
+		  "last_request_cno %llu, reqs_count %u, "
+		  "protected_range %llu, future_request_cno %llu\n",
+		  create_cno,
+		  last_request_cno, reqs_count,
+		  protected_range, future_request_cno);
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+static inline
+void ssdfs_protection_forget_request(struct ssdfs_protection_window *ptr,
+				     u64 current_cno)
+{
+	u64 create_cno;
+	u64 last_request_cno;
+	u32 reqs_count;
+	u64 protected_range;
+	u64 future_request_cno;
+	int err = 0;
+
+	spin_lock(&ptr->cno_lock);
+
+	if (ptr->reqs_count == 0) {
+		err = -ERANGE;
+		goto finish_process_request;
+	} else if (ptr->reqs_count == 1) {
+		ptr->reqs_count--;
+
+		if (ptr->last_request_cno >= current_cno) {
+			err = -ERANGE;
+			goto finish_process_request;
+		} else {
+			u64 diff = current_cno - ptr->last_request_cno;
+			u64 last_range = ptr->protected_range;
+			ptr->protected_range = max_t(u64, last_range, diff);
+			ptr->last_request_cno = current_cno;
+			ptr->future_request_cno =
+				current_cno + ptr->protected_range;
+		}
+	} else
+		ptr->reqs_count--;
+
+finish_process_request:
+	create_cno = ptr->create_cno;
+	last_request_cno = ptr->last_request_cno;
+	reqs_count = ptr->reqs_count;
+	protected_range = ptr->protected_range;
+	future_request_cno = ptr->future_request_cno;
+
+	spin_unlock(&ptr->cno_lock);
+
+	if (unlikely(err)) {
+		SSDFS_WARN("create_cno %llu, "
+			   "last_request_cno %llu, reqs_count %u, "
+			   "protected_range %llu, future_request_cno %llu\n",
+			   create_cno,
+			   last_request_cno, reqs_count,
+			   protected_range, future_request_cno);
+		return;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("create_cno %llu, "
+		  "last_request_cno %llu, reqs_count %u, "
+		  "protected_range %llu, future_request_cno %llu\n",
+		  create_cno,
+		  last_request_cno, reqs_count,
+		  protected_range, future_request_cno);
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+static inline
+void ssdfs_segment_create_request_cno(struct ssdfs_segment_info *si)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("seg %llu\n", si->seg_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_protection_account_request(&si->protection,
+				ssdfs_current_cno(si->fsi->sb));
+}
+
+static inline
+void ssdfs_segment_finish_request_cno(struct ssdfs_segment_info *si)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("seg %llu\n", si->seg_id);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_protection_forget_request(&si->protection,
+				ssdfs_current_cno(si->fsi->sb));
+}
+
+static inline
+bool should_gc_doesnt_touch_segment(struct ssdfs_segment_info *si)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	u64 create_cno;
+	u64 last_request_cno;
+	u32 reqs_count;
+	u64 protected_range;
+	u64 future_request_cno;
+#endif /* CONFIG_SSDFS_DEBUG */
+	u64 cur_cno;
+	bool dont_touch = false;
+
+	spin_lock(&si->protection.cno_lock);
+	if (si->protection.reqs_count > 0) {
+		/* segment is under processing */
+		dont_touch = true;
+	} else {
+		cur_cno = ssdfs_current_cno(si->fsi->sb);
+		if (cur_cno <= si->protection.future_request_cno) {
+			/* segment is under protection window yet */
+			dont_touch = true;
+		}
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	create_cno = si->protection.create_cno;
+	last_request_cno = si->protection.last_request_cno;
+	reqs_count = si->protection.reqs_count;
+	protected_range = si->protection.protected_range;
+	future_request_cno = si->protection.future_request_cno;
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	spin_unlock(&si->protection.cno_lock);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("seg_id %llu, create_cno %llu, "
+		  "last_request_cno %llu, reqs_count %u, "
+		  "protected_range %llu, future_request_cno %llu, "
+		  "dont_touch %#x\n",
+		  si->seg_id, create_cno,
+		  last_request_cno, reqs_count,
+		  protected_range, future_request_cno,
+		  dont_touch);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return dont_touch;
+}
+
+static inline
+void ssdfs_peb_read_request_cno(struct ssdfs_peb_container *pebc)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("seg %llu, peb_index %u\n",
+		  pebc->parent_si->seg_id,
+		  pebc->peb_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_protection_account_request(&pebc->cache_protection,
+			ssdfs_current_cno(pebc->parent_si->fsi->sb));
+}
+
+static inline
+void ssdfs_peb_finish_read_request_cno(struct ssdfs_peb_container *pebc)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("seg %llu, peb_index %u\n",
+		  pebc->parent_si->seg_id,
+		  pebc->peb_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_protection_forget_request(&pebc->cache_protection,
+			ssdfs_current_cno(pebc->parent_si->fsi->sb));
+}
+
+static inline
+bool is_it_time_free_peb_cache_memory(struct ssdfs_peb_container *pebc)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	u64 create_cno;
+	u64 last_request_cno;
+	u32 reqs_count;
+	u64 protected_range;
+	u64 future_request_cno;
+#endif /* CONFIG_SSDFS_DEBUG */
+	u64 cur_cno;
+	bool dont_touch = false;
+
+	spin_lock(&pebc->cache_protection.cno_lock);
+	if (pebc->cache_protection.reqs_count > 0) {
+		/* PEB has read requests */
+		dont_touch = true;
+	} else {
+		cur_cno = ssdfs_current_cno(pebc->parent_si->fsi->sb);
+		if (cur_cno <= pebc->cache_protection.future_request_cno) {
+			/* PEB is under protection window yet */
+			dont_touch = true;
+		}
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	create_cno = pebc->cache_protection.create_cno;
+	last_request_cno = pebc->cache_protection.last_request_cno;
+	reqs_count = pebc->cache_protection.reqs_count;
+	protected_range = pebc->cache_protection.protected_range;
+	future_request_cno = pebc->cache_protection.future_request_cno;
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	spin_unlock(&pebc->cache_protection.cno_lock);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("seg_id %llu, peb_index %u, create_cno %llu, "
+		  "last_request_cno %llu, reqs_count %u, "
+		  "protected_range %llu, future_request_cno %llu, "
+		  "dont_touch %#x\n",
+		  pebc->parent_si->seg_id,
+		  pebc->peb_index,
+		  create_cno,
+		  last_request_cno, reqs_count,
+		  protected_range, future_request_cno,
+		  dont_touch);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return !dont_touch;
+}
+
+static inline
+bool is_ssdfs_segment_under_invalidation(struct ssdfs_segment_info *si)
+{
+	return atomic_read(&si->activity_type) == SSDFS_SEG_UNDER_INVALIDATION;
+}
+
+/*
+ * Segment object's API
+ */
+struct ssdfs_segment_info *ssdfs_segment_allocate_object(u64 seg_id);
+void ssdfs_segment_free_object(struct ssdfs_segment_info *si);
+int ssdfs_segment_create_object(struct ssdfs_fs_info *fsi,
+				u64 seg,
+				int seg_state,
+				u16 seg_type,
+				u16 log_pages,
+				u8 create_threads,
+				struct ssdfs_segment_info *si);
+int ssdfs_segment_destroy_object(struct ssdfs_segment_info *si);
+void ssdfs_segment_get_object(struct ssdfs_segment_info *si);
+void ssdfs_segment_put_object(struct ssdfs_segment_info *si);
+
+struct ssdfs_segment_info *
+ssdfs_grab_segment(struct ssdfs_fs_info *fsi,
+		   struct ssdfs_segment_search_state *state);
+bool is_ssdfs_segment_ready_for_requests(struct ssdfs_segment_info *si);
+int ssdfs_wait_segment_init_end(struct ssdfs_segment_info *si);
+
+int ssdfs_segment_read_block_sync(struct ssdfs_segment_info *si,
+				  struct ssdfs_segment_request *req);
+int ssdfs_segment_read_block_async(struct ssdfs_segment_info *si,
+				  int req_type,
+				  struct ssdfs_segment_request *req);
+
+int ssdfs_segment_pre_alloc_data_block_sync(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request_pool *pool,
+					struct ssdfs_dirty_folios_batch *batch);
+int ssdfs_segment_pre_alloc_data_block_async(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request_pool *pool,
+					struct ssdfs_dirty_folios_batch *batch);
+int ssdfs_segment_pre_alloc_leaf_node_block_sync(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_pre_alloc_leaf_node_block_async(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_pre_alloc_hybrid_node_block_sync(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_pre_alloc_hybrid_node_block_async(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_pre_alloc_index_node_block_sync(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_pre_alloc_index_node_block_async(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+
+int ssdfs_segment_add_data_block_sync(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request_pool *pool,
+					struct ssdfs_dirty_folios_batch *batch);
+int ssdfs_segment_add_data_block_async(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request_pool *pool,
+					struct ssdfs_dirty_folios_batch *batch);
+int ssdfs_segment_migrate_zone_block_sync(struct ssdfs_fs_info *fsi,
+					  int req_type,
+					  struct ssdfs_segment_request *req,
+					  u64 *seg_id,
+					  struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_migrate_zone_block_async(struct ssdfs_fs_info *fsi,
+					   int req_type,
+					   struct ssdfs_segment_request *req,
+					   u64 *seg_id,
+					   struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_add_leaf_node_block_sync(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_add_leaf_node_block_async(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_add_hybrid_node_block_sync(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_add_hybrid_node_block_async(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_add_index_node_block_sync(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_add_index_node_block_async(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+
+int ssdfs_segment_pre_alloc_data_extent_sync(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request_pool *pool,
+					struct ssdfs_dirty_folios_batch *batch);
+int ssdfs_segment_pre_alloc_data_extent_async(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request_pool *pool,
+					struct ssdfs_dirty_folios_batch *batch);
+int ssdfs_segment_pre_alloc_leaf_node_extent_sync(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_pre_alloc_leaf_node_extent_async(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_pre_alloc_hybrid_node_extent_sync(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_pre_alloc_hybrid_node_extent_async(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_pre_alloc_index_node_extent_sync(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_pre_alloc_index_node_extent_async(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+
+int ssdfs_segment_add_data_extent_sync(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request_pool *pool,
+					struct ssdfs_dirty_folios_batch *batch);
+int ssdfs_segment_add_data_extent_async(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request_pool *pool,
+					struct ssdfs_dirty_folios_batch *batch);
+int ssdfs_segment_migrate_zone_extent_sync(struct ssdfs_fs_info *fsi,
+					   int req_type,
+					   struct ssdfs_segment_request *req,
+					   u64 *seg_id,
+					   struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_migrate_zone_extent_async(struct ssdfs_fs_info *fsi,
+					    int req_type,
+					    struct ssdfs_segment_request *req,
+					    u64 *seg_id,
+					    struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_move_peb_extent_sync(struct ssdfs_fs_info *fsi,
+					int req_type,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_move_peb_extent_async(struct ssdfs_fs_info *fsi,
+					int req_type,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_add_xattr_blob_sync(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_add_xattr_blob_async(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_add_leaf_node_extent_sync(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_add_leaf_node_extent_async(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_add_hybrid_node_extent_sync(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_add_hybrid_node_extent_async(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_add_index_node_extent_sync(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+int ssdfs_segment_add_index_node_extent_async(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req,
+					u64 *seg_id,
+					struct ssdfs_blk2off_range *extent);
+
+int ssdfs_segment_update_data_block_sync(struct ssdfs_segment_info *si,
+					struct ssdfs_segment_request_pool *pool,
+					struct ssdfs_dirty_folios_batch *batch);
+int ssdfs_segment_update_data_block_async(struct ssdfs_segment_info *si,
+					int req_type,
+					struct ssdfs_segment_request_pool *pool,
+					struct ssdfs_dirty_folios_batch *batch);
+int ssdfs_segment_update_data_extent_sync(struct ssdfs_segment_info *si,
+					struct ssdfs_segment_request_pool *pool,
+					struct ssdfs_dirty_folios_batch *batch);
+int ssdfs_segment_update_data_extent_async(struct ssdfs_segment_info *si,
+					int req_type,
+					struct ssdfs_segment_request_pool *pool,
+					struct ssdfs_dirty_folios_batch *batch);
+int ssdfs_segment_update_block_sync(struct ssdfs_segment_info *si,
+				    struct ssdfs_segment_request *req);
+int ssdfs_segment_update_block_async(struct ssdfs_segment_info *si,
+				     int req_type,
+				     struct ssdfs_segment_request *req);
+int ssdfs_segment_update_extent_sync(struct ssdfs_segment_info *si,
+				     struct ssdfs_segment_request *req);
+int ssdfs_segment_update_extent_async(struct ssdfs_segment_info *si,
+				      int req_type,
+				      struct ssdfs_segment_request *req);
+int ssdfs_segment_update_pre_alloc_block_sync(struct ssdfs_segment_info *si,
+					    struct ssdfs_segment_request *req);
+int ssdfs_segment_update_pre_alloc_block_async(struct ssdfs_segment_info *si,
+					    int req_type,
+					    struct ssdfs_segment_request *req);
+int ssdfs_segment_update_pre_alloc_extent_sync(struct ssdfs_segment_info *si,
+					    struct ssdfs_segment_request *req);
+int ssdfs_segment_update_pre_alloc_extent_async(struct ssdfs_segment_info *si,
+					    int req_type,
+					    struct ssdfs_segment_request *req);
+
+int ssdfs_segment_node_diff_on_write_sync(struct ssdfs_segment_info *si,
+					  struct ssdfs_segment_request *req);
+int ssdfs_segment_node_diff_on_write_async(struct ssdfs_segment_info *si,
+					   int req_type,
+					   struct ssdfs_segment_request *req);
+int ssdfs_segment_data_diff_on_write_sync(struct ssdfs_segment_info *si,
+					  struct ssdfs_segment_request *req);
+int ssdfs_segment_data_diff_on_write_async(struct ssdfs_segment_info *si,
+					   int req_type,
+					   struct ssdfs_segment_request *req);
+
+int ssdfs_segment_prepare_migration_sync(struct ssdfs_segment_info *si,
+					 struct ssdfs_segment_request *req);
+int ssdfs_segment_prepare_migration_async(struct ssdfs_segment_info *si,
+					  int req_type,
+					  struct ssdfs_segment_request *req);
+int ssdfs_segment_commit_log_sync(struct ssdfs_segment_info *si,
+				  struct ssdfs_segment_request *req);
+int ssdfs_segment_commit_log_async(struct ssdfs_segment_info *si,
+				   int req_type,
+				   struct ssdfs_segment_request *req);
+int ssdfs_segment_commit_log_sync2(struct ssdfs_segment_info *si,
+				   u16 peb_index,
+				   struct ssdfs_segment_request *req);
+int ssdfs_segment_commit_log_async2(struct ssdfs_segment_info *si,
+				    int req_type, u16 peb_index,
+				    struct ssdfs_segment_request *req);
+
+int ssdfs_segment_invalidate_logical_block(struct ssdfs_segment_info *si,
+					   u32 blk_offset);
+int ssdfs_segment_invalidate_logical_extent(struct ssdfs_segment_info *si,
+					    u32 start_off, u32 blks_count);
+
+int ssdfs_segment_migrate_range_async(struct ssdfs_segment_info *si,
+				      struct ssdfs_segment_request *req);
+int ssdfs_segment_migrate_pre_alloc_page_async(struct ssdfs_segment_info *si,
+					    struct ssdfs_segment_request *req);
+int ssdfs_segment_migrate_fragment_async(struct ssdfs_segment_info *si,
+					 struct ssdfs_segment_request *req);
+
+/*
+ * Internal segment object's API
+ */
+struct ssdfs_segment_info *
+__ssdfs_create_new_segment(struct ssdfs_fs_info *fsi,
+			   u64 seg_id, int seg_state,
+			   u16 seg_type, u16 log_pages,
+			   u8 create_threads);
+int ssdfs_segment_change_state(struct ssdfs_segment_info *si);
+int ssdfs_segment_detect_search_range(struct ssdfs_fs_info *fsi,
+				      u64 *start_seg, u64 *end_seg);
+
+#endif /* _SSDFS_SEGMENT_H */
diff --git a/fs/ssdfs/segment_tree.h b/fs/ssdfs/segment_tree.h
new file mode 100644
index 000000000000..c026f2e78424
--- /dev/null
+++ b/fs/ssdfs/segment_tree.h
@@ -0,0 +1,107 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/segment_tree.h - segment tree declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_SEGMENT_TREE_H
+#define _SSDFS_SEGMENT_TREE_H
+
+/*
+ * struct ssdfs_seg_object_info - segment object info
+ * @list: segment objects queue list
+ * @si: pointer on segment object
+ */
+struct ssdfs_seg_object_info {
+	struct list_head list;
+	struct ssdfs_segment_info *si;
+};
+
+/*
+ * struct ssdfs_segment_tree - tree of segment objects
+ * @lnodes_seg_log_pages: full log size in leaf nodes segment (pages count)
+ * @hnodes_seg_log_pages: full log size in hybrid nodes segment (pages count)
+ * @inodes_seg_log_pages: full log size in index nodes segment (pages count)
+ * @user_data_log_pages: full log size in user data segment (pages count)
+ * @default_log_pages: default full log size (pages count)
+ * @dentries_btree: dentries b-tree descriptor
+ * @extents_btree: extents b-tree descriptor
+ * @xattr_btree: xattrs b-tree descriptor
+ * @lock: folios array's lock
+ * @capacity: maxumum possible capacity of folios in array
+ * @folios: folios of segment tree
+ */
+struct ssdfs_segment_tree {
+	u16 lnodes_seg_log_pages;
+	u16 hnodes_seg_log_pages;
+	u16 inodes_seg_log_pages;
+	u16 user_data_log_pages;
+	u16 default_log_pages;
+
+	struct ssdfs_dentries_btree_descriptor dentries_btree;
+	struct ssdfs_extents_btree_descriptor extents_btree;
+	struct ssdfs_xattr_btree_descriptor xattr_btree;
+
+	struct rw_semaphore lock;
+	u32 capacity;
+	struct ssdfs_folio_array folios;
+};
+
+#define SSDFS_SEG_OBJ_PTR_PER_PAGE \
+	(PAGE_SIZE / sizeof(struct ssdfs_segment_info *))
+
+/*
+ * Segment objects queue API
+ */
+void ssdfs_seg_objects_queue_init(struct ssdfs_seg_objects_queue *soq);
+bool is_ssdfs_seg_objects_queue_empty(struct ssdfs_seg_objects_queue *soq);
+void ssdfs_seg_objects_queue_add_tail(struct ssdfs_seg_objects_queue *soq,
+				      struct ssdfs_seg_object_info *soi);
+void ssdfs_seg_objects_queue_add_head(struct ssdfs_seg_objects_queue *soq,
+				      struct ssdfs_seg_object_info *soi);
+int ssdfs_seg_objects_queue_remove_first(struct ssdfs_seg_objects_queue *soq,
+					 struct ssdfs_seg_object_info **soi);
+void ssdfs_seg_objects_queue_remove_all(struct ssdfs_seg_objects_queue *soq);
+
+/*
+ * Segment object info's API
+ */
+void ssdfs_zero_seg_object_info_cache_ptr(void);
+int ssdfs_init_seg_object_info_cache(void);
+void ssdfs_shrink_seg_object_info_cache(void);
+void ssdfs_destroy_seg_object_info_cache(void);
+
+struct ssdfs_seg_object_info *ssdfs_seg_object_info_alloc(void);
+void ssdfs_seg_object_info_free(struct ssdfs_seg_object_info *soi);
+void ssdfs_seg_object_info_init(struct ssdfs_seg_object_info *soi,
+				struct ssdfs_segment_info *si);
+
+/*
+ * Segments' tree API
+ */
+int ssdfs_segment_tree_create(struct ssdfs_fs_info *fsi);
+void ssdfs_segment_tree_destroy(struct ssdfs_fs_info *fsi);
+int ssdfs_segment_tree_add(struct ssdfs_fs_info *fsi,
+			   struct ssdfs_segment_info *si);
+int ssdfs_segment_tree_remove(struct ssdfs_fs_info *fsi,
+			      struct ssdfs_segment_info *si);
+struct ssdfs_segment_info *
+ssdfs_segment_tree_find(struct ssdfs_fs_info *fsi, u64 seg_id);
+
+#endif /* _SSDFS_SEGMENT_TREE_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 38/79] ssdfs: introduce PEB mapping table
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (13 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 33/79] ssdfs: introduce segment object Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 40/79] ssdfs: introduce PEB mapping table cache Viacheslav Dubeyko
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

SSDFS file system is based on the concept of logical segment
that is the aggregation of Logical Erase Blocks (LEB). Moreover,
initially, LEB hasn’t association with a particular "Physical"
Erase Block (PEB). It means that segment could have the association
not for all LEBs or, even, to have no association at all with any
PEB (for example, in the case of clean segment). Generally speaking,
SSDFS file system needs a special metadata structure (PEB mapping
table) that is capable of associating any LEB with any PEB. The PEB
mapping table is the crucial metadata structure that has several goals:
(1) mapping LEB to PEB, (2) implementation the logical extent concept,
(3) implementation the concept of PEB migration, (4) implementation of
the delayed erase operation by specialized thread.

PEB mapping table describes the state of all PEBs on a particular
SSDFS file system’s volume. These descriptors are split on several
fragments that are distributed amongst PEBs of specialized segments.
Every fragment of PEB mapping table represents the log’s payload in
a specialized segment. Generally speaking, the payload’s content is
split on: (1) LEB table, and (2) PEB table. The LEB table starts from
the header and it contains the array of records are ordered by LEB IDs.
It means that LEB ID plays the role of index in the array of records.
As a result, the responsibility of LEB table is to define an index inside
of PEB table. Moreover, every LEB table’s record defines two indexes.
The first index (physical index) associates the LEB ID with some PEB ID.
Additionally, the second index (relation index) is able to define a PEB ID
that plays the role of destination PEB during the migration process from
the exhausted PEB into a new one. It is possible to see that PEB table
starts from the header and it contains the array of PEB’s state records is
ordered by PEB ID. The most important fields of the PEB’s state record
are: (1) erase cycles, (2) PEB type, (3) PEB state.

PEB type describes possible types of data that PEB could contain:
(1) user data, (2) leaf b-tree node, (3) hybrid b-tree node,
(4) index b-tree node, (5) snapshot, (6) superblock, (7) segment bitmap,
(8) PEB mapping table. PEB state describes possible states of PEB during
the lifecycle: (1) clean state means that PEB contains only free NAND flash
pages are ready for write operations, (2) using state means that PEB could
contain valid, invalid, and free pages, (3) used state means that PEB
contains only valid pages, (4) pre-dirty state means that PEB contains
as valid as invalid pages only, (5) dirty state means that PEB contains
only invalid pages, (6) migrating state means that PEB is under migration,
(7) pre-erase state means that PEB is added into the queue of PEBs are
waiting the erase operation, (8) recovering state means that PEB will be
untouched during some amount of time with the goal to recover the ability
to fulfill the erase operation, (9) bad state means that PEB is unable
to be used for storing the data. Generally speaking, the responsibility of
PEB state is to track the passing of PEBs through various phases of their
lifetime with the goal to manage the PEBs’ pool of the file system’s
volume efficiently.

"Physical" Erase Block (PEB) mapping table is represented by
a sequence of fragments are distributed among several
segments. Every map or unmap operation marks a fragment as
dirty. Flush operation requires to check the dirty state of
all fragments and to flush dirty fragments on the volume by
means of creation of log(s) into PEB(s) is dedicated to store
mapping table's content. Flush operation is executed in several
steps: (1) prepare migration, (2) flush dirty fragments,
(3) commit logs.

Prepare migration operation is requested before mapping table
flush with the goal to check the necessity to finish/start
migration. Because, start/finish migration requires the modification of
mapping table. However, mapping table's flush operation needs to be
finished without any modifications of mapping table itself.
Flush dirty fragments step implies the searching of dirty fragments
and preparation of update requests for PEB(s) flush thread.
Finally, commit log should be requested because metadata flush
operation must be finished by storing new metadata state
persistently.

Logical extent represents fundamental concept of SSDFS file
system. Any piece of data or metadata on file system volume
is identified by: (1) segment ID, (2) logical block ID, and
(3) length. As a result, any logical block is always located
at the same segment because segment is logical portion of
file system volume is always located at the same position.
However, logical block's content should be located into some
erase block. "Physical" Erase Block (PEB) mapping table
implements mapping of Logical Erase Block (LEB) into PEB
because any segment is a container for one or several LEBs.
Moreover, mapping table supports migration scheme implementation.
The migration scheme guarantee that logical block will be
always located at the same segment even for the case of update
requests.

PEB mapping table implements two fundamental methods:
(1) convert LEB to PEB; (2) map LEB to PEB. Conversion operation is
required if we need to identify which particular PEB contains
data for a LEB of particular segment. Mapping operation is required
if a clean segment has been allocated because LEB(s) of clean
segment need to be associated with PEB(s) that can store logs with
user data or metadata.

Migration scheme is the fundamental technique of GC overhead
management in the SSDFS file system. The key responsibility of
the migration scheme is to guarantee the presence of data at
the same segment for any update operations. Generally speaking,
the migration scheme’s model is implemented on the basis of
association an exhausted PEB with a clean one. The goal of such
association of two PEBs is to implement the gradual migration of
data by means of the update operations in the initial (exhausted)
PEB. As a result, the old, exhausted PEB becomes invalidated after
complete data migration and it will be possible to apply
the erase operation to convert it in the clean state. Moreover,
the destination PEB in the association changes the initial PEB
for some index in the segment and, finally, it becomes the only
PEB for this position. Such technique implements the concept of
logical extent with the goal to decrease the write amplification
issue and to manage the GC overhead. Because the logical extent
concept excludes the necessity to update metadata tracking
the position of user data on the file system’s volume.
Generally speaking, the migration scheme is capable to decrease
the GC activity significantly by means of the excluding the necessity
to update metadata and by means of self-migration of data
between of PEBs is triggered by regular update operations.

Mapping table supports two principal operations:
(1) add migration PEB, (2) exclude migration PEB. Operation of
adding migration PEB is required for the case of starting
migration. Exclude migration PEB operation is executed during
finishing migration. Adding migration PEB operation implies
the association an exhausted PEB with a clean one. Excluding
migration PEB operation implies removing completely invalidated
PEB from the association and request to TRIM/erase this PEB.

"Physical" Erase Block (PEB) mapping table has dedicated
thread. This thread has goal to track the presence of dirty
PEB(s) in mapping table and to execute TRIM/erase operation
for dirty PEBs in the background. However, if the number of
dirty PEBs is big enough, then erase operation(s) can be
executed at the context of the thread that marks PEB as dirty.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/peb_mapping_table.h | 784 +++++++++++++++++++++++++++++++++++
 1 file changed, 784 insertions(+)
 create mode 100644 fs/ssdfs/peb_mapping_table.h

diff --git a/fs/ssdfs/peb_mapping_table.h b/fs/ssdfs/peb_mapping_table.h
new file mode 100644
index 000000000000..aa2332ec8341
--- /dev/null
+++ b/fs/ssdfs/peb_mapping_table.h
@@ -0,0 +1,784 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/peb_mapping_table.h - PEB mapping table declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_PEB_MAPPING_TABLE_H
+#define _SSDFS_PEB_MAPPING_TABLE_H
+
+#include "request_queue.h"
+
+#define SSDFS_MAPTBL_FIRST_PROTECTED_INDEX	0
+#define SSDFS_MAPTBL_PROTECTION_STEP		50
+#define SSDFS_MAPTBL_PROTECTION_RANGE		3
+
+#define SSDFS_PRE_ERASE_PEB_THRESHOLD_PCT	(3)
+#define SSDFS_UNUSED_LEB_THRESHOLD_PCT		(1)
+
+/*
+ * struct ssdfs_maptbl_flush_pair - segment/request pair
+ * @si: pointer on segment object
+ * @req: request object
+ */
+struct ssdfs_maptbl_flush_pair {
+	struct ssdfs_segment_info *si;
+	struct ssdfs_segment_request req;
+};
+
+/*
+ * struct ssdfs_maptbl_fragment_desc - fragment descriptor
+ * @lock: fragment lock
+ * @state: fragment state
+ * @fragment_id: fragment's ID in the whole sequence
+ * @fragment_folios: count of memory folios in fragment
+ * @start_leb: start LEB of fragment
+ * @lebs_count: count of LEB descriptors in the whole fragment
+ * @lebs_per_page: count of LEB descriptors in memory folio
+ * @lebtbl_pages: count of memory folios are used for LEBs description
+ * @pebs_per_page: count of PEB descriptors in memory folio
+ * @stripe_pages: count of memory folios in one stripe
+ * @mapped_lebs: mapped LEBs count in the fragment
+ * @migrating_lebs: migrating LEBs count in the fragment
+ * @reserved_pebs: count of reserved PEBs in fragment
+ * @pre_erase_pebs: count of PEBs in pre-erase state per fragment
+ * @recovering_pebs: count of recovering PEBs per fragment
+ * @array: fragment's memory folios
+ * @init_end: wait of init ending
+ * @flush_req1: main flush requests array
+ * @flush_req2: backup flush requests array
+ * @flush_req_count: number of flush requests in the array
+ * @flush_seq_size: flush requests' array capacity
+ */
+struct ssdfs_maptbl_fragment_desc {
+	struct rw_semaphore lock;
+	atomic_t state;
+
+	u32 fragment_id;
+	u32 fragment_folios;
+
+	u64 start_leb;
+	u32 lebs_count;
+
+	u16 lebs_per_page;
+	u16 lebtbl_pages;
+
+	u16 pebs_per_page;
+	u16 stripe_pages;
+
+	u32 mapped_lebs;
+	u32 migrating_lebs;
+	u32 reserved_pebs;
+	u32 pre_erase_pebs;
+	u32 recovering_pebs;
+
+	struct ssdfs_folio_array array;
+	struct completion init_end;
+
+	struct ssdfs_maptbl_flush_pair *flush_pair1;
+	struct ssdfs_maptbl_flush_pair *flush_pair2;
+	u32 flush_req_count;
+	u32 flush_seq_size;
+
+	/* /sys/fs/<ssdfs>/<device>/maptbl/fragments/fragment<N> */
+	struct kobject frag_kobj;
+	struct completion frag_kobj_unregister;
+};
+
+/* Fragment's state */
+enum {
+	SSDFS_MAPTBL_FRAG_CREATED	= 0,
+	SSDFS_MAPTBL_FRAG_INIT_FAILED	= 1,
+	SSDFS_MAPTBL_FRAG_INITIALIZED	= 2,
+	SSDFS_MAPTBL_FRAG_DIRTY		= 3,
+	SSDFS_MAPTBL_FRAG_TOWRITE	= 4,
+	SSDFS_MAPTBL_FRAG_STATE_MAX	= 5,
+};
+
+/*
+ * struct ssdfs_maptbl_area - mapping table area
+ * @portion_id: sequential ID of mapping table fragment
+ * @folios: array of memory folio pointers
+ * @folios_capacity: capacity of array
+ * @folios_count: count of folios in array
+ */
+struct ssdfs_maptbl_area {
+	u16 portion_id;
+	struct folio **folios;
+	size_t folios_capacity;
+	size_t folios_count;
+};
+
+/*
+ * struct ssdfs_peb_mapping_table - mapping table object
+ * @tbl_lock: mapping table lock
+ * @fragments_count: count of fragments
+ * @fragments_per_seg: count of fragments in segment
+ * @fragments_per_peb: count of fragments in PEB
+ * @fragment_bytes: count of bytes in one fragment
+ * @fragment_folios: count of memory folios in one fragment
+ * @flags: mapping table flags
+ * @lebs_count: count of LEBs are described by mapping table
+ * @pebs_count: count of PEBs are described by mapping table
+ * @lebs_per_fragment: count of LEB descriptors in fragment
+ * @pebs_per_fragment: count of PEB descriptors in fragment
+ * @pebs_per_stripe: count of PEB descriptors in stripe
+ * @stripes_per_fragment: count of stripes in fragment
+ * @extents: metadata extents that describe mapping table location
+ * @segs: array of pointers on segment objects
+ * @segs_count: count of segment objects are used for mapping table
+ * @state: mapping table's state
+ * @erase_op_state: state of erase operation
+ * @min_pre_erase_pebs: minimum number of PEBs in pre-erase state
+ * @total_pre_erase_pebs: total number of PEBs in pre-erase state
+ * @max_erase_ops: upper bound of erase operations for one iteration
+ * @erase_ops_end_wq: wait queue of threads are waiting end of erase operation
+ * @bmap_lock: dirty bitmap's lock
+ * @dirty_bmap: bitmap of dirty fragments
+ * @desc_array: array of fragment descriptors
+ * @wait_queue: wait queue of mapping table's thread
+ * @flush_end: wait of flush ending
+ * @thread: descriptor of mapping table's thread
+ * @fsi: pointer on shared file system object
+ */
+struct ssdfs_peb_mapping_table {
+	struct rw_semaphore tbl_lock;
+	u32 fragments_count;
+	u16 fragments_per_seg;
+	u16 fragments_per_peb;
+	u32 fragment_bytes;
+	u32 fragment_folios;
+	atomic_t flags;
+	u64 lebs_count;
+	u64 pebs_count;
+	u16 lebs_per_fragment;
+	u16 pebs_per_fragment;
+	u16 pebs_per_stripe;
+	u16 stripes_per_fragment;
+	struct ssdfs_meta_area_extent extents[MAPTBL_LIMIT1][MAPTBL_LIMIT2];
+	struct ssdfs_segment_info **segs[SSDFS_MAPTBL_SEG_COPY_MAX];
+	u16 segs_count;
+
+	atomic_t state;
+
+	atomic_t erase_op_state;
+	atomic_t min_pre_erase_pebs;
+	atomic_t total_pre_erase_pebs;
+	atomic_t max_erase_ops;
+	wait_queue_head_t erase_ops_end_wq;
+
+	atomic64_t last_peb_recover_cno;
+
+	struct mutex bmap_lock;
+	unsigned long *dirty_bmap;
+	struct ssdfs_maptbl_fragment_desc *desc_array;
+
+	wait_queue_head_t wait_queue;
+	struct completion flush_end;
+	struct ssdfs_thread_info thread;
+	struct ssdfs_fs_info *fsi;
+};
+
+/* PEB mapping table's state */
+enum {
+	SSDFS_MAPTBL_CREATED			= 0,
+	SSDFS_MAPTBL_GOING_TO_BE_DESTROY	= 1,
+	SSDFS_MAPTBL_STATE_MAX			= 2,
+};
+
+/*
+ * struct ssdfs_maptbl_peb_descriptor - PEB descriptor
+ * @peb_id: PEB identification number
+ * @shared_peb_index: index of external shared destination PEB
+ * @erase_cycles: P/E cycles
+ * @type: PEB type
+ * @state: PEB state
+ * @flags: PEB flags
+ * @consistency: PEB state consistency type
+ */
+struct ssdfs_maptbl_peb_descriptor {
+	u64 peb_id;
+	u8 shared_peb_index;
+	u32 erase_cycles;
+	u8 type;
+	u8 state;
+	u8 flags;
+	u8 consistency;
+};
+
+/*
+ * struct ssdfs_maptbl_peb_relation - PEBs association
+ * @pebs: array of PEB descriptors
+ */
+struct ssdfs_maptbl_peb_relation {
+	struct ssdfs_maptbl_peb_descriptor pebs[SSDFS_MAPTBL_RELATION_MAX];
+};
+
+/*
+ * Erase operation state
+ */
+enum {
+	SSDFS_MAPTBL_NO_ERASE,
+	SSDFS_MAPTBL_ERASE_IN_PROGRESS
+};
+
+/* Stage of recovering try */
+enum {
+	SSDFS_CHECK_RECOVERABILITY,
+	SSDFS_MAKE_RECOVERING,
+	SSDFS_RECOVER_STAGE_MAX
+};
+
+/* Possible states of erase operation */
+enum {
+	SSDFS_ERASE_RESULT_UNKNOWN,
+	SSDFS_ERASE_DONE,
+	SSDFS_ERASE_SB_PEB_DONE,
+	SSDFS_IGNORE_ERASE,
+	SSDFS_ERASE_FAILURE,
+	SSDFS_BAD_BLOCK_DETECTED,
+	SSDFS_ERASE_RESULT_MAX
+};
+
+/*
+ * struct ssdfs_erase_result - PEB's erase operation result
+ * @fragment_index: index of mapping table's fragment
+ * @peb_index: PEB's index in fragment
+ * @peb_id: PEB ID number
+ * @state: state of erase operation
+ */
+struct ssdfs_erase_result {
+	u32 fragment_index;
+	u16 peb_index;
+	u64 peb_id;
+	int state;
+};
+
+/*
+ * struct ssdfs_erase_result_array - array of erase operation results
+ * @ptr: pointer on memory buffer
+ * @capacity: maximal number of erase operation results in array
+ * @size: count of erase operation results in array
+ */
+struct ssdfs_erase_result_array {
+	struct ssdfs_erase_result *ptr;
+	u32 capacity;
+	u32 size;
+};
+
+#define SSDFS_ERASE_RESULTS_PER_FRAGMENT	(10)
+
+/*
+ * Inline functions
+ */
+
+/*
+ * SSDFS_ERASE_RESULT_INIT() - init erase result
+ * @fragment_index: index of mapping table's fragment
+ * @peb_index: PEB's index in fragment
+ * @peb_id: PEB ID number
+ * @state: state of erase operation
+ * @result: erase operation result [out]
+ *
+ * This method initializes the erase operation result.
+ */
+static inline
+void SSDFS_ERASE_RESULT_INIT(u32 fragment_index, u16 peb_index,
+			     u64 peb_id, int state,
+			     struct ssdfs_erase_result *result)
+{
+	result->fragment_index = fragment_index;
+	result->peb_index = peb_index;
+	result->peb_id = peb_id;
+	result->state = state;
+}
+
+/*
+ * DEFINE_PEB_INDEX_IN_FRAGMENT() - define PEB index in the whole fragment
+ * @fdesc: fragment descriptor
+ * @folio_index: folio index in the fragment
+ * @item_index: item index in the memory folio
+ */
+static inline
+u16 DEFINE_PEB_INDEX_IN_FRAGMENT(struct ssdfs_maptbl_fragment_desc *fdesc,
+				 pgoff_t folio_index,
+				 u16 item_index)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fdesc);
+	BUG_ON(folio_index < fdesc->lebtbl_pages);
+
+	SSDFS_DBG("fdesc %p, folio_index %lu, item_index %u\n",
+		  fdesc, folio_index, item_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	folio_index -= fdesc->lebtbl_pages;
+	folio_index *= fdesc->pebs_per_page;
+	folio_index += item_index;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(folio_index >= U16_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return (u16)folio_index;
+}
+
+/*
+ * GET_PEB_ID() - define PEB ID for the index
+ * @kaddr: pointer on memory folio's content
+ * @item_index: item index inside of the folio
+ *
+ * This method tries to convert @item_index into
+ * PEB ID value.
+ *
+ * RETURN:
+ * [success] - PEB ID
+ * [failure] - U64_MAX
+ */
+static inline
+u64 GET_PEB_ID(void *kaddr, u16 item_index)
+{
+	struct ssdfs_peb_table_fragment_header *hdr;
+	u64 start_peb;
+	u16 pebs_count;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!kaddr);
+
+	SSDFS_DBG("kaddr %p, item_index %u\n",
+		  kaddr, item_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	hdr = (struct ssdfs_peb_table_fragment_header *)kaddr;
+
+	if (le16_to_cpu(hdr->magic) != SSDFS_PEB_TABLE_MAGIC) {
+		SSDFS_ERR("corrupted folio\n");
+		return U64_MAX;
+	}
+
+	start_peb = le64_to_cpu(hdr->start_peb);
+	pebs_count = le16_to_cpu(hdr->pebs_count);
+
+	if (item_index >= pebs_count) {
+		SSDFS_ERR("item_index %u >= pebs_count %u\n",
+			  item_index, pebs_count);
+		return U64_MAX;
+	}
+
+	return start_peb + item_index;
+}
+
+/*
+ * PEBTBL_FOLIO_INDEX() - define PEB table folio index
+ * @fdesc: fragment descriptor
+ * @peb_index: index of PEB in the fragment
+ */
+static inline
+pgoff_t PEBTBL_FOLIO_INDEX(struct ssdfs_maptbl_fragment_desc *fdesc,
+			   u16 peb_index)
+{
+	pgoff_t folio_index;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fdesc);
+
+	SSDFS_DBG("fdesc %p, peb_index %u\n",
+		  fdesc, peb_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	folio_index = fdesc->lebtbl_pages;
+	folio_index += peb_index / fdesc->pebs_per_page;
+	return folio_index;
+}
+
+/*
+ * GET_PEB_DESCRIPTOR() - retrieve PEB descriptor
+ * @kaddr: pointer on memory folio's content
+ * @item_index: item index inside of the folio
+ *
+ * This method tries to return the pointer on
+ * PEB descriptor for @item_index.
+ *
+ * RETURN:
+ * [success] - pointer on PEB descriptor
+ * [failure] - error code:
+ *
+ * %-ERANGE     - internal error.
+ */
+static inline
+struct ssdfs_peb_descriptor *GET_PEB_DESCRIPTOR(void *kaddr, u16 item_index)
+{
+	struct ssdfs_peb_table_fragment_header *hdr;
+	u16 pebs_count;
+	u32 peb_desc_off;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!kaddr);
+
+	SSDFS_DBG("kaddr %p, item_index %u\n",
+		  kaddr, item_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	hdr = (struct ssdfs_peb_table_fragment_header *)kaddr;
+
+	if (le16_to_cpu(hdr->magic) != SSDFS_PEB_TABLE_MAGIC) {
+		SSDFS_ERR("corrupted folio\n");
+		return ERR_PTR(-ERANGE);
+	}
+
+	pebs_count = le16_to_cpu(hdr->pebs_count);
+
+	if (item_index >= pebs_count) {
+		SSDFS_ERR("item_index %u >= pebs_count %u\n",
+			  item_index, pebs_count);
+		return ERR_PTR(-ERANGE);
+	}
+
+	peb_desc_off = SSDFS_PEBTBL_FRAGMENT_HDR_SIZE;
+	peb_desc_off += item_index * sizeof(struct ssdfs_peb_descriptor);
+
+	if (peb_desc_off >= PAGE_SIZE) {
+		SSDFS_ERR("invalid offset %u\n", peb_desc_off);
+		return ERR_PTR(-ERANGE);
+	}
+
+	return (struct ssdfs_peb_descriptor *)((u8 *)kaddr + peb_desc_off);
+}
+
+/*
+ * SEG2PEB_TYPE() - convert segment into PEB type
+ */
+static inline
+int SEG2PEB_TYPE(int seg_type)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("seg_type %d\n", seg_type);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (seg_type) {
+	case SSDFS_USER_DATA_SEG_TYPE:
+		return SSDFS_MAPTBL_DATA_PEB_TYPE;
+
+	case SSDFS_LEAF_NODE_SEG_TYPE:
+		return SSDFS_MAPTBL_LNODE_PEB_TYPE;
+
+	case SSDFS_HYBRID_NODE_SEG_TYPE:
+		return SSDFS_MAPTBL_HNODE_PEB_TYPE;
+
+	case SSDFS_INDEX_NODE_SEG_TYPE:
+		return SSDFS_MAPTBL_IDXNODE_PEB_TYPE;
+
+	case SSDFS_INITIAL_SNAPSHOT_SEG_TYPE:
+		return SSDFS_MAPTBL_INIT_SNAP_PEB_TYPE;
+
+	case SSDFS_SB_SEG_TYPE:
+		return SSDFS_MAPTBL_SBSEG_PEB_TYPE;
+
+	case SSDFS_SEGBMAP_SEG_TYPE:
+		return SSDFS_MAPTBL_SEGBMAP_PEB_TYPE;
+
+	case SSDFS_MAPTBL_SEG_TYPE:
+		return SSDFS_MAPTBL_MAPTBL_PEB_TYPE;
+	}
+
+	return SSDFS_MAPTBL_PEB_TYPE_MAX;
+}
+
+/*
+ * PEB2SEG_TYPE() - convert PEB into segment type
+ */
+static inline
+int PEB2SEG_TYPE(int peb_type)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("peb_type %d\n", peb_type);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (peb_type) {
+	case SSDFS_MAPTBL_DATA_PEB_TYPE:
+		return SSDFS_USER_DATA_SEG_TYPE;
+
+	case SSDFS_MAPTBL_LNODE_PEB_TYPE:
+		return SSDFS_LEAF_NODE_SEG_TYPE;
+
+	case SSDFS_MAPTBL_HNODE_PEB_TYPE:
+		return SSDFS_HYBRID_NODE_SEG_TYPE;
+
+	case SSDFS_MAPTBL_IDXNODE_PEB_TYPE:
+		return SSDFS_INDEX_NODE_SEG_TYPE;
+
+	case SSDFS_MAPTBL_INIT_SNAP_PEB_TYPE:
+		return SSDFS_INITIAL_SNAPSHOT_SEG_TYPE;
+
+	case SSDFS_MAPTBL_SBSEG_PEB_TYPE:
+		return SSDFS_SB_SEG_TYPE;
+
+	case SSDFS_MAPTBL_SEGBMAP_PEB_TYPE:
+		return SSDFS_SEGBMAP_SEG_TYPE;
+
+	case SSDFS_MAPTBL_MAPTBL_PEB_TYPE:
+		return SSDFS_MAPTBL_SEG_TYPE;
+	}
+
+	return SSDFS_UNKNOWN_SEG_TYPE;
+}
+
+static inline
+bool is_ssdfs_maptbl_under_flush(struct ssdfs_fs_info *fsi)
+{
+	return atomic_read(&fsi->maptbl->flags) & SSDFS_MAPTBL_UNDER_FLUSH;
+}
+
+static inline
+bool is_ssdfs_maptbl_start_migration(struct ssdfs_fs_info *fsi)
+{
+	return atomic_read(&fsi->maptbl->flags) & SSDFS_MAPTBL_START_MIGRATION;
+}
+
+/*
+ * is_peb_protected() - check that PEB is protected
+ * @found_item: PEB index in the fragment
+ */
+static inline
+bool is_peb_protected(unsigned long found_item)
+{
+	unsigned long remainder;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("found_item %lu\n", found_item);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	remainder = found_item % SSDFS_MAPTBL_PROTECTION_STEP;
+	return remainder == 0;
+}
+
+static inline
+bool is_ssdfs_maptbl_going_to_be_destroyed(struct ssdfs_peb_mapping_table *tbl)
+{
+	return atomic_read(&tbl->state) == SSDFS_MAPTBL_GOING_TO_BE_DESTROY;
+}
+
+static inline
+void set_maptbl_going_to_be_destroyed(struct ssdfs_fs_info *fsi)
+{
+	atomic_set(&fsi->maptbl->state, SSDFS_MAPTBL_GOING_TO_BE_DESTROY);
+}
+
+static inline
+void ssdfs_account_updated_user_data_pages(struct ssdfs_fs_info *fsi,
+					   u32 count)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	u64 updated = 0;
+
+	BUG_ON(!fsi);
+
+	SSDFS_DBG("fsi %p, count %u\n",
+		  fsi, count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	spin_lock(&fsi->volume_state_lock);
+	fsi->updated_user_data_pages += count;
+#ifdef CONFIG_SSDFS_DEBUG
+	updated = fsi->updated_user_data_pages;
+#endif /* CONFIG_SSDFS_DEBUG */
+	spin_unlock(&fsi->volume_state_lock);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("updated %llu\n", updated);
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+int ssdfs_maptbl_change_peb_state(struct ssdfs_fs_info *fsi,
+				  u64 leb_id, u8 peb_type,
+				  int peb_state,
+				  struct completion **end);
+
+/*
+ * ssdfs_maptbl_wait_and_change_peb_state() - wait and change PEB state
+ * @fsi: file system info object
+ * @leb_id: LEB ID number
+ * @peb_type: type of the PEB
+ * @peb_state: new state of the PEB
+ */
+static inline
+int ssdfs_maptbl_wait_and_change_peb_state(struct ssdfs_fs_info *fsi,
+					   u64 leb_id, u8 peb_type,
+					   int peb_state)
+{
+	struct completion *end;
+	int number_of_tries = 0;
+	int err = 0;
+
+	err = ssdfs_maptbl_change_peb_state(fsi, leb_id,
+					    peb_type, peb_state,
+					    &end);
+	if (err == -EAGAIN) {
+wait_completion_end:
+		err = SSDFS_WAIT_COMPLETION(end);
+		if (unlikely(err)) {
+			SSDFS_ERR("waiting failed: "
+				  "err %d\n", err);
+			return err;
+		}
+
+		err = ssdfs_maptbl_change_peb_state(fsi,
+						    leb_id,
+						    peb_type,
+						    peb_state,
+						    &end);
+		if (err == -EAGAIN && is_ssdfs_maptbl_under_flush(fsi)) {
+			if (number_of_tries < SSDFS_MAX_NUMBER_OF_TRIES) {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("mapping table is flushing: "
+					  "leb_id %llu, peb_type %#x, "
+					  "new_state %#x, number_of_tries %d\n",
+					  leb_id, peb_type, peb_state,
+					  number_of_tries);
+#endif /* CONFIG_SSDFS_DEBUG */
+				number_of_tries++;
+				goto wait_completion_end;
+			}
+		}
+	}
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to change the PEB state: "
+			  "leb_id %llu, peb_type %#x, "
+			  "new_state %#x, err %d\n",
+			  leb_id, peb_type,
+			  peb_state, err);
+	}
+
+	return err;
+}
+
+/*
+ * PEB mapping table's API
+ */
+int ssdfs_maptbl_create(struct ssdfs_fs_info *fsi);
+void ssdfs_maptbl_destroy(struct ssdfs_fs_info *fsi);
+int ssdfs_maptbl_fragment_init(struct ssdfs_peb_container *pebc,
+				struct ssdfs_maptbl_area *area);
+int ssdfs_maptbl_flush(struct ssdfs_peb_mapping_table *tbl);
+int ssdfs_maptbl_resize(struct ssdfs_peb_mapping_table *tbl,
+			u64 new_pebs_count);
+
+int ssdfs_maptbl_convert_leb2peb(struct ssdfs_fs_info *fsi,
+				 u64 leb_id, u8 peb_type,
+				 struct ssdfs_maptbl_peb_relation *pebr,
+				 struct completion **end);
+int ssdfs_maptbl_map_leb2peb(struct ssdfs_fs_info *fsi,
+			     u64 leb_id, u8 peb_type,
+			     struct ssdfs_maptbl_peb_relation *pebr,
+			     struct completion **end);
+int ssdfs_maptbl_recommend_search_range(struct ssdfs_fs_info *fsi,
+					u64 *start_leb,
+					u64 *end_leb,
+					struct completion **end);
+int ssdfs_maptbl_change_peb_state(struct ssdfs_fs_info *fsi,
+				  u64 leb_id, u8 peb_type,
+				  int peb_state,
+				  struct completion **end);
+int ssdfs_maptbl_prepare_pre_erase_state(struct ssdfs_fs_info *fsi,
+					 u64 leb_id, u8 peb_type,
+					 struct completion **end);
+int ssdfs_maptbl_set_pre_erased_snapshot_peb(struct ssdfs_fs_info *fsi,
+					     u64 peb_id,
+					     struct completion **end);
+int ssdfs_maptbl_add_migration_peb(struct ssdfs_fs_info *fsi,
+				   u64 leb_id, u8 peb_type,
+				   struct ssdfs_maptbl_peb_relation *pebr,
+				   struct completion **end);
+int ssdfs_maptbl_exclude_migration_peb(struct ssdfs_fs_info *fsi,
+					u64 leb_id, u8 peb_type,
+					u64 peb_create_time,
+					u64 last_log_time,
+					struct completion **end);
+int ssdfs_maptbl_set_indirect_relation(struct ssdfs_peb_mapping_table *tbl,
+					u64 leb_id, u8 peb_type,
+					u64 dst_leb_id, u16 dst_peb_index,
+					struct completion **end);
+int ssdfs_maptbl_break_indirect_relation(struct ssdfs_peb_mapping_table *tbl,
+					 u64 leb_id, u8 peb_type,
+					 u64 dst_leb_id, int dst_peb_refs,
+					 struct completion **end);
+int ssdfs_maptbl_set_zns_indirect_relation(struct ssdfs_peb_mapping_table *tbl,
+					   u64 leb_id, u8 peb_type,
+					   struct completion **end);
+int ssdfs_maptbl_break_zns_indirect_relation(struct ssdfs_peb_mapping_table *tbl,
+					     u64 leb_id, u8 peb_type,
+					     struct completion **end);
+
+int ssdfs_reserve_free_pages(struct ssdfs_fs_info *fsi,
+			     u32 count, int type);
+
+/*
+ * It makes sense to have special thread for the whole mapping table.
+ * The goal of the thread will be clearing of dirty PEBs,
+ * tracking P/E cycles, excluding bad PEBs and recovering PEBs
+ * in the background. Knowledge about PEBs will be hidden by
+ * mapping table. All other subsystems will operate by LEBs.
+ */
+
+/*
+ * PEB mapping table's internal API
+ */
+int ssdfs_maptbl_start_thread(struct ssdfs_peb_mapping_table *tbl);
+int ssdfs_maptbl_stop_thread(struct ssdfs_peb_mapping_table *tbl);
+
+int ssdfs_maptbl_define_fragment_info(struct ssdfs_fs_info *fsi,
+				      u64 leb_id,
+				      u16 *pebs_per_fragment,
+				      u16 *pebs_per_stripe,
+				      u16 *stripes_per_fragment);
+struct ssdfs_maptbl_fragment_desc *
+ssdfs_maptbl_get_fragment_descriptor(struct ssdfs_peb_mapping_table *tbl,
+				     u64 leb_id);
+void ssdfs_maptbl_set_fragment_dirty(struct ssdfs_peb_mapping_table *tbl,
+				     struct ssdfs_maptbl_fragment_desc *fdesc,
+				     u64 leb_id, u8 peb_type);
+int ssdfs_maptbl_solve_inconsistency(struct ssdfs_peb_mapping_table *tbl,
+				     struct ssdfs_maptbl_fragment_desc *fdesc,
+				     u64 leb_id,
+				     struct ssdfs_maptbl_peb_relation *pebr);
+int ssdfs_maptbl_solve_pre_deleted_state(struct ssdfs_peb_mapping_table *tbl,
+				     struct ssdfs_maptbl_fragment_desc *fdesc,
+				     u64 leb_id,
+				     struct ssdfs_maptbl_peb_relation *pebr);
+void ssdfs_maptbl_move_fragment_folios(struct ssdfs_segment_request *req,
+					struct ssdfs_maptbl_area *area,
+					u16 folios_count);
+int ssdfs_maptbl_erase_peb(struct ssdfs_fs_info *fsi,
+			   struct ssdfs_erase_result *result);
+int ssdfs_maptbl_correct_dirty_peb(struct ssdfs_peb_mapping_table *tbl,
+				   struct ssdfs_maptbl_fragment_desc *fdesc,
+				   struct ssdfs_erase_result *result);
+int __ssdfs_maptbl_correct_peb_state(struct ssdfs_peb_mapping_table *tbl,
+				     struct ssdfs_maptbl_fragment_desc *fdesc,
+				     struct ssdfs_peb_table_fragment_header *hdr,
+				     struct ssdfs_erase_result *res);
+int ssdfs_maptbl_erase_reserved_peb_now(struct ssdfs_fs_info *fsi,
+					u64 leb_id, u8 peb_type,
+					struct completion **end);
+int ssdfs_maptbl_erase_dirty_pebs_now(struct ssdfs_peb_mapping_table *tbl);
+
+void ssdfs_debug_maptbl_object(struct ssdfs_peb_mapping_table *tbl);
+
+#endif /* _SSDFS_PEB_MAPPING_TABLE_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 40/79] ssdfs: introduce PEB mapping table cache
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (14 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 38/79] ssdfs: introduce PEB mapping table Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 42/79] ssdfs: introduce segment bitmap Viacheslav Dubeyko
                   ` (16 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

"Physical" Erase Block (PEB) mapping table is enhanced by special
cache is stored in the payload of superblock segment’s log.
Generally speaking, the cache stores the copy of records of
PEBs’ state. The goal of PEB mapping table’s cache is to resolve
the case when a PEB’s descriptor is associated with a LEB of
PEB mapping table itself, for example. If unmount operation
triggers the flush of PEB mapping table then there are the cases
when the PEB mapping table could be modified during the flush
operation’s activity. As a result, actual PEB’s state is stored
only into PEB mapping table’s cache. Such record is marked as
inconsistent and the inconsistency has to be resolved during
the next mount operation by means of storing the actual PEB’s state
into the PEB mapping table by specialized thread. Moreover, the cache
plays another very important role. Namely, PEB mapping table’s cache
is used for conversion the LEB ID into PEB ID for the case of
basic metadata structures (PEB mapping table, segment bitmap,
for example) before the finishing of PEB mapping table initialization
during the mount operation.

PEB mapping table’s cache starts from the header that precedes
to: (1) LEB ID / PEB ID pairs, (2) PEB state records. The pairs’ area
associates the LEB IDs with PEB IDs. Additionally, PEB state records’ area
contains information about the last actual state of PEBs for every
record in the pairs’ area. It makes sense to point out that the most
important fields in PEB state area are: (1) consistency, (2) PEB state,
and (3) PEB flags. Generally speaking, the consistency field simply
shows that a record in the cache and mapping table is identical or not.
If some record in the cache has marked as inconsistent then it means
that the PEB mapping table has to be modified with the goal to keep
the actual value of the cache. As a result, finally, the value in the table
and the cache will be consistent.

"Physical" Erase Block (PEB) mapping table cache supports
operations:
(1) convert LEB to PEB - convert LEB to PEB if mapping table
                         is not initialized yet
(2) map LEB to PEB - cache information about LEB to PEB mapping
(3) forget LEB to PEB - exclude information about LEB to PEB mapping
                        from the cache
(4) change PEB state - update cached information about LEB to PEB mapping
(5) add migration PEB - cache information about migration destination
(6) exclude migration PEB - exclude information about migration destination

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/peb_mapping_table_cache.h | 120 +++++++++++++++++++++++++++++
 1 file changed, 120 insertions(+)
 create mode 100644 fs/ssdfs/peb_mapping_table_cache.h

diff --git a/fs/ssdfs/peb_mapping_table_cache.h b/fs/ssdfs/peb_mapping_table_cache.h
new file mode 100644
index 000000000000..0e738660eac2
--- /dev/null
+++ b/fs/ssdfs/peb_mapping_table_cache.h
@@ -0,0 +1,120 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/peb_mapping_table_cache.h - PEB mapping table cache declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_PEB_MAPPING_TABLE_CACHE_H
+#define _SSDFS_PEB_MAPPING_TABLE_CACHE_H
+
+#include <linux/ssdfs_fs.h>
+
+/*
+ * struct ssdfs_maptbl_cache - maptbl cache
+ * @lock: lock of maptbl cache
+ * @batch: memory folios of maptbl cache
+ * @bytes_count: count of bytes in maptbl cache
+ * @pm_queue: PEB mappings queue
+ */
+struct ssdfs_maptbl_cache {
+	struct rw_semaphore lock;
+	struct folio_batch batch;
+	atomic_t bytes_count;
+
+	struct ssdfs_peb_mapping_queue pm_queue;
+};
+
+/*
+ * struct ssdfs_maptbl_cache_item - cache item descriptor
+ * @folio_index: index of the found memory folio
+ * @item_index: item of found index
+ * @found: found LEB2PEB pair
+ */
+struct ssdfs_maptbl_cache_item {
+#define SSDFS_MAPTBL_CACHE_ITEM_UNKNOWN		(0)
+#define SSDFS_MAPTBL_CACHE_ITEM_FOUND		(1)
+#define SSDFS_MAPTBL_CACHE_ITEM_ABSENT		(2)
+#define SSDFS_MAPTBL_CACHE_SEARCH_ERROR		(3)
+#define SSDFS_MAPTBL_CACHE_SEARCH_MAX		(4)
+	int state;
+	unsigned folio_index;
+	u16 item_index;
+	struct ssdfs_leb2peb_pair found;
+};
+
+#define SSDFS_MAPTBL_MAIN_INDEX		(0)
+#define SSDFS_MAPTBL_RELATION_INDEX	(1)
+#define SSDFS_MAPTBL_RELATION_MAX	(2)
+
+/*
+ * struct ssdfs_maptbl_cache_search_result - PEBs association
+ * @pebs: array of PEB descriptors
+ */
+struct ssdfs_maptbl_cache_search_result {
+	struct ssdfs_maptbl_cache_item pebs[SSDFS_MAPTBL_RELATION_MAX];
+};
+
+struct ssdfs_maptbl_peb_relation;
+
+/*
+ * PEB mapping table cache's API
+ */
+void ssdfs_maptbl_cache_init(struct ssdfs_maptbl_cache *cache);
+void ssdfs_maptbl_cache_destroy(struct ssdfs_maptbl_cache *cache);
+
+int ssdfs_maptbl_cache_convert_leb2peb(struct ssdfs_maptbl_cache *cache,
+					u64 leb_id,
+					struct ssdfs_maptbl_peb_relation *pebr);
+int ssdfs_maptbl_cache_map_leb2peb(struct ssdfs_maptbl_cache *cache,
+				   u64 leb_id,
+				   struct ssdfs_maptbl_peb_relation *pebr,
+				   int consistency);
+int ssdfs_maptbl_cache_forget_leb2peb(struct ssdfs_maptbl_cache *cache,
+				      u64 leb_id,
+				      int consistency);
+int ssdfs_maptbl_cache_change_peb_state(struct ssdfs_maptbl_cache *cache,
+					u64 leb_id, int peb_state,
+					int consistency);
+int ssdfs_maptbl_cache_add_migration_peb(struct ssdfs_maptbl_cache *cache,
+					u64 leb_id,
+					struct ssdfs_maptbl_peb_relation *pebr,
+					int consistency);
+int ssdfs_maptbl_cache_exclude_migration_peb(struct ssdfs_maptbl_cache *cache,
+					     u64 leb_id,
+					     int consistency);
+
+/*
+ * PEB mapping table cache's internal API
+ */
+struct folio *
+ssdfs_maptbl_cache_add_batch_folio(struct ssdfs_maptbl_cache *cache);
+int ssdfs_maptbl_cache_convert_leb2peb_nolock(struct ssdfs_maptbl_cache *cache,
+					 u64 leb_id,
+					 struct ssdfs_maptbl_peb_relation *pebr);
+int __ssdfs_maptbl_cache_convert_leb2peb(struct ssdfs_maptbl_cache *cache,
+					 u64 leb_id,
+					 struct ssdfs_maptbl_peb_relation *pebr);
+int ssdfs_maptbl_cache_change_peb_state_nolock(struct ssdfs_maptbl_cache *cache,
+						u64 leb_id, int peb_state,
+						int consistency);
+int ssdfs_maptbl_cache_forget_leb2peb_nolock(struct ssdfs_maptbl_cache *cache,
+					     u64 leb_id,
+					     int consistency);
+
+#endif /* _SSDFS_PEB_MAPPING_TABLE_CACHE_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 42/79] ssdfs: introduce segment bitmap
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (15 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 40/79] ssdfs: introduce PEB mapping table cache Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 44/79] ssdfs: introduce b-tree object Viacheslav Dubeyko
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

Segment bitmap is the critical metadata structure of SSDFS file
system that implements several goals: (1) searching for a candidate
for a current segment capable of storing new data, (2) searching
by GC subsystem for a most optimal segment (dirty state,
for example) with the goal of preparing the segment in background
for storing new data (converting in clean state). Segment bitmap is
able to represent a set of states: (1) clean state means that
a segment contains the free logical blocks only, (2) using state means
that a segment could contain valid, invalid, and free logical blocks,
(3) used state means that a segment contains the valid logical
blocks only, (4) pre-dirty state means that a segment contains valid
and invalid logical blocks, (5) dirty state means that a segment
contains only invalid blocks, (6) reserved state is used for
reservation the segment numbers for some metadata structures
(for example, for the case of superblock segment).

PEB migration scheme implies that segments are able to migrate from
one state into another one without the explicit using of GC subsystem.
For example, if some segment receives enough truncate operations (data
invalidation) then the segment could change the used state in pre-dirty
state. Additionally, the segment is able to migrate from pre-dirty into
using state by means of PEBs migration in the case of receiving enough
data update requests. As a result, the segment in using state could be
selected like the current segment without any GC-related activity. However,
a segment is able to stick in pre-dirty state in the case of absence
the update requests. Finally, such situation can be resolved by GC
subsystem by means of valid blocks migration in the background and pre-dirty
segment can be transformed into the using state.

Segment bitmap is implemented like the bitmap metadata structure
that is split on several fragments. Every fragment is stored into a log of
specialized PEB. As a result, the full size of segment bitmap and
PEB’s capacity define the number of fragments. The mkfs utility reserves
the necessary number of segments for storing the segment bitmap’s fragments
during a SSDFS file system’s volume creation. Finally, the numbers of
reserved segments are stored into the superblock metadata structure.
The segment bitmap ”lives” in the same set of reserved segments during
the whole lifetime of the volume. However, the update operations of segment
bitmap could trigger the PEBs migration in the case of exhaustion of any
PEB used for keeping the segment bitmap’s content.

Segment bitmap implements API:
(1) create - create empty segment bitmap object
(2) destroy - destroy segment bitmap object
(3) fragment_init - init fragment of segment bitmap
(4) flush - flush dirty segment bitmap
(5) check_state - check that segment has particular state
(6) get_state - get current state of particular segment
(7) change_state - change state of segment
(8) find - find segment for requested state or state mask
(9) find_and_set - find segment for requested state and change state

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/segment_bitmap.h | 482 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 482 insertions(+)
 create mode 100644 fs/ssdfs/segment_bitmap.h

diff --git a/fs/ssdfs/segment_bitmap.h b/fs/ssdfs/segment_bitmap.h
new file mode 100644
index 000000000000..286e4d8fedf5
--- /dev/null
+++ b/fs/ssdfs/segment_bitmap.h
@@ -0,0 +1,482 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/segment_bitmap.h - segment bitmap declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Badic
+ */
+
+#ifndef _SSDFS_SEGMENT_BITMAP_H
+#define _SSDFS_SEGMENT_BITMAP_H
+
+#include "common_bitmap.h"
+#include "request_queue.h"
+#include "folio_array.h"
+
+/* Segment states */
+enum {
+	SSDFS_SEG_CLEAN				= 0x0,
+	SSDFS_SEG_DATA_USING			= 0x1,
+	SSDFS_SEG_LEAF_NODE_USING		= 0x2,
+	SSDFS_SEG_HYBRID_NODE_USING		= 0x5,
+	SSDFS_SEG_INDEX_NODE_USING		= 0x3,
+	SSDFS_SEG_USED				= 0x7,
+	SSDFS_SEG_PRE_DIRTY			= 0x6,
+	SSDFS_SEG_DIRTY				= 0x4,
+	SSDFS_SEG_BAD				= 0x8,
+	SSDFS_SEG_RESERVED			= 0x9,
+	SSDFS_SEG_DATA_USING_INVALIDATED	= 0xA,
+	SSDFS_SEG_STATE_MAX			= 0xB
+};
+
+/* Segment state flags */
+#define SSDFS_SEG_CLEAN_STATE_FLAG			(1 << 0)
+#define SSDFS_SEG_DATA_USING_STATE_FLAG			(1 << 1)
+#define SSDFS_SEG_LEAF_NODE_USING_STATE_FLAG		(1 << 2)
+#define SSDFS_SEG_HYBRID_NODE_USING_STATE_FLAG		(1 << 3)
+#define SSDFS_SEG_INDEX_NODE_USING_STATE_FLAG		(1 << 4)
+#define SSDFS_SEG_USED_STATE_FLAG			(1 << 5)
+#define SSDFS_SEG_PRE_DIRTY_STATE_FLAG			(1 << 6)
+#define SSDFS_SEG_DIRTY_STATE_FLAG			(1 << 7)
+#define SSDFS_SEG_BAD_STATE_FLAG			(1 << 8)
+#define SSDFS_SEG_RESERVED_STATE_FLAG			(1 << 9)
+#define SSDFS_SEG_DATA_USING_INVALIDATED_STATE_FLAG	(1 << 10)
+
+/* Segment state masks */
+#define SSDFS_SEG_CLEAN_USING_MASK \
+	(SSDFS_SEG_CLEAN_STATE_FLAG | \
+	 SSDFS_SEG_DATA_USING_STATE_FLAG | \
+	 SSDFS_SEG_LEAF_NODE_USING_STATE_FLAG | \
+	 SSDFS_SEG_HYBRID_NODE_USING_STATE_FLAG | \
+	 SSDFS_SEG_INDEX_NODE_USING_STATE_FLAG | \
+	 SSDFS_SEG_DATA_USING_INVALIDATED_STATE_FLAG)
+#define SSDFS_SEG_USED_DIRTY_MASK \
+	(SSDFS_SEG_USED_STATE_FLAG | \
+	 SSDFS_SEG_PRE_DIRTY_STATE_FLAG | \
+	 SSDFS_SEG_DIRTY_STATE_FLAG)
+#define SSDFS_SEG_BAD_STATE_MASK \
+	(SSDFS_SEG_BAD_STATE_FLAG)
+
+#define SSDFS_SEG_STATE_BITS	4
+#define SSDFS_SEG_STATE_MASK	0xF
+
+struct ssdfs_segment_bmap;
+
+/*
+ * struct ssdfs_segbmap_fragment_desc - fragment descriptor
+ * @state: fragment's state
+ * @fragment_id: fragment's ID in the whole sequence
+ * @total_segs: total count of segments in fragment
+ * @clean_or_using_segs: count of clean or using segments in fragment
+ * @used_or_dirty_segs: count of used, pre-dirty, dirty or reserved segments
+ * @bad_segs: count of bad segments in fragment
+ * @init_end: wait of init ending
+ * @flush_pairs: array of flush requests
+ * @segbmap: pointer on segment bitmap object
+ * @frag_kobj: fragment kobject for sysfs
+ * @frag_kobj_unregister: completion for fragment kobject cleanup
+ */
+struct ssdfs_segbmap_fragment_desc {
+	int state;
+	u16 fragment_id;
+	u16 total_segs;
+	u16 clean_or_using_segs;
+	u16 used_or_dirty_segs;
+	u16 bad_segs;
+	struct completion init_end;
+
+#define SSDFS_SEGBMAP_FLUSH_REQS_MAX		(2)
+	struct {
+		struct ssdfs_segment_info *si;
+		struct ssdfs_segment_request req;
+	} flush_pairs[SSDFS_SEGBMAP_FLUSH_REQS_MAX];
+
+	struct ssdfs_segment_bmap *segbmap;
+
+	/* /sys/fs/<ssdfs>/<device>/segbmap/fragments/fragment<N> */
+	struct kobject frag_kobj;
+	struct completion frag_kobj_unregister;
+};
+
+/* Fragment's state */
+enum {
+	SSDFS_SEGBMAP_FRAG_CREATED	= 0,
+	SSDFS_SEGBMAP_FRAG_INIT_FAILED	= 1,
+	SSDFS_SEGBMAP_FRAG_INITIALIZED	= 2,
+	SSDFS_SEGBMAP_FRAG_DIRTY	= 3,
+	SSDFS_SEGBMAP_FRAG_TOWRITE	= 4,
+	SSDFS_SEGBMAP_FRAG_STATE_MAX	= 5,
+};
+
+/* Fragments bitmap types */
+enum {
+	SSDFS_SEGBMAP_CLEAN_USING_FBMAP,
+	SSDFS_SEGBMAP_USED_DIRTY_FBMAP,
+	SSDFS_SEGBMAP_BAD_FBMAP,
+	SSDFS_SEGBMAP_MODIFICATION_FBMAP,
+	SSDFS_SEGBMAP_FBMAP_TYPE_MAX,
+};
+
+/*
+ * struct ssdfs_segment_bmap - segments bitmap
+ * @resize_lock: lock for possible resize operation
+ * @flags: bitmap flags
+ * @bytes_count: count of bytes in the whole segment bitmap
+ * @items_count: count of volume's segments
+ * @fragments_count: count of fragments in the whole segment bitmap
+ * @fragments_per_seg: segbmap's fragments per segment
+ * @fragments_per_peb: segbmap's fragments per PEB
+ * @fragment_size: size of fragment in bytes
+ * @seg_numbers: array of segment bitmap's segment numbers
+ * @segs_count: count of segment objects are used for segment bitmap
+ * @segs: array of pointers on segment objects
+ * @search_lock: lock for search and change state operations
+ * @fbmap: array of fragment bitmaps
+ * @desc_array: array of fragments' descriptors
+ * @folios: memory folios of the whole segment bitmap
+ * @fsi: pointer on shared file system object
+ */
+struct ssdfs_segment_bmap {
+	struct rw_semaphore resize_lock;
+	u16 flags;
+	u32 bytes_count;
+	u64 items_count;
+	u16 fragments_count;
+	u16 fragments_per_seg;
+	u16 fragments_per_peb;
+	u16 fragment_size;
+#define SEGS_LIMIT1	SSDFS_SEGBMAP_SEGS
+#define SEGS_LIMIT2	SSDFS_SEGBMAP_SEG_COPY_MAX
+	u64 seg_numbers[SEGS_LIMIT1][SEGS_LIMIT2];
+	u16 segs_count;
+	struct ssdfs_segment_info *segs[SEGS_LIMIT1][SEGS_LIMIT2];
+
+	struct rw_semaphore search_lock;
+	unsigned long *fbmap[SSDFS_SEGBMAP_FBMAP_TYPE_MAX];
+	struct ssdfs_segbmap_fragment_desc *desc_array;
+	struct ssdfs_folio_array folios;
+
+	struct ssdfs_fs_info *fsi;
+};
+
+/*
+ * Inline functions
+ */
+static inline
+u32 SEG_BMAP_BYTES(u64 items_count)
+{
+	u64 bytes;
+
+	bytes = items_count + SSDFS_ITEMS_PER_BYTE(SSDFS_SEG_STATE_BITS) - 1;
+	bytes /= SSDFS_ITEMS_PER_BYTE(SSDFS_SEG_STATE_BITS);
+
+	BUG_ON(bytes >= U32_MAX);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("items_count %llu, bytes %llu\n",
+		  items_count, bytes);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return (u32)bytes;
+}
+
+static inline
+u16 SEG_BMAP_FRAGMENTS(u64 items_count)
+{
+	u32 hdr_size = sizeof(struct ssdfs_segbmap_fragment_header);
+	u32 bytes = SEG_BMAP_BYTES(items_count);
+	u32 pages, fragments;
+
+	pages = (bytes + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	bytes += pages * hdr_size;
+
+	fragments = (bytes + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	BUG_ON(fragments >= U16_MAX);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("items_count %llu, pages %u, "
+		  "bytes %u, fragments %u\n",
+		  items_count, pages,
+		  bytes, fragments);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return (u16)fragments;
+}
+
+static inline
+u16 ssdfs_segbmap_seg_2_fragment_index(u64 seg)
+{
+	u16 fragments_count = SEG_BMAP_FRAGMENTS(seg + 1);
+
+	BUG_ON(fragments_count == 0);
+	return fragments_count - 1;
+}
+
+static inline
+u32 ssdfs_segbmap_items_per_fragment(size_t fragment_size)
+{
+	u32 hdr_size = sizeof(struct ssdfs_segbmap_fragment_header);
+	u32 payload_bytes;
+	u64 items;
+
+	BUG_ON(hdr_size >= fragment_size);
+
+	payload_bytes = fragment_size - hdr_size;
+	items = payload_bytes * SSDFS_ITEMS_PER_BYTE(SSDFS_SEG_STATE_BITS);
+
+	BUG_ON(items >= U32_MAX);
+
+	return (u32)items;
+}
+
+static inline
+u64 ssdfs_segbmap_define_first_fragment_item(pgoff_t fragment_index,
+					     size_t fragment_size)
+{
+	return fragment_index * ssdfs_segbmap_items_per_fragment(fragment_size);
+}
+
+static inline
+u32 ssdfs_segbmap_get_item_byte_offset(u32 fragment_item)
+{
+	u32 hdr_size = sizeof(struct ssdfs_segbmap_fragment_header);
+	u32 items_per_byte = SSDFS_ITEMS_PER_BYTE(SSDFS_SEG_STATE_BITS);
+	return hdr_size + (fragment_item / items_per_byte);
+}
+
+static inline
+int ssdfs_segbmap_seg_id_2_seg_index(struct ssdfs_segment_bmap *segbmap,
+				     u64 seg_id)
+{
+	int i;
+
+	if (seg_id == U64_MAX)
+		return -ENODATA;
+
+	for (i = 0; i < segbmap->segs_count; i++) {
+		if (seg_id == segbmap->seg_numbers[i][SSDFS_MAIN_SEGBMAP_SEG])
+			return i;
+		if (seg_id == segbmap->seg_numbers[i][SSDFS_COPY_SEGBMAP_SEG])
+			return i;
+	}
+
+	return -ENODATA;
+}
+
+static inline
+bool ssdfs_segbmap_fragment_has_content(struct folio *folio)
+{
+	bool has_content = false;
+	void *kaddr;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!folio);
+
+	SSDFS_DBG("folio %p\n", folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	kaddr = kmap_local_folio(folio, 0);
+	if (memchr_inv(kaddr, 0xff, PAGE_SIZE) != NULL)
+		has_content = true;
+	kunmap_local(kaddr);
+
+	return has_content;
+}
+
+static inline
+bool IS_STATE_GOOD_FOR_MASK(int mask, int state)
+{
+	bool is_good = false;
+
+	switch (state) {
+	case SSDFS_SEG_CLEAN:
+		is_good = mask & SSDFS_SEG_CLEAN_STATE_FLAG;
+		break;
+
+	case SSDFS_SEG_DATA_USING:
+		is_good = mask & SSDFS_SEG_DATA_USING_STATE_FLAG;
+		break;
+
+	case SSDFS_SEG_DATA_USING_INVALIDATED:
+		is_good = mask & SSDFS_SEG_DATA_USING_INVALIDATED_STATE_FLAG;
+		break;
+
+	case SSDFS_SEG_LEAF_NODE_USING:
+		is_good = mask & SSDFS_SEG_LEAF_NODE_USING_STATE_FLAG;
+		break;
+
+	case SSDFS_SEG_HYBRID_NODE_USING:
+		is_good = mask & SSDFS_SEG_HYBRID_NODE_USING_STATE_FLAG;
+		break;
+
+	case SSDFS_SEG_INDEX_NODE_USING:
+		is_good = mask & SSDFS_SEG_INDEX_NODE_USING_STATE_FLAG;
+		break;
+
+	case SSDFS_SEG_USED:
+		is_good = mask & SSDFS_SEG_USED_STATE_FLAG;
+		break;
+
+	case SSDFS_SEG_PRE_DIRTY:
+		is_good = mask & SSDFS_SEG_PRE_DIRTY_STATE_FLAG;
+		break;
+
+	case SSDFS_SEG_DIRTY:
+		is_good = mask & SSDFS_SEG_DIRTY_STATE_FLAG;
+		break;
+
+	case SSDFS_SEG_BAD:
+		is_good = mask & SSDFS_SEG_BAD_STATE_FLAG;
+		break;
+
+	case SSDFS_SEG_RESERVED:
+		is_good = mask & SSDFS_SEG_RESERVED_STATE_FLAG;
+		break;
+
+	default:
+		BUG();
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("mask %#x, state %#x, is_good %#x\n",
+		  mask, state, is_good);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return is_good;
+}
+
+static inline
+void ssdfs_debug_segbmap_object(struct ssdfs_segment_bmap *bmap)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	int i, j;
+	size_t bytes;
+
+	BUG_ON(!bmap);
+
+	SSDFS_DBG("flags %#x, bytes_count %u, items_count %llu, "
+		  "fragments_count %u, fragments_per_seg %u, "
+		  "fragments_per_peb %u, fragment_size %u\n",
+		  bmap->flags, bmap->bytes_count, bmap->items_count,
+		  bmap->fragments_count, bmap->fragments_per_seg,
+		  bmap->fragments_per_peb, bmap->fragment_size);
+
+	for (i = 0; i < SSDFS_SEGBMAP_SEGS; i++) {
+		for (j = 0; j < SSDFS_SEGBMAP_SEG_COPY_MAX; j++) {
+			SSDFS_DBG("seg_numbers[%d][%d] = %llu\n",
+				  i, j, bmap->seg_numbers[i][j]);
+		}
+	}
+
+	SSDFS_DBG("segs_count %u\n", bmap->segs_count);
+
+	for (i = 0; i < SSDFS_SEGBMAP_SEGS; i++) {
+		for (j = 0; j < SSDFS_SEGBMAP_SEG_COPY_MAX; j++) {
+			SSDFS_DBG("segs[%d][%d] = %p\n",
+				  i, j, bmap->segs[i][j]);
+		}
+	}
+
+	bytes = bmap->fragments_count + BITS_PER_LONG - 1;
+	bytes /= BITS_PER_BYTE;
+
+	for (i = 0; i < SSDFS_SEGBMAP_FBMAP_TYPE_MAX; i++) {
+		SSDFS_DBG("fbmap[%d]\n", i);
+		print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+					bmap->fbmap[i], bytes);
+	}
+
+	for (i = 0; i < bmap->fragments_count; i++) {
+		struct ssdfs_segbmap_fragment_desc *desc;
+
+		desc = &bmap->desc_array[i];
+
+		SSDFS_DBG("state %#x, total_segs %u, "
+			  "clean_or_using_segs %u, used_or_dirty_segs %u, "
+			  "bad_segs %u\n",
+			  desc->state, desc->total_segs,
+			  desc->clean_or_using_segs,
+			  desc->used_or_dirty_segs,
+			  desc->bad_segs);
+	}
+
+	for (i = 0; i < bmap->fragments_count; i++) {
+		struct folio *folio;
+		void *kaddr;
+
+		folio = ssdfs_folio_array_get_folio_locked(&bmap->folios, i);
+
+		SSDFS_DBG("folio[%d] %p\n", i, folio);
+		if (!folio)
+			continue;
+
+		SSDFS_DBG("folio_index %llu, flags %#lx\n",
+			  (u64)folio->index, folio->flags.f);
+
+		kaddr = kmap_local_folio(folio, 0);
+		print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+					kaddr, PAGE_SIZE);
+		kunmap_local(kaddr);
+
+		ssdfs_folio_unlock(folio);
+		ssdfs_folio_put(folio);
+
+		SSDFS_DBG("folio %p, count %d\n",
+			  folio, folio_ref_count(folio));
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+/*
+ * Segment bitmap's API
+ */
+int ssdfs_segbmap_create(struct ssdfs_fs_info *fsi);
+void ssdfs_segbmap_destroy(struct ssdfs_fs_info *fsi);
+int ssdfs_segbmap_check_fragment_header(struct ssdfs_peb_container *pebc,
+					u16 seg_index,
+					u16 sequence_id,
+					struct folio *folio);
+int ssdfs_segbmap_fragment_init(struct ssdfs_peb_container *pebc,
+				u16 sequence_id,
+				struct folio *folio,
+				int state);
+int ssdfs_segbmap_flush(struct ssdfs_segment_bmap *segbmap);
+int ssdfs_segbmap_resize(struct ssdfs_segment_bmap *segbmap,
+			 u64 new_items_count);
+
+int ssdfs_segbmap_check_state(struct ssdfs_segment_bmap *segbmap,
+				u64 seg, int state,
+				struct completion **end);
+int ssdfs_segbmap_get_state(struct ssdfs_segment_bmap *segbmap,
+			    u64 seg, struct completion **end);
+int ssdfs_segbmap_change_state(struct ssdfs_segment_bmap *segbmap,
+				u64 seg, int new_state,
+				struct completion **end);
+int ssdfs_segbmap_find(struct ssdfs_segment_bmap *segbmap,
+			u64 start, u64 max,
+			int state, int mask,
+			u64 *seg, struct completion **end);
+int ssdfs_segbmap_find_and_set(struct ssdfs_segment_bmap *segbmap,
+				u64 start, u64 max,
+				int state, int mask,
+				int new_state,
+				u64 *seg, struct completion **end);
+int ssdfs_segbmap_reserve_clean_segment(struct ssdfs_segment_bmap *segbmap,
+					u64 start, u64 max,
+					u64 *seg, struct completion **end);
+
+#endif /* _SSDFS_SEGMENT_BITMAP_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 44/79] ssdfs: introduce b-tree object
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (16 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 42/79] ssdfs: introduce segment bitmap Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 46/79] ssdfs: introduce b-tree node object Viacheslav Dubeyko
                   ` (14 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

SSDFS file system is using the logical segment, logical extent
concepts, and the "Physical" Erase Blocks (PEB) migration scheme.
Generally speaking, these techniques provide the opportunity
to exclude completely the wandering tree issue and to decrease
significantly the write amplification. SSDFS file system introduces
the technique of storing the data on the basis of logical extent
that describes this data’s position by means of segment ID and
logical block ID. Finally, PEBs migration technique guarantee that
data will be described by the same logical extent until the direct
change of segment ID or logical block ID. As a result, it means that
logical extent will be the same if data is sitting in the same logical
segment. The responsibility of PEBs migration technique is to implement
the continuous migration of data between PEBs inside of the logical
segment for the case of data updates. Generally speaking, SSDFS file
system’s internal techniques guarantee that COW policy will not update
the content of b-tree. But content of b-tree will be updated only by
regular operations of end-user with the file system.

SSDFS file system uses b-tree architecture for metadata representation
(for example, inodes tree, extents tree, dentries tree, xattr tree)
because it provides the compact way of reserving the metadata space
without the necessity to use the excessive overprovisioning of metadata
reservation (for example, in the case of plain table or array).

The b-tree provides the efficient technique of items lookup, especially,
for the case of aged or sparse b-tree that is capable to contain
the mixture of used and deleted (or freed) items. Such b-tree’s feature
could be very useful for the case of extent invalidation, for example.
Also SSDFS file system aggregates the b-tree’s root node in the superblock
(for example, inodes tree case) or in the inode (for example, extents tree
case). As a result, it means that an empty b-tree will contain only
the root node without the necessity to reserve any b-tree’s node on the
file system’s volume. Moreover, if a b-tree needs to contain only several
items (two items, for example) then the root node’s space can be used to
store these items inline without the necessity to create the full-featured
b-tree’s node. As a result, SSDFS uses b-trees with the goal to achieve
the compact representation of metadata, the flexible way to expend or
to shrink the b-tree’s space capacity, and the efficient mechanism of
items’ lookup.

SSDFS file system uses a hybrid b-tree architecture with the goal
to eliminate the index nodes’ side effect. The hybrid b-tree operates
by three node types: (1) index node, (2) hybrid node, (3) leaf node.
Generally speaking, the peculiarity of hybrid node is the mixture
as index as data records into one node. Hybrid b-tree starts with root
node that is capable to keep the two index records or two data records
inline (if size of data record is equal or lesser than size of index
record). If the b-tree needs to contain more than two items then it should
be added the first hybrid node into the b-tree. The root level of
b-tree is able to contain only two nodes because the root node is capable
to store only two index records. Generally speaking, the initial goal of
hybrid node is to store the data records in the presence of reserved
index area.

Generalized b-tree functionality implements API:
(1) create - create empty b-tree object
(2) destroy - destroy b-tree object
(3) flush - flush dirty b-tree object
(4) find_item - find item in b-tree hierarchy
(5) find_range - find range of items in b-tree hierarchy
(6) allocate_item - allocate item in existing b-tree's node
(7) allocate_range - allocate range of items in existing b-tree's node
(8) add_item - add item into b-tree
(9) add_range - add range of items into b-tree
(10) change_item - change existing item in b-tree
(11) delete_item - delete item from b-tree
(12) delete_range - delete range of items from b-tree
(13) delete_all - delete all items from b-tree

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/btree.h        | 219 +++++++++++++++++++++
 fs/ssdfs/btree_search.h | 424 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 643 insertions(+)
 create mode 100644 fs/ssdfs/btree.h
 create mode 100644 fs/ssdfs/btree_search.h

diff --git a/fs/ssdfs/btree.h b/fs/ssdfs/btree.h
new file mode 100644
index 000000000000..ee1d7eec581b
--- /dev/null
+++ b/fs/ssdfs/btree.h
@@ -0,0 +1,219 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/btree.h - btree declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_BTREE_H
+#define _SSDFS_BTREE_H
+
+struct ssdfs_btree;
+
+/*
+ * struct ssdfs_btree_descriptor_operations - btree descriptor operations
+ * @init: initialize btree object by descriptor
+ * @flush: save btree descriptor into superblock
+ */
+struct ssdfs_btree_descriptor_operations {
+	int (*init)(struct ssdfs_fs_info *fsi,
+		    struct ssdfs_btree *tree);
+	int (*flush)(struct ssdfs_btree *tree);
+};
+
+/*
+ * struct ssdfs_btree_operations - btree operations specialization
+ * @create_root_node: specialization of root node creation
+ * @create_node: specialization of node's construction operation
+ * @init_node: specialization of node's init operation
+ * @destroy_node: specialization of node's destroy operation
+ * @add_node: specialization of adding into the tree a new empty node
+ * @delete_node: specialization of deletion a node from the tree
+ * @pre_flush_root_node: specialized flush preparation of root node
+ * @flush_root_node: specialized method of root node flushing
+ * @pre_flush_node: specialized flush preparation of common node
+ * @flush_node: specialized method of common node flushing
+ */
+struct ssdfs_btree_operations {
+	int (*create_root_node)(struct ssdfs_fs_info *fsi,
+				struct ssdfs_btree_node *node);
+	int (*create_node)(struct ssdfs_btree_node *node);
+	int (*init_node)(struct ssdfs_btree_node *node);
+	void (*destroy_node)(struct ssdfs_btree_node *node);
+	int (*add_node)(struct ssdfs_btree_node *node);
+	int (*delete_node)(struct ssdfs_btree_node *node);
+	int (*pre_flush_root_node)(struct ssdfs_btree_node *node);
+	int (*flush_root_node)(struct ssdfs_btree_node *node);
+	int (*pre_flush_node)(struct ssdfs_btree_node *node);
+	int (*flush_node)(struct ssdfs_btree_node *node);
+};
+
+/*
+ * struct ssdfs_btree - generic btree
+ * @type: btree type
+ * @owner_ino: inode identification number of btree owner
+ * @node_size: size of the node in bytes
+ * @pages_per_node: physical pages per node
+ * @node_ptr_size: size in bytes of pointer on btree node
+ * @index_size: size in bytes of btree's index
+ * @item_size: default size of item in bytes
+ * @min_item_size: min size of item in bytes
+ * @max_item_size: max possible size of item in bytes
+ * @index_area_min_size: minimal size in bytes of index area in btree node
+ * @create_cno: btree's create checkpoint
+ * @state: btree state
+ * @flags: btree flags
+ * @height: current height of the tree
+ * @lock: btree's lock
+ * @nodes_lock: radix tree lock
+ * @upper_node_id: last allocated node id
+ * @nodes: nodes' radix tree
+ * @fsi: pointer on shared file system object
+ *
+ * Btree nodes are organized by radix tree.
+ * Another good point about radix tree is
+ * supporting of knowledge about dirty items.
+ */
+struct ssdfs_btree {
+	/* static data */
+	u8 type;
+	u64 owner_ino;
+	u32 node_size;
+	u8 pages_per_node;
+	u8 node_ptr_size;
+	u16 index_size;
+	u16 item_size;
+	u8 min_item_size;
+	u16 max_item_size;
+	u16 index_area_min_size;
+	u64 create_cno;
+
+	/* operation specializations */
+	const struct ssdfs_btree_descriptor_operations *desc_ops;
+	const struct ssdfs_btree_operations *btree_ops;
+
+	/* mutable data */
+	atomic_t state;
+	atomic_t flags;
+	atomic_t height;
+
+	struct rw_semaphore lock;
+
+	spinlock_t nodes_lock;
+	u32 upper_node_id;
+	struct radix_tree_root nodes;
+
+	struct ssdfs_fs_info *fsi;
+};
+
+/* Btree object states */
+enum {
+	SSDFS_BTREE_UNKNOWN_STATE,
+	SSDFS_BTREE_CREATED,
+	SSDFS_BTREE_DIRTY,
+	SSDFS_BTREE_STATE_MAX
+};
+
+/* Radix tree tags */
+#define SSDFS_BTREE_NODE_DIRTY_TAG	PAGECACHE_TAG_DIRTY
+#define SSDFS_BTREE_NODE_TOWRITE_TAG	PAGECACHE_TAG_TOWRITE
+
+/*
+ * Btree API
+ */
+int ssdfs_btree_create(struct ssdfs_fs_info *fsi,
+		    u64 owner_ino,
+		    const struct ssdfs_btree_descriptor_operations *desc_ops,
+		    const struct ssdfs_btree_operations *btree_ops,
+		    struct ssdfs_btree *tree);
+void ssdfs_btree_destroy(struct ssdfs_btree *tree);
+int ssdfs_btree_flush(struct ssdfs_btree *tree);
+
+int ssdfs_btree_find_item(struct ssdfs_btree *tree,
+			  struct ssdfs_btree_search *search);
+int ssdfs_btree_find_range(struct ssdfs_btree *tree,
+			   struct ssdfs_btree_search *search);
+bool is_ssdfs_btree_empty(struct ssdfs_btree *tree);
+int ssdfs_btree_allocate_item(struct ssdfs_btree *tree,
+			      struct ssdfs_btree_search *search);
+int ssdfs_btree_allocate_range(struct ssdfs_btree *tree,
+				struct ssdfs_btree_search *search);
+int ssdfs_btree_add_item(struct ssdfs_btree *tree,
+			 struct ssdfs_btree_search *search);
+int ssdfs_btree_add_range(struct ssdfs_btree *tree,
+			  struct ssdfs_btree_search *search);
+int ssdfs_btree_change_item(struct ssdfs_btree *tree,
+			    struct ssdfs_btree_search *search);
+int ssdfs_btree_delete_item(struct ssdfs_btree *tree,
+			    struct ssdfs_btree_search *search);
+int ssdfs_btree_delete_range(struct ssdfs_btree *tree,
+			     struct ssdfs_btree_search *search);
+int ssdfs_btree_delete_all(struct ssdfs_btree *tree);
+
+/*
+ * Internal Btree API
+ */
+bool need_migrate_generic2inline_btree(struct ssdfs_btree *tree,
+					int items_threshold);
+int ssdfs_btree_desc_init(struct ssdfs_fs_info *fsi,
+			  struct ssdfs_btree *tree,
+			  struct ssdfs_btree_descriptor *desc,
+			  u8 min_item_size,
+			  u16 max_item_size);
+int ssdfs_btree_desc_flush(struct ssdfs_btree *tree,
+			   struct ssdfs_btree_descriptor *desc);
+struct ssdfs_btree_node *
+ssdfs_btree_get_child_node_for_hash(struct ssdfs_btree *tree,
+				    struct ssdfs_btree_node *parent,
+				    u64 hash);
+int ssdfs_btree_update_parent_node_pointer(struct ssdfs_btree *tree,
+					   struct ssdfs_btree_node *parent);
+int ssdfs_btree_add_node(struct ssdfs_btree *tree,
+			 struct ssdfs_btree_search *search);
+int ssdfs_btree_insert_node(struct ssdfs_btree *tree,
+			    struct ssdfs_btree_search *search);
+int ssdfs_btree_delete_node(struct ssdfs_btree *tree,
+			    struct ssdfs_btree_search *search);
+int ssdfs_btree_get_head_range(struct ssdfs_btree *tree,
+				u32 expected_len,
+				struct ssdfs_btree_search *search);
+int ssdfs_btree_extract_range(struct ssdfs_btree *tree,
+				u16 start_index, u16 count,
+				struct ssdfs_btree_search *search);
+int ssdfs_btree_destroy_node_range(struct ssdfs_btree *tree,
+				   u64 start_hash);
+struct ssdfs_btree_node *
+__ssdfs_btree_read_node(struct ssdfs_btree *tree,
+			struct ssdfs_btree_node *parent,
+			struct ssdfs_btree_index_key *node_index,
+			u8 node_type, u32 node_id);
+int ssdfs_btree_radix_tree_find(struct ssdfs_btree *tree,
+				unsigned long node_id,
+				struct ssdfs_btree_node **node);
+int ssdfs_btree_synchronize_root_node(struct ssdfs_btree *tree,
+				struct ssdfs_btree_inline_root_node *root);
+int ssdfs_btree_get_next_hash(struct ssdfs_btree *tree,
+			      struct ssdfs_btree_search *search,
+			      u64 *next_hash);
+
+void ssdfs_debug_show_btree_node_indexes(struct ssdfs_btree *tree,
+					 struct ssdfs_btree_node *parent);
+void ssdfs_check_btree_consistency(struct ssdfs_btree *tree);
+void ssdfs_debug_btree_object(struct ssdfs_btree *tree);
+
+#endif /* _SSDFS_BTREE_H */
diff --git a/fs/ssdfs/btree_search.h b/fs/ssdfs/btree_search.h
new file mode 100644
index 000000000000..c13fb9c6ce0d
--- /dev/null
+++ b/fs/ssdfs/btree_search.h
@@ -0,0 +1,424 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/btree_search.h - btree search object declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_BTREE_SEARCH_H
+#define _SSDFS_BTREE_SEARCH_H
+
+/* Search request types */
+enum {
+	SSDFS_BTREE_SEARCH_UNKNOWN_TYPE,
+	SSDFS_BTREE_SEARCH_FIND_ITEM,
+	SSDFS_BTREE_SEARCH_FIND_RANGE,
+	SSDFS_BTREE_SEARCH_ALLOCATE_ITEM,
+	SSDFS_BTREE_SEARCH_ALLOCATE_RANGE,
+	SSDFS_BTREE_SEARCH_ADD_ITEM,
+	SSDFS_BTREE_SEARCH_ADD_RANGE,
+	SSDFS_BTREE_SEARCH_CHANGE_ITEM,
+	SSDFS_BTREE_SEARCH_MOVE_ITEM,
+	SSDFS_BTREE_SEARCH_DELETE_ITEM,
+	SSDFS_BTREE_SEARCH_DELETE_RANGE,
+	SSDFS_BTREE_SEARCH_DELETE_ALL,
+	SSDFS_BTREE_SEARCH_INVALIDATE_TAIL,
+	SSDFS_BTREE_SEARCH_TYPE_MAX
+};
+
+/*
+ * struct ssdfs_peb_timestamps - PEB timestamps
+ * @peb_id: PEB ID
+ * @create_time: PEB's create timestamp
+ * @last_log_time: PEB's last log create timestamp
+ */
+struct ssdfs_peb_timestamps {
+	u64 peb_id;
+	u64 create_time;
+	u64 last_log_time;
+};
+
+/*
+ * struct ssdfs_btree_search_hash - btree search hash
+ * @name: name of the searching object
+ * @name_len: length of the name in bytes
+ * @uuid: UUID of the searching object
+ * @hash: hash value
+ * @ino: inode ID
+ * @fingerprint: fingerprint value
+ * @peb2time: PEB timestamps
+ */
+struct ssdfs_btree_search_hash {
+	const char *name;
+	size_t name_len;
+	u8 *uuid;
+	u64 hash;
+	u64 ino;
+	struct ssdfs_fingerprint *fingerprint;
+	struct ssdfs_peb_timestamps *peb2time;
+};
+
+/*
+ * struct ssdfs_btree_search_request - btree search request
+ * @type: request type
+ * @flags: request flags
+ * @start: starting hash value
+ * @end: ending hash value
+ * @count: range of hashes length in the request
+ */
+struct ssdfs_btree_search_request {
+	int type;
+#define SSDFS_BTREE_SEARCH_HAS_VALID_HASH_RANGE		(1 << 0)
+#define SSDFS_BTREE_SEARCH_HAS_VALID_COUNT		(1 << 1)
+#define SSDFS_BTREE_SEARCH_HAS_VALID_NAME		(1 << 2)
+#define SSDFS_BTREE_SEARCH_HAS_VALID_INO		(1 << 3)
+#define SSDFS_BTREE_SEARCH_NOT_INVALIDATE		(1 << 4)
+#define SSDFS_BTREE_SEARCH_HAS_VALID_UUID		(1 << 5)
+#define SSDFS_BTREE_SEARCH_HAS_VALID_FINGERPRINT	(1 << 6)
+#define SSDFS_BTREE_SEARCH_INCREMENT_REF_COUNT		(1 << 7)
+#define SSDFS_BTREE_SEARCH_DECREMENT_REF_COUNT		(1 << 8)
+#define SSDFS_BTREE_SEARCH_INLINE_BUF_HAS_NEW_ITEM	(1 << 9)
+#define SSDFS_BTREE_SEARCH_DONT_EXTRACT_RECORD		(1 << 10)
+#define SSDFS_BTREE_SEARCH_HAS_PEB2TIME_PAIR		(1 << 11)
+#define SSDFS_BTREE_SEARCH_DONT_DELETE_BTREE_NODE	(1 << 12)
+#define SSDFS_BTREE_SEARCH_REQUEST_FLAGS_MASK		0x1FFF
+	u32 flags;
+
+	struct ssdfs_btree_search_hash start;
+	struct ssdfs_btree_search_hash end;
+	unsigned int count;
+};
+
+/* Node descriptor possible states */
+enum {
+	SSDFS_BTREE_SEARCH_NODE_DESC_EMPTY,
+	SSDFS_BTREE_SEARCH_ROOT_NODE_DESC,
+	SSDFS_BTREE_SEARCH_FOUND_INDEX_NODE_DESC,
+	SSDFS_BTREE_SEARCH_FOUND_LEAF_NODE_DESC,
+	SSDFS_BTREE_SEARCH_NODE_DESC_STATE_MAX
+};
+
+/*
+ * struct ssdfs_btree_search_node_desc - btree node descriptor
+ * @state: descriptor state
+ * @id: node ID number
+ * @height: node height
+ * @found_index: index of child node
+ * @parent: last parent node
+ * @child: last child node
+ */
+struct ssdfs_btree_search_node_desc {
+	int state;
+
+	u32 id;
+	u8 height;
+
+	struct ssdfs_btree_index_key found_index;
+	struct ssdfs_btree_node *parent;
+	struct ssdfs_btree_node *child;
+};
+
+/* Search result possible states */
+enum {
+	SSDFS_BTREE_SEARCH_UNKNOWN_RESULT,
+	SSDFS_BTREE_SEARCH_FAILURE,
+	SSDFS_BTREE_SEARCH_EMPTY_RESULT,
+	SSDFS_BTREE_SEARCH_VALID_ITEM,
+	SSDFS_BTREE_SEARCH_POSSIBLE_PLACE_FOUND,
+	SSDFS_BTREE_SEARCH_OUT_OF_RANGE,
+	SSDFS_BTREE_SEARCH_OBSOLETE_RESULT,
+	SSDFS_BTREE_SEARCH_PLEASE_ADD_NODE,
+	SSDFS_BTREE_SEARCH_PLEASE_DELETE_NODE,
+	SSDFS_BTREE_SEARCH_PLEASE_MOVE_BUF_CONTENT,
+	SSDFS_BTREE_SEARCH_RESULT_STATE_MAX
+};
+
+/* Search result buffer possible states */
+enum {
+	SSDFS_BTREE_SEARCH_UNKNOWN_BUFFER_STATE,
+	SSDFS_BTREE_SEARCH_INLINE_BUFFER,
+	SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER,
+	SSDFS_BTREE_SEARCH_BUFFER_STATE_MAX
+};
+
+/*
+ * struct ssdfs_lookup_descriptor - lookup descriptor
+ * @index: index of item in the lookup1 table
+ * @desc: descriptor of lookup1 table's item
+ */
+struct ssdfs_lookup_descriptor {
+	u16 index;
+	struct ssdfs_shdict_ltbl1_item desc;
+};
+
+/*
+ * struct ssdfs_strings_range_descriptor - strings range descriptor
+ * @index: index of item in the lookup2 table
+ * @desc: descriptor of lookup2 table's item
+ */
+struct ssdfs_strings_range_descriptor {
+	u16 index;
+	struct ssdfs_shdict_ltbl2_item desc;
+};
+
+/*
+ * struct ssdfs_string_descriptor - string descriptor
+ * @index: index of item in the hash table
+ * @desc: descriptor of hash table's item
+ */
+struct ssdfs_string_descriptor {
+	u16 index;
+	struct ssdfs_shdict_htbl_item desc;
+};
+
+/*
+ * struct ssdfs_string_table_index - string table indexes
+ * @lookup1_index: index in lookup1 table
+ * @lookup2_index: index in lookup2 table
+ * @hash_index: index in hash table
+ *
+ * Search operation defines lookup, strings_range, prefix,
+ * left_name, and right_name. This information contains
+ * potential position to store the string. However,
+ * the final position to insert string and indexes can
+ * be defined during the insert operation. This field
+ * keeps the knowledge of finally used indexes to store
+ * the string and lookup1, lookup2, hash indexes.
+ */
+struct ssdfs_string_table_index {
+	u16 lookup1_index;
+	u16 lookup2_index;
+	u16 hash_index;
+};
+
+/*
+ * struct ssdfs_name_string - name string
+ * @hash: name hash
+ * @lookup: lookup item descriptor
+ * @strings_range: range of strings descriptor
+ * @prefix: prefix descriptor
+ * @left_name: left name descriptor
+ * @right_name: right name descriptor
+ * @placement: stored indexes descriptor
+ * @len: name length
+ * @str: name buffer
+ */
+struct ssdfs_name_string {
+	u64 hash;
+	struct ssdfs_lookup_descriptor lookup;
+	struct ssdfs_strings_range_descriptor strings_range;
+	struct ssdfs_string_descriptor prefix;
+	struct ssdfs_string_descriptor left_name;
+	struct ssdfs_string_descriptor right_name;
+
+	struct ssdfs_string_table_index placement;
+
+	size_t len;
+	unsigned char str[SSDFS_MAX_NAME_LEN];
+};
+
+/*
+ * struct ssdfs_btree_search_buffer - buffer descriptor
+ * @state: state of the buffer
+ * @size: size of the buffer in bytes
+ * @item_size: size of one item in bytes
+ * @items_count: items count in buffer
+ * @place.ptr: pointer on buffer
+ * @place.ltbl2_items: pointer on buffer with lookup2 table's items
+ * @place.htbl_items: pointer on buffer with hash table's items
+ * @place.name: pointer on buffer with name descriptor(s)
+ * @place.name_range: pointer on buffer with names range
+ */
+struct ssdfs_btree_search_buffer {
+	int state;
+	size_t size;
+
+	size_t item_size;
+	u32 items_count;
+
+	union {
+		void *ptr;
+		struct ssdfs_shdict_ltbl2_item *ltbl2_items;
+		struct ssdfs_shdict_htbl_item *htbl_items;
+		struct ssdfs_name_string *name;
+		struct ssdfs_name_string_range *name_range;
+	} place;
+};
+
+/*
+ * struct ssdfs_name_string_range - name string range
+ * @lookup1: lookup1 item descriptor
+ * @lookup2_table.index: index of first item in the lookup2 table
+ * @lookup2_table.buf: lookup2 table's buffer
+ * @hash_table.index: index of first item in the hash table
+ * @hash_table.buf: hash table's buffer
+ * @strings.buf: buffer with strings
+ * @placement: final destination of storing range
+ */
+struct ssdfs_name_string_range {
+	struct ssdfs_lookup_descriptor lookup1;
+
+	struct {
+		u16 index;
+		struct ssdfs_btree_search_buffer buf;
+	} lookup2_table;
+
+	struct {
+		u16 index;
+		struct ssdfs_btree_search_buffer buf;
+	} hash_table;
+
+	struct {
+		struct ssdfs_btree_search_buffer buf;
+	} strings;
+
+	struct ssdfs_string_table_index placement;
+};
+
+/*
+ * struct ssdfs_btree_search_result - btree search result
+ * @state: result state
+ * @err: result error code
+ * @flags: result's flags
+ * @start_index: starting found item index
+ * @count: count of found items
+ * @search_cno: checkpoint of search activity
+ * @name_buf: name(s) buffer
+ * @range_buf: buffer with names range
+ * @raw_buf: raw buffer with item(s)
+ */
+struct ssdfs_btree_search_result {
+	int state;
+	int err;
+
+#define SSDFS_BTREE_SEARCH_RESULT_HAS_NAME		(1 << 0)
+#define SSDFS_BTREE_SEARCH_RESULT_HAS_RANGE		(1 << 1)
+#define SSDFS_BTREE_SEARCH_RESULT_HAS_RAW_DATA		(1 << 2)
+#define SSDFS_BTREE_SEARCH_RESULT_FLAGS_MASK		0x7
+	u32 flags;
+
+	u16 start_index;
+	u16 count;
+
+	u64 search_cno;
+
+	struct ssdfs_btree_search_buffer name_buf;
+	struct ssdfs_btree_search_buffer range_buf;
+	struct ssdfs_btree_search_buffer raw_buf;
+};
+
+/* Position check results */
+enum {
+	SSDFS_CORRECT_POSITION,
+	SSDFS_SEARCH_LEFT_DIRECTION,
+	SSDFS_SEARCH_RIGHT_DIRECTION,
+	SSDFS_CHECK_POSITION_FAILURE
+};
+
+/*
+ * struct ssdfs_btree_search - btree search
+ * @request: search request
+ * @node: btree node descriptor
+ * @result: search result
+ * @raw.fork: raw fork buffer
+ * @raw.inode: raw inode buffer
+ * @raw.dentry.header: raw directory entry header
+ * @raw.xattr.header: raw xattr entry header
+ * @raw.shared_extent: shared extent buffer
+ * @raw.snapshot: raw snapshot info buffer
+ * @raw.peb2time: raw PEB2time set
+ * @raw.invalidated_extent: invalidated extent buffer
+ * @name.string: name string
+ * @name.range: range of names
+ */
+struct ssdfs_btree_search {
+	struct ssdfs_btree_search_request request;
+	struct ssdfs_btree_search_node_desc node;
+	struct ssdfs_btree_search_result result;
+	union ssdfs_btree_search_raw_data {
+		struct ssdfs_raw_fork fork;
+		struct ssdfs_inode inode;
+		struct ssdfs_raw_dentry {
+			struct ssdfs_dir_entry header;
+		} dentry;
+		struct ssdfs_raw_xattr {
+			struct ssdfs_xattr_entry header;
+		} xattr;
+		struct ssdfs_shared_extent shared_extent;
+		struct ssdfs_snapshot snapshot;
+		struct ssdfs_peb2time_set peb2time;
+		struct ssdfs_raw_extent invalidated_extent;
+	} raw;
+	struct {
+		struct ssdfs_name_string string;
+		struct ssdfs_name_string_range range;
+	} name;
+};
+
+/* Btree height's classification */
+enum {
+	SSDFS_BTREE_PARENT2LEAF_HEIGHT		= 1,
+	SSDFS_BTREE_PARENT2HYBRID_HEIGHT	= 2,
+	SSDFS_BTREE_PARENT2INDEX_HEIGHT		= 3,
+};
+
+/*
+ * Inline functions
+ */
+
+static inline
+bool is_btree_search_contains_new_item(struct ssdfs_btree_search *search)
+{
+	return search->request.flags &
+			SSDFS_BTREE_SEARCH_INLINE_BUF_HAS_NEW_ITEM;
+}
+
+/*
+ * Btree search object API
+ */
+struct ssdfs_btree_search *ssdfs_btree_search_alloc(void);
+void ssdfs_btree_search_free(struct ssdfs_btree_search *search);
+void ssdfs_btree_search_init(struct ssdfs_btree_search *search);
+bool need_initialize_btree_search(struct ssdfs_btree_search *search);
+bool is_btree_search_request_valid(struct ssdfs_btree_search *search);
+bool is_btree_index_search_request_valid(struct ssdfs_btree_search *search,
+					 u32 prev_node_id,
+					 u8 prev_node_height);
+bool is_btree_leaf_node_found(struct ssdfs_btree_search *search);
+bool is_btree_search_node_desc_consistent(struct ssdfs_btree_search *search);
+void ssdfs_btree_search_define_parent_node(struct ssdfs_btree_search *search,
+					   struct ssdfs_btree_node *parent);
+void ssdfs_btree_search_define_child_node(struct ssdfs_btree_search *search,
+					  struct ssdfs_btree_node *child);
+void ssdfs_btree_search_forget_parent_node(struct ssdfs_btree_search *search);
+void ssdfs_btree_search_forget_child_node(struct ssdfs_btree_search *search);
+int ssdfs_btree_search_alloc_result_buf(struct ssdfs_btree_search *search,
+					size_t buf_size);
+void ssdfs_btree_search_free_result_buf(struct ssdfs_btree_search *search);
+int ssdfs_btree_search_alloc_result_name(struct ssdfs_btree_search *search,
+					 size_t string_size);
+void ssdfs_btree_search_free_result_name(struct ssdfs_btree_search *search);
+int ssdfs_btree_search_alloc_result_name_range(struct ssdfs_btree_search *search,
+						size_t ltbl2_size,
+						size_t htbl_size,
+						size_t str_buf_size);
+void ssdfs_btree_search_free_result_name_range(struct ssdfs_btree_search *search);
+
+void ssdfs_debug_btree_search_object(struct ssdfs_btree_search *search);
+
+#endif /* _SSDFS_BTREE_SEARCH_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 46/79] ssdfs: introduce b-tree node object
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (17 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 44/79] ssdfs: introduce b-tree object Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 48/79] ssdfs: introduce b-tree hierarchy object Viacheslav Dubeyko
                   ` (13 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

SSDFS file system uses a hybrid b-tree architecture with the goal
to eliminate the index nodes’ side effect. The hybrid b-tree operates
by three node types: (1) index node, (2) hybrid node, (3) leaf node.
Generally speaking, the peculiarity of hybrid node is the mixture
as index as data records into one node. Hybrid b-tree starts with root
node that is capable to keep the two index records or two data records
inline (if size of data record is equal or lesser than size of index
record). If the b-tree needs to contain more than two items then it should
be added the first hybrid node into the b-tree. The root level of
b-tree is able to contain only two nodes because the root node is capable
to store only two index records. Generally speaking, the initial goal of
hybrid node is to store the data records in the presence of reserved
index area.

Dirty b-tree implies the presence of one or several dirty
b-tree node. B-tree flush logic detects the dirty b-tree
nodes and request the flush operation for every dirty b-tree
node. B-tree node can include several memory pages (8K, for
example). It means that one b-tree node can be located in
one or several logical blocks. Finally, flush operation means
that b-tree node's flush logic has to issue update request(s)
for all logical blocks that contain the b-tree node content.
Every b-tree node is described by index record (or key) that
includes: (1) node ID, (2) node type, (3) node height,
(4) starting hash value, (5) raw extent. The raw extent
describes the segment ID, logical block ID, and length.
As a result, flush logic needs to add update request into
an update queue of particular PEB for segment ID. Also,
flush logic has to request the log commit operation because
b-tree node has to be stored peristently right now. Flush
thread(s) of particular PEB(s) executes the update requests.
Finally, b-tree flush logic requires to wait the finish of
update operations for all dirty b-tree node(s).

Every b-tree node is described by index record (or key) that
includes: (1) node ID, (2) node type, (3) node height,
(4) starting hash value, (5) raw extent. The raw extent
describes the segment ID, logical block ID, and length.
Index records are stored into index and hybrid b-tree
nodes. These records implement mechanism of lookup and
traverse operations in b-tree.

Index operations include:
(1) resize_index_area - operation of increasing size of index
                        area in hybrid b-tree node by means of
                        redistribution of free space between
                        index and item area and shift of item
                        area
(2) find_index - find index in hybrid or index b-tree node
(3) add_index - add index in hybrid of index b-tree node
(4) change_index - change index record in hybrid of index b-tree node
(5) delete_index - delete index record from hybrid of index b-tree node

B-tree node implements search, allocation, and insert item or
range of items in the node:
(1) find_item - find item in b-tree node
(2) find_range - find range of items in b-tree node
(3) allocate_item - allocate item in b-tree node
(4) allocate_range - allocate range of items in b-tree node
(5) insert_item - insert/add item into b-tree node
(6) insert_range - insert/add range of items into b-tree node

B-tree node implements change and delete item or
range of items in the node:
(1) change_item - change item in the b-tre node
(2) delete_item - delete item from b-tree node
(3) delete_range - delete range of items from b-tree node

Hybrid b-tree node plays very special role in b-tree architecture.
First of all, hybrid node includes index and item areas. The goal is
to combine in one node index and items records for the case of small
b-trees. B-tree starts by creation of root node that can contain only
two index keys. It means that root node keeps knowledge about two
nodes only. At first, b-tree logic creates leaf nodes until the root
node will contain two index keys for leaf nodes. Next step implies
the creation of hybrid node that contains index records for two
existing leaf nodes. The root node contains index key for hybrid node.
Now hybrid node becomes to play. New items will be added into items area
of hybrid node until this area becomes completely full. B-tree logic
allocates new leaf node, all existing items in hybrid node are moved
into newly created leaf node, and index key is added into hybrid node's
index area. Such operation repeat multiple times until index area of
hybrid node becomes completely full. Now index area is resized by
increasing twice in size after moving existing items into newly
created node. Finally, hybrid node will be converted into index node.
Important point that small b-tree has one hybrid node with index keys
and items instead of two nodes (index + leaf). Hybrid node combines
as index as items operations that makes this type of node by "hot"
type of metadata and it provides the way to isolate/distinguish
hot, warm, and cold data. As a result, it provides the way to make
b-tree more compact by decreasing number of nodes, makes GC operations
not neccessary because update operations of "hot" hybrid node(s)
makes migration scheme efficient, and decrease write amplification.

Hybrid nodes require range operations that are represented by:
(1) extract_range - extract range of items (or all items) from node
(2) insert_range - insert range of items into node
(3) delete_range - remove range of items from node

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/btree_node.h | 891 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 891 insertions(+)
 create mode 100644 fs/ssdfs/btree_node.h

diff --git a/fs/ssdfs/btree_node.h b/fs/ssdfs/btree_node.h
new file mode 100644
index 000000000000..eab66206429c
--- /dev/null
+++ b/fs/ssdfs/btree_node.h
@@ -0,0 +1,891 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/btree_node.h - btree node declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_BTREE_NODE_H
+#define _SSDFS_BTREE_NODE_H
+
+#include "request_queue.h"
+#include "folio_array.h"
+#include "peb.h"
+
+/*
+ * struct ssdfs_btree_node_operations - node operations specialization
+ * @find_item: specialized item searching algorithm
+ * @find_range: specialized range searching algorithm
+ * @extract_range: specialized extract range operation
+ * @allocate_item: specialized item allocation operation
+ * @allocate_range: specialized range allocation operation
+ * @insert_item: specialized insert item operation
+ * @insert_range: specialized insert range operation
+ * @change_item: specialized change item operation
+ * @delete_item: specialized delete item operation
+ * @delete_range: specialized delete range operation
+ * @move_items_range: specialized move items operation
+ * @resize_items_area: specialized resize items area operation
+ */
+struct ssdfs_btree_node_operations {
+	int (*find_item)(struct ssdfs_btree_node *node,
+			 struct ssdfs_btree_search *search);
+	int (*find_range)(struct ssdfs_btree_node *node,
+			  struct ssdfs_btree_search *search);
+	int (*extract_range)(struct ssdfs_btree_node *node,
+			     u16 start_index, u16 count,
+			     struct ssdfs_btree_search *search);
+	int (*allocate_item)(struct ssdfs_btree_node *node,
+			     struct ssdfs_btree_search *search);
+	int (*allocate_range)(struct ssdfs_btree_node *node,
+			      struct ssdfs_btree_search *search);
+	int (*insert_item)(struct ssdfs_btree_node *node,
+			   struct ssdfs_btree_search *search);
+	int (*insert_range)(struct ssdfs_btree_node *node,
+			    struct ssdfs_btree_search *search);
+	int (*change_item)(struct ssdfs_btree_node *node,
+			   struct ssdfs_btree_search *search);
+	int (*delete_item)(struct ssdfs_btree_node *node,
+			   struct ssdfs_btree_search *search);
+	int (*delete_range)(struct ssdfs_btree_node *node,
+			    struct ssdfs_btree_search *search);
+	int (*move_items_range)(struct ssdfs_btree_node *src,
+				struct ssdfs_btree_node *dst,
+				u16 start_item, u16 count);
+	int (*resize_items_area)(struct ssdfs_btree_node *node,
+				 u32 new_size);
+};
+
+/* Btree node area's states */
+enum {
+	SSDFS_BTREE_NODE_AREA_UNKNOWN_STATE,
+	SSDFS_BTREE_NODE_AREA_ABSENT,
+	SSDFS_BTREE_NODE_INDEX_AREA_EXIST,
+	SSDFS_BTREE_NODE_ITEMS_AREA_EXIST,
+	SSDFS_BTREE_NODE_LOOKUP_TBL_EXIST,
+	SSDFS_BTREE_NODE_HASH_TBL_EXIST,
+	SSDFS_BTREE_NODE_AREA_STATE_MAX
+};
+
+/*
+ * struct ssdfs_btree_node_index_area - btree node's index area
+ * @state: area state
+ * @flags: index area's flags
+ * @offset: area offset from node's beginning
+ * @area_size: area size in bytes
+ * @index_size: index size in bytes
+ * @index_count: count of indexes in area
+ * @index_capacity: index area capacity
+ * @start_hash: starting hash in index area
+ * @end_hash: ending hash in index area
+ */
+struct ssdfs_btree_node_index_area {
+	atomic_t state;
+
+#define SSDFS_PLEASE_ADD_HYBRID_NODE_SELF_INDEX		(1 << 0)
+#define SSDFS_BTREE_NODE_INDEX_AREA_FLAGS_MASK		0x1
+	atomic_t flags;
+
+	u32 offset;
+	u32 area_size;
+
+	u8 index_size;
+	u16 index_count;
+	u16 index_capacity;
+
+	u64 start_hash;
+	u64 end_hash;
+};
+
+/*
+ * struct ssdfs_btree_node_items_area - btree node's data area
+ * @state: area state
+ * @flags: items area's flags
+ * @offset: area offset from node's beginning
+ * @area_size: area size in bytes
+ * @free_space: free space in bytes
+ * @item_size: item size in bytes
+ * @min_item_size: minimal possible item size in bytes
+ * @max_item_size: maximal possible item size in bytes
+ * @items_count: count of allocated items in area
+ * @items_capacity: items area capacity
+ * @start_hash: starting hash in items area
+ * @end_hash: ending hash in items area
+ */
+struct ssdfs_btree_node_items_area {
+	atomic_t state;
+
+#define SSDFS_PLEASE_ADD_FREE_ITEMS_RANGE		(1 << 0)
+#define SSDFS_BTREE_NODE_ITEMS_AREA_FLAGS_MASK		0x1
+	atomic_t flags;
+
+	u32 offset;
+	u32 area_size;
+	u32 free_space;
+
+	u16 item_size;
+	u8 min_item_size;
+	u16 max_item_size;
+
+	u16 items_count;
+	u16 items_capacity;
+
+	u64 start_hash;
+	u64 end_hash;
+};
+
+struct ssdfs_btree;
+
+/*
+ * struct ssdfs_state_bitmap - bitmap of states
+ * @lock: bitmap lock
+ * @flags: bitmap's flags
+ * @ptr: bitmap
+ */
+struct ssdfs_state_bitmap {
+	spinlock_t lock;
+
+#define SSDFS_LOOKUP_TBL2_IS_USING	(1 << 0)
+#define SSDFS_HASH_TBL_IS_USING		(1 << 1)
+#define SSDFS_BMAP_ARRAY_FLAGS_MASK	0x3
+	u32 flags;
+
+	unsigned long *ptr;
+};
+
+/*
+ * struct ssdfs_state_bitmap_array - array of bitmaps
+ * @lock: bitmap array lock
+ * @bits_count: whole bits count in the bitmap
+ * @bmap_bytes: size in bytes of every bitmap
+ * @index_start_bit: starting bit of index area in the bitmap
+ * @item_start_bit: starting bit of items area in the bitmap
+ * @locks_count: count of sucessful locks
+ * @bmap: partial locks, alloc and dirty bitmaps
+ */
+struct ssdfs_state_bitmap_array {
+	struct rw_semaphore lock;
+	unsigned long bits_count;
+	size_t bmap_bytes;
+	unsigned long index_start_bit;
+	unsigned long item_start_bit;
+	atomic_t locks_count;
+
+#define SSDFS_BTREE_NODE_LOCK_BMAP	(0)
+#define SSDFS_BTREE_NODE_ALLOC_BMAP	(1)
+#define SSDFS_BTREE_NODE_DIRTY_BMAP	(2)
+#define SSDFS_BTREE_NODE_BMAP_COUNT	(3)
+	struct ssdfs_state_bitmap bmap[SSDFS_BTREE_NODE_BMAP_COUNT];
+};
+
+/*
+ * struct ssdfs_btree_node_content - btree node's content
+ * @protection: btree node's content protection window
+ * @blocks: array of logical blocks
+ * @count: number of blocks in extent
+ */
+struct ssdfs_btree_node_content {
+	struct ssdfs_protection_window protection;
+#define SSDFS_BTREE_NODE_EXTENT_LEN_MAX		(SSDFS_EXTENT_LEN_MAX)
+	struct ssdfs_content_block blocks[SSDFS_BTREE_NODE_EXTENT_LEN_MAX];
+	int count;
+};
+
+union ssdfs_aggregated_btree_node_header {
+	struct ssdfs_inodes_btree_node_header inodes_header;
+	struct ssdfs_dentries_btree_node_header dentries_header;
+	struct ssdfs_extents_btree_node_header extents_header;
+	struct ssdfs_xattrs_btree_node_header xattrs_header;
+};
+
+/*
+ * struct ssdfs_btree_node - btree node
+ * @list: btree nodes' list
+ * @height: node's height
+ * @node_size: node size in bytes
+ * @pages_per_node: count of logical blocks per node
+ * @create_cno: create checkpoint
+ * @node_id: node identification number
+ * @tree: pointer on node's parent tree
+ * @node_ops: btree's node operation specialization
+ * @refs_count: reference counter
+ * @state: node state
+ * @flags: node's flags
+ * @type: node type
+ * @header_lock: header lock
+ * @raw.root_node: root node copy
+ * @raw.generic_header: generic node's header
+ * @raw.inodes_header: inodes node's header
+ * @raw.dentries_header: dentries node's header
+ * @raw.extents_header: extents node's header
+ * @raw.dict_header: shared dictionary node's header
+ * @raw.xattrs_header: xattrs node's header
+ * @raw.shextree_header: shared extents tree's header
+ * @raw.snapshots_header: snapshots node's header
+ * @raw.invextree_header: invalidated extents tree's header
+ * @index_area: index area descriptor
+ * @items_area: items area descriptor
+ * @lookup_tbl_area: lookup table's area descriptor
+ * @hash_tbl_area: hash table's area descriptor
+ * @descriptor_lock: node's descriptor lock
+ * @update_cno: last update checkpoint
+ * @parent_node: pointer on parent node
+ * @node_index: node's index (for using in search operations)
+ * @extent: node's location
+ * @seg: pointer on segment object
+ * @init_end: wait of init ending
+ * @flush_req: flush request
+ * @bmap_array: partial locks, alloc and dirty bitmaps
+ * @wait_queue: queue of threads are waiting partial lock
+ * @full_lock: the whole node lock
+ * @content: node's content
+ */
+struct ssdfs_btree_node {
+	struct list_head list;
+
+	/* static data */
+	atomic_t height;
+	u32 node_size;
+	u8 pages_per_node;
+	u64 create_cno;
+	u32 node_id;
+
+	struct ssdfs_btree *tree;
+
+	/* btree's node operation specialization */
+	const struct ssdfs_btree_node_operations *node_ops;
+
+	/*
+	 * Reference counter
+	 * The goal of reference counter is to account how
+	 * many btree search objects are referencing the
+	 * node's object. If some thread deletes all records
+	 * in a node then the node will be left undeleted
+	 * from the tree in the case of @refs_count is greater
+	 * than one.
+	 */
+	atomic_t refs_count;
+
+	/* mutable data */
+	atomic_t state;
+	atomic_t flags;
+	atomic_t type;
+
+	/* node's header */
+	struct rw_semaphore header_lock;
+	union {
+		struct ssdfs_btree_inline_root_node root_node;
+		struct ssdfs_btree_node_header generic_header;
+		struct ssdfs_inodes_btree_node_header inodes_header;
+		struct ssdfs_dentries_btree_node_header dentries_header;
+		struct ssdfs_extents_btree_node_header extents_header;
+		struct ssdfs_shared_dictionary_node_header dict_header;
+		struct ssdfs_xattrs_btree_node_header xattrs_header;
+		struct ssdfs_shextree_node_header shextree_header;
+		struct ssdfs_snapshots_btree_node_header snapshots_header;
+		struct ssdfs_invextree_node_header invextree_header;
+	} raw;
+	struct ssdfs_btree_node_index_area index_area;
+	struct ssdfs_btree_node_items_area items_area;
+	struct ssdfs_btree_node_index_area lookup_tbl_area;
+	struct ssdfs_btree_node_index_area hash_tbl_area;
+
+	/* node's descriptor */
+	spinlock_t descriptor_lock;
+	u64 update_cno;
+	struct ssdfs_btree_node *parent_node;
+	struct ssdfs_btree_index_key node_index;
+	struct ssdfs_raw_extent extent;
+	struct ssdfs_segment_info *seg;
+	struct completion init_end;
+	struct ssdfs_segment_request flush_req;
+
+	/* partial locks, alloc and dirty bitmaps */
+	struct ssdfs_state_bitmap_array bmap_array;
+	wait_queue_head_t wait_queue;
+
+	/* node raw content */
+	struct rw_semaphore full_lock;
+	struct ssdfs_btree_node_content content;
+};
+
+/* Btree node states */
+enum {
+	SSDFS_BTREE_NODE_UNKNOWN_STATE,
+	SSDFS_BTREE_NODE_CREATED,
+	SSDFS_BTREE_NODE_NONE_CONTENT,
+	SSDFS_BTREE_NODE_CONTENT_PREPARED,
+	SSDFS_BTREE_NODE_CONTENT_UNDER_FREE,
+	SSDFS_BTREE_NODE_INITIALIZED,
+	SSDFS_BTREE_NODE_DIRTY,
+	SSDFS_BTREE_NODE_PRE_DELETED,
+	SSDFS_BTREE_NODE_INVALID,
+	SSDFS_BTREE_NODE_CORRUPTED,
+	SSDFS_BTREE_NODE_STATE_MAX
+};
+
+/*
+ * It is possible to use knowledge about partial
+ * updates and to send only changed pieces of
+ * data for the case of Diff-On-Write approach.
+ * Metadata is good case for determination of
+ * partial updates and to send changed part(s)
+ * only. For example, bitmap could show dirty
+ * items in the node.
+ */
+
+/*
+ * Inline functions
+ */
+
+/*
+ * ssdfs_btree_node_content_init() - init btree node's content
+ * @content: btree node's content
+ */
+static inline
+void ssdfs_btree_node_content_init(struct ssdfs_btree_node_content *content)
+{
+	int i;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!content);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	content->count = 0;
+
+	for (i = 0; i < SSDFS_BTREE_NODE_EXTENT_LEN_MAX; i++)
+		folio_batch_init(&content->blocks[i].batch);
+}
+
+/*
+ * NODE2SEG_TYPE() - convert node type into segment type
+ * @node_type: node type
+ */
+static inline
+u8 NODE2SEG_TYPE(u8 node_type)
+{
+	switch (node_type) {
+	case SSDFS_BTREE_INDEX_NODE:
+		return SSDFS_INDEX_NODE_SEG_TYPE;
+
+	case SSDFS_BTREE_HYBRID_NODE:
+		return SSDFS_HYBRID_NODE_SEG_TYPE;
+
+	case SSDFS_BTREE_LEAF_NODE:
+		return SSDFS_LEAF_NODE_SEG_TYPE;
+	}
+
+	SSDFS_WARN("invalid node type %#x\n", node_type);
+
+	return SSDFS_UNKNOWN_SEG_TYPE;
+}
+
+/*
+ * RANGE_WITHOUT_INTERSECTION() - check that ranges have intersection
+ * @start1: starting hash of the first range
+ * @end1: ending hash of the first range
+ * @start2: starting hash of the second range
+ * @end2: ending hash of the second range
+ *
+ * This method checks that ranges have intersection.
+ *
+ * RETURN:
+ *  0  - ranges have intersection
+ *  1  - range1 > range2
+ * -1  - range1 < range2
+ */
+static inline
+int RANGE_WITHOUT_INTERSECTION(u64 start1, u64 end1, u64 start2, u64 end2)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("start1 %llx, end1 %llx, start2 %llx, end2 %llx\n",
+		  start1, end1, start2, end2);
+
+	BUG_ON(start1 >= U64_MAX || end1 >= U64_MAX ||
+		start2 >= U64_MAX || end2 >= U64_MAX);
+	BUG_ON(start1 > end1);
+	BUG_ON(start2 > end2);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (start1 > end2)
+		return 1;
+
+	if (end1 < start2)
+		return -1;
+
+	return 0;
+}
+
+/*
+ * RANGE_HAS_PARTIAL_INTERSECTION() - check that ranges intersect partially
+ * @start1: starting hash of the first range
+ * @end1: ending hash of the first range
+ * @start2: starting hash of the second range
+ * @end2: ending hash of the second range
+ */
+static inline
+bool RANGE_HAS_PARTIAL_INTERSECTION(u64 start1, u64 end1,
+				    u64 start2, u64 end2)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("start1 %llx, end1 %llx, start2 %llx, end2 %llx\n",
+		  start1, end1, start2, end2);
+
+	BUG_ON(start1 >= U64_MAX || end1 >= U64_MAX ||
+		start2 >= U64_MAX || end2 >= U64_MAX);
+	BUG_ON(start1 > end1);
+	BUG_ON(start2 > end2);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (start1 > end2)
+		return false;
+
+	if (end1 < start2)
+		return false;
+
+	return true;
+}
+
+/*
+ * __ssdfs_items_per_lookup_index() - calculate items per lookup index
+ * @items_per_node: number of items per node
+ * @lookup_table_capacity: maximal number of items in lookup table
+ */
+static inline
+u16 __ssdfs_items_per_lookup_index(u32 items_per_node,
+				   int lookup_table_capacity)
+{
+	u32 items_per_lookup_index;
+
+	items_per_lookup_index = items_per_node / lookup_table_capacity;
+
+	if (items_per_node % lookup_table_capacity)
+		items_per_lookup_index++;
+
+	SSDFS_DBG("items_per_lookup_index %u\n", items_per_lookup_index);
+
+	return items_per_lookup_index;
+}
+
+/*
+ * __ssdfs_convert_lookup2item_index() - convert lookup into item index
+ * @lookup_index: lookup index
+ * @node_size: size of the node in bytes
+ * @item_size: size of the item in bytes
+ * @lookup_table_capacity: maximal number of items in lookup table
+ */
+static inline
+u16 __ssdfs_convert_lookup2item_index(u16 lookup_index,
+					u32 node_size,
+					size_t item_size,
+					int lookup_table_capacity)
+{
+	u32 items_per_node;
+	u32 items_per_lookup_index;
+	u32 item_index;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("lookup_index %u, node_size %u, "
+		  "item_size %zu, table_capacity %d\n",
+		  lookup_index, node_size,
+		  item_size, lookup_table_capacity);
+
+	BUG_ON(lookup_index >= lookup_table_capacity);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	items_per_node = node_size / item_size;
+	items_per_lookup_index = __ssdfs_items_per_lookup_index(items_per_node,
+							lookup_table_capacity);
+
+	item_index = (u32)lookup_index * items_per_lookup_index;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("lookup_index %u, item_index %u\n",
+		  lookup_index, item_index);
+
+	BUG_ON(item_index >= U16_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return (u16)item_index;
+}
+
+/*
+ * __ssdfs_convert_item2lookup_index() - convert item into lookup index
+ * @item_index: item index
+ * @node_size: size of the node in bytes
+ * @item_size: size of the item in bytes
+ * @lookup_table_capacity: maximal number of items in lookup table
+ */
+static inline
+u16 __ssdfs_convert_item2lookup_index(u16 item_index,
+					u32 node_size,
+					size_t item_size,
+					int lookup_table_capacity)
+{
+	u32 items_per_node;
+	u32 items_per_lookup_index;
+	u16 lookup_index;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("item_index %u, node_size %u, "
+		  "item_size %zu, table_capacity %d\n",
+		  item_index, node_size,
+		  item_size, lookup_table_capacity);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	items_per_node = node_size / item_size;
+	items_per_lookup_index = __ssdfs_items_per_lookup_index(items_per_node,
+							lookup_table_capacity);
+	lookup_index = item_index / items_per_lookup_index;
+
+	SSDFS_DBG("item_index %u, lookup_index %u, table_capacity %d\n",
+		  item_index, lookup_index, lookup_table_capacity);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(lookup_index >= lookup_table_capacity);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return lookup_index;
+}
+
+/*
+ * ssdfs_find_node_content_folio() - find node content's folio
+ * @content: node's content
+ * @folio: smart folio descriptor
+ */
+static inline
+int ssdfs_find_node_content_folio(struct ssdfs_btree_node_content *content,
+				  struct ssdfs_smart_folio *folio)
+{
+	struct folio_batch *batch;
+	u32 offset = 0;
+	int i;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!content || !folio);
+
+	BUG_ON(!IS_SSDFS_OFF2FOLIO_VALID(&folio->desc));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	folio->ptr = NULL;
+
+	if (folio->desc.folio_index >= content->count) {
+		SSDFS_ERR("folio_index %u > blocks_count %u\n",
+			  folio->desc.folio_index,
+			  content->count);
+		return -ERANGE;
+	}
+
+	batch = &content->blocks[folio->desc.folio_index].batch;
+
+	for (i = 0; i < folio_batch_count(batch); i++) {
+		struct folio *cur_folio = batch->folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!cur_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		offset += folio_size(cur_folio);
+
+		if (offset >= folio->desc.page_offset) {
+			folio->ptr = cur_folio;
+			break;
+		}
+	}
+
+	if (!folio->ptr) {
+		SSDFS_ERR("fail to find folio\n");
+		return -ERANGE;
+	}
+
+	return 0;
+}
+
+/*
+ * Btree nodes list's API
+ */
+void ssdfs_btree_nodes_list_init(struct ssdfs_btree_nodes_list *bnl);
+bool is_ssdfs_btree_nodes_list_empty(struct ssdfs_btree_nodes_list *bnl);
+void ssdfs_btree_nodes_list_add(struct ssdfs_btree_nodes_list *bnl,
+				struct ssdfs_btree_node *node);
+void ssdfs_btree_nodes_list_delete(struct ssdfs_btree_nodes_list *bnl,
+				   struct ssdfs_btree_node *node);
+
+void ssdfs_btree_node_start_request_cno(struct ssdfs_btree_node *node);
+void ssdfs_btree_node_finish_request_cno(struct ssdfs_btree_node *node);
+bool is_it_time_free_btree_node_content(struct ssdfs_btree_node *node);
+
+/*
+ * Btree node API
+ */
+struct ssdfs_btree_node *
+ssdfs_btree_node_create(struct ssdfs_btree *tree,
+			u32 node_id,
+			struct ssdfs_btree_node *parent,
+			u8 height, int type, u64 start_hash);
+void ssdfs_btree_node_destroy(struct ssdfs_btree_node *node);
+int ssdfs_btree_node_prepare_content(struct ssdfs_btree_node *node,
+				     struct ssdfs_btree_index_key *index);
+int ssdfs_btree_init_node(struct ssdfs_btree_node *node,
+			  struct ssdfs_btree_node_header *hdr,
+			  size_t hdr_size);
+int ssdfs_btree_pre_flush_root_node(struct ssdfs_btree_node *node);
+void ssdfs_btree_flush_root_node(struct ssdfs_btree_node *node,
+				struct ssdfs_btree_inline_root_node *root_node);
+int ssdfs_btree_node_pre_flush(struct ssdfs_btree_node *node);
+int ssdfs_btree_node_flush(struct ssdfs_btree_node *node);
+
+void ssdfs_btree_node_get(struct ssdfs_btree_node *node);
+void ssdfs_btree_node_put(struct ssdfs_btree_node *node);
+bool is_ssdfs_node_shared(struct ssdfs_btree_node *node);
+
+bool is_ssdfs_btree_node_dirty(struct ssdfs_btree_node *node);
+void set_ssdfs_btree_node_dirty(struct ssdfs_btree_node *node);
+void clear_ssdfs_btree_node_dirty(struct ssdfs_btree_node *node);
+bool is_ssdfs_btree_node_pre_deleted(struct ssdfs_btree_node *node);
+void set_ssdfs_btree_node_pre_deleted(struct ssdfs_btree_node *node);
+void clear_ssdfs_btree_node_pre_deleted(struct ssdfs_btree_node *node);
+
+bool is_ssdfs_btree_node_index_area_exist(struct ssdfs_btree_node *node);
+bool is_ssdfs_btree_node_index_area_empty(struct ssdfs_btree_node *node);
+int ssdfs_btree_node_resize_index_area(struct ssdfs_btree_node *node,
+					u32 new_size);
+int ssdfs_btree_node_find_index(struct ssdfs_btree_search *search);
+bool can_add_new_index(struct ssdfs_btree_node *node);
+int ssdfs_btree_node_add_index(struct ssdfs_btree_node *node,
+				struct ssdfs_btree_index_key *key);
+int ssdfs_btree_node_change_index(struct ssdfs_btree_node *node,
+				  struct ssdfs_btree_index_key *old_key,
+				  struct ssdfs_btree_index_key *new_key);
+int ssdfs_btree_node_delete_index(struct ssdfs_btree_node *node,
+				  u64 hash);
+
+bool is_ssdfs_btree_node_items_area_exist(struct ssdfs_btree_node *node);
+bool is_ssdfs_btree_node_items_area_empty(struct ssdfs_btree_node *node);
+int ssdfs_btree_node_find_item(struct ssdfs_btree_search *search);
+int ssdfs_btree_node_find_range(struct ssdfs_btree_search *search);
+int ssdfs_btree_node_allocate_item(struct ssdfs_btree_search *search);
+int ssdfs_btree_node_allocate_range(struct ssdfs_btree_search *search);
+int ssdfs_btree_node_insert_item(struct ssdfs_btree_search *search);
+int ssdfs_btree_node_insert_range(struct ssdfs_btree_search *search);
+int ssdfs_btree_node_change_item(struct ssdfs_btree_search *search);
+int ssdfs_btree_node_delete_item(struct ssdfs_btree_search *search);
+int ssdfs_btree_node_delete_range(struct ssdfs_btree_search *search);
+
+/*
+ * Internal Btree node API
+ */
+void ssdfs_btree_node_free_content_space(struct ssdfs_btree_node *node);
+int ssdfs_lock_items_range(struct ssdfs_btree_node *node,
+			   u16 start_index, u16 count);
+void ssdfs_unlock_items_range(struct ssdfs_btree_node *node,
+				u16 start_index, u16 count);
+int ssdfs_lock_whole_index_area(struct ssdfs_btree_node *node);
+void ssdfs_unlock_whole_index_area(struct ssdfs_btree_node *node);
+int ssdfs_allocate_items_range(struct ssdfs_btree_node *node,
+				struct ssdfs_btree_search *search,
+				u16 items_capacity,
+				u16 start_index, u16 count);
+bool is_ssdfs_node_items_range_allocated(struct ssdfs_btree_node *node,
+					 u16 items_capacity,
+					 u16 start_index, u16 count);
+int ssdfs_free_items_range(struct ssdfs_btree_node *node,
+			   u16 start_index, u16 count);
+int ssdfs_set_node_header_dirty(struct ssdfs_btree_node *node,
+				u16 items_capacity);
+void ssdfs_clear_node_header_dirty_state(struct ssdfs_btree_node *node);
+int ssdfs_set_dirty_items_range(struct ssdfs_btree_node *node,
+				u16 items_capacity,
+				u16 start_index, u16 count);
+void ssdfs_clear_dirty_items_range_state(struct ssdfs_btree_node *node,
+					 u16 start_index, u16 count);
+
+int ssdfs_btree_node_allocate_bmaps(void *addr[SSDFS_BTREE_NODE_BMAP_COUNT],
+				    size_t bmap_bytes);
+void ssdfs_btree_node_init_bmaps(struct ssdfs_btree_node *node,
+				void *addr[SSDFS_BTREE_NODE_BMAP_COUNT]);
+int ssdfs_btree_node_allocate_content_space(struct ssdfs_btree_node *node,
+					    u32 node_size);
+int __ssdfs_btree_node_prepare_content(struct ssdfs_fs_info *fsi,
+					struct ssdfs_btree_index_key *ptr,
+					u32 node_size,
+					u64 owner_id,
+					struct ssdfs_segment_info **si,
+					struct ssdfs_btree_node_content *content);
+int ssdfs_btree_create_root_node(struct ssdfs_btree_node *node,
+				struct ssdfs_btree_inline_root_node *root_node);
+int ssdfs_btree_node_pre_flush_header(struct ssdfs_btree_node *node,
+					struct ssdfs_btree_node_header *hdr);
+int ssdfs_btree_common_node_flush(struct ssdfs_btree_node *node);
+int ssdfs_btree_node_commit_log(struct ssdfs_btree_node *node);
+int ssdfs_btree_deleted_node_commit_log(struct ssdfs_btree_node *node);
+int __ssdfs_btree_root_node_extract_index(struct ssdfs_btree_node *node,
+					  u16 found_index,
+					  struct ssdfs_btree_index_key *ptr);
+int ssdfs_btree_root_node_delete_index(struct ssdfs_btree_node *node,
+					u16 position);
+int ssdfs_btree_common_node_delete_index(struct ssdfs_btree_node *node,
+					 u16 position);
+int ssdfs_find_index_by_hash(struct ssdfs_btree_node *node,
+			     struct ssdfs_btree_node_index_area *area,
+			     u64 hash,
+			     u16 *found_index);
+int ssdfs_btree_node_find_index_position(struct ssdfs_btree_node *node,
+					 u64 hash,
+					 u16 *found_position);
+int ssdfs_btree_node_extract_range(u16 start_index, u16 count,
+				   struct ssdfs_btree_search *search);
+int ssdfs_btree_node_get_index(struct ssdfs_fs_info *fsi,
+				struct ssdfs_btree_node_content *content,
+				u32 area_offset, u32 area_size,
+				u32 node_size, u16 position,
+				struct ssdfs_btree_index_key *ptr);
+int ssdfs_btree_node_move_index_range(struct ssdfs_btree_node *src,
+				      u16 src_start,
+				      struct ssdfs_btree_node *dst,
+				      u16 dst_start, u16 count);
+int ssdfs_btree_node_move_items_range(struct ssdfs_btree_node *src,
+				      struct ssdfs_btree_node *dst,
+				      u16 start_item, u16 count);
+int ssdfs_copy_item_in_buffer(struct ssdfs_btree_node *node,
+			      u16 index,
+			      size_t item_size,
+			      struct ssdfs_btree_search *search);
+bool is_last_leaf_node_found(struct ssdfs_btree_search *search);
+int ssdfs_btree_node_find_lookup_index_nolock(struct ssdfs_btree_search *search,
+						__le64 *lookup_table,
+						int table_capacity,
+						u16 *lookup_index);
+typedef int (*ssdfs_check_found_item)(struct ssdfs_fs_info *fsi,
+					struct ssdfs_btree_search *search,
+					void *kaddr,
+					u16 item_index,
+					u64 *start_hash,
+					u64 *end_hash,
+					u16 *found_index);
+typedef int (*ssdfs_prepare_result_buffer)(struct ssdfs_btree_search *search,
+					   u16 found_index,
+					   u64 start_hash,
+					   u64 end_hash,
+					   u16 items_count,
+					   size_t item_size);
+typedef int (*ssdfs_extract_found_item)(struct ssdfs_fs_info *fsi,
+					struct ssdfs_btree_search *search,
+					size_t item_size,
+					void *kaddr,
+					u64 *start_hash,
+					u64 *end_hash);
+int __ssdfs_extract_range_by_lookup_index(struct ssdfs_btree_node *node,
+				u16 lookup_index,
+				int lookup_table_capacity,
+				size_t item_size,
+				struct ssdfs_btree_search *search,
+				ssdfs_check_found_item check_item,
+				ssdfs_prepare_result_buffer prepare_buffer,
+				ssdfs_extract_found_item extract_item);
+int ssdfs_shift_range_right(struct ssdfs_btree_node *node,
+			    struct ssdfs_btree_node_items_area *area,
+			    size_t item_size,
+			    u16 start_index, u16 range_len,
+			    u16 shift);
+int ssdfs_shift_range_right2(struct ssdfs_btree_node *node,
+			     struct ssdfs_btree_node_index_area *area,
+			     size_t item_size,
+			     u16 start_index, u16 range_len,
+			     u16 shift);
+int ssdfs_shift_range_left(struct ssdfs_btree_node *node,
+			   struct ssdfs_btree_node_items_area *area,
+			   size_t item_size,
+			   u16 start_index, u16 range_len,
+			   u16 shift);
+int ssdfs_shift_range_left2(struct ssdfs_btree_node *node,
+			    struct ssdfs_btree_node_index_area *area,
+			    size_t item_size,
+			    u16 start_index, u16 range_len,
+			    u16 shift);
+int ssdfs_shift_memory_range_right(struct ssdfs_btree_node *node,
+				   struct ssdfs_btree_node_items_area *area,
+				   u16 offset, u16 range_len,
+				   u16 shift);
+int ssdfs_shift_memory_range_right2(struct ssdfs_btree_node *node,
+				    struct ssdfs_btree_node_index_area *area,
+				    u16 offset, u16 range_len,
+				    u16 shift);
+int ssdfs_shift_memory_range_left(struct ssdfs_btree_node *node,
+				   struct ssdfs_btree_node_items_area *area,
+				   u16 offset, u16 range_len,
+				   u16 shift);
+int ssdfs_shift_memory_range_left2(struct ssdfs_btree_node *node,
+				   struct ssdfs_btree_node_index_area *area,
+				   u16 offset, u16 range_len,
+				   u16 shift);
+int ssdfs_generic_insert_range(struct ssdfs_btree_node *node,
+				struct ssdfs_btree_node_items_area *area,
+				size_t item_size,
+				struct ssdfs_btree_search *search);
+int ssdfs_invalidate_root_node_hierarchy(struct ssdfs_btree_node *node);
+int __ssdfs_btree_node_extract_range(struct ssdfs_btree_node *node,
+				     u16 start_index, u16 count,
+				     size_t item_size,
+				     struct ssdfs_btree_search *search);
+int __ssdfs_btree_node_resize_items_area(struct ssdfs_btree_node *node,
+					 size_t item_size,
+					 size_t index_size,
+					 u32 new_size);
+int __ssdfs_define_memory_folio(struct ssdfs_fs_info *fsi,
+				u32 area_offset, u32 area_size,
+				u32 node_size, size_t item_size,
+				u16 position,
+				struct ssdfs_offset2folio *desc);
+int ssdfs_btree_node_get_hash_range(struct ssdfs_btree_search *search,
+				    u64 *start_hash, u64 *end_hash,
+				    u16 *items_count);
+int __ssdfs_btree_common_node_extract_index(struct ssdfs_btree_node *node,
+				    struct ssdfs_btree_node_index_area *area,
+				    u16 found_index,
+				    struct ssdfs_btree_index_key *ptr);
+int ssdfs_btree_node_check_hash_range(struct ssdfs_btree_node *node,
+				      u16 items_count,
+				      u16 items_capacity,
+				      u64 start_hash,
+				      u64 end_hash,
+				      struct ssdfs_btree_search *search);
+int ssdfs_btree_node_clear_range(struct ssdfs_btree_node *node,
+				struct ssdfs_btree_node_items_area *area,
+				size_t item_size,
+				struct ssdfs_btree_search *search);
+int __ssdfs_btree_node_clear_range(struct ssdfs_btree_node *node,
+				   struct ssdfs_btree_node_items_area *area,
+				   size_t item_size,
+				   u16 start_index,
+				   unsigned int range_len);
+int ssdfs_btree_node_copy_header_nolock(struct ssdfs_btree_node *node,
+					struct folio *folio,
+					u32 *write_offset);
+int ssdfs_btree_node_read_content(struct ssdfs_btree_node *node,
+				  u32 offset, u32 size,
+				  void *buf, u32 buf_size);
+int ssdfs_btree_node_write_content(struct ssdfs_btree_node *node,
+				   u32 offset, u32 size,
+				   void *buf, u32 buf_size);
+
+void ssdfs_btree_node_forget_folio_batch(struct folio_batch *batch);
+void ssdfs_show_btree_node_info(struct ssdfs_btree_node *node);
+void ssdfs_debug_btree_node_object(struct ssdfs_btree_node *node);
+
+#endif /* _SSDFS_BTREE_NODE_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 48/79] ssdfs: introduce b-tree hierarchy object
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (18 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 46/79] ssdfs: introduce b-tree node object Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 50/79] ssdfs: introduce inodes b-tree Viacheslav Dubeyko
                   ` (12 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

B-tree needs to serve the operations of adding items, inserting
items, and deleting items. These operations could require
modification of b-tree structure (adding and deleting nodes).
Also, indexes need to be updated in parent nodes. SSDFS file
system uses the special b-tree hierarchy object to manage
the b-tree structure. For every b-tree modification request,
file system logic creates the hierarchy object and executes
the b-tree hierarchy check. The checking logic defines the actions
that should be done for every level of b-tree to execute b-tree
node's add or delete operation. Finally, b-tree hierarchy object
represents the actions plan that modification logic has to execute.

The main goal of checking b-tree hierarchy for add operation is
to define how many new nodes and which type(s) should be added.
Any b-tree starts with creation of root node. The root node can
store only two index keys. Initially, SSDFS logic adds leaf
nodes into empty b-tree. If root node already contains two index
keys on leaf nodes, then hybrid node needs to be added into
the b-tree. Hybrid node contains as index as item areas. New items
will be added into items area of hybrid node until this area
becomes completely full. B-tree logic allocates new leaf node,
all existing items in hybrid node are moved into newly created leaf
node, and index key is added into hybrid node's index area.
Such operation repeat multiple times until index area of hybrid
node becomes completely full. Now index area is resized by
increasing twice in size after moving existing items into newly
created node. Finally, hybrid node will be converted into index node.
If root node contains two index keys on hybrid nodes, then
index node will be added in the b-tree. Generaly speaking, the leaf
nodes are always allocated for the lowest level. Next level contains
hybrid nodes. And the rest of b-tree levels contain index nodes.

Every b-tree node has associated hash value that represents
starting hash value of records sequence in the node. This hash
value is stored in index key that keeps parent node. Finally,
hash values are used for search items in the b-tree.
If modification operation changes starting hash value in
the node, then index key in parent node has to be updated.
The checking logic identifies all parent nodes that requires
index keys update. As a result, modification logic executes
index keys update in all parent nodes that were selected for
update by checking logic. Delete operation requires to identify
which nodes are empty and should be deleted/invalidated.
This invalidation plan is executed by modification logic,
finally.

For every b-tree modification request, file system logic creates
the hierarchy object and executes the b-tree hierarchy check.
The checking logic defines the actions that should be done for
every level of b-tree to execute b-tree node's add or delete
operation. Finally, b-tree hierarchy object represents the actions
plan that modification logic has to execute. The execution logic
simply starts from the bottom of the hierarchy and executes planned
action for every level of b-tree. The planned actions could include
adding a new empty node, moving items from hybrid parent node into
leaf one, rebalancing b-tree, updating indexes. Finally, b-tree
should be able to receive new items/indexes and be consistent.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/btree_hierarchy.h | 336 +++++++++++++++++++++++++++++++++++++
 1 file changed, 336 insertions(+)
 create mode 100644 fs/ssdfs/btree_hierarchy.h

diff --git a/fs/ssdfs/btree_hierarchy.h b/fs/ssdfs/btree_hierarchy.h
new file mode 100644
index 000000000000..ebc9ab18d6ff
--- /dev/null
+++ b/fs/ssdfs/btree_hierarchy.h
@@ -0,0 +1,336 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/btree_hierarchy.h - btree hierarchy declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_BTREE_HIERARCHY_H
+#define _SSDFS_BTREE_HIERARCHY_H
+
+/*
+ * struct ssdfs_hash_range - hash range
+ * @start: start hash
+ * @end: end hash
+ */
+struct ssdfs_hash_range {
+	u64 start;
+	u64 end;
+};
+
+/*
+ * struct ssdfs_btree_node_position - node's position range
+ * @state: intersection state
+ * @start: starting node's position
+ * @count: number of positions in the range
+ */
+struct ssdfs_btree_node_position {
+	int state;
+	u16 start;
+	u16 count;
+};
+
+/* Intersection states */
+enum {
+	SSDFS_HASH_RANGE_INTERSECTION_UNDEFINED,
+	SSDFS_HASH_RANGE_LEFT_ADJACENT,
+	SSDFS_HASH_RANGE_INTERSECTION,
+	SSDFS_HASH_RANGE_RIGHT_ADJACENT,
+	SSDFS_HASH_RANGE_OUT_OF_NODE,
+	SSDFS_HASH_RANGE_INTERSECTION_STATE_MAX
+};
+
+/*
+ * struct ssdfs_btree_node_insert - insert position
+ * @op_state: operation state
+ * @hash: hash range of insertion
+ * @pos: position descriptor
+ */
+struct ssdfs_btree_node_insert {
+	int op_state;
+	struct ssdfs_hash_range hash;
+	struct ssdfs_btree_node_position pos;
+};
+
+/*
+ * struct ssdfs_btree_node_move - moving range descriptor
+ * @op_state: operation state
+ * @direction: moving direction
+ * @pos: position descriptor
+ */
+struct ssdfs_btree_node_move {
+	int op_state;
+	int direction;
+	struct ssdfs_btree_node_position pos;
+};
+
+/*
+ * struct ssdfs_btree_node_delete - deleting node's index descriptor
+ * @op_state: operation state
+ * @node_index: node index for deletion
+ */
+struct ssdfs_btree_node_delete {
+	int op_state;
+	struct ssdfs_btree_index_key node_index;
+};
+
+/* Possible operation states */
+enum {
+	SSDFS_BTREE_AREA_OP_UNKNOWN,
+	SSDFS_BTREE_AREA_OP_REQUESTED,
+	SSDFS_BTREE_AREA_OP_DONE,
+	SSDFS_BTREE_AREA_OP_FAILED,
+	SSDFS_BTREE_AREA_OP_STATE_MAX
+};
+
+/* Possible moving directions */
+enum {
+	SSDFS_BTREE_MOVE_NOWHERE,
+	SSDFS_BTREE_MOVE_TO_PARENT,
+	SSDFS_BTREE_MOVE_TO_CHILD,
+	SSDFS_BTREE_MOVE_TO_LEFT,
+	SSDFS_BTREE_MOVE_TO_RIGHT,
+	SSDFS_BTREE_MOVE_DIRECTION_MAX
+};
+
+/* Btree level's flags */
+#define SSDFS_BTREE_LEVEL_ADD_NODE		(1 << 0)
+#define SSDFS_BTREE_LEVEL_ADD_INDEX		(1 << 1)
+#define SSDFS_BTREE_LEVEL_UPDATE_INDEX		(1 << 2)
+#define SSDFS_BTREE_LEVEL_ADD_ITEM		(1 << 3)
+#define SSDFS_BTREE_INDEX_AREA_NEED_MOVE	(1 << 4)
+#define SSDFS_BTREE_ITEMS_AREA_NEED_MOVE	(1 << 5)
+#define SSDFS_BTREE_TRY_RESIZE_INDEX_AREA	(1 << 6)
+#define SSDFS_BTREE_LEVEL_DELETE_NODE		(1 << 7)
+#define SSDFS_BTREE_LEVEL_DELETE_INDEX		(1 << 8)
+#define SSDFS_BTREE_LEVEL_FLAGS_MASK		0x1FF
+
+#define SSDFS_BTREE_ADD_NODE_MASK \
+	(SSDFS_BTREE_LEVEL_ADD_NODE | SSDFS_BTREE_LEVEL_ADD_INDEX | \
+	 SSDFS_BTREE_LEVEL_UPDATE_INDEX | SSDFS_BTREE_LEVEL_ADD_ITEM | \
+	 SSDFS_BTREE_INDEX_AREA_NEED_MOVE | \
+	 SSDFS_BTREE_ITEMS_AREA_NEED_MOVE | \
+	 SSDFS_BTREE_TRY_RESIZE_INDEX_AREA)
+
+#define SSDFS_BTREE_DELETE_NODE_MASK \
+	(SSDFS_BTREE_LEVEL_UPDATE_INDEX | SSDFS_BTREE_LEVEL_DELETE_NODE | \
+	 SSDFS_BTREE_LEVEL_DELETE_INDEX | SSDFS_BTREE_ITEMS_AREA_NEED_MOVE)
+
+/*
+ * struct ssdfs_btree_level_node - node descriptor
+ * @type: node's type
+ * @index_hash: old index area's hash pair
+ * @items_hash: old items area's hash pair
+ * @ptr: pointer on node's object
+ */
+struct ssdfs_btree_level_node {
+	int type;
+	struct ssdfs_hash_range index_hash;
+	struct ssdfs_hash_range items_hash;
+	struct ssdfs_btree_node *ptr;
+};
+
+/*
+ * struct ssdfs_btree_node_details - b-tree node details
+ * @ptr: node pointer
+ * @index_area.index_count: count of indexes in index area
+ * @index_area.index_capacity: capacity of indexes in index area
+ * @items_area.area_size: items area's size in bytes
+ * @items_area.free_space: items area's free space in bytes
+ * @items_area.item_size: size of item in bytes
+ * @items_area.min_item_size: minimal size of item in bytes
+ * @items_area.max_item_size: maximum size of item in bytes
+ * @items_area.items_count: count of items in items area
+ * @items_area.items_capacity: capacity of items in items area
+ */
+struct ssdfs_btree_node_details {
+	struct ssdfs_btree_node *ptr;
+
+	struct {
+		u16 index_count;
+		u16 index_capacity;
+	} index_area;
+
+	struct {
+		u32 area_size;
+		u32 free_space;
+		u16 item_size;
+		u8 min_item_size;
+		u16 max_item_size;
+		u16 items_count;
+		u16 items_capacity;
+	} items_area;
+};
+
+/*
+ * struct ssdfs_btree_level_node_desc - descriptor of level's nodes
+ * @left_node: left node from the old node of the level
+ * @old_node: old node of the level
+ * @right_node: right node from the old node of the level
+ * @new_node: created empty node
+ */
+struct ssdfs_btree_level_node_desc {
+	struct ssdfs_btree_level_node left_node;
+	struct ssdfs_btree_level_node old_node;
+	struct ssdfs_btree_level_node right_node;
+	struct ssdfs_btree_level_node new_node;
+};
+
+/*
+ * struct ssdfs_btree_level - btree level descriptor
+ * @flags: level's flags
+ * @index_area.area_size: size of the index area
+ * @index_area.free_space: free space in index area
+ * @index_area.hash: hash range of index area
+ * @items_area.add: adding index descriptor
+ * @index_area.insert: insert position descriptor
+ * @index_area.move: move range descriptor
+ * @index_area.delete: delete index descriptor
+ * @items_area.area_size: size of the items area
+ * @items_area.free_space: free space in items area
+ * @items_area.hash: hash range of items area
+ * @items_area.add: adding item descriptor
+ * @items_area.insert: insert position descriptor
+ * @items_area.child2parent: child to/from parent move range descriptor
+ * @items_area.old2sibling: sibling to/from sibling move range descriptor
+ * @nodes: descriptor of level's nodes
+ */
+struct ssdfs_btree_level {
+	u32 flags;
+
+	struct {
+		u32 area_size;
+		u32 free_space;
+		struct ssdfs_hash_range hash;
+		struct ssdfs_btree_node_insert add;
+		struct ssdfs_btree_node_insert insert;
+		struct ssdfs_btree_node_move move;
+		struct ssdfs_btree_node_delete delete;
+	} index_area;
+
+	struct {
+		u32 area_size;
+		u32 free_space;
+		struct ssdfs_hash_range hash;
+		struct ssdfs_btree_node_insert add;
+		struct ssdfs_btree_node_insert insert;
+		struct ssdfs_btree_node_move child2parent;
+		struct ssdfs_btree_node_move old2sibling;
+	} items_area;
+
+	struct ssdfs_btree_level_node_desc nodes;
+};
+
+/*
+ * struct ssdfs_btree_state_descriptor - btree's state descriptor
+ * @height: btree height
+ * @increment_height: request to increment tree's height
+ * @node_size: size of the node in bytes
+ * @index_size: size of the index record in bytes
+ * @min_item_size: minimum item size in bytes
+ * @max_item_size: maximum item size in bytes
+ * @index_area_min_size: minimum size of index area in bytes
+ */
+struct ssdfs_btree_state_descriptor {
+	int height;
+	bool increment_height;
+	u32 node_size;
+	u16 index_size;
+	u16 min_item_size;
+	u16 max_item_size;
+	u16 index_area_min_size;
+};
+
+/*
+ * struct ssdfs_btree_hierarchy - btree's hierarchy descriptor
+ * @desc: btree state's descriptor
+ * @array_ptr: btree level's array
+ */
+struct ssdfs_btree_hierarchy {
+	struct ssdfs_btree_state_descriptor desc;
+	struct ssdfs_btree_level **array_ptr;
+};
+
+/* Btree hierarchy inline methods */
+static inline
+bool need_add_node(struct ssdfs_btree_level *level)
+{
+	return level->flags & SSDFS_BTREE_LEVEL_ADD_NODE;
+}
+
+static inline
+bool need_delete_node(struct ssdfs_btree_level *level)
+{
+	return level->flags & SSDFS_BTREE_LEVEL_DELETE_NODE;
+}
+
+/* Btree hierarchy API */
+struct ssdfs_btree_hierarchy *
+ssdfs_btree_hierarchy_allocate(struct ssdfs_btree *tree);
+void ssdfs_btree_hierarchy_init(struct ssdfs_btree *tree,
+				struct ssdfs_btree_hierarchy *hierarchy);
+void ssdfs_btree_hierarchy_free(struct ssdfs_btree_hierarchy *hierarchy);
+
+bool need_update_parent_index_area(u64 start_hash,
+				   struct ssdfs_btree_node *child);
+int ssdfs_btree_check_hierarchy_for_add(struct ssdfs_btree *tree,
+					struct ssdfs_btree_search *search,
+					struct ssdfs_btree_hierarchy *ptr);
+int ssdfs_btree_process_level_for_add(struct ssdfs_btree_hierarchy *ptr,
+					int cur_height,
+					struct ssdfs_btree_search *search);
+int ssdfs_btree_check_hierarchy_for_delete(struct ssdfs_btree *tree,
+					struct ssdfs_btree_search *search,
+					struct ssdfs_btree_hierarchy *ptr);
+int ssdfs_btree_process_level_for_delete(struct ssdfs_btree_hierarchy *ptr,
+					 int cur_height,
+					 struct ssdfs_btree_search *search);
+int ssdfs_btree_check_hierarchy_for_update(struct ssdfs_btree *tree,
+					   struct ssdfs_btree_search *search,
+					   struct ssdfs_btree_hierarchy *ptr);
+int ssdfs_btree_process_level_for_update(struct ssdfs_btree_hierarchy *ptr,
+					 int cur_height,
+					 struct ssdfs_btree_search *search);
+bool __need_migrate_items_to_parent_node(struct ssdfs_btree_node *parent,
+					 struct ssdfs_btree_node *child);
+int ssdfs_btree_check_hierarchy_for_parent_child_merge(struct ssdfs_btree *tree,
+					struct ssdfs_btree_search *search,
+					struct ssdfs_btree_hierarchy *hierarchy);
+int ssdfs_btree_process_level_for_node_merge(struct ssdfs_btree_hierarchy *ptr,
+					     int cur_height,
+					     struct ssdfs_btree_search *search);
+bool __need_merge_sibling_nodes(struct ssdfs_btree_node *parent,
+				struct ssdfs_btree_node *child);
+int ssdfs_btree_check_hierarchy_for_siblings_merge(struct ssdfs_btree *tree,
+					struct ssdfs_btree_search *search,
+					struct ssdfs_btree_hierarchy *hierarchy);
+
+/* Btree hierarchy internal API*/
+void ssdfs_btree_prepare_add_node(struct ssdfs_btree *tree,
+				  int node_type,
+				  u64 start_hash, u64 end_hash,
+				  struct ssdfs_btree_level *level,
+				  struct ssdfs_btree_node *node);
+int ssdfs_btree_prepare_add_index(struct ssdfs_btree_level *level,
+				  u64 start_hash, u64 end_hash,
+				  struct ssdfs_btree_node *node);
+
+void ssdfs_show_btree_hierarchy_object(struct ssdfs_btree_hierarchy *ptr);
+void ssdfs_debug_btree_hierarchy_object(struct ssdfs_btree_hierarchy *ptr);
+
+#endif /* _SSDFS_BTREE_HIERARCHY_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 50/79] ssdfs: introduce inodes b-tree
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (19 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 48/79] ssdfs: introduce b-tree hierarchy object Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 52/79] ssdfs: introduce dentries b-tree Viacheslav Dubeyko
                   ` (11 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

SSDFS raw inode is the metadata structure of fixed size that
can vary from 256 bytes to several KBs. The size of inode is defined
during the file system’s volume creation. The most special part of
the SSDFS raw inode is the private area that is used for storing:
(1) small file inline, (2) root node of extents, dentries, and/or
xattr b-tree.

SSDFS inodes b-tree is the hybrid b-tree that includes the hybrid
nodes with the goal to use the node’s space in more efficient way
by means of combination the index and data records inside of the node.
Root node of inodes b-tree is stored into the log footer or partial
log header of every log. Generally speaking, it means that SSDFS file
system is using the massive replication of the root node of inodes
b-tree. Actually, inodes b-tree node’s space includes header,
index area (in the case of hybrid node), and array of inodes are ordered
by ID values. If node has 8 KB in size and inode structure is 256 bytes
in size then the maximum capacity of one inodes b-tree’s node is 32 inodes.

Generally speaking, inodes table can be imagined like an imaginary array
that is extended by means of adding the new inodes into the tail of
the array. However, inode can be allocated or deleted by virtue of
create file or delete file operations, for example. As a result, every
b-tree node has an allocation bitmap that is tracking the state (used
or free) of every inode in the b-tree node. The allocation bitmap provides
the mechanism of fast lookup a free inode with the goal to reuse
the inodes of deleted files.

Additionally, every b-tree node has a dirty bitmap that has goal to track
modification of inodes. Generally speaking, the dirty bitmap provides
the opportunity to flush not the whole node but the modified inodes only.
As a result, such bitmap could play the cornerstone role in
the delta-encoding or in the Diff-On-Write approach. Moreover, b-tree
node has a lock bitmap that has responsibility to implement the mechanism of
exclusive lock a particular inode without the necessity to lock
exclusively the whole node. Generally speaking, the lock bitmap was
introduced with the goal to improve the granularity of lock operation.
As a result, it provides the way to modify the different inodes in
the same b-tree node without the using of exclusive lock the whole b-tree
node. However, the exclusive lock of the whole tree has to be used for
the case of addition or deletion a b-tree node.

Inodes b-tree supports operations:
(1) find_item - find item in the b-tree
(2) find_range - find range of items in the b-tree
(3) extract_range - extract range of items from the node of b-tree
(4) allocate_item - allocate item in b-tree
(5) allocate_range - allocate range of items in b-tree
(6) insert_item - insert item into node of the b-tree
(7) insert_range - insert range of items into node of the b-tree
(8) change_item - change item in the b-tree
(9) delete_item - delete item from the b-tree
(10) delete_range - delete range of items from a node of b-tree

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/inodes_tree.h | 181 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 181 insertions(+)
 create mode 100644 fs/ssdfs/inodes_tree.h

diff --git a/fs/ssdfs/inodes_tree.h b/fs/ssdfs/inodes_tree.h
new file mode 100644
index 000000000000..95267c631dd5
--- /dev/null
+++ b/fs/ssdfs/inodes_tree.h
@@ -0,0 +1,181 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/inodes_tree.h - inodes btree declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_INODES_TREE_H
+#define _SSDFS_INODES_TREE_H
+
+/*
+ * struct ssdfs_inodes_range - items range
+ * @start_hash: starting hash
+ * @start_index: staring index in the node
+ * @count: count of items in the range
+ */
+struct ssdfs_inodes_range {
+#define SSDFS_INODES_RANGE_INVALID_START	(U64_MAX)
+	u64 start_hash;
+#define SSDFS_INODES_RANGE_INVALID_INDEX	(U16_MAX)
+	u16 start_index;
+	u16 count;
+};
+
+/*
+ * struct ssdfs_inodes_btree_range - node's items range descriptor
+ * @list: free inode ranges queue
+ * @node_id: node identification number
+ * @area: items range
+ */
+struct ssdfs_inodes_btree_range {
+	struct list_head list;
+	u32 node_id;
+	struct ssdfs_inodes_range area;
+};
+
+/*
+ * struct ssdfs_free_inode_range_queue - free inode ranges queue
+ * @lock: queue's lock
+ * @list: queue's list
+ */
+struct ssdfs_free_inode_range_queue {
+	spinlock_t lock;
+	struct list_head list;
+};
+
+/*
+ * struct ssdfs_inodes_btree_info - inodes btree info
+ * @generic_tree: generic btree description
+ * @lock: inodes btree lock
+ * @root_folder: copy of root folder's inode
+ * @upper_allocated_ino: maximal allocated inode ID number
+ * @last_free_ino: latest free inode ID number
+ * @allocated_inodes: allocated inodes count in the whole tree
+ * @free_inodes: free inodes count in the whole tree
+ * @inodes_capacity: inodes capacity in the whole tree
+ * @leaf_nodes: count of leaf nodes in the whole tree
+ * @nodes_count: count of all nodes in the whole tree
+ * @raw_inode_size: size in bytes of raw inode
+ * @free_inodes_queue: queue of free inode descriptors
+ * @fsi: pointer on shared file system object
+ */
+struct ssdfs_inodes_btree_info {
+	struct ssdfs_btree generic_tree;
+
+	spinlock_t lock;
+	struct ssdfs_inode root_folder;
+	u64 upper_allocated_ino;
+	u64 last_free_ino;
+	u64 allocated_inodes;
+	u64 free_inodes;
+	u64 inodes_capacity;
+	u32 leaf_nodes;
+	u32 nodes_count;
+	u16 raw_inode_size;
+
+	/*
+	 * Inodes btree should have special allocation queue.
+	 * If a btree nodes has free (not allocated) inodes
+	 * items then the information about such btree node
+	 * should be added into queue. Moreover, queue should
+	 * contain as so many node's descriptors as free items
+	 * in the node.
+	 *
+	 * If some btree node has deleted inodes (free items)
+	 * then all node's descriptors should be added into
+	 * the head of allocation queue. Descriptors of the last
+	 * btree's node should be added into tail of the queue.
+	 * Information about node's descriptors should be added
+	 * into the allocation queue during btree node creation
+	 * or reading from the volume. Otherwise, allocation of
+	 * new items should be done from last leaf btree's node.
+	 */
+	struct ssdfs_free_inode_range_queue free_inodes_queue;
+
+	struct ssdfs_fs_info *fsi;
+};
+
+/*
+ * Inline methods
+ */
+static inline
+bool is_free_inodes_range_invalid(struct ssdfs_inodes_btree_range *range)
+{
+	bool is_invalid;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!range);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	is_invalid = range->node_id == SSDFS_BTREE_NODE_INVALID_ID ||
+		range->area.start_hash == SSDFS_INODES_RANGE_INVALID_START ||
+		range->area.start_index == SSDFS_INODES_RANGE_INVALID_INDEX ||
+		range->area.count == 0;
+
+	if (is_invalid) {
+		SSDFS_ERR("node_id %u, start_hash %llx, "
+			  "start_index %u, count %u\n",
+			  range->node_id,
+			  range->area.start_hash,
+			  range->area.start_index,
+			  range->area.count);
+	}
+
+	return is_invalid;
+}
+
+/*
+ * Free inodes range API
+ */
+struct ssdfs_inodes_btree_range *ssdfs_free_inodes_range_alloc(void);
+void ssdfs_free_inodes_range_free(struct ssdfs_inodes_btree_range *range);
+void ssdfs_free_inodes_range_init(struct ssdfs_inodes_btree_range *range);
+
+/*
+ * Inodes btree API
+ */
+int ssdfs_inodes_btree_create(struct ssdfs_fs_info *fsi);
+void ssdfs_inodes_btree_destroy(struct ssdfs_fs_info *fsi);
+int ssdfs_inodes_btree_flush(struct ssdfs_inodes_btree_info *tree);
+
+int ssdfs_inodes_btree_allocate(struct ssdfs_inodes_btree_info *tree,
+				ino_t *ino,
+				struct ssdfs_btree_search *search);
+int ssdfs_inodes_btree_find(struct ssdfs_inodes_btree_info *tree,
+			    ino_t ino,
+			    struct ssdfs_btree_search *search);
+int ssdfs_inodes_btree_change(struct ssdfs_inodes_btree_info *tree,
+				ino_t ino,
+				struct ssdfs_btree_search *search);
+int ssdfs_inodes_btree_delete(struct ssdfs_inodes_btree_info *tree,
+				ino_t ino);
+int ssdfs_inodes_btree_delete_range(struct ssdfs_inodes_btree_info *tree,
+				    ino_t ino, u16 count);
+
+void ssdfs_debug_inodes_btree_object(struct ssdfs_inodes_btree_info *tree);
+
+/*
+ * Inodes btree specialized operations
+ */
+extern const struct ssdfs_btree_descriptor_operations
+						ssdfs_inodes_btree_desc_ops;
+extern const struct ssdfs_btree_operations ssdfs_inodes_btree_ops;
+extern const struct ssdfs_btree_node_operations ssdfs_inodes_btree_node_ops;
+
+#endif /* _SSDFS_INODES_TREE_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 52/79] ssdfs: introduce dentries b-tree
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (20 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 50/79] ssdfs: introduce inodes b-tree Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 55/79] ssdfs: introduce extents b-tree Viacheslav Dubeyko
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

SSDFS dentry is the metadata structure of fixed size (32 bytes).
It contains inode ID, name hash, name length, and inline string
for 12 symbols. Generally speaking, the dentry is able to store
8.3 filename inline. If the name of file/folder has longer name
(more than 12 symbols) then the dentry will keep only the portion of
the name but the whole name will be stored into a shared dictionary.
The goal of such approach is to represent the dentry by compact
metadata structure of fixed size for the fast and efficient operations
with the dentries. It is possible to point out that there are a lot of
use-cases when the length of file or folder is not very long. As a result,
dentry’s inline string could be only storage for the file/folder name.
Moreover, the goal of shared dictionary is to store the long names
efficiently by means of using the deduplication mechanism.

Dentries b-tree is the hybrid b-tree with the root node is stored into
the private inode’s area. By default, inode’s private area has 128 bytes
in size. Also SSDFS dentry has 32 bytes in size. As a result, inode’s
private area provides enough space for 4 inline dentries. Generally
speaking, if a folder contains 4 or lesser files then the dentries
can be stored into the inode’s private area without the necessity
to create the dentries b-tree. Otherwise, if a folder includes
more than 4 files or folders then it needs to create the regular
dentries b-tree with the root node is stored into the private area
of inode. Actually, every node of dentries b-tree contains the header,
index area (for the case of hybrid node), and array of dentries are ordered
by hash value of filename. Moreover, if a b-tree node has 8 KB size then
it is capable to contain maximum 256 dentries. Generally speaking,
the hybrid b-tree was opted for the dentries metadata structure
by virtue of compactness of metadata structure representation and
efficient lookup mechanism. Dentries is ordered on the basis of
name’s hash. Every node of dentries b-tree has: (1) dirty bitmap -
tracking modified dentries, (2) lock bitmap - exclusive locking of
particular dentries without the necessity to lock the whole b-tree
node. Actually, it is expected that dentries b-tree could contain
not many nodes in average because the two nodes (8K in size) of
dentries b-tree is capable to store about 400 dentries.

Dentries b-tree implements API:
(1) create - create dentries b-tree
(2) destroy - destroy dentries b-tree
(3) flush - flush dirty dentries b-tree
(4) find - find dentry for a name in b-tree
(5) add - add dentry object into b-tree
(6) change - change/update dentry object in b-tree
(7) delete - delete dentry object from b-tree
(8) delete_all - delete all dentries from b-tree

Dentries b-tree node implements a specialized API:
(1) find_item - find item in the node
(2) find_range - find range of items in the node
(3) extract_range - extract range of items (or all items) from node
(4) insert_item - insert item in the node
(5) insert_range - insert range of items in the node
(6) change_item - change item in the node
(7) delete_item - delete item from the node
(8) delete_range - delete range of items from the node

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/dentries_tree.h | 158 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 158 insertions(+)
 create mode 100644 fs/ssdfs/dentries_tree.h

diff --git a/fs/ssdfs/dentries_tree.h b/fs/ssdfs/dentries_tree.h
new file mode 100644
index 000000000000..1428e5d70d49
--- /dev/null
+++ b/fs/ssdfs/dentries_tree.h
@@ -0,0 +1,158 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/dentries_tree.h - dentries btree declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_DENTRIES_TREE_H
+#define _SSDFS_DENTRIES_TREE_H
+
+#define SSDFS_INLINE_DENTRIES_COUNT		(2 * SSDFS_INLINE_DENTRIES_PER_AREA)
+#define SSDFS_FOLDER_DEFAULT_SHORTCUTS_COUNT	(2)
+
+/*
+ * struct ssdfs_dentries_btree_info - dentries btree info
+ * @type: dentries btree type
+ * @state: dentries btree state
+ * @dentries_count: count of the dentries in the whole dentries tree
+ * @lock: dentries btree lock
+ * @generic_tree: pointer on generic btree object
+ * @inline_dentries: pointer on inline dentries array
+ * @buffer.tree: piece of memory for generic btree object
+ * @buffer.dentries: piece of memory for the inline dentries
+ * @root: pointer on root node
+ * @root_buffer: buffer for root node
+ * @desc: b-tree descriptor
+ * @owner: pointer on owner inode object
+ * @fsi: pointer on shared file system object
+ *
+ * A newly created inode tries to store dentries into inline
+ * dentries. The raw on-disk inode has internal private area
+ * that is able to contain the four inline dentries or
+ * root node of extents btree and extended attributes btree.
+ * If inode hasn't extended attributes and the amount of dentries
+ * are lesser than four then everithing can be stored inside of
+ * inline dentries. Otherwise, the real dentries btree should
+ * be created.
+ */
+struct ssdfs_dentries_btree_info {
+	atomic_t type;
+	atomic_t state;
+	atomic64_t dentries_count;
+
+	struct rw_semaphore lock;
+	struct ssdfs_btree *generic_tree;
+	struct ssdfs_dir_entry *inline_dentries;
+	union {
+		struct ssdfs_btree tree;
+		struct ssdfs_dir_entry dentries[SSDFS_INLINE_DENTRIES_COUNT];
+	} buffer;
+	struct ssdfs_btree_inline_root_node *root;
+	struct ssdfs_btree_inline_root_node root_buffer;
+
+	struct ssdfs_dentries_btree_descriptor desc;
+	struct ssdfs_inode_info *owner;
+	struct ssdfs_fs_info *fsi;
+};
+
+/* Dentries tree types */
+enum {
+	SSDFS_DENTRIES_BTREE_UNKNOWN_TYPE,
+	SSDFS_INLINE_DENTRIES_ARRAY,
+	SSDFS_PRIVATE_DENTRIES_BTREE,
+	SSDFS_DENTRIES_BTREE_TYPE_MAX
+};
+
+/* Dentries tree states */
+enum {
+	SSDFS_DENTRIES_BTREE_UNKNOWN_STATE,
+	SSDFS_DENTRIES_BTREE_CREATED,
+	SSDFS_DENTRIES_BTREE_INITIALIZED,
+	SSDFS_DENTRIES_BTREE_DIRTY,
+	SSDFS_DENTRIES_BTREE_CORRUPTED,
+	SSDFS_DENTRIES_BTREE_STATE_MAX
+};
+
+/*
+ * Inline methods
+ */
+static inline
+size_t ssdfs_inline_dentries_size(void)
+{
+	size_t dentry_size = sizeof(struct ssdfs_dir_entry);
+	return dentry_size * SSDFS_INLINE_DENTRIES_COUNT;
+}
+
+static inline
+size_t ssdfs_area_dentries_size(void)
+{
+	size_t dentry_size = sizeof(struct ssdfs_dir_entry);
+	return dentry_size * SSDFS_INLINE_DENTRIES_PER_AREA;
+}
+
+/*
+ * Dentries tree API
+ */
+int ssdfs_dentries_tree_create(struct ssdfs_fs_info *fsi,
+				struct ssdfs_inode_info *ii);
+int ssdfs_dentries_tree_init(struct ssdfs_fs_info *fsi,
+			     struct ssdfs_inode_info *ii);
+void ssdfs_dentries_tree_destroy(struct ssdfs_inode_info *ii);
+int ssdfs_dentries_tree_flush(struct ssdfs_fs_info *fsi,
+				struct ssdfs_inode_info *ii);
+
+int ssdfs_dentries_tree_find(struct ssdfs_dentries_btree_info *tree,
+			     const char *name, size_t len,
+			     struct ssdfs_btree_search *search);
+int ssdfs_dentries_tree_add(struct ssdfs_dentries_btree_info *tree,
+			    const struct qstr *str,
+			    struct ssdfs_inode_info *ii,
+			    struct ssdfs_btree_search *search);
+int ssdfs_dentries_tree_change(struct ssdfs_dentries_btree_info *tree,
+				u64 name_hash, ino_t old_ino,
+				const struct qstr *str,
+				struct ssdfs_inode_info *new_ii,
+				struct ssdfs_btree_search *search);
+int ssdfs_dentries_tree_delete(struct ssdfs_dentries_btree_info *tree,
+				u64 name_hash, ino_t ino,
+				struct ssdfs_btree_search *search);
+int ssdfs_dentries_tree_delete_all(struct ssdfs_dentries_btree_info *tree);
+
+/*
+ * Internal dentries tree API
+ */
+u64 ssdfs_generate_name_hash(const struct qstr *str);
+int ssdfs_dentries_tree_find_leaf_node(struct ssdfs_dentries_btree_info *tree,
+					u64 name_hash,
+					struct ssdfs_btree_search *search);
+int ssdfs_dentries_tree_extract_range(struct ssdfs_dentries_btree_info *tree,
+				      u16 start_index, u16 count,
+				      struct ssdfs_btree_search *search);
+
+void ssdfs_debug_dentries_btree_object(struct ssdfs_dentries_btree_info *tree);
+
+/*
+ * Dentries btree specialized operations
+ */
+extern const struct ssdfs_btree_descriptor_operations
+						ssdfs_dentries_btree_desc_ops;
+extern const struct ssdfs_btree_operations ssdfs_dentries_btree_ops;
+extern const struct ssdfs_btree_node_operations ssdfs_dentries_btree_node_ops;
+
+#endif /* _SSDFS_DENTRIES_TREE_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 55/79] ssdfs: introduce extents b-tree
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (21 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 52/79] ssdfs: introduce dentries b-tree Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 57/79] ssdfs: introduce invalidated " Viacheslav Dubeyko
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

SSDFS raw extent describes a contiguous sequence of logical blocks
by means of segment ID, logical block number of starting position,
and length. By default, SSDFS inode has the private area of
128 bytes in size and SSDFS extent has 16 bytes in size. As a result,
the inode’s private area is capable to store not more than 8 raw
extents. Generally speaking, hybrid b-tree was opted with the goal
to store efficiently larger number of raw extents. First of all,
it was taken into account that file sizes can vary a lot on the same
file system’s volume. Moreover, the size of the same file could vary
significantly during its lifetime. Finally, b-tree is the really good
mechanism for storing the extents compactly with very flexible way of
increasing or shrinking the reserved space. Also b-tree provides very
efficient technique of extents lookup. Additionally, SSDFS file system
uses compression that guarantee the really compact storage of semi-empty
b-tree nodes. Moreover, hybrid b-tree provides the way to mix as index as
data records in the hybrid nodes with the goal to achieve much more
compact representation of b-tree’s content. It needs to point out that
extents b-tree’s nodes group the extent records into forks.
Generally speaking, the raw extent describes a position on the volume of
some contiguous sequence of logical blocks without any details about
the offset of this extent from a file’s beginning. As a result, the fork
describes an offset of some portion of file’s content from the file’s
beginning and number of logical blocks in this portion. Also fork contains
the space for three raw extents that are able to define the position of
three contiguous sequences of logical blocks on the file system’s volume.
Finally, one fork has 64 bytes in size. If anybody considers a b-tree
node of 4 KB in size then such node is capable to store about 64 forks with
192 extents in total. Generally speaking, even a small b-tree is able
to store a significant number of extents and to determine the position of
fragments of generally big file. If anybody imagines a b-tree with the two
4 KB nodes in total, every extent defines a position of 8 MB file’s
portion then such b-tree is able to describe a file of 3 GB in total.

Extents b-tree implements API:
(1) create - create extents b-tree
(2) destroy - destroy extents b-tree
(3) flush - flush dirty extents b-tree
(4) prepare_volume_extent - convert requested offset into extent
(5) recommend_migration_extent - find extent recommended for migration
(6) add_extent - add extent into the extents b-tree
(7) move_extent - move extent from one segment into another one
(8) truncate - truncate extent b-tree

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/extents_tree.h | 188 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 188 insertions(+)
 create mode 100644 fs/ssdfs/extents_tree.h

diff --git a/fs/ssdfs/extents_tree.h b/fs/ssdfs/extents_tree.h
new file mode 100644
index 000000000000..ac9c7b82cae1
--- /dev/null
+++ b/fs/ssdfs/extents_tree.h
@@ -0,0 +1,188 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/extents_tree.h - extents tree declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_EXTENTS_TREE_H
+#define _SSDFS_EXTENTS_TREE_H
+
+#define SSDFS_COMMIT_QUEUE_DEFAULT_CAPACITY	(16)
+#define SSDFS_COMMIT_QUEUE_THRESHOLD		(32)
+
+/*
+ * struct ssdfs_commit_queue - array of segment IDs
+ * @ids: array of segment IDs
+ * @count: number of items in the queue
+ * @capacity: maximum number of available positions in the queue
+ */
+struct ssdfs_commit_queue {
+	u64 *ids;
+	u32 count;
+	u32 capacity;
+};
+
+/*
+ * struct ssdfs_extents_btree_info - extents btree info
+ * @type: extents btree type
+ * @state: extents btree state
+ * @forks_count: count of the forks in the whole extents tree
+ * @lock: extents btree lock
+ * @generic_tree: pointer on generic btree object
+ * @inline_forks: pointer on inline forks array
+ * @buffer.tree: piece of memory for generic btree object
+ * @buffer.forks: piece of memory for the inline forks
+ * @root: pointer on root node
+ * @root_buffer: buffer for root node
+ * @updated_segs: updated segments queue
+ * @desc: b-tree descriptor
+ * @owner: pointer on owner inode object
+ * @fsi: pointer on shared file system object
+ *
+ * A newly created inode tries to store extents into inline
+ * forks. Every fork contains three extents. The raw on-disk
+ * inode has internal private area that is able to contain the
+ * two inline forks or root node of extents btree and extended
+ * attributes btree. If inode hasn't extended attributes and
+ * the amount of extents are lesser than six then everithing
+ * can be stored inside of inline forks. Otherwise, the real
+ * extents btree should be created.
+ */
+struct ssdfs_extents_btree_info {
+	atomic_t type;
+	atomic_t state;
+	atomic64_t forks_count;
+
+	struct rw_semaphore lock;
+	struct ssdfs_btree *generic_tree;
+	struct ssdfs_raw_fork *inline_forks;
+	union {
+		struct ssdfs_btree tree;
+		struct ssdfs_raw_fork forks[SSDFS_INLINE_FORKS_COUNT];
+	} buffer;
+	struct ssdfs_btree_inline_root_node *root;
+	struct ssdfs_btree_inline_root_node root_buffer;
+	struct ssdfs_commit_queue updated_segs;
+
+	struct ssdfs_extents_btree_descriptor desc;
+	struct ssdfs_inode_info *owner;
+	struct ssdfs_fs_info *fsi;
+};
+
+/* Extents tree types */
+enum {
+	SSDFS_EXTENTS_BTREE_UNKNOWN_TYPE,
+	SSDFS_INLINE_FORKS_ARRAY,
+	SSDFS_PRIVATE_EXTENTS_BTREE,
+	SSDFS_EXTENTS_BTREE_TYPE_MAX
+};
+
+/* Extents tree states */
+enum {
+	SSDFS_EXTENTS_BTREE_UNKNOWN_STATE,
+	SSDFS_EXTENTS_BTREE_CREATED,
+	SSDFS_EXTENTS_BTREE_INITIALIZED,
+	SSDFS_EXTENTS_BTREE_DIRTY,
+	SSDFS_EXTENTS_BTREE_CORRUPTED,
+	SSDFS_EXTENTS_BTREE_STATE_MAX
+};
+
+/*
+ * struct ssdfs_file_fragment - file's fragment
+ * @start_blk: offset inside of file in logical blocks
+ * @len: fragment's length in logical blocks
+ * @extent: raw extent in segment
+ */
+struct ssdfs_file_fragment {
+	u64 start_blk;
+	u32 len;
+	struct ssdfs_raw_extent extent;
+};
+
+/*
+ * Extents tree API
+ */
+int ssdfs_extents_tree_create(struct ssdfs_fs_info *fsi,
+				struct ssdfs_inode_info *ii);
+int ssdfs_extents_tree_init(struct ssdfs_fs_info *fsi,
+			    struct ssdfs_inode_info *ii);
+void ssdfs_extents_tree_destroy(struct ssdfs_inode_info *ii);
+int ssdfs_extents_tree_flush(struct ssdfs_fs_info *fsi,
+			     struct ssdfs_inode_info *ii);
+int ssdfs_extents_tree_add_updated_seg_id(struct ssdfs_extents_btree_info *tree,
+					  u64 seg_id);
+
+int __ssdfs_prepare_volume_extent(struct ssdfs_fs_info *fsi,
+				  struct inode *inode,
+				  struct ssdfs_logical_extent *requested,
+				  struct ssdfs_volume_extent *place);
+int ssdfs_prepare_volume_extent(struct ssdfs_fs_info *fsi,
+				struct ssdfs_segment_request *req);
+int ssdfs_recommend_migration_extent(struct ssdfs_fs_info *fsi,
+				     struct ssdfs_segment_request *req,
+				     struct ssdfs_zone_fragment *fragment);
+bool ssdfs_extents_tree_has_logical_block(u64 blk_offset, struct inode *inode);
+int ssdfs_extents_tree_add_extent(struct inode *inode,
+				  struct ssdfs_segment_request *req);
+int ssdfs_extents_tree_move_extent(struct ssdfs_extents_btree_info *tree,
+				   u64 blk, u32 len,
+				   struct ssdfs_raw_extent *new_extent,
+				   struct ssdfs_btree_search *search);
+int ssdfs_extents_tree_truncate(struct inode *inode);
+
+/*
+ * Extents tree internal API
+ */
+int ssdfs_extents_tree_find_fork(struct ssdfs_extents_btree_info *tree,
+				 u64 blk,
+				 struct ssdfs_btree_search *search);
+int __ssdfs_extents_tree_add_extent(struct ssdfs_extents_btree_info *tree,
+				     u64 blk,
+				     struct ssdfs_raw_extent *extent,
+				     struct ssdfs_btree_search *search);
+int ssdfs_extents_tree_change_extent(struct ssdfs_extents_btree_info *tree,
+				     u64 blk,
+				     struct ssdfs_raw_extent *extent,
+				     struct ssdfs_btree_search *search);
+int ssdfs_extents_tree_truncate_extent(struct ssdfs_extents_btree_info *tree,
+					u64 blk, u32 new_len,
+					struct ssdfs_btree_search *search);
+int ssdfs_extents_tree_delete_extent(struct ssdfs_extents_btree_info *tree,
+				     u64 blk,
+				     struct ssdfs_btree_search *search);
+int ssdfs_extents_tree_delete_all(struct ssdfs_extents_btree_info *tree);
+int __ssdfs_extents_btree_node_get_fork(struct ssdfs_fs_info *fsi,
+					struct ssdfs_btree_node_content *content,
+					u32 area_offset,
+					u32 area_size,
+					u32 node_size,
+					u16 item_index,
+					struct ssdfs_raw_fork *fork);
+
+void ssdfs_debug_extents_btree_object(struct ssdfs_extents_btree_info *tree);
+
+/*
+ * Extents btree specialized operations
+ */
+extern const struct ssdfs_btree_descriptor_operations
+						ssdfs_extents_btree_desc_ops;
+extern const struct ssdfs_btree_operations ssdfs_extents_btree_ops;
+extern const struct ssdfs_btree_node_operations ssdfs_extents_btree_node_ops;
+
+#endif /* _SSDFS_EXTENTS_TREE_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 57/79] ssdfs: introduce invalidated extents b-tree
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (22 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 55/79] ssdfs: introduce extents b-tree Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 59/79] ssdfs: introduce shared " Viacheslav Dubeyko
                   ` (8 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

ZNS SSD operates by zone concept. Zone can be: (1) empty,
(2) open (implicitly or explicitly), (3) closed, (4) full.
The number of open/active zones is limited by some threshold.
To manage open/active zones limitation, SSDFS has current
user data segment for new data and current user data segment
to receive updates for closed zones. Every update of data in
closed zone requires: (1) store updated data into current
segment for updated user data, (2) update extents b-tree by
new data location, (3) add invalidated extent of closed zone
into invalidated extents b-tree. Invalidated extents b-tree
is responsible for: (1) correct erase block's (closed zone)
block bitmap by means of setting moved logical blocks as
invalidated during erase block object initialization,
(2) collect all invalidated extents of closed zone.
If the length of all closed zone's invalidated extents is equal
to zone size, then closed zone can be re-initialized or be erased.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/invalidated_extents_tree.h | 96 +++++++++++++++++++++++++++++
 1 file changed, 96 insertions(+)
 create mode 100644 fs/ssdfs/invalidated_extents_tree.h

diff --git a/fs/ssdfs/invalidated_extents_tree.h b/fs/ssdfs/invalidated_extents_tree.h
new file mode 100644
index 000000000000..a0d1993e5523
--- /dev/null
+++ b/fs/ssdfs/invalidated_extents_tree.h
@@ -0,0 +1,96 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/invalidated_extents_tree.h - invalidated extents btree declarations.
+ *
+ * Copyright (c) 2022-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * Copyright (c) 2022-2023 Bytedance Ltd. and/or its affiliates.
+ *              https://www.bytedance.com/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cong Wang
+ */
+
+#ifndef _SSDFS_INVALIDATED_EXTENTS_TREE_H
+#define _SSDFS_INVALIDATED_EXTENTS_TREE_H
+
+/*
+ * struct ssdfs_invextree_info - invalidated extents tree object
+ * @state: invalidated extents btree state
+ * @lock: invalidated extents btree lock
+ * @generic_tree: generic btree description
+ * @extents_count: count of extents in the whole tree
+ * @fsi: pointer on shared file system object
+ */
+struct ssdfs_invextree_info {
+	atomic_t state;
+	struct rw_semaphore lock;
+	struct ssdfs_btree generic_tree;
+
+	atomic64_t extents_count;
+
+	struct ssdfs_fs_info *fsi;
+};
+
+/* Invalidated extents tree states */
+enum {
+	SSDFS_INVEXTREE_UNKNOWN_STATE,
+	SSDFS_INVEXTREE_CREATED,
+	SSDFS_INVEXTREE_INITIALIZED,
+	SSDFS_INVEXTREE_DIRTY,
+	SSDFS_INVEXTREE_CORRUPTED,
+	SSDFS_INVEXTREE_STATE_MAX
+};
+
+/*
+ * Invalidated extents tree API
+ */
+int ssdfs_invextree_create(struct ssdfs_fs_info *fsi);
+void ssdfs_invextree_destroy(struct ssdfs_fs_info *fsi);
+int ssdfs_invextree_flush(struct ssdfs_fs_info *fsi);
+
+int ssdfs_invextree_find(struct ssdfs_invextree_info *tree,
+			 struct ssdfs_raw_extent *extent,
+			 struct ssdfs_btree_search *search);
+int ssdfs_invextree_add(struct ssdfs_invextree_info *tree,
+			struct ssdfs_raw_extent *extent,
+			struct ssdfs_btree_search *search);
+int ssdfs_invextree_delete(struct ssdfs_invextree_info *tree,
+			   struct ssdfs_raw_extent *extent,
+			   struct ssdfs_btree_search *search);
+
+/*
+ * Invalidated extents tree's internal API
+ */
+int ssdfs_invextree_find_leaf_node(struct ssdfs_invextree_info *tree,
+				   u64 seg_id,
+				   struct ssdfs_btree_search *search);
+int ssdfs_invextree_get_start_hash(struct ssdfs_invextree_info *tree,
+				   u64 *start_hash);
+int ssdfs_invextree_node_hash_range(struct ssdfs_invextree_info *tree,
+				    struct ssdfs_btree_search *search,
+				    u64 *start_hash, u64 *end_hash,
+				    u16 *items_count);
+int ssdfs_invextree_extract_range(struct ssdfs_invextree_info *tree,
+				  u16 start_index, u16 count,
+				  struct ssdfs_btree_search *search);
+int ssdfs_invextree_check_search_result(struct ssdfs_btree_search *search);
+int ssdfs_invextree_get_next_hash(struct ssdfs_invextree_info *tree,
+				  struct ssdfs_btree_search *search,
+				  u64 *next_hash);
+
+void ssdfs_debug_invextree_object(struct ssdfs_invextree_info *tree);
+
+/*
+ * Invalidated extents btree specialized operations
+ */
+extern const struct ssdfs_btree_descriptor_operations ssdfs_invextree_desc_ops;
+extern const struct ssdfs_btree_operations ssdfs_invextree_ops;
+extern const struct ssdfs_btree_node_operations ssdfs_invextree_node_ops;
+
+#endif /* _SSDFS_INVALIDATED_EXTENTS_TREE_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 59/79] ssdfs: introduce shared extents b-tree
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (23 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 57/79] ssdfs: introduce invalidated " Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 61/79] ssdfs: introduce PEB-based deduplication technique Viacheslav Dubeyko
                   ` (7 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

SSDFS file system has been designed to support
deduplication technique. There are two mechanisms
are supported by SSDFS: (1) erase block based
deduplication, (2) file based deduplication.
Shared extents b-tree implements file based
deduplication. It means that file system logic
can calculate the fingerprints of files' portions
and stores these fingerprints in the shared extents
b-tree. If the same fingerprint is already stored
into the shared extents b-tree, then deduplication
can happen on the file basis.

Also, shared extents b-tree has a dedicated thread
that executes the delayed extents invalidation. It means
that any file's extent, b-tree's extent, or the whole
b-tree can be added into the queue of this thread for
delayed invalidation instead of doing the immediate
extent invalidation. Generally speaking, it implies
that huge file can be deleted really fast, but the
real invalidation will happen in the background.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/shared_extents_tree.h | 146 +++++++++++++++++++++++++++++++++
 1 file changed, 146 insertions(+)
 create mode 100644 fs/ssdfs/shared_extents_tree.h

diff --git a/fs/ssdfs/shared_extents_tree.h b/fs/ssdfs/shared_extents_tree.h
new file mode 100644
index 000000000000..6b9a200462d6
--- /dev/null
+++ b/fs/ssdfs/shared_extents_tree.h
@@ -0,0 +1,146 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/shared_extents_tree.h - shared extents tree declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_SHARED_EXTENTS_TREE_H
+#define _SSDFS_SHARED_EXTENTS_TREE_H
+
+#include "fingerprint.h"
+
+/*
+ * struct ssdfs_invalidation_queue - invalidation queue object
+ * @queue: extents/index queue object
+ * @content: btree node's content buffer
+ * @thread: descriptor of queue's thread
+ */
+struct ssdfs_invalidation_queue {
+	struct ssdfs_extents_queue queue;
+	struct ssdfs_btree_node_content content;
+	struct ssdfs_thread_info thread;
+};
+
+/* Invalidation queue ID */
+enum {
+	SSDFS_EXTENT_INVALIDATION_QUEUE,
+	SSDFS_INDEX_INVALIDATION_QUEUE,
+	SSDFS_INVALIDATION_QUEUE_NUMBER
+};
+
+/*
+ * struct ssdfs_shared_extents_tree - shared extents tree object
+ * @state: shared extents btree state
+ * @lock: shared extents btree lock
+ * @generic_tree: generic btree description
+ * @shared_extents: count of the shared extents in the whole tree
+ * @array: invalidation queues array
+ * @wait_queue: wait queue of shared extents tree's thread
+ * @fsi: pointer on shared file system object
+ */
+struct ssdfs_shared_extents_tree {
+	atomic_t state;
+	struct rw_semaphore lock;
+	struct ssdfs_btree generic_tree;
+
+	atomic64_t shared_extents;
+
+	struct ssdfs_invalidation_queue array[SSDFS_INVALIDATION_QUEUE_NUMBER];
+	wait_queue_head_t wait_queue;
+
+	struct ssdfs_fs_info *fsi;
+};
+
+/* Shared extents tree states */
+enum {
+	SSDFS_SHEXTREE_UNKNOWN_STATE,
+	SSDFS_SHEXTREE_CREATED,
+	SSDFS_SHEXTREE_INITIALIZED,
+	SSDFS_SHEXTREE_DIRTY,
+	SSDFS_SHEXTREE_CORRUPTED,
+	SSDFS_SHEXTREE_STATE_MAX
+};
+
+#define SSDFS_SHARED_EXT(ptr) \
+	((struct ssdfs_shared_extent *)(ptr))
+
+/*
+ * Shared extents tree API
+ */
+int ssdfs_shextree_create(struct ssdfs_fs_info *fsi);
+void ssdfs_shextree_destroy(struct ssdfs_fs_info *fsi);
+int ssdfs_shextree_flush(struct ssdfs_fs_info *fsi);
+
+int ssdfs_shextree_find(struct ssdfs_shared_extents_tree *tree,
+			struct ssdfs_fingerprint *fingerprint,
+			struct ssdfs_btree_search *search);
+int ssdfs_shextree_find_range(struct ssdfs_shared_extents_tree *tree,
+			      struct ssdfs_fingerprint_range *range,
+			      struct ssdfs_btree_search *search);
+int ssdfs_shextree_find_leaf_node(struct ssdfs_shared_extents_tree *tree,
+				  struct ssdfs_fingerprint *fingerprint,
+				  struct ssdfs_btree_search *search);
+int ssdfs_shextree_add(struct ssdfs_shared_extents_tree *tree,
+			struct ssdfs_fingerprint *fingerprint,
+			struct ssdfs_shared_extent *extent,
+			struct ssdfs_btree_search *search);
+int ssdfs_shextree_change(struct ssdfs_shared_extents_tree *tree,
+			  struct ssdfs_fingerprint *fingerprint,
+			  struct ssdfs_shared_extent *extent,
+			  struct ssdfs_btree_search *search);
+int ssdfs_shextree_ref_count_inc(struct ssdfs_shared_extents_tree *tree,
+				 struct ssdfs_fingerprint *fingerprint,
+				 struct ssdfs_btree_search *search);
+int ssdfs_shextree_ref_count_dec(struct ssdfs_shared_extents_tree *tree,
+				 struct ssdfs_fingerprint *fingerprint,
+				 struct ssdfs_btree_search *search);
+int ssdfs_shextree_delete(struct ssdfs_shared_extents_tree *tree,
+			  struct ssdfs_fingerprint *fingerprint,
+			  struct ssdfs_btree_search *search);
+int ssdfs_shextree_delete_all(struct ssdfs_shared_extents_tree *tree);
+
+int ssdfs_shextree_add_pre_invalid_extent(struct ssdfs_shared_extents_tree *tree,
+					  u64 owner_ino,
+					  struct ssdfs_raw_extent *extent);
+int ssdfs_shextree_add_pre_invalid_fork(struct ssdfs_shared_extents_tree *tree,
+					u64 owner_ino,
+					struct ssdfs_raw_fork *fork);
+int ssdfs_shextree_add_pre_invalid_index(struct ssdfs_shared_extents_tree *tree,
+					 u64 owner_ino,
+					 int index_type,
+					 struct ssdfs_btree_index_key *index);
+
+/*
+ * Shared extents tree's internal API
+ */
+int ssdfs_shextree_start_thread(struct ssdfs_shared_extents_tree *tree,
+				int index);
+int ssdfs_shextree_stop_thread(struct ssdfs_shared_extents_tree *tree,
+				int index);
+
+void ssdfs_debug_shextree_object(struct ssdfs_shared_extents_tree *tree);
+
+/*
+ * Shared extents btree specialized operations
+ */
+extern const struct ssdfs_btree_descriptor_operations ssdfs_shextree_desc_ops;
+extern const struct ssdfs_btree_operations ssdfs_shextree_ops;
+extern const struct ssdfs_btree_node_operations ssdfs_shextree_node_ops;
+
+#endif /* _SSDFS_SHARED_EXTENTS_TREE_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 61/79] ssdfs: introduce PEB-based deduplication technique
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (24 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 59/79] ssdfs: introduce shared " Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 62/79] ssdfs: introduce shared dictionary b-tree Viacheslav Dubeyko
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

SSDFS file system splits the whole volume space on
sequence of segments. Every segment contains one or
several erase blocks. Erase block contains a sequence
of logs. As a result, log contains files' or metadata's
contents as a log's payload. Flush thread of erase
block receives the requests to store new logical blocks
or to update existing ones. PEB-based deduplication
logic calculates the fingerprints for logical blocks
in requests. If the calculated fingerprint is identical
to any existing one, then the deduplication is happened
for the log under creation. It means that if a logical
block with identical content was already stored into
the erase block, then it will be used the reference for
this logical block instead of storing the logical
block with the same content again. This deduplication
logic can be efficient if there are many duplicated
logical blocks in the same file or different files
that are flushing together.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/fingerprint.h       | 261 ++++++++++++
 fs/ssdfs/fingerprint_array.c | 795 +++++++++++++++++++++++++++++++++++
 fs/ssdfs/fingerprint_array.h |  82 ++++
 fs/ssdfs/peb_deduplication.c | 483 +++++++++++++++++++++
 4 files changed, 1621 insertions(+)
 create mode 100644 fs/ssdfs/fingerprint.h
 create mode 100644 fs/ssdfs/fingerprint_array.c
 create mode 100644 fs/ssdfs/fingerprint_array.h
 create mode 100644 fs/ssdfs/peb_deduplication.c

diff --git a/fs/ssdfs/fingerprint.h b/fs/ssdfs/fingerprint.h
new file mode 100644
index 000000000000..0e1da7f461bb
--- /dev/null
+++ b/fs/ssdfs/fingerprint.h
@@ -0,0 +1,261 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/fingerprint.h - fingerprint's declarations.
+ *
+ * Copyright (c) 2023-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_FINGERPRINT_H
+#define _SSDFS_FINGERPRINT_H
+
+#include <crypto/hash_info.h>
+#include <crypto/ghash.h>
+#include <crypto/polyval.h>
+
+/*
+ * struct ssdfs_fingerprint - fingerprint object
+ * @buf: fingerprint buffer
+ * @len: fingerprint length
+ * @type: fingerprint type
+ */
+struct ssdfs_fingerprint {
+	u8 buf[SSDFS_FINGERPRINT_LENGTH_MAX];
+	u8 len;
+	u8 type;
+};
+
+/* Fingerprint types */
+enum {
+	SSDFS_UNKNOWN_FINGERPRINT_TYPE = 0,
+	SSDFS_MD5_FINGERPRINT,
+	SSDFS_SHA1_FINGERPRINT,
+	SSDFS_SHA224_FINGERPRINT,
+	SSDFS_SHA256_FINGERPRINT,
+	SSDFS_GHASH_FINGERPRINT,
+	SSDFS_POLYVAL_FINGERPRINT,
+	SSDFS_FINGERPRINT_TYPE_MAX
+};
+
+/* Fingerprint algorithm names */
+#define SSDFS_MD5_HASH_FUNCTION_NAME		("md5")
+#define SSDFS_SHA1_HASH_FUNCTION_NAME		("sha1")
+#define SSDFS_SHA224_HASH_FUNCTION_NAME		("sha224")
+#define SSDFS_SHA256_HASH_FUNCTION_NAME		("sha256")
+#define SSDFS_GHASH_HASH_FUNCTION_NAME		("ghash")
+#define SSDFS_POLYVAL_HASH_FUNCTION_NAME	("polyval")
+
+/*
+ * struct ssdfs_fingerprint_range - range of fingerprints
+ * @start: starting fingerprint
+ * @end: ending fingerprint
+ */
+struct ssdfs_fingerprint_range {
+	struct ssdfs_fingerprint start;
+	struct ssdfs_fingerprint end;
+};
+
+/*
+ * Inline methods
+ */
+
+/*
+ * SSDFS_DEFAULT_FINGERPRINT_TYPE() - default fingerprint type
+ */
+static inline
+int SSDFS_DEFAULT_FINGERPRINT_TYPE(void)
+{
+#ifdef CONFIG_SSDFS_MD5_FINGEPRINT_TYPE
+	return SSDFS_MD5_FINGERPRINT;
+#elif defined(CONFIG_SSDFS_SHA1_FINGEPRINT_TYPE)
+	return SSDFS_SHA1_FINGERPRINT;
+#elif defined(CONFIG_SSDFS_SHA224_FINGEPRINT_TYPE)
+	return SSDFS_SHA224_FINGERPRINT;
+#elif defined(CONFIG_SSDFS_SHA256_FINGEPRINT_TYPE)
+	return SSDFS_SHA256_FINGERPRINT;
+#elif defined(CONFIG_SSDFS_GHASH_FINGEPRINT_TYPE)
+	return SSDFS_GHASH_FINGERPRINT;
+#elif defined(CONFIG_SSDFS_POLYVAL_FINGEPRINT_TYPE)
+	return SSDFS_POLYVAL_FINGERPRINT;
+#else
+	return SSDFS_UNKNOWN_FINGERPRINT_TYPE;
+#endif
+}
+
+/*
+ * SSDFS_FINGERPRINT_TYPE2NAME() - convert fingerprint type into name
+ */
+static inline
+const char *SSDFS_FINGERPRINT_TYPE2NAME(int type)
+{
+	switch (type) {
+	case SSDFS_MD5_FINGERPRINT:
+		return SSDFS_MD5_HASH_FUNCTION_NAME;
+	case SSDFS_SHA1_FINGERPRINT:
+		return SSDFS_SHA1_HASH_FUNCTION_NAME;
+	case SSDFS_SHA224_FINGERPRINT:
+		return SSDFS_SHA224_HASH_FUNCTION_NAME;
+	case SSDFS_SHA256_FINGERPRINT:
+		return SSDFS_SHA256_HASH_FUNCTION_NAME;
+	case SSDFS_GHASH_FINGERPRINT:
+		return SSDFS_GHASH_HASH_FUNCTION_NAME;
+	case SSDFS_POLYVAL_FINGERPRINT:
+		return SSDFS_POLYVAL_HASH_FUNCTION_NAME;
+	default:
+		/* SHA1 is used by default */
+		break;
+	}
+
+	return SSDFS_SHA1_HASH_FUNCTION_NAME;
+}
+
+/*
+ * SSDFS_DEFAULT_FINGERPRINT_NAME() - default fingerprint algorithm name
+ */
+static inline
+const char *SSDFS_DEFAULT_FINGERPRINT_NAME(void)
+{
+#ifdef CONFIG_SSDFS_MD5_FINGEPRINT_TYPE
+	return SSDFS_FINGERPRINT_TYPE2NAME(SSDFS_MD5_FINGERPRINT);
+#elif defined(CONFIG_SSDFS_SHA1_FINGEPRINT_TYPE)
+	return SSDFS_FINGERPRINT_TYPE2NAME(SSDFS_SHA1_FINGERPRINT);
+#elif defined(CONFIG_SSDFS_SHA224_FINGEPRINT_TYPE)
+	return SSDFS_FINGERPRINT_TYPE2NAME(SSDFS_SHA224_FINGERPRINT);
+#elif defined(CONFIG_SSDFS_SHA256_FINGEPRINT_TYPE)
+	return SSDFS_FINGERPRINT_TYPE2NAME(SSDFS_SHA256_FINGERPRINT);
+#elif defined(CONFIG_SSDFS_GHASH_FINGEPRINT_TYPE)
+	return SSDFS_FINGERPRINT_TYPE2NAME(SSDFS_GHASH_FINGERPRINT);
+#elif defined(CONFIG_SSDFS_POLYVAL_FINGEPRINT_TYPE)
+	return SSDFS_FINGERPRINT_TYPE2NAME(SSDFS_POLYVAL_FINGERPRINT);
+#else
+	return SSDFS_FINGERPRINT_TYPE2NAME(SSDFS_UNKNOWN_FINGERPRINT_TYPE);
+#endif
+}
+
+/*
+ * SSDFS_FINGEPRINT_TYPE2LENGTH() - convert fingerprint type into digest size
+ */
+static inline
+u32 SSDFS_FINGEPRINT_TYPE2LENGTH(int type)
+{
+	switch (type) {
+	case SSDFS_MD5_FINGERPRINT:
+		return MD5_DIGEST_SIZE;
+	case SSDFS_SHA1_FINGERPRINT:
+		return SHA1_DIGEST_SIZE;
+	case SSDFS_SHA224_FINGERPRINT:
+		return SHA224_DIGEST_SIZE;
+	case SSDFS_SHA256_FINGERPRINT:
+		return SHA256_DIGEST_SIZE;
+	case SSDFS_GHASH_FINGERPRINT:
+		return GHASH_DIGEST_SIZE;
+	case SSDFS_POLYVAL_FINGERPRINT:
+		return POLYVAL_DIGEST_SIZE;
+	default:
+		SSDFS_WARN("unknown fingerprint type %#x\n",
+			   type);
+		break;
+	}
+
+	return U32_MAX;
+}
+
+/*
+ * SSDFS_DEFAULT_FINGERPRINT_LENGTH() - default fingerprint digest size
+ */
+static inline
+u32 SSDFS_DEFAULT_FINGERPRINT_LENGTH(void)
+{
+#ifdef CONFIG_SSDFS_MD5_FINGEPRINT_TYPE
+	return SSDFS_FINGEPRINT_TYPE2LENGTH(SSDFS_MD5_FINGERPRINT);
+#elif defined(CONFIG_SSDFS_SHA1_FINGEPRINT_TYPE)
+	return SSDFS_FINGEPRINT_TYPE2LENGTH(SSDFS_SHA1_FINGERPRINT);
+#elif defined(CONFIG_SSDFS_SHA224_FINGEPRINT_TYPE)
+	return SSDFS_FINGEPRINT_TYPE2LENGTH(SSDFS_SHA224_FINGERPRINT);
+#elif defined(CONFIG_SSDFS_SHA256_FINGEPRINT_TYPE)
+	return SSDFS_FINGEPRINT_TYPE2LENGTH(SSDFS_SHA256_FINGERPRINT);
+#elif defined(CONFIG_SSDFS_GHASH_FINGEPRINT_TYPE)
+	return SSDFS_FINGEPRINT_TYPE2LENGTH(SSDFS_GHASH_FINGERPRINT);
+#elif defined(CONFIG_SSDFS_POLYVAL_FINGEPRINT_TYPE)
+	return SSDFS_FINGEPRINT_TYPE2LENGTH(SSDFS_POLYVAL_FINGERPRINT);
+#else
+	return SSDFS_FINGEPRINT_TYPE2LENGTH(SSDFS_UNKNOWN_FINGERPRINT_TYPE);
+#endif
+}
+
+/*
+ * IS_FINGERPRINT_VALID() - check that fingerprint is valid
+ */
+static inline
+bool IS_FINGERPRINT_VALID(struct ssdfs_fingerprint *hash)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!hash);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (hash->type <= SSDFS_UNKNOWN_FINGERPRINT_TYPE ||
+	    hash->type >= SSDFS_FINGERPRINT_TYPE_MAX)
+		return false;
+
+	if (hash->len == 0 || hash->len > SSDFS_FINGERPRINT_LENGTH_MAX)
+		return false;
+
+	return true;
+}
+
+/*
+ * IS_FINGERPRINTS_COMPARABLE() - check that fingerprints can be compared
+ */
+static inline
+bool IS_FINGERPRINTS_COMPARABLE(struct ssdfs_fingerprint *hash1,
+				struct ssdfs_fingerprint *hash2)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!hash1 || !hash2);
+	BUG_ON(!IS_FINGERPRINT_VALID(hash1));
+	BUG_ON(!IS_FINGERPRINT_VALID(hash2));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (hash1->type == hash2->type && hash1->len == hash2->len)
+		return true;
+
+	return false;
+}
+
+/*
+ * ssdfs_compare_fingerprints() - compare fingerprints
+ * @hash1: first fingerprint
+ * @hash2: second fingerprint
+ *
+ * This function tries to compare two fingerprints.
+ *
+ * RETURN:
+ * [-1]   - hash1 is lesser that hash2
+ * [ 0]   - hash1 is equal to hash2
+ * [+1]   - hash1 is bigger that hash2
+ */
+static inline
+int ssdfs_compare_fingerprints(struct ssdfs_fingerprint *hash1,
+				struct ssdfs_fingerprint *hash2)
+{
+	size_t len;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!hash1 || !hash2);
+	BUG_ON(!IS_FINGERPRINT_VALID(hash1));
+	BUG_ON(!IS_FINGERPRINT_VALID(hash2));
+	BUG_ON(!IS_FINGERPRINTS_COMPARABLE(hash1, hash2));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	len = min_t(u8, hash1->len, hash2->len);
+
+	return memcmp(hash1->buf, hash2->buf, len);
+}
+
+#endif /* _SSDFS_FINGERPRINT_H */
diff --git a/fs/ssdfs/fingerprint_array.c b/fs/ssdfs/fingerprint_array.c
new file mode 100644
index 000000000000..22036b1dfabf
--- /dev/null
+++ b/fs/ssdfs/fingerprint_array.c
@@ -0,0 +1,795 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/fingerprint_array.c - fingerprint array implementation.
+ *
+ * Copyright (c) 2023-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#include <linux/pagemap.h>
+#include <linux/slab.h>
+#include <linux/pagevec.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "ssdfs.h"
+#include "fingerprint_array.h"
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+atomic64_t ssdfs_fingerprint_array_folio_leaks;
+atomic64_t ssdfs_fingerprint_array_memory_leaks;
+atomic64_t ssdfs_fingerprint_array_cache_leaks;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+/*
+ * void ssdfs_fingerprint_array_cache_leaks_increment(void *kaddr)
+ * void ssdfs_fingerprint_array_cache_leaks_decrement(void *kaddr)
+ * void *ssdfs_fingerprint_array_kmalloc(size_t size, gfp_t flags)
+ * void *ssdfs_fingerprint_array_kzalloc(size_t size, gfp_t flags)
+ * void *ssdfs_fingerprint_array_kcalloc(size_t n, size_t size, gfp_t flags)
+ * void ssdfs_fingerprint_array_kfree(void *kaddr)
+ * struct folio *ssdfs_fingerprint_array_alloc_folio(gfp_t gfp_mask,
+ *                          			     unsigned int order)
+ * struct folio *ssdfs_file_add_batch_folio(struct folio_batch *batch,
+ *                                          unsigned int order)
+ * void ssdfs_fingerprint_array_free_folio(struct folio *folio)
+ * void ssdfs_file_folio_batch_release(struct folio_batch *batch)
+ */
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_MEMORY_LEAKS_CHECKER_FNS(fingerprint_array)
+#else
+	SSDFS_MEMORY_ALLOCATOR_FNS(fingerprint_array)
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+void ssdfs_fingerprint_array_memory_leaks_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_fingerprint_array_folio_leaks, 0);
+	atomic64_set(&ssdfs_fingerprint_array_memory_leaks, 0);
+	atomic64_set(&ssdfs_fingerprint_array_cache_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+void ssdfs_fingerprint_array_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_fingerprint_array_folio_leaks) != 0) {
+		SSDFS_ERR("FINGERPRINT ARRAY: "
+			  "memory leaks include %lld folios\n",
+			  atomic64_read(&ssdfs_fingerprint_array_folio_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_fingerprint_array_memory_leaks) != 0) {
+		SSDFS_ERR("FINGERPRINT ARRAY: "
+			  "memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_fingerprint_array_memory_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_fingerprint_array_cache_leaks) != 0) {
+		SSDFS_ERR("FINGERPRINT ARRAY: "
+			  "caches suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_fingerprint_array_cache_leaks));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+/*
+ * ssdfs_fingerprint_array_create() - create fingerprint array
+ * @farray: pointer on fingerprint array object
+ * @capacity: maximum number of items in array
+ */
+int ssdfs_fingerprint_array_create(struct ssdfs_fingerprint_array *farray,
+				   u32 capacity)
+{
+	size_t item_size = sizeof(struct ssdfs_fingerprint_item);
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!farray);
+
+	SSDFS_DBG("array %p, capacity %u, item_size %zu\n",
+		  farray, capacity, item_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	atomic_set(&farray->state, SSDFS_FINGERPRINT_ARRAY_UNKNOWN_STATE);
+
+	init_rwsem(&farray->lock);
+	farray->items_count = 0;
+
+	err = ssdfs_dynamic_array_create(&farray->array,
+					 capacity, item_size,
+					 0xFF);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to create dynamic array: "
+			  "capacity %u, item_size %zu, err %d\n",
+			  capacity, item_size, err);
+		return err;
+	}
+
+	atomic_set(&farray->state, SSDFS_FINGERPRINT_ARRAY_CREATED);
+
+	return 0;
+}
+
+/*
+ * ssdfs_fingerprint_array_destroy() - destroy fingerprint array
+ * @farray: pointer on fingerprint array object
+ */
+void ssdfs_fingerprint_array_destroy(struct ssdfs_fingerprint_array *farray)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!farray);
+
+	SSDFS_DBG("array %p\n", farray);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (atomic_read(&farray->state)) {
+	case SSDFS_FINGERPRINT_ARRAY_CREATED:
+		/* continue logic */
+		break;
+
+	default:
+		/* do nothing */
+		return;
+	}
+
+	down_write(&farray->lock);
+	ssdfs_dynamic_array_destroy(&farray->array);
+	farray->items_count = 0;
+	up_write(&farray->lock);
+}
+
+/*
+ * ssdfs_check_fingerprint_item() - compare fingerprint item with hash
+ * @hash: fingerprint with hash value
+ * @item: fingerprint item for comparison with hash value
+ *
+ * This method tries to compare the fingerprint hashes.
+ *
+ * RETURN:
+ * [success]
+ *
+ * %-ENOENT     - hash is lesser than item's fingerprint.
+ * %-EEXIST     - hash is equal to item's fingerprint.
+ * %-EAGAIN     - hash is bigger than item's fingerprint.
+ *
+ * [failure]
+ *
+ * %-EINVAL     - invalid input.
+ * %-ERANGE     - internal error.
+ */
+int ssdfs_check_fingerprint_item(struct ssdfs_fingerprint *hash,
+				 struct ssdfs_fingerprint_item *item)
+{
+	int res;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!item || !hash);
+
+	SSDFS_DBG("item %p, hash %p\n",
+		  item, hash);
+
+	SSDFS_DBG("HASH: type %#x, len %u\n",
+		  hash->type, hash->len);
+	SSDFS_DBG("HASH DUMP\n");
+	print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+			     hash->buf,
+			     SSDFS_FINGERPRINT_LENGTH_MAX);
+	SSDFS_DBG("\n");
+
+	SSDFS_DBG("ITEM HASH: type %#x, len %u\n",
+		  item->hash.type, item->hash.len);
+	SSDFS_DBG("ITEM HASH DUMP\n");
+	print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+			     item->hash.buf,
+			     SSDFS_FINGERPRINT_LENGTH_MAX);
+	SSDFS_DBG("\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!IS_FINGERPRINT_VALID(hash)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("hash is invalid\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EINVAL;
+	}
+
+	if (!IS_FINGERPRINT_VALID(&item->hash)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("item is invalid\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EINVAL;
+	}
+
+	if (!IS_FINGERPRINTS_COMPARABLE(hash, &item->hash)) {
+		SSDFS_ERR("fingerprings are incomparable\n");
+		return -EINVAL;
+	}
+
+	res = ssdfs_compare_fingerprints(hash, &item->hash);
+	if (res < 0) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("search fingerprint is lesser\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -ENOENT;
+	} else if (res == 0) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("identical fingerprint is found\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EEXIST;
+	} else {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("search fingerprint is bigger\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EAGAIN;
+	}
+
+	return 0;
+}
+
+/*
+ * ssdfs_fingerprint_array_find_nolock() - find fingerprint item without lock
+ * @farray: pointer on fingerprint array object
+ * @hash: fingerprint with hash value
+ * @item_index: pointer on found item index [out]
+ *
+ * This method tries to find the position in array for
+ * requested fingerprint hash without lock.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-ENOENT     - item is not found.
+ * %-ERANGE     - internal error.
+ */
+static
+int ssdfs_fingerprint_array_find_nolock(struct ssdfs_fingerprint_array *farray,
+					struct ssdfs_fingerprint *hash,
+					u32 *item_index)
+{
+	struct ssdfs_fingerprint_item *items_array;
+	struct ssdfs_fingerprint_item *item;
+	void *kaddr;
+	u32 total_items;
+	u32 items_count;
+	u32 processed_items = 0;
+	u32 lower_bound;
+	u32 upper_bound;
+	u32 cur_index;
+	u32 diff;
+	int res;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!farray || !hash || !item_index);
+	BUG_ON(!rwsem_is_locked(&farray->lock));
+
+	SSDFS_DBG("array %p, hash %p\n",
+		  farray, hash);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	total_items = ssdfs_dynamic_array_items_count(&farray->array);
+	if (total_items == 0) {
+		err = -ENOENT;
+		*item_index = 0;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("empty array: total_items %u\n",
+			  total_items);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto finish_search_item;
+	}
+
+	if (farray->items_count > total_items) {
+		err = -ERANGE;
+		SSDFS_ERR("items_count %u > total_items %u\n",
+			  farray->items_count,
+			  total_items);
+		goto finish_search_item;
+	}
+
+	if (farray->items_count == 0) {
+		err = -ENOENT;
+		*item_index = 0;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("empty array: items_count %u\n",
+			  farray->items_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto finish_search_item;
+	}
+
+	while (processed_items < farray->items_count) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("processed_items %u, total_items %u\n",
+			  processed_items, farray->items_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		kaddr = ssdfs_dynamic_array_get_content_locked(&farray->array,
+								processed_items,
+								&items_count);
+		if (IS_ERR_OR_NULL(kaddr)) {
+			err = (kaddr == NULL ? -ENOENT : PTR_ERR(kaddr));
+			SSDFS_ERR("fail to get fingerprints range: "
+				  "processed_items %u, err %d\n",
+				  processed_items, err);
+			goto finish_search_item;
+		}
+
+		if (items_count == 0) {
+			err = -ENOENT;
+			*item_index = processed_items;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("fingerprint portion is empty\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto unlock_fingerprints_portion;
+		}
+
+		items_array = (struct ssdfs_fingerprint_item *)kaddr;
+
+		item = &items_array[0];
+
+		err = ssdfs_check_fingerprint_item(hash, item);
+		if (err == -ENOENT) {
+			*item_index = processed_items;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("stop search: item %u, err %d\n",
+				  *item_index, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto unlock_fingerprints_portion;
+		} else if (err == -EEXIST) {
+			*item_index = processed_items;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("stop search: item %u, err %d\n",
+				  *item_index, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto unlock_fingerprints_portion;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to check fingerprint: err %d\n",
+				  err);
+			goto unlock_fingerprints_portion;
+		} else
+			BUG();
+
+		if (items_count == 1) {
+			err = -EAGAIN;
+			*item_index = processed_items;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("continue search: "
+				  "item %u\n",
+				  *item_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto unlock_fingerprints_portion;
+		}
+
+		item = &items_array[items_count - 1];
+
+		err = ssdfs_check_fingerprint_item(hash, item);
+		if (err == -ENOENT) {
+			/*
+			 * Continue search in the range
+			 */
+		} else if (err == -EEXIST) {
+			*item_index = items_count - 1;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("stop search: item %u, err %d\n",
+				  *item_index, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto unlock_fingerprints_portion;
+		} else if (err == -EAGAIN) {
+			*item_index = items_count - 1;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("continue search: "
+				  "item %u\n",
+				  *item_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto unlock_fingerprints_portion;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to check fingerprint: err %d\n",
+				  err);
+			goto unlock_fingerprints_portion;
+		} else
+			BUG();
+
+		lower_bound = 0;
+		*item_index = lower_bound;
+		upper_bound = items_count;
+		cur_index = upper_bound / 2;
+
+		do {
+			item = &items_array[cur_index];
+
+			err = ssdfs_check_fingerprint_item(hash, item);
+			if (err == -ENOENT) {
+				/* correct upper_bound */
+				upper_bound = cur_index;
+			} else if (err == -EEXIST) {
+				*item_index = cur_index;
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("stop search: item %u, err %d\n",
+					  *item_index, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+				goto unlock_fingerprints_portion;
+			} else if (err == -EAGAIN) {
+				/* correct lower_bound */
+				lower_bound = cur_index;
+				*item_index = lower_bound;
+			} else if (unlikely(err)) {
+				SSDFS_ERR("fail to check fingerprint: err %d\n",
+					  err);
+				goto unlock_fingerprints_portion;
+			} else
+				BUG();
+
+			diff = upper_bound - lower_bound;
+			cur_index = lower_bound + (diff / 2);
+		} while (lower_bound < upper_bound);
+
+unlock_fingerprints_portion:
+		res = ssdfs_dynamic_array_release(&farray->array,
+						  processed_items,
+						  kaddr);
+		if (unlikely(res)) {
+			SSDFS_ERR("fail to release fingerprints portion: "
+				  "processed_items %u, err %d\n",
+				  processed_items, res);
+#ifdef CONFIG_SSDFS_DEBUG
+			BUG();
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		processed_items += items_count;
+
+		if (err == -EAGAIN) {
+			if (processed_items < farray->items_count) {
+				err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("continue search: "
+					  "item %u\n",
+					  *item_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+				continue;
+			} else {
+				err = -ENOENT;
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("stop search: "
+					  "nothing is found: item %u\n",
+					  *item_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+				break;
+			}
+		} else if (err)
+			break;
+	}
+
+	if (err == -EEXIST) {
+		err = 0;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("fingerprint is found: "
+			  "item %u\n",
+			  *item_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+finish_search_item:
+	return err;
+}
+
+/*
+ * ssdfs_fingerprint_array_find() - find fingerprint item
+ * @farray: pointer on fingerprint array object
+ * @hash: fingerprint with hash value
+ * @item_index: pointer on found item index [out]
+ *
+ * This method tries to find the position in array for
+ * requested fingerprint hash.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-ENOENT     - item is not found.
+ * %-ERANGE     - internal error.
+ */
+int ssdfs_fingerprint_array_find(struct ssdfs_fingerprint_array *farray,
+				 struct ssdfs_fingerprint *hash,
+				 u32 *item_index)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!farray || !hash || !item_index);
+
+	SSDFS_DBG("array %p, hash %p\n",
+		  farray, hash);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*item_index = U32_MAX;
+
+	down_read(&farray->lock);
+	err = ssdfs_fingerprint_array_find_nolock(farray, hash, item_index);
+	up_read(&farray->lock);
+
+	return err;
+}
+
+/*
+ * ssdfs_fingerprint_array_get_nolock() - get fingerprint item without lock
+ * @farray: pointer on fingerprint array object
+ * @item_index: item index
+ * @item: pointer on buffer for requested fingerprint item [out]
+ *
+ * This method tries to extract the item on @item_index position
+ * in array without lock.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-ERANGE     - internal error.
+ */
+static inline
+int ssdfs_fingerprint_array_get_nolock(struct ssdfs_fingerprint_array *farray,
+					u32 item_index,
+					struct ssdfs_fingerprint_item *item)
+{
+	void *kaddr;
+	size_t item_size = sizeof(struct ssdfs_fingerprint_item);
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!farray || !item);
+	BUG_ON(!rwsem_is_locked(&farray->lock));
+
+	SSDFS_DBG("array %p, item_index %u\n",
+		  farray, item_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	kaddr = ssdfs_dynamic_array_get_locked(&farray->array, item_index);
+	if (IS_ERR_OR_NULL(kaddr)) {
+		err = (kaddr == NULL ? -ENOENT : PTR_ERR(kaddr));
+		SSDFS_ERR("fail to get fingerprint item: "
+			  "item_index %u, err %d\n",
+			  item_index, err);
+		goto finish_get_item;
+	}
+
+	ssdfs_memcpy(item, 0, item_size,
+		     kaddr, 0, item_size,
+		     item_size);
+
+	err = ssdfs_dynamic_array_release(&farray->array, item_index, kaddr);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to release: "
+			  "item_index %u, err %d\n",
+			  item_index, err);
+		goto finish_get_item;
+	}
+
+finish_get_item:
+	return err;
+}
+
+/*
+ * ssdfs_fingerprint_array_get() - get fingerprint item
+ * @farray: pointer on fingerprint array object
+ * @item_index: item index
+ * @item: pointer on buffer for requested fingerprint item [out]
+ *
+ * This method tries to extract the item on @item_index position
+ * in array.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-ERANGE     - internal error.
+ */
+int ssdfs_fingerprint_array_get(struct ssdfs_fingerprint_array *farray,
+				u32 item_index,
+				struct ssdfs_fingerprint_item *item)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!farray || !item);
+
+	SSDFS_DBG("array %p, item_index %u\n",
+		  farray, item_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	down_read(&farray->lock);
+	err = ssdfs_fingerprint_array_get_nolock(farray, item_index, item);
+	up_read(&farray->lock);
+
+	return err;
+}
+
+/*
+ * ssdfs_fingerprint_array_add() - add fingerprint item into array
+ * @farray: pointer on fingerprint array object
+ * @item: fingerprint item
+ * @item_index: item index
+ *
+ * This method tries to add the item in array.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EEXIST     - item exists in the array.
+ * %-ERANGE     - internal error.
+ */
+int ssdfs_fingerprint_array_add(struct ssdfs_fingerprint_array *farray,
+				struct ssdfs_fingerprint_item *item,
+				u32 item_index)
+{
+	struct ssdfs_fingerprint_item existing;
+	u32 total_items;
+	u32 cur_index;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!farray || !item);
+
+	SSDFS_DBG("array %p, item_index %u\n",
+		  farray, item_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	down_write(&farray->lock);
+
+	total_items = farray->items_count;
+
+	if (item_index >= U32_MAX || item_index > total_items) {
+		err = ssdfs_fingerprint_array_find_nolock(farray,
+							  &item->hash,
+							  &item_index);
+		if (err == -ENOENT) {
+			err = 0;
+			/* expected state */
+		} else if (err == -EEXIST) {
+			SSDFS_ERR("item exists for requested fingerprint\n");
+			goto finish_add_fingerprint;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find position for fingerprint: "
+				  "err %d\n", err);
+			goto finish_add_fingerprint;
+		} else {
+			err = -ERANGE;
+			SSDFS_ERR("unexpected result of position search\n");
+			goto finish_add_fingerprint;
+		}
+	}
+
+	if (item_index > total_items) {
+		err = -ERANGE;
+		SSDFS_ERR("item_index %u > total_items %u\n",
+			  item_index, total_items);
+		goto finish_add_fingerprint;
+	}
+
+	if (total_items == 0) {
+		err = ssdfs_dynamic_array_set(&farray->array, item_index, item);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to add fingerprint: "
+				  "item_index %u, err %d\n",
+				  item_index, err);
+			goto finish_add_fingerprint;
+		}
+	} else if (item_index == total_items) {
+		cur_index = item_index - 1;
+
+		err = ssdfs_fingerprint_array_get_nolock(farray,
+							 cur_index,
+							 &existing);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to get previous item: "
+				  "index %u, err %d\n",
+				  cur_index, err);
+			goto finish_add_fingerprint;
+		}
+
+		err = ssdfs_check_fingerprint_item(&existing.hash, item);
+		if (err == -ENOENT) {
+			err = 0;
+			/*
+			 * Continue logic
+			 */
+		} else if (unlikely(err)) {
+			SSDFS_ERR("corrupted fingerprints' sequence: "
+				  "index1 %u, index2 %u, err %d\n",
+				  cur_index, item_index, err);
+			goto finish_add_fingerprint;
+		} else
+			BUG();
+
+		err = ssdfs_dynamic_array_set(&farray->array, item_index, item);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to add fingerprint: "
+				  "item_index %u, err %d\n",
+				  item_index, err);
+			goto finish_add_fingerprint;
+		}
+	} else {
+		cur_index = item_index;
+
+		err = ssdfs_fingerprint_array_get_nolock(farray,
+							 cur_index,
+							 &existing);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to get previous item: "
+				  "index %u, err %d\n",
+				  cur_index, err);
+			goto finish_add_fingerprint;
+		}
+
+		err = ssdfs_check_fingerprint_item(&item->hash, &existing);
+		if (err == -ENOENT) {
+			err = 0;
+			/*
+			 * Continue logic
+			 */
+		} else if (unlikely(err)) {
+			SSDFS_ERR("corrupted fingerprints' sequence: "
+				  "index1 %u, index2 %u, err %d\n",
+				  cur_index, item_index, err);
+			goto finish_add_fingerprint;
+		} else
+			BUG();
+
+		cur_index = item_index - 1;
+
+		err = ssdfs_fingerprint_array_get_nolock(farray,
+							 cur_index,
+							 &existing);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to get previous item: "
+				  "index %u, err %d\n",
+				  cur_index, err);
+			goto finish_add_fingerprint;
+		}
+
+		err = ssdfs_check_fingerprint_item(&existing.hash, item);
+		if (err == -ENOENT) {
+			err = 0;
+			/*
+			 * Continue logic
+			 */
+		} else if (unlikely(err)) {
+			SSDFS_ERR("corrupted fingerprints' sequence: "
+				  "index1 %u, index2 %u, err %d\n",
+				  cur_index, item_index, err);
+			goto finish_add_fingerprint;
+		} else
+			BUG();
+
+		err = ssdfs_dynamic_array_shift_content_right(&farray->array,
+							      item_index, 1);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to shift range: "
+				  "index %u, err %d\n",
+				  item_index, err);
+			goto finish_add_fingerprint;
+		}
+
+		err = ssdfs_dynamic_array_set(&farray->array, item_index, item);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to add fingerprint: "
+				  "item_index %u, err %d\n",
+				  item_index, err);
+			goto finish_add_fingerprint;
+		}
+	}
+
+	farray->items_count++;
+
+finish_add_fingerprint:
+	up_write(&farray->lock);
+
+	return err;
+}
diff --git a/fs/ssdfs/fingerprint_array.h b/fs/ssdfs/fingerprint_array.h
new file mode 100644
index 000000000000..ba70977d5a8e
--- /dev/null
+++ b/fs/ssdfs/fingerprint_array.h
@@ -0,0 +1,82 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/fingerprint_array.h - fingerprint array's declarations.
+ *
+ * Copyright (c) 2023-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_FINGERPRINT_ARRAY_H
+#define _SSDFS_FINGERPRINT_ARRAY_H
+
+#include "dynamic_array.h"
+#include "fingerprint.h"
+
+/*
+ * struct ssdfs_fingerprint_item - fingerprint item
+ * @hash: fingerprint hash
+ * @logical_blk: logical block ID
+ * @blk_desc: logical block descriptor
+ */
+struct ssdfs_fingerprint_item {
+	struct ssdfs_fingerprint hash;
+	u32 logical_blk;
+	struct ssdfs_block_descriptor blk_desc;
+};
+
+/*
+ * struct ssdfs_fingerprint_pair - item + index pair
+ * @item: fingerprint item
+ * @item_index: item index in array
+ */
+struct ssdfs_fingerprint_pair {
+	struct ssdfs_fingerprint_item item;
+	u32 item_index;
+};
+
+/*
+ * struct ssdfs_fingerprint_array - fingerprint array
+ * @state: state of fingerprint array
+ * @lock: array lock
+ * @items_count: items count in array
+ * @array: array of fingerprints
+ */
+struct ssdfs_fingerprint_array {
+	atomic_t state;
+	struct rw_semaphore lock;
+	u32 items_count;
+	struct ssdfs_dynamic_array array;
+};
+
+/* Fingeprint array states */
+enum {
+	SSDFS_FINGERPRINT_ARRAY_UNKNOWN_STATE,
+	SSDFS_FINGERPRINT_ARRAY_CREATED,
+	SSDFS_FINGERPRINT_ARRAY_STATE_MAX
+};
+
+/*
+ * Fingerprint array's API
+ */
+int ssdfs_fingerprint_array_create(struct ssdfs_fingerprint_array *array,
+				   u32 capacity);
+void ssdfs_fingerprint_array_destroy(struct ssdfs_fingerprint_array *array);
+int ssdfs_check_fingerprint_item(struct ssdfs_fingerprint *hash,
+				 struct ssdfs_fingerprint_item *item);
+int ssdfs_fingerprint_array_find(struct ssdfs_fingerprint_array *array,
+				 struct ssdfs_fingerprint *hash,
+				 u32 *item_index);
+int ssdfs_fingerprint_array_get(struct ssdfs_fingerprint_array *array,
+				u32 item_index,
+				struct ssdfs_fingerprint_item *item);
+int ssdfs_fingerprint_array_add(struct ssdfs_fingerprint_array *array,
+				struct ssdfs_fingerprint_item *item,
+				u32 item_index);
+
+#endif /* _SSDFS_FINGERPRINT_ARRAY_H */
diff --git a/fs/ssdfs/peb_deduplication.c b/fs/ssdfs/peb_deduplication.c
new file mode 100644
index 000000000000..1e755e96538d
--- /dev/null
+++ b/fs/ssdfs/peb_deduplication.c
@@ -0,0 +1,483 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/peb_deduplication.c - PEB-based deduplication logic.
+ *
+ * Copyright (c) 2023-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#include <linux/pagemap.h>
+#include <linux/slab.h>
+#include <linux/pagevec.h>
+#include <crypto/hash.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "folio_vector.h"
+#include "ssdfs.h"
+#include "folio_array.h"
+#include "request_queue.h"
+#include "peb.h"
+#include "offset_translation_table.h"
+#include "peb_container.h"
+#include "segment_bitmap.h"
+#include "segment.h"
+#include "fingerprint.h"
+#include "fingerprint_array.h"
+
+/*
+ * ssdfs_calculate_fingerprint() - calculate block's fingerprint
+ * @pebi: pointer on PEB object
+ * @req: I/O request
+ * @hash: calculated fingerprint hash [out]
+ *
+ * This method tries to calculate block's fingerprint.
+ */
+static
+int ssdfs_calculate_fingerprint(struct ssdfs_peb_info *pebi,
+				struct ssdfs_segment_request *req,
+				struct ssdfs_fingerprint *hash)
+{
+	struct ssdfs_fs_info *fsi;
+	struct ssdfs_request_content_block *block;
+	struct ssdfs_content_block *blk_state;
+	SHASH_DESC_ON_STACK(shash, pebi->dedup.shash_tfm);
+	u32 rest_bytes;
+	u32 start_blk = 0;
+	u32 num_blks = 0;
+	u32 processed_bytes = 0;
+	int i, j, k;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!req || !hash);
+	BUG_ON(req->place.len >= U16_MAX);
+	BUG_ON(req->result.processed_blks > req->place.len);
+	BUG_ON(!is_ssdfs_peb_container_locked(pebi->pebc));
+
+	SSDFS_DBG("ino %llu, seg_id %llu, logical_offset %llu, "
+		  "processed_blks %d, logical_block %u, data_bytes %u, "
+		  "cno %llu, parent_snapshot %llu, cmd %#x, type %#x\n",
+		  req->extent.ino, req->place.start.seg_id,
+		  req->extent.logical_offset, req->result.processed_blks,
+		  req->place.start.blk_index,
+		  req->extent.data_bytes, req->extent.cno,
+		  req->extent.parent_snapshot,
+		  req->private.cmd, req->private.type);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	fsi = pebi->pebc->parent_si->fsi;
+
+	shash->tfm = pebi->dedup.shash_tfm;
+	crypto_shash_init(shash);
+
+	rest_bytes = ssdfs_request_rest_bytes(pebi, req);
+
+	start_blk = req->result.processed_blks;
+	rest_bytes = min_t(u32, rest_bytes, fsi->pagesize);
+	num_blks = rest_bytes + fsi->pagesize - 1;
+	num_blks >>= fsi->log_pagesize;
+
+	for (i = 0; i < num_blks; i++) {
+		struct folio *folio;
+		void *kaddr;
+		int blk_index = i + start_blk;
+		u32 portion_size;
+
+		if (blk_index >= req->result.content.count)
+			break;
+
+		block = &req->result.content.blocks[blk_index];
+		blk_state = &block->new_state;
+
+		for (j = 0; j < folio_batch_count(&blk_state->batch); j++) {
+			u32 mem_pages_per_folio;
+
+			folio = blk_state->batch.folios[j];
+
+#ifdef CONFIG_SSDFS_DEBUG
+			BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			mem_pages_per_folio = folio_size(folio) >> PAGE_SHIFT;
+
+			for (k = 0; k < mem_pages_per_folio; k++) {
+				portion_size = min_t(u32, PAGE_SIZE,
+						  rest_bytes - processed_bytes);
+
+				kaddr = kmap_local_folio(folio, k * PAGE_SIZE);
+				crypto_shash_update(shash, kaddr, portion_size);
+				kunmap_local(kaddr);
+
+				processed_bytes += portion_size;
+			}
+		}
+	}
+
+	crypto_shash_final(shash, hash->buf);
+
+	hash->type = SSDFS_DEFAULT_FINGERPRINT_TYPE();
+	hash->len = SSDFS_DEFAULT_FINGERPRINT_LENGTH();
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("HASH (type %#x, len %u)\n",
+		  hash->type, hash->len);
+
+	SSDFS_DBG("FINGERPRINT DUMP:\n");
+	print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+			     hash->buf,
+			     SSDFS_FINGERPRINT_LENGTH_MAX);
+	SSDFS_DBG("\n");
+
+	BUG_ON(!IS_FINGERPRINT_VALID(hash));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+/*
+ * is_data_size_fingerprint_ready() - is data size bigger than page size
+ * @pebi: pointer on PEB object
+ * @req: I/O request
+ */
+static inline
+bool is_data_size_fingerprint_ready(struct ssdfs_peb_info *pebi,
+				    struct ssdfs_segment_request *req)
+{
+	u32 pagesize;
+	u32 logical_block;
+	u32 processed_blks;
+	u64 rest_bytes;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pebi || !pebi->pebc || !req);
+	BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi);
+	BUG_ON(req->place.len >= U16_MAX);
+	BUG_ON(req->result.processed_blks > req->place.len);
+	BUG_ON(!is_ssdfs_peb_container_locked(pebi->pebc));
+
+	SSDFS_DBG("ino %llu, seg %llu, peb %llu, "
+		  "peb_index %u, logical_offset %llu, "
+		  "processed_blks %d, logical_block %u, data_bytes %u, "
+		  "cno %llu, parent_snapshot %llu, cmd %#x, type %#x\n",
+		  req->extent.ino, req->place.start.seg_id,
+		  pebi->peb_id, pebi->peb_index,
+		  req->extent.logical_offset, req->result.processed_blks,
+		  req->place.start.blk_index,
+		  req->extent.data_bytes, req->extent.cno,
+		  req->extent.parent_snapshot,
+		  req->private.cmd, req->private.type);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	pagesize = pebi->pebc->parent_si->fsi->pagesize;
+
+	processed_blks = req->result.processed_blks;
+	logical_block = req->place.start.blk_index + processed_blks;
+
+	rest_bytes = processed_blks * pagesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(rest_bytes >= req->extent.data_bytes);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	rest_bytes = req->extent.data_bytes - rest_bytes;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("data_size %llu, pagesize %u\n",
+		  rest_bytes, pagesize);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (rest_bytes < PAGE_SIZE)
+		return false;
+
+	return true;
+}
+
+/*
+ * is_ssdfs_block_duplicated() - check that block is duplicated
+ * @pebi: pointer on PEB object
+ * @req: I/O request
+ * @pair: fingerprint pair [out]
+ *
+ * This method tries to check that block is duplicated.
+ * Pre-allocated blocks will be ignored.
+ */
+bool is_ssdfs_block_duplicated(struct ssdfs_peb_info *pebi,
+				struct ssdfs_segment_request *req,
+				struct ssdfs_fingerprint_pair *pair)
+{
+	u32 logical_block;
+	u32 processed_blks;
+	bool is_duplicated = false;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pebi || !pebi->pebc || !req || !pair);
+	BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi);
+	BUG_ON(req->place.len >= U16_MAX);
+	BUG_ON(req->result.processed_blks > req->place.len);
+	BUG_ON(!is_ssdfs_peb_container_locked(pebi->pebc));
+
+	SSDFS_DBG("ino %llu, seg %llu, peb %llu, "
+		  "peb_index %u, logical_offset %llu, "
+		  "processed_blks %d, logical_block %u, data_bytes %u, "
+		  "cno %llu, parent_snapshot %llu, cmd %#x, type %#x\n",
+		  req->extent.ino, req->place.start.seg_id,
+		  pebi->peb_id, pebi->peb_index,
+		  req->extent.logical_offset, req->result.processed_blks,
+		  req->place.start.blk_index,
+		  req->extent.data_bytes, req->extent.cno,
+		  req->extent.parent_snapshot,
+		  req->private.cmd, req->private.type);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!is_ssdfs_peb_containing_user_data(pebi->pebc)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("ignore metadata\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	if (!is_data_size_fingerprint_ready(pebi, req)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("data is too small for fingerprint calculation\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	processed_blks = req->result.processed_blks;
+	logical_block = req->place.start.blk_index + processed_blks;
+
+	memset(&pair->item.hash, 0, sizeof(struct ssdfs_fingerprint));
+	pair->item.logical_blk = logical_block;
+	SSDFS_BLK_DESC_INIT(&pair->item.blk_desc);
+
+	err = ssdfs_calculate_fingerprint(pebi, req, &pair->item.hash);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to calculate fingerprint: "
+			  "logical_block %u, err %d\n",
+			  logical_block, err);
+		return false;
+	}
+
+	err = ssdfs_fingerprint_array_find(&pebi->dedup.fingerprints,
+					   &pair->item.hash,
+					   &pair->item_index);
+	if (err == -ENOENT) {
+		is_duplicated = false;
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to find fingerprint: "
+			  "logical_block %u, err %d\n",
+			  logical_block, err);
+		return false;
+	} else {
+		is_duplicated = true;
+	}
+
+	return is_duplicated;
+}
+
+/*
+ * should_ssdfs_save_fingerprint() - should fingerprint to be saved
+ * @pebi: pointer on PEB object
+ * @req: I/O request
+ */
+bool should_ssdfs_save_fingerprint(struct ssdfs_peb_info *pebi,
+				   struct ssdfs_segment_request *req)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pebi || !pebi->pebc || !req);
+	BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi);
+	BUG_ON(req->place.len >= U16_MAX);
+	BUG_ON(req->result.processed_blks > req->place.len);
+
+	SSDFS_DBG("ino %llu, seg %llu, peb %llu, "
+		  "peb_index %u, logical_offset %llu, "
+		  "processed_blks %d, logical_block %u, data_bytes %u, "
+		  "cno %llu, parent_snapshot %llu, cmd %#x, type %#x\n",
+		  req->extent.ino, req->place.start.seg_id,
+		  pebi->peb_id, pebi->peb_index,
+		  req->extent.logical_offset, req->result.processed_blks,
+		  req->place.start.blk_index,
+		  req->extent.data_bytes, req->extent.cno,
+		  req->extent.parent_snapshot,
+		  req->private.cmd, req->private.type);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!is_ssdfs_peb_containing_user_data(pebi->pebc)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("ignore metadata\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	if (!is_data_size_fingerprint_ready(pebi, req)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("data is too small for fingerprint calculation\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * ssdfs_peb_deduplicate_logical_block() - deduplicate a logical block
+ * @pebi: pointer on PEB object
+ * @req: I/O request
+ * @pair: fingerprint pair
+ * @blk_desc: pointer on buffer for block descriptor [out]
+ *
+ * This method tries to deduplicate the logical block.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input.
+ * %-ERANGE     - internal error.
+ */
+int ssdfs_peb_deduplicate_logical_block(struct ssdfs_peb_info *pebi,
+					struct ssdfs_segment_request *req,
+					struct ssdfs_fingerprint_pair *pair,
+					struct ssdfs_block_descriptor *blk_desc)
+{
+	struct ssdfs_fingerprint_item item;
+	size_t blk_desc_size = sizeof(struct ssdfs_block_descriptor);
+	int res;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pebi || !pebi->pebc || !req);
+	BUG_ON(!pair || !blk_desc);
+	BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi);
+	BUG_ON(req->place.len >= U16_MAX);
+	BUG_ON(req->result.processed_blks > req->place.len);
+	BUG_ON(!is_ssdfs_peb_container_locked(pebi->pebc));
+
+	SSDFS_DBG("ino %llu, seg %llu, peb %llu, "
+		  "peb_index %u, logical_offset %llu, "
+		  "processed_blks %d, logical_block %u, data_bytes %u, "
+		  "cno %llu, parent_snapshot %llu, cmd %#x, type %#x\n",
+		  req->extent.ino, req->place.start.seg_id,
+		  pebi->peb_id, pebi->peb_index,
+		  req->extent.logical_offset, req->result.processed_blks,
+		  req->place.start.blk_index,
+		  req->extent.data_bytes, req->extent.cno,
+		  req->extent.parent_snapshot,
+		  req->private.cmd, req->private.type);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (pair->item.logical_blk >= U32_MAX) {
+		SSDFS_ERR("invalid logical block\n");
+		return -EINVAL;
+	}
+
+	if (!IS_FINGERPRINT_VALID(&pair->item.hash)) {
+		SSDFS_ERR("invalid hash: logical block %u\n",
+			  pair->item.logical_blk);
+		return -EINVAL;
+	}
+
+	err = ssdfs_fingerprint_array_get(&pebi->dedup.fingerprints,
+					  pair->item_index, &item);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to get fingerprint item: "
+			  "item_index %u, err %d\n",
+			  pair->item_index, err);
+		return err;
+	}
+
+	res = ssdfs_check_fingerprint_item(&pair->item.hash, &item);
+	if (res == -EEXIST) {
+		/*
+		 * fingerprints are identical
+		 */
+	} else {
+		SSDFS_ERR("fingerprints are different\n");
+		return -ERANGE;
+	}
+
+	ssdfs_memcpy(blk_desc, 0, blk_desc_size,
+		     &item.blk_desc, 0, blk_desc_size,
+		     blk_desc_size);
+
+	return 0;
+}
+
+/*
+ * ssdfs_peb_save_fingerprint() - save new fingerprint into array
+ * @pebi: pointer on PEB object
+ * @req: I/O request
+ * @blk_desc: block descriptor of logical block
+ * @pair: fingerprint pair
+ *
+ * This method tries to store the new fingerprint of logical
+ * block into the fingerprint array.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-EINVAL     - invalid input.
+ * %-ERANGE     - internal error.
+ */
+int ssdfs_peb_save_fingerprint(struct ssdfs_peb_info *pebi,
+				struct ssdfs_segment_request *req,
+				struct ssdfs_block_descriptor *blk_desc,
+				struct ssdfs_fingerprint_pair *pair)
+{
+	size_t blk_desc_size = sizeof(struct ssdfs_block_descriptor);
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!pebi || !pebi->pebc || !req || !blk_desc || !pair);
+	BUG_ON(!pebi->pebc->parent_si || !pebi->pebc->parent_si->fsi);
+	BUG_ON(req->place.len >= U16_MAX);
+	BUG_ON(req->result.processed_blks > req->place.len);
+	BUG_ON(!is_ssdfs_peb_container_locked(pebi->pebc));
+
+	SSDFS_DBG("ino %llu, seg %llu, peb %llu, "
+		  "peb_index %u, logical_offset %llu, "
+		  "processed_blks %d, logical_block %u, data_bytes %u, "
+		  "cno %llu, parent_snapshot %llu, cmd %#x, type %#x\n",
+		  req->extent.ino, req->place.start.seg_id,
+		  pebi->peb_id, pebi->peb_index,
+		  req->extent.logical_offset, req->result.processed_blks,
+		  req->place.start.blk_index,
+		  req->extent.data_bytes, req->extent.cno,
+		  req->extent.parent_snapshot,
+		  req->private.cmd, req->private.type);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (pair->item.logical_blk >= U32_MAX) {
+		SSDFS_ERR("invalid logical block\n");
+		return -EINVAL;
+	}
+
+	if (!IS_FINGERPRINT_VALID(&pair->item.hash)) {
+		SSDFS_ERR("invalid hash: logical block %u\n",
+			  pair->item.logical_blk);
+		return -EINVAL;
+	}
+
+	ssdfs_memcpy(&pair->item.blk_desc, 0, blk_desc_size,
+		     blk_desc, 0, blk_desc_size,
+		     blk_desc_size);
+
+	err = ssdfs_fingerprint_array_add(&pebi->dedup.fingerprints, &pair->item,
+					  pair->item_index);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to add fingerprint item: "
+			  "logical_block %u, item_index %u, err %d\n",
+			  pair->item.logical_blk, pair->item_index, err);
+		return err;
+	}
+
+	return 0;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 62/79] ssdfs: introduce shared dictionary b-tree
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (25 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 61/79] ssdfs: introduce PEB-based deduplication technique Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 65/79] ssdfs: introduce snapshots b-tree Viacheslav Dubeyko
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

SSDFS dentry is the metadata structure of fixed size (32 bytes).
It contains inode ID, name hash, name length, and inline string
for 12 symbols. Generally speaking, the dentry is able to store
8.3 filename inline. If the name of file/folder has longer name
(more than 12 symbols) then the dentry will keep only the portion of
the name but the whole name will be stored into a shared dictionary.
The goal of such approach is to represent the dentry by compact
metadata structure of fixed size for the fast and efficient operations
with the dentries. It is possible to point out that there are a lot of
use-cases when the length of file or folder is not very long. As a result,
dentry’s inline string could be only storage for the file/folder name.
Moreover, the goal of shared dictionary is to store the long names
efficiently by means of using the deduplication mechanism.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/shared_dictionary.h | 204 +++++++++++++++++++++++++++++++++++
 1 file changed, 204 insertions(+)
 create mode 100644 fs/ssdfs/shared_dictionary.h

diff --git a/fs/ssdfs/shared_dictionary.h b/fs/ssdfs/shared_dictionary.h
new file mode 100644
index 000000000000..3bd18a0682cb
--- /dev/null
+++ b/fs/ssdfs/shared_dictionary.h
@@ -0,0 +1,204 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/shared_dictionary.h - shared dictionary btree declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_SHARED_DICTIONARY_TREE_H
+#define _SSDFS_SHARED_DICTIONARY_TREE_H
+
+/*
+ * struct ssdfs_names_queue - names queue descriptor
+ * @lock: names queue's lock
+ * @list: names queue's list
+ */
+struct ssdfs_names_queue {
+	spinlock_t lock;
+	struct list_head list;
+};
+
+/*
+ * struct ssdfs_name_descriptor - name descriptor
+ * @hash: name hash
+ * @len: name length
+ * @str_buf: name string
+ */
+struct ssdfs_name_descriptor {
+	u64 hash;
+	size_t len;
+	unsigned char str_buf[SSDFS_MAX_NAME_LEN];
+};
+
+/*
+ * struct ssdfs_name_info - name info
+ * @list: names queue list
+ * @type: operation type
+ * @desc.name: name descriptor
+ * @index: node's index
+ */
+struct ssdfs_name_info {
+	struct list_head list;
+	int type;
+
+	union {
+		struct ssdfs_name_descriptor name;
+		struct ssdfs_btree_index index;
+	} desc;
+};
+
+/* Possible type of operations with name */
+enum {
+	SSDFS_NAME_UNKNOWN_OP,
+	SSDFS_INIT_SHDICT_NODE,
+	SSDFS_NAME_ADD,
+	SSDFS_NAME_CHANGE,
+	SSDFS_NAME_DELETE,
+	SSDFS_NAME_OP_MAX
+};
+
+/*
+ * struct ssdfs_name_requests_queue - name requests queue
+ * @queue: name requests queue object
+ * @thread: descriptor of queue's thread
+ */
+struct ssdfs_name_requests_queue {
+	struct ssdfs_names_queue queue;
+	struct ssdfs_thread_info thread;
+};
+
+/*
+ * struct ssdfs_shared_dict_btree_info - shared dictionary btree info
+ * @state: shared dictionary btree state
+ * @lock: shared dictionary tree's lock
+ * @generic_tree: generic btree description
+ * @read_reqs: current count of read requests
+ * @requests: name requests queue
+ * @wait_queue: wait queue of shared dictionary tree's thread
+ * @fsi: pointer on shared file system object
+ */
+struct ssdfs_shared_dict_btree_info {
+	atomic_t state;
+	struct rw_semaphore lock;
+	struct ssdfs_btree generic_tree;
+
+	atomic_t read_reqs;
+
+	struct ssdfs_name_requests_queue requests;
+	wait_queue_head_t wait_queue;
+
+	struct ssdfs_fs_info *fsi;
+};
+
+/* Shared dictionary tree states */
+enum {
+	SSDFS_SHDICT_BTREE_UNKNOWN_STATE,
+	SSDFS_SHDICT_BTREE_CREATED,
+	SSDFS_SHDICT_BTREE_UNDER_INIT,
+	SSDFS_SHDICT_BTREE_INITIALIZED,
+	SSDFS_SHDICT_BTREE_CORRUPTED,
+	SSDFS_SHDICT_BTREE_STATE_MAX
+};
+
+#define SSDFS_HTBL_DESC(ptr) \
+	((struct ssdfs_shdict_htbl_item *)(ptr))
+#define SSDFS_LTBL2_DESC(ptr) \
+	((struct ssdfs_shdict_ltbl2_item *)(ptr))
+#define SSDFS_LTBL1_DESC(ptr) \
+	((struct ssdfs_shdict_ltbl1_item *)(ptr))
+#define SSDFS_SEARCH_KEY(ptr) \
+	((union ssdfs_shdict_search_key *)(ptr))
+
+/*
+ * Shared dictionary tree API
+ */
+int ssdfs_shared_dict_btree_create(struct ssdfs_fs_info *fsi);
+int ssdfs_shared_dict_btree_init(struct ssdfs_fs_info *fsi);
+void ssdfs_shared_dict_btree_destroy(struct ssdfs_fs_info *fsi);
+int ssdfs_shared_dict_btree_flush(struct ssdfs_shared_dict_btree_info *tree);
+
+int ssdfs_shared_dict_get_name(struct ssdfs_shared_dict_btree_info *tree,
+				u64 hash,
+				struct ssdfs_name_string *name);
+int ssdfs_shared_dict_save_name(struct ssdfs_shared_dict_btree_info *tree,
+				u64 hash,
+				const struct qstr *str);
+
+/*
+ * Shared dictionary tree internal API
+ */
+int ssdfs_shared_dict_tree_find(struct ssdfs_shared_dict_btree_info *tree,
+				u64 name_hash,
+				struct ssdfs_btree_search *search);
+int ssdfs_shared_dict_tree_add(struct ssdfs_shared_dict_btree_info *tree,
+				u64 name_hash,
+				const char *name, size_t len,
+				struct ssdfs_btree_search *search);
+int ssdfs_shared_dict_btree_node_left_items(struct ssdfs_btree_node *node,
+					    u32 threshold,
+					    u16 *free_space);
+int ssdfs_shared_dict_btree_node_right_items(struct ssdfs_btree_node *node,
+					     u32 threshold,
+					     u16 *free_space);
+
+int ssdfs_shared_dict_start_thread(struct ssdfs_shared_dict_btree_info *tree);
+int ssdfs_shared_dict_stop_thread(struct ssdfs_shared_dict_btree_info *tree);
+
+/*
+ * Name info's API
+ */
+void ssdfs_zero_name_info_cache_ptr(void);
+int ssdfs_init_name_info_cache(void);
+void ssdfs_destroy_name_info_cache(void);
+struct ssdfs_name_info *ssdfs_name_info_alloc(void);
+void ssdfs_name_info_free(struct ssdfs_name_info *ni);
+void ssdfs_name_info_init(int type, u64 hash,
+			  const unsigned char *str,
+			  const size_t len,
+			  struct ssdfs_name_info *ni);
+void ssdfs_node_index_init(int type, struct ssdfs_btree_index *index,
+			   struct ssdfs_name_info *ni);
+
+/*
+ * Names queue API
+ */
+void ssdfs_names_queue_init(struct ssdfs_names_queue *nq);
+bool is_ssdfs_names_queue_empty(struct ssdfs_names_queue *nq);
+bool has_queue_unprocessed_names(struct ssdfs_shared_dict_btree_info *tree);
+void ssdfs_names_queue_add_tail(struct ssdfs_names_queue *nq,
+				struct ssdfs_name_info *ni);
+void ssdfs_names_queue_add_head(struct ssdfs_names_queue *nq,
+				struct ssdfs_name_info *ni);
+int ssdfs_names_queue_remove_first(struct ssdfs_names_queue *nq,
+				   struct ssdfs_name_info **ni);
+void ssdfs_names_queue_remove_all(struct ssdfs_names_queue *nq);
+
+void ssdfs_debug_shdict_btree_object(struct ssdfs_shared_dict_btree_info *tree);
+void ssdfs_debug_btree_search_result_name(struct ssdfs_btree_search *search);
+
+/*
+ * Shared dictionary btree specialized operations
+ */
+extern const struct ssdfs_btree_descriptor_operations
+					ssdfs_shared_dict_btree_desc_ops;
+extern const struct ssdfs_btree_operations
+					ssdfs_shared_dict_btree_ops;
+extern const struct ssdfs_btree_node_operations
+					ssdfs_shared_dict_btree_node_ops;
+
+#endif /* _SSDFS_SHARED_DICTIONARY_TREE_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 65/79] ssdfs: introduce snapshots b-tree
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (26 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 62/79] ssdfs: introduce shared dictionary b-tree Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 67/79] ssdfs: implement extended attributes support Viacheslav Dubeyko
                   ` (4 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

SSDFS file system driver stores any payload in the form
of logs. Every log starts by header, continues by payload,
and could be ended by footer. The every header and
footer includes the timestamp of the log's creation.
It means that every log can be imagined like a checkpoint
of file system volume's evolution. By default, if no snapshot
has been created, SSDFS file system logic executes an erase
operation of an erase block upon its complete invalidation
(all valid data has been moved into another erase block).
However, it is possible to create a snapshot(s) with the goal
of protecting the previous state of user data or metadata
in invalidated erase blocks. Snapshot is simple timestamp
that needs to be stored into snapshots b-tree. In the case
of presence of snapshots in the b-tree, SSDFS logic compares
the starting and ending timestamps/checkponts of particular
erase block with snapshot timestamps in the b-tree. If any
timestamp/checkpoint of particular erase block is protected
by snapshot, then this invalidated erase block is excluded
from erase operation with the goal to keep the snapshoted
state of the user data or metadata.

SSDFS provides the opportunity to create a snapshot rule(s).
It means that it is possible to define a frequency of
snapshots creation (for example, every minute, hour, day, etc).
Also, it is possible to define how soon the snapshot could
expire (for example, after hour, day, week, month). Finally,
it means that SSDFS logic is capable to create snapshots and
delete the expired ones on automated basis.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/snapshot.h       | 283 ++++++++++++++++++++++++++++++++++++++
 fs/ssdfs/snapshot_rules.h |  55 ++++++++
 fs/ssdfs/snapshots_tree.h | 248 +++++++++++++++++++++++++++++++++
 3 files changed, 586 insertions(+)
 create mode 100644 fs/ssdfs/snapshot.h
 create mode 100644 fs/ssdfs/snapshot_rules.h
 create mode 100644 fs/ssdfs/snapshots_tree.h

diff --git a/fs/ssdfs/snapshot.h b/fs/ssdfs/snapshot.h
new file mode 100644
index 000000000000..b10d73b98e6e
--- /dev/null
+++ b/fs/ssdfs/snapshot.h
@@ -0,0 +1,283 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/snapshot.h - snapshot's declarations.
+ *
+ * Copyright (c) 2021-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_SNAPSHOT_H
+#define _SSDFS_SNAPSHOT_H
+
+/*
+ * struct ssdfs_time_range - time range definition
+ * @minute: minute of the time range
+ * @hour: hour of the time range
+ * @day: day of the time range
+ * @month: month of the time range
+ * @year: year of the time range
+ */
+struct ssdfs_time_range {
+	u32 minute;
+	u32 hour;
+	u32 day;
+	u32 month;
+	u32 year;
+};
+
+#define SSDFS_ANY_MINUTE			U32_MAX
+#define SSDFS_ANY_HOUR				U32_MAX
+#define SSDFS_ANY_DAY				U32_MAX
+#define SSDFS_ANY_MONTH				U32_MAX
+#define SSDFS_ANY_YEAR				U32_MAX
+
+/*
+ * struct ssdfs_snapshot_details - snapshot details
+ * @create_time: snapshot's timestamp
+ * @cno: snapshot's checkpoint
+ * @mode: snapshot mode (READ-ONLY|READ-WRITE)
+ * @expiration: snapshot expiration time (WEEK|MONTH|YEAR|NEVER)
+ * @ino: target/root object's inode ID
+ * @uuid: snapshot's UUID
+ */
+struct ssdfs_snapshot_details {
+	u64 create_time;
+	u64 cno;
+	u32 mode;
+	u32 expiration;
+	u64 ino;
+	u8 uuid[SSDFS_UUID_SIZE];
+};
+
+/*
+ * struct ssdfs_snapshot_rule_details - snapshot rule details
+ * @mode: snapshot mode (READ-ONLY|READ-WRITE)
+ * @type: snapshot type (PERIODIC|ONE-TIME)
+ * @expiration: snapshot expiration time (WEEK|MONTH|YEAR|NEVER)
+ * @frequency: taking snapshot frequency (FSYNC|HOUR|DAY|WEEK)
+ * @snapshots_threshold: max number of simultaneously available snapshots
+ * @snapshots_number: currently created number of snapshots
+ * @last_snapshot_cno: checkpoint of last snapshot
+ * @ino: target/root object's inode ID
+ * @uuid: snapshot's UUID
+ */
+struct ssdfs_snapshot_rule_details {
+	u32 mode;
+	u32 type;
+	u32 expiration;
+	u32 frequency;
+	u32 snapshots_threshold;
+	u32 snapshots_number;
+	u64 last_snapshot_cno;
+	u64 ino;
+	u8 uuid[SSDFS_UUID_SIZE];
+};
+
+/*
+ * struct ssdfs_snapshot_info - snapshot details
+ * @name: snapshot name
+ * @uuid: snapshot UUID
+ * @mode: snapshot mode (READ-ONLY|READ-WRITE)
+ * @type: snapshot type (PERIODIC|ONE-TIME)
+ * @expiration: snapshot expiration time (WEEK|MONTH|YEAR|NEVER)
+ * @frequency: taking snapshot frequency (FSYNC|HOUR|DAY|WEEK)
+ * @snapshots_threshold: max number of simultaneously available snapshots
+ * @time_range: time range to select/modify/delete snapshots
+ * @buf: buffer to share the snapshot details
+ * @buf_size: size of buffer in bytes
+ */
+struct ssdfs_snapshot_info {
+	char name[SSDFS_MAX_NAME_LEN];
+	u8 uuid[SSDFS_UUID_SIZE];
+
+	int mode;
+	int type;
+	int expiration;
+	int frequency;
+	u32 snapshots_threshold;
+	struct ssdfs_time_range time_range;
+
+	char __user *buf;
+	u64 buf_size;
+};
+
+/* Requested operation */
+enum {
+	SSDFS_UNKNOWN_OPERATION,
+	SSDFS_CREATE_SNAPSHOT,
+	SSDFS_LIST_SNAPSHOTS,
+	SSDFS_MODIFY_SNAPSHOT,
+	SSDFS_REMOVE_SNAPSHOT,
+	SSDFS_REMOVE_RANGE,
+	SSDFS_SHOW_SNAPSHOT_DETAILS,
+	SSDFS_LIST_SNAPSHOT_RULES,
+	SSDFS_OPERATION_TYPE_MAX
+};
+
+/*
+ * struct ssdfs_snapshot_request - snapshot request
+ * @list: snapshot requests queue list
+ * @operation: requested operation
+ * @ino: inode ID of object under snapshot
+ * @info: snapshot request's info
+ */
+struct ssdfs_snapshot_request {
+	struct list_head list;
+	int operation;
+	u64 ino;
+	struct ssdfs_snapshot_info info;
+};
+
+/*
+ * struct ssdfs_snapshot_rule_item - snapshot rule item
+ * @list: snapshot rules list
+ * @rule: snapshot rule's info
+ */
+struct ssdfs_snapshot_rule_item {
+	struct list_head list;
+	struct ssdfs_snapshot_rule_info rule;
+};
+
+/*
+ * struct ssdfs_timestamp_range - range of timestamps
+ * @start: starting timestamp
+ * @end: ending timestamp
+ */
+struct ssdfs_timestamp_range {
+	u64 start;
+	u64 end;
+};
+
+/*
+ * struct ssdfs_snapshot_id - snapshot ID
+ * @timestamp: snapshot timestamp
+ * @uuid: snapshot UUID (could be NULL)
+ * @name: snapshot name (could be NULL)
+ */
+struct ssdfs_snapshot_id {
+	u64 timestamp;
+	u8 *uuid;
+	char *name;
+};
+
+#define SSDFS_UNKNOWN_TIMESTAMP		(0)
+
+#define SSDFS_SNAP_DETAILS(ptr) \
+	((struct ssdfs_snapshot_details *)(ptr))
+#define SSDFS_SNAP_RULE_DETAILS(ptr) \
+	((struct ssdfs_snapshot_rule_details *)(ptr))
+
+struct ssdfs_snapshot_subsystem;
+struct ssdfs_fs_info;
+
+/*
+ * Inline functions
+ */
+
+static inline
+bool is_snapshot_rule_requested(struct ssdfs_snapshot_request *snr)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!snr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return snr->info.type == SSDFS_PERIODIC_SNAPSHOT;
+}
+
+static inline
+bool is_ssdfs_snapshot_mode_correct(int mode)
+{
+	switch (mode) {
+	case SSDFS_READ_ONLY_SNAPSHOT:
+	case SSDFS_READ_WRITE_SNAPSHOT:
+		return true;
+
+	default:
+		/* do nothing */
+		break;
+	}
+
+	return false;
+}
+
+static inline
+bool is_ssdfs_snapshot_type_correct(int type)
+{
+	switch (type) {
+	case SSDFS_ONE_TIME_SNAPSHOT:
+	case SSDFS_PERIODIC_SNAPSHOT:
+		return true;
+
+	default:
+		/* do nothing */
+		break;
+	}
+
+	return false;
+}
+
+static inline
+bool is_ssdfs_snapshot_expiration_correct(int expiration)
+{
+	switch (expiration) {
+	case SSDFS_EXPIRATION_IN_WEEK:
+	case SSDFS_EXPIRATION_IN_MONTH:
+	case SSDFS_EXPIRATION_IN_YEAR:
+	case SSDFS_NEVER_EXPIRED:
+		return true;
+
+	default:
+		/* do nothing */
+		break;
+	}
+
+	return false;
+}
+
+static inline
+bool is_ssdfs_snapshot_frequency_correct(int frequency)
+{
+	switch (frequency) {
+	case SSDFS_SYNCFS_FREQUENCY:
+	case SSDFS_HOUR_FREQUENCY:
+	case SSDFS_DAY_FREQUENCY:
+	case SSDFS_WEEK_FREQUENCY:
+	case SSDFS_MONTH_FREQUENCY:
+		return true;
+
+	default:
+		/* do nothing */
+		break;
+	}
+
+	return false;
+}
+
+/*
+ * is_uuids_identical() - check the UUIDs identity
+ * @uuid1: first UUID instance
+ * @uuid2: second UUID instance
+ */
+static inline
+bool is_uuids_identical(const u8 *uuid1, const u8 *uuid2)
+{
+	return memcmp(uuid1, uuid2, SSDFS_UUID_SIZE) == 0;
+}
+
+/*
+ * Snapshots subsystem's API
+ */
+int ssdfs_snapshot_subsystem_init(struct ssdfs_fs_info *fsi);
+int ssdfs_snapshot_subsystem_destroy(struct ssdfs_fs_info *fsi);
+
+int ssdfs_convert_time2timestamp_range(struct ssdfs_fs_info *fsi,
+					struct ssdfs_time_range *range1,
+					struct ssdfs_timestamp_range *range2);
+
+#endif /* _SSDFS_SNAPSHOT_H */
diff --git a/fs/ssdfs/snapshot_rules.h b/fs/ssdfs/snapshot_rules.h
new file mode 100644
index 000000000000..0ce0c214ec96
--- /dev/null
+++ b/fs/ssdfs/snapshot_rules.h
@@ -0,0 +1,55 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/snapshot_rules.h - snapshot rule declarations.
+ *
+ * Copyright (c) 2021-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_SNAPSHOT_RULES_H
+#define _SSDFS_SNAPSHOT_RULES_H
+
+/*
+ * struct ssdfs_snapshot_rules_list - snapshot rules list descriptor
+ * @lock: snapshot rules list's lock
+ * @list: snapshot rules list
+ */
+struct ssdfs_snapshot_rules_list {
+	spinlock_t lock;
+	struct list_head list;
+};
+
+/*
+ * Snapshot rules list API
+ */
+void ssdfs_snapshot_rules_list_init(struct ssdfs_snapshot_rules_list *rl);
+bool is_ssdfs_snapshot_rules_list_empty(struct ssdfs_snapshot_rules_list *rl);
+void ssdfs_snapshot_rules_list_add_tail(struct ssdfs_snapshot_rules_list *rl,
+					struct ssdfs_snapshot_rule_item *ri);
+void ssdfs_snapshot_rules_list_add_head(struct ssdfs_snapshot_rules_list *rl,
+					struct ssdfs_snapshot_rule_item *ri);
+void ssdfs_snapshot_rules_list_remove_all(struct ssdfs_snapshot_rules_list *rl);
+
+/*
+ * Snapshot rule's API
+ */
+struct ssdfs_snapshot_rule_item *ssdfs_snapshot_rule_alloc(void);
+void ssdfs_snapshot_rule_free(struct ssdfs_snapshot_rule_item *ri);
+
+struct folio *ssdfs_snapshot_rules_add_batch_folio(struct folio_batch *batch,
+						   unsigned int order);
+void ssdfs_snapshot_rules_folio_batch_release(struct folio_batch *batch);
+
+int ssdfs_process_snapshot_rules(struct ssdfs_fs_info *fsi);
+int ssdfs_modify_snapshot_rule(struct ssdfs_fs_info *fsi,
+				struct ssdfs_snapshot_request *snr);
+int ssdfs_remove_snapshot_rule(struct ssdfs_snapshot_subsystem *snapshots,
+				struct ssdfs_snapshot_request *snr);
+
+#endif /* _SSDFS_SNAPSHOT_RULES_H */
diff --git a/fs/ssdfs/snapshots_tree.h b/fs/ssdfs/snapshots_tree.h
new file mode 100644
index 000000000000..f3ce051f0eb9
--- /dev/null
+++ b/fs/ssdfs/snapshots_tree.h
@@ -0,0 +1,248 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/snapshots_tree.h - snapshots btree declarations.
+ *
+ * Copyright (c) 2021-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_SNAPSHOTS_TREE_H
+#define _SSDFS_SNAPSHOTS_TREE_H
+
+/*
+ * struct ssdfs_snapshots_btree_queue - snapshot requests queue
+ * @queue: snapshot requests queue object
+ * @thread: descriptor of queue's thread
+ */
+struct ssdfs_snapshots_btree_queue {
+	struct ssdfs_snapshot_reqs_queue queue;
+	struct ssdfs_thread_info thread;
+};
+
+/*
+ * struct ssdfs_snapshots_btree_info - snapshots btree info
+ * @state: snapshots btree state
+ * @lock: snapshots btree lock
+ * @generic_tree: generic btree description
+ * @snapshots_count: count of the snapshots in the whole tree
+ * @deleted_snapshots: current number of snapshot delete operations
+ * @requests: snapshot requests queue
+ * @wait_queue: wait queue of snapshots tree's thread
+ * @fsi: pointer on shared file system object
+ */
+struct ssdfs_snapshots_btree_info {
+	atomic_t state;
+	struct rw_semaphore lock;
+	struct ssdfs_btree generic_tree;
+
+	atomic64_t snapshots_count;
+	atomic64_t deleted_snapshots;
+
+	struct ssdfs_snapshots_btree_queue requests;
+	wait_queue_head_t wait_queue;
+
+	struct ssdfs_fs_info *fsi;
+};
+
+/* Snapshots tree states */
+enum {
+	SSDFS_SNAPSHOTS_BTREE_UNKNOWN_STATE,
+	SSDFS_SNAPSHOTS_BTREE_CREATED,
+	SSDFS_SNAPSHOTS_BTREE_INITIALIZED,
+	SSDFS_SNAPSHOTS_BTREE_DIRTY,
+	SSDFS_SNAPSHOTS_BTREE_CORRUPTED,
+	SSDFS_SNAPSHOTS_BTREE_STATE_MAX
+};
+
+/*
+ * Inline functions
+ */
+
+static inline
+int check_minute(int minute)
+{
+	if (minute < 0 || minute > 60) {
+		SSDFS_ERR("invalid minute value %d\n",
+			  minute);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static inline
+int check_hour(int hour)
+{
+	if (hour < 0 || hour > 24) {
+		SSDFS_ERR("invalid hour value %d\n",
+			  hour);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static inline
+int check_day(int day)
+{
+	if (day <= 0 || day > 31) {
+		SSDFS_ERR("invalid day value %d\n",
+			  day);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static inline
+int check_month(int month)
+{
+	if (month <= 0 || month > 12) {
+		SSDFS_ERR("invalid month value %d\n",
+			  month);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static inline
+int check_year(int year)
+{
+	if (year < 1970) {
+		SSDFS_ERR("invalid year value %d\n",
+			  year);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static inline
+void SHOW_SNAPSHOT_INFO(struct ssdfs_snapshot_request *snr)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("SNAPSHOT INFO: ");
+	SSDFS_DBG("name %s, ", snr->info.name);
+	SSDFS_DBG("UUID %pUb, ", snr->info.uuid);
+	SSDFS_DBG("mode %#x, type %#x, expiration %#x, "
+		  "frequency %#x, snapshots_threshold %u, "
+		  "TIME_RANGE (day %u, month %u, year %u)\n",
+		  snr->info.mode, snr->info.type, snr->info.expiration,
+		  snr->info.frequency, snr->info.snapshots_threshold,
+		  snr->info.time_range.day,
+		  snr->info.time_range.month,
+		  snr->info.time_range.year);
+#endif /* CONFIG_SSDFS_DEBUG */
+}
+
+static inline
+bool is_item_snapshot(void *kaddr)
+{
+	struct ssdfs_snapshot *snapshot;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!kaddr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	snapshot = (struct ssdfs_snapshot *)kaddr;
+
+	return le16_to_cpu(snapshot->magic) == SSDFS_SNAPSHOT_RECORD_MAGIC;
+}
+
+static inline
+bool is_item_peb2time_record(void *kaddr)
+{
+	struct ssdfs_peb2time_set *peb2time;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!kaddr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	peb2time = (struct ssdfs_peb2time_set *)kaddr;
+
+	return le16_to_cpu(peb2time->magic) == SSDFS_PEB2TIME_RECORD_MAGIC;
+}
+
+static inline
+bool is_peb2time_record_requested(struct ssdfs_btree_search *search)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!search);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return search->request.flags & SSDFS_BTREE_SEARCH_HAS_PEB2TIME_PAIR;
+}
+
+/*
+ * Snapshots tree API
+ */
+int ssdfs_snapshots_btree_create(struct ssdfs_fs_info *fsi);
+void ssdfs_snapshots_btree_destroy(struct ssdfs_fs_info *fsi);
+int ssdfs_snapshots_btree_flush(struct ssdfs_fs_info *fsi);
+
+int ssdfs_snapshots_btree_find(struct ssdfs_snapshots_btree_info *tree,
+				struct ssdfs_snapshot_id *id,
+				struct ssdfs_btree_search *search);
+int ssdfs_snapshots_btree_find_range(struct ssdfs_snapshots_btree_info *tree,
+				     struct ssdfs_timestamp_range *range,
+				     struct ssdfs_btree_search *search);
+int ssdfs_snapshots_btree_check_range(struct ssdfs_snapshots_btree_info *tree,
+				      struct ssdfs_timestamp_range *range,
+				      struct ssdfs_btree_search *search);
+int ssdfs_snapshots_btree_add(struct ssdfs_snapshots_btree_info *tree,
+			     struct ssdfs_snapshot_request *snr,
+			     struct ssdfs_btree_search *search);
+int ssdfs_snapshots_btree_add_peb2time(struct ssdfs_snapshots_btree_info *tree,
+					struct ssdfs_peb_timestamps *peb2time,
+					struct ssdfs_btree_search *search);
+int ssdfs_snapshots_btree_change(struct ssdfs_snapshots_btree_info *tree,
+				 struct ssdfs_snapshot_request *snr,
+				 struct ssdfs_btree_search *search);
+int ssdfs_snapshots_btree_delete(struct ssdfs_snapshots_btree_info *tree,
+				 struct ssdfs_snapshot_request *snr,
+				 struct ssdfs_btree_search *search);
+int ssdfs_snapshots_btree_delete_peb2time(struct ssdfs_snapshots_btree_info *,
+					  struct ssdfs_peb_timestamps *peb2time,
+					  struct ssdfs_btree_search *search);
+int ssdfs_snapshots_btree_delete_all(struct ssdfs_snapshots_btree_info *tree);
+
+/*
+ * Internal snapshots tree API
+ */
+int ssdfs_start_snapshots_btree_thread(struct ssdfs_fs_info *fsi);
+int ssdfs_stop_snapshots_btree_thread(struct ssdfs_fs_info *fsi);
+int ssdfs_snapshots_tree_find_leaf_node(struct ssdfs_snapshots_btree_info *tree,
+					struct ssdfs_timestamp_range *range,
+					struct ssdfs_btree_search *search);
+int ssdfs_snapshots_tree_get_start_hash(struct ssdfs_snapshots_btree_info *tree,
+					u64 *start_hash);
+int ssdfs_snapshots_tree_node_hash_range(struct ssdfs_snapshots_btree_info *tree,
+					 struct ssdfs_btree_search *search,
+					 u64 *start_hash, u64 *end_hash,
+					 u16 *items_count);
+int ssdfs_snapshots_tree_extract_range(struct ssdfs_snapshots_btree_info *tree,
+				       u16 start_index, u16 count,
+				       struct ssdfs_btree_search *search);
+int ssdfs_snapshots_tree_check_search_result(struct ssdfs_btree_search *search);
+int ssdfs_snapshots_tree_get_next_hash(struct ssdfs_snapshots_btree_info *tree,
+					struct ssdfs_btree_search *search,
+					u64 *next_hash);
+
+void ssdfs_debug_snapshots_btree_object(struct ssdfs_snapshots_btree_info *tree);
+
+/*
+ * Snapshots btree specialized operations
+ */
+extern const struct ssdfs_btree_descriptor_operations
+						ssdfs_snapshots_btree_desc_ops;
+extern const struct ssdfs_btree_operations ssdfs_snapshots_btree_ops;
+extern const struct ssdfs_btree_node_operations ssdfs_snapshots_btree_node_ops;
+
+#endif /* _SSDFS_SNAPSHOTS_TREE_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 67/79] ssdfs: implement extended attributes support
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (27 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 65/79] ssdfs: introduce snapshots b-tree Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 71/79] ssdfs: implement IOCTL operations Viacheslav Dubeyko
                   ` (3 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

This patch implements the extended attributes support.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/acl.c            |  260 ++++++
 fs/ssdfs/acl.h            |   54 ++
 fs/ssdfs/xattr.c          | 1700 +++++++++++++++++++++++++++++++++++++
 fs/ssdfs/xattr.h          |   88 ++
 fs/ssdfs/xattr_security.c |  159 ++++
 fs/ssdfs/xattr_tree.h     |  143 ++++
 fs/ssdfs/xattr_trusted.c  |   93 ++
 fs/ssdfs/xattr_user.c     |   93 ++
 8 files changed, 2590 insertions(+)
 create mode 100644 fs/ssdfs/acl.c
 create mode 100644 fs/ssdfs/acl.h
 create mode 100644 fs/ssdfs/xattr.c
 create mode 100644 fs/ssdfs/xattr.h
 create mode 100644 fs/ssdfs/xattr_security.c
 create mode 100644 fs/ssdfs/xattr_tree.h
 create mode 100644 fs/ssdfs/xattr_trusted.c
 create mode 100644 fs/ssdfs/xattr_user.c

diff --git a/fs/ssdfs/acl.c b/fs/ssdfs/acl.c
new file mode 100644
index 000000000000..04a075f4424d
--- /dev/null
+++ b/fs/ssdfs/acl.c
@@ -0,0 +1,260 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/acl.c - ACLs support implementation.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/kernel.h>
+#include <linux/rwsem.h>
+#include <linux/pagevec.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "folio_vector.h"
+#include "ssdfs.h"
+#include "xattr.h"
+#include "acl.h"
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+atomic64_t ssdfs_acl_folio_leaks;
+atomic64_t ssdfs_acl_memory_leaks;
+atomic64_t ssdfs_acl_cache_leaks;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+/*
+ * void ssdfs_acl_cache_leaks_increment(void *kaddr)
+ * void ssdfs_acl_cache_leaks_decrement(void *kaddr)
+ * void *ssdfs_acl_kmalloc(size_t size, gfp_t flags)
+ * void *ssdfs_acl_kzalloc(size_t size, gfp_t flags)
+ * void *ssdfs_acl_kcalloc(size_t n, size_t size, gfp_t flags)
+ * void ssdfs_acl_kfree(void *kaddr)
+ * struct folio *ssdfs_acl_alloc_folio(gfp_t gfp_mask, unsigned int order)
+ * struct folio *ssdfs_acl_add_batch_folio(struct folio_batch *batch,
+ *                                         unsigned int order)
+ * void ssdfs_acl_free_folio(struct folio *folio)
+ * void ssdfs_acl_folio_batch_release(struct folio_batch *batch)
+ */
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_MEMORY_LEAKS_CHECKER_FNS(acl)
+#else
+	SSDFS_MEMORY_ALLOCATOR_FNS(acl)
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+void ssdfs_acl_memory_leaks_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_acl_folio_leaks, 0);
+	atomic64_set(&ssdfs_acl_memory_leaks, 0);
+	atomic64_set(&ssdfs_acl_cache_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+void ssdfs_acl_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_acl_folio_leaks) != 0) {
+		SSDFS_ERR("ACL: "
+			  "memory leaks include %lld folios\n",
+			  atomic64_read(&ssdfs_acl_folio_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_acl_memory_leaks) != 0) {
+		SSDFS_ERR("ACL: "
+			  "memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_acl_memory_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_acl_cache_leaks) != 0) {
+		SSDFS_ERR("ACL: "
+			  "caches suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_acl_cache_leaks));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+struct posix_acl *ssdfs_get_acl(struct inode *inode, int type, bool rcu)
+{
+	struct posix_acl *acl;
+	char *xattr_name;
+	int name_index;
+	char *value = NULL;
+	ssize_t size;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, type %#x\n",
+		  (unsigned long)inode->i_ino, type);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (rcu)
+		return ERR_PTR(-ECHILD);
+
+	switch (type) {
+	case ACL_TYPE_ACCESS:
+		name_index = SSDFS_POSIX_ACL_ACCESS_XATTR_ID;
+		xattr_name = XATTR_NAME_POSIX_ACL_ACCESS;
+		break;
+	case ACL_TYPE_DEFAULT:
+		name_index = SSDFS_POSIX_ACL_DEFAULT_XATTR_ID;
+		xattr_name = XATTR_NAME_POSIX_ACL_DEFAULT;
+		break;
+	default:
+		SSDFS_ERR("unknown type %#x\n", type);
+		return ERR_PTR(-EINVAL);
+	}
+
+	size = __ssdfs_getxattr(inode, name_index, xattr_name, NULL, 0);
+
+	if (size > 0) {
+		value = ssdfs_acl_kzalloc(size, GFP_KERNEL);
+		if (unlikely(!value)) {
+			SSDFS_ERR("unable to allocate memory\n");
+			return ERR_PTR(-ENOMEM);
+		}
+		size = __ssdfs_getxattr(inode, name_index, xattr_name,
+					value, size);
+	}
+
+	if (size > 0)
+		acl = posix_acl_from_xattr(&init_user_ns, value, size);
+	else if (size == -ENODATA)
+		acl = NULL;
+	else
+		acl = ERR_PTR(size);
+
+	ssdfs_acl_kfree(value);
+	return acl;
+}
+
+static
+int __ssdfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+{
+	int name_index;
+	char *xattr_name;
+	size_t size = 0;
+	char *value = NULL;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, type %#x, acl %p\n",
+		  (unsigned long)inode->i_ino, type, acl);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (type) {
+	case ACL_TYPE_ACCESS:
+		name_index = SSDFS_POSIX_ACL_ACCESS_XATTR_ID;
+		xattr_name = XATTR_NAME_POSIX_ACL_ACCESS;
+		break;
+
+	case ACL_TYPE_DEFAULT:
+		name_index = SSDFS_POSIX_ACL_DEFAULT_XATTR_ID;
+		xattr_name = XATTR_NAME_POSIX_ACL_DEFAULT;
+		if (!S_ISDIR(inode->i_mode))
+			return acl ? -EACCES : 0;
+		break;
+
+	default:
+		SSDFS_ERR("unknown type %#x\n", type);
+		return -EINVAL;
+	}
+
+	if (acl) {
+		size = posix_acl_xattr_size(acl->a_count);
+		value = ssdfs_acl_kzalloc(size, GFP_KERNEL);
+		if (!value) {
+			SSDFS_ERR("unable to allocate memory\n");
+			return -ENOMEM;
+		}
+		err = posix_acl_to_xattr(&init_user_ns, acl, value, size);
+		if (err < 0) {
+			SSDFS_ERR("unable to convert acl to xattr\n");
+			goto end_set_acl;
+		}
+	}
+
+	err = __ssdfs_setxattr(inode, name_index, xattr_name, value, size, 0);
+
+end_set_acl:
+	ssdfs_acl_kfree(value);
+
+	if (!err)
+		set_cached_acl(inode, type, acl);
+
+	return err;
+}
+
+int ssdfs_set_acl(struct mnt_idmap *idmap, struct dentry *dentry,
+		  struct posix_acl *acl, int type)
+{
+	int update_mode = 0;
+	struct inode *inode = d_inode(dentry);
+	umode_t mode = inode->i_mode;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, type %#x, acl %p\n",
+		  (unsigned long)inode->i_ino, type, acl);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (type == ACL_TYPE_ACCESS && acl) {
+		err = posix_acl_update_mode(idmap, inode, &mode, &acl);
+		if (err)
+			goto end_set_acl;
+
+		if (mode != inode->i_mode)
+			update_mode = 1;
+	}
+
+	err = __ssdfs_set_acl(inode, acl, type);
+	if (!err && update_mode) {
+		inode->i_mode = mode;
+
+		inode_set_ctime_to_ts(inode, current_time(inode));
+		mark_inode_dirty(inode);
+	}
+
+end_set_acl:
+	return err;
+}
+
+int ssdfs_init_acl(struct inode *inode, struct inode *dir)
+{
+	struct posix_acl *default_acl, *acl;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("dir_ino %lu, ino %lu\n",
+		  (unsigned long)dir->i_ino, (unsigned long)inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = posix_acl_create(dir, &inode->i_mode, &default_acl, &acl);
+	if (err)
+		return err;
+
+	if (default_acl) {
+		err = __ssdfs_set_acl(inode, default_acl, ACL_TYPE_DEFAULT);
+		posix_acl_release(default_acl);
+	}
+
+	if (acl) {
+		if (!err)
+			err = __ssdfs_set_acl(inode, acl, ACL_TYPE_ACCESS);
+		posix_acl_release(acl);
+	}
+	return err;
+}
diff --git a/fs/ssdfs/acl.h b/fs/ssdfs/acl.h
new file mode 100644
index 000000000000..8e3f3b5131ae
--- /dev/null
+++ b/fs/ssdfs/acl.h
@@ -0,0 +1,54 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/acl.h - ACLs support declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_ACL_H
+#define _SSDFS_ACL_H
+
+#include <linux/posix_acl_xattr.h>
+
+#ifdef CONFIG_SSDFS_POSIX_ACL
+
+#define set_posix_acl_flag(sb) \
+	((sb)->s_flags |= SB_POSIXACL)
+
+/* acl.c */
+struct posix_acl *ssdfs_get_acl(struct inode *, int, bool);
+int ssdfs_set_acl(struct mnt_idmap *idmap, struct dentry *,
+		  struct posix_acl *, int);
+int ssdfs_init_acl(struct inode *, struct inode *);
+
+#else  /* CONFIG_SSDFS_POSIX_ACL */
+
+#define set_posix_acl_flag(sb) \
+	((sb)->s_flags &= ~SB_POSIXACL)
+
+#define ssdfs_get_acl NULL
+#define ssdfs_set_acl NULL
+
+static inline int ssdfs_init_acl(struct inode *inode, struct inode *dir)
+{
+	return 0;
+}
+
+#endif  /* CONFIG_SSDFS_POSIX_ACL */
+
+#endif /* _SSDFS_ACL_H */
diff --git a/fs/ssdfs/xattr.c b/fs/ssdfs/xattr.c
new file mode 100644
index 000000000000..3e2b38bc2985
--- /dev/null
+++ b/fs/ssdfs/xattr.c
@@ -0,0 +1,1700 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/xattr.c - extended attributes support implementation.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/kernel.h>
+#include <linux/rwsem.h>
+#include <linux/pagevec.h>
+#include <linux/sched/signal.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "folio_vector.h"
+#include "ssdfs.h"
+#include "folio_array.h"
+#include "peb.h"
+#include "offset_translation_table.h"
+#include "peb_container.h"
+#include "segment_bitmap.h"
+#include "segment.h"
+#include "btree_search.h"
+#include "btree_node.h"
+#include "btree.h"
+#include "xattr_tree.h"
+#include "shared_dictionary.h"
+#include "dentries_tree.h"
+#include "xattr.h"
+
+const struct xattr_handler *ssdfs_xattr_handlers[] = {
+	&ssdfs_xattr_user_handler,
+	&ssdfs_xattr_trusted_handler,
+#ifdef CONFIG_SSDFS_SECURITY
+	&ssdfs_xattr_security_handler,
+#endif
+	NULL
+};
+
+static
+int ssdfs_xattrs_tree_get_start_hash(struct ssdfs_xattrs_btree_info *tree,
+				     u64 *start_hash)
+{
+	struct ssdfs_btree_index *index;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!tree || !start_hash);
+
+	SSDFS_DBG("tree %p, start_hash %p\n",
+		  tree, start_hash);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*start_hash = U64_MAX;
+
+	switch (atomic_read(&tree->state)) {
+	case SSDFS_XATTR_BTREE_CREATED:
+	case SSDFS_XATTR_BTREE_INITIALIZED:
+	case SSDFS_XATTR_BTREE_DIRTY:
+		/* expected state */
+		break;
+
+	default:
+		SSDFS_ERR("invalid xattrs tree's state %#x\n",
+			  atomic_read(&tree->state));
+		return -ERANGE;
+	};
+
+	down_read(&tree->lock);
+
+	if (!tree->root) {
+		err = -ERANGE;
+		SSDFS_ERR("root node pointer is NULL\n");
+		goto finish_get_start_hash;
+	}
+
+	index = &tree->root->indexes[SSDFS_ROOT_NODE_LEFT_LEAF_NODE];
+	*start_hash = le64_to_cpu(index->hash);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("start_hash %llx\n", *start_hash);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+finish_get_start_hash:
+	up_read(&tree->lock);
+
+	return err;
+}
+
+static
+int ssdfs_xattrs_tree_get_next_hash(struct ssdfs_xattrs_btree_info *tree,
+				    struct ssdfs_btree_search *search,
+				    u64 *next_hash)
+{
+	u64 old_hash;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!tree || !search || !next_hash);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	old_hash = le64_to_cpu(search->node.found_index.index.hash);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("search %p, next_hash %p, old (node %u, hash %llx)\n",
+		  search, next_hash, search->node.id, old_hash);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (atomic_read(&tree->type)) {
+	case SSDFS_INLINE_XATTR:
+	case SSDFS_INLINE_XATTR_ARRAY:
+		SSDFS_DBG("inline xattrs array is unsupported\n");
+		return -ENOENT;
+
+	case SSDFS_PRIVATE_XATTR_BTREE:
+		/* expected tree type */
+		break;
+
+	default:
+		SSDFS_ERR("invalid tree type %#x\n",
+			  atomic_read(&tree->type));
+		return -ERANGE;
+	}
+
+	down_read(&tree->lock);
+	err = ssdfs_btree_get_next_hash(tree->generic_tree, search, next_hash);
+	up_read(&tree->lock);
+
+	return err;
+}
+
+static
+int ssdfs_xattrs_tree_node_hash_range(struct ssdfs_xattrs_btree_info *tree,
+					struct ssdfs_btree_search *search,
+					u64 *start_hash, u64 *end_hash,
+					u16 *items_count)
+{
+	struct ssdfs_xattr_entry *cur_xattr;
+	u16 inline_count;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!search || !start_hash || !end_hash || !items_count);
+
+	SSDFS_DBG("search %p, start_hash %p, "
+		  "end_hash %p, items_count %p\n",
+		  tree, start_hash, end_hash, items_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*start_hash = *end_hash = U64_MAX;
+	*items_count = 0;
+
+	switch (atomic_read(&tree->state)) {
+	case SSDFS_XATTR_BTREE_CREATED:
+	case SSDFS_XATTR_BTREE_INITIALIZED:
+	case SSDFS_XATTR_BTREE_DIRTY:
+		/* expected state */
+		break;
+
+	default:
+		SSDFS_ERR("invalid xattrs tree's state %#x\n",
+			  atomic_read(&tree->state));
+		return -ERANGE;
+	};
+
+	switch (atomic_read(&tree->type)) {
+	case SSDFS_INLINE_XATTR:
+	case SSDFS_INLINE_XATTR_ARRAY:
+		down_read(&tree->lock);
+
+		if (!tree->inline_xattrs) {
+			err = -ERANGE;
+			SSDFS_ERR("inline tree's pointer is empty\n");
+			goto finish_process_inline_tree;
+		}
+
+		inline_count = tree->inline_count;
+
+		if (inline_count >= U16_MAX) {
+			err = -ERANGE;
+			SSDFS_ERR("unexpected xattrs count %u\n",
+				  inline_count);
+			goto finish_process_inline_tree;
+		}
+
+		*items_count = inline_count;
+
+		if (*items_count == 0)
+			goto finish_process_inline_tree;
+
+		cur_xattr = &tree->inline_xattrs[0];
+		*start_hash = le64_to_cpu(cur_xattr->name_hash);
+
+		if (inline_count > tree->inline_capacity) {
+			err = -ERANGE;
+			SSDFS_ERR("xattrs_count %u > max_value %u\n",
+				  inline_count,
+				  tree->inline_capacity);
+			goto finish_process_inline_tree;
+		}
+
+		cur_xattr = &tree->inline_xattrs[inline_count - 1];
+		*end_hash = le64_to_cpu(cur_xattr->name_hash);
+
+finish_process_inline_tree:
+		up_read(&tree->lock);
+		break;
+
+	case SSDFS_PRIVATE_XATTR_BTREE:
+		err = ssdfs_btree_node_get_hash_range(search,
+						      start_hash,
+						      end_hash,
+						      items_count);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to get hash range: err %d\n",
+				  err);
+			goto finish_extract_hash_range;
+		}
+		break;
+
+	default:
+		SSDFS_ERR("invalid tree type %#x\n",
+			  atomic_read(&tree->type));
+		return -ERANGE;
+	}
+
+finish_extract_hash_range:
+	return err;
+}
+
+static
+int ssdfs_xattrs_tree_check_search_result(struct ssdfs_btree_search *search)
+{
+	size_t xattr_size = sizeof(struct ssdfs_xattr_entry);
+	u16 items_count;
+	size_t buf_size;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!search);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (search->result.state) {
+	case SSDFS_BTREE_SEARCH_VALID_ITEM:
+		/* expected state */
+		break;
+
+	default:
+		SSDFS_ERR("unexpected result's state %#x\n",
+			  search->result.state);
+		return  -ERANGE;
+	}
+
+	switch (search->result.raw_buf.state) {
+	case SSDFS_BTREE_SEARCH_INLINE_BUFFER:
+	case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER:
+		if (!search->result.raw_buf.place.ptr) {
+			SSDFS_ERR("buffer pointer is NULL\n");
+			return -ERANGE;
+		}
+		break;
+
+	default:
+		SSDFS_ERR("unexpected buffer's state\n");
+		return -ERANGE;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(search->result.raw_buf.items_count >= U16_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	items_count = (u16)search->result.raw_buf.items_count;
+
+	if (items_count == 0) {
+		SSDFS_ERR("items_in_buffer %u\n",
+			  items_count);
+		return -ENOENT;
+	} else if (items_count != search->result.count) {
+		SSDFS_ERR("items_count %u != search->result.count %u\n",
+			  items_count, search->result.count);
+		return -ERANGE;
+	}
+
+	buf_size = xattr_size * items_count;
+
+	if (buf_size != search->result.raw_buf.size) {
+		SSDFS_ERR("buf_size %zu != search->result.raw_buf.size %zu\n",
+			  buf_size,
+			  search->result.raw_buf.size);
+		return -ERANGE;
+	}
+
+	return 0;
+}
+
+static
+bool is_invalid_xattr(struct ssdfs_xattr_entry *xattr)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!xattr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (le64_to_cpu(xattr->name_hash) >= U64_MAX) {
+		SSDFS_ERR("invalid hash_code\n");
+		return true;
+	}
+
+	if (xattr->name_len > SSDFS_MAX_NAME_LEN) {
+		SSDFS_ERR("invalid name_len %u\n",
+			  xattr->name_len);
+		return true;
+	}
+
+	if (xattr->name_type <= SSDFS_XATTR_NAME_UNKNOWN_TYPE ||
+	    xattr->name_type >= SSDFS_XATTR_NAME_TYPE_MAX) {
+		SSDFS_ERR("invalid name_type %#x\n",
+			  xattr->name_type);
+		return true;
+	}
+
+	if (xattr->name_flags & ~SSDFS_XATTR_NAME_FLAGS_MASK) {
+		SSDFS_ERR("invalid set of flags %#x\n",
+			  xattr->name_flags);
+		return true;
+	}
+
+	if (xattr->blob_type <= SSDFS_XATTR_BLOB_UNKNOWN_TYPE ||
+	    xattr->blob_type >= SSDFS_XATTR_BLOB_TYPE_MAX) {
+		SSDFS_ERR("invalid blob_type %#x\n",
+			  xattr->blob_type);
+		return true;
+	}
+
+	if (xattr->blob_flags & ~SSDFS_XATTR_BLOB_FLAGS_MASK) {
+		SSDFS_ERR("invalid set of flags %#x\n",
+			  xattr->blob_flags);
+		return true;
+	}
+
+	return false;
+}
+
+static
+ssize_t ssdfs_copy_name2buffer(struct ssdfs_shared_dict_btree_info *dict,
+				struct ssdfs_xattr_entry *xattr,
+				struct ssdfs_btree_search *search,
+				ssize_t offset,
+				char *buffer, size_t size)
+{
+	u64 hash;
+	size_t prefix_len, name_len;
+	ssize_t copied = 0;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!xattr || !buffer);
+
+	SSDFS_DBG("xattr %p, offset %zd, "
+		  "buffer %p, size %zu\n",
+		  xattr, offset, buffer, size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	hash = le64_to_cpu(xattr->name_hash);
+
+	if ((copied + xattr->name_len) >= size) {
+		SSDFS_ERR("copied %zd, name_len %u, size %zu\n",
+			  copied, xattr->name_len, size);
+		return -ERANGE;
+	}
+
+	if (xattr->name_flags & SSDFS_XATTR_HAS_EXTERNAL_STRING) {
+		err = ssdfs_shared_dict_get_name(dict, hash,
+						 &search->name.string);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to extract the name: "
+				  "hash %llx, err %d\n",
+				  hash, err);
+			return err;
+		}
+
+		switch (xattr->name_type) {
+		case SSDFS_XATTR_REGULAR_NAME:
+			/* do nothing here */
+			break;
+
+		case SSDFS_XATTR_USER_REGULAR_NAME:
+			prefix_len =
+				strlen(SSDFS_NS_PREFIX[SSDFS_USER_NS_INDEX]);
+			err = ssdfs_memcpy(buffer, offset, size,
+				     SSDFS_NS_PREFIX[SSDFS_USER_NS_INDEX],
+				     0, prefix_len,
+				     prefix_len);
+			BUG_ON(unlikely(err != 0));
+			offset += prefix_len;
+			copied += prefix_len;
+			break;
+
+		case SSDFS_XATTR_TRUSTED_REGULAR_NAME:
+			prefix_len =
+				strlen(SSDFS_NS_PREFIX[SSDFS_TRUSTED_NS_INDEX]);
+			err = ssdfs_memcpy(buffer, offset, size,
+				     SSDFS_NS_PREFIX[SSDFS_TRUSTED_NS_INDEX],
+				     0, prefix_len,
+				     prefix_len);
+			BUG_ON(unlikely(err != 0));
+			offset += prefix_len;
+			copied += prefix_len;
+			break;
+
+		case SSDFS_XATTR_SYSTEM_REGULAR_NAME:
+			prefix_len =
+				strlen(SSDFS_NS_PREFIX[SSDFS_SYSTEM_NS_INDEX]);
+			err = ssdfs_memcpy(buffer, offset, size,
+				     SSDFS_NS_PREFIX[SSDFS_SYSTEM_NS_INDEX],
+				     0, prefix_len,
+				     prefix_len);
+			BUG_ON(unlikely(err != 0));
+			offset += prefix_len;
+			copied += prefix_len;
+			break;
+
+		case SSDFS_XATTR_SECURITY_REGULAR_NAME:
+			prefix_len =
+			    strlen(SSDFS_NS_PREFIX[SSDFS_SECURITY_NS_INDEX]);
+			err = ssdfs_memcpy(buffer, offset, size,
+				     SSDFS_NS_PREFIX[SSDFS_SECURITY_NS_INDEX],
+				     0, prefix_len,
+				     prefix_len);
+			BUG_ON(unlikely(err != 0));
+			offset += prefix_len;
+			copied += prefix_len;
+			break;
+
+		default:
+			SSDFS_ERR("unexpected name type %#x\n",
+				  xattr->name_type);
+			return -EIO;
+		}
+
+		err = ssdfs_memcpy(buffer, offset, size,
+				   search->name.string.str,
+				   0, SSDFS_MAX_NAME_LEN,
+				   search->name.string.len);
+		BUG_ON(unlikely(err != 0));
+
+		offset += search->name.string.len;
+		copied += search->name.string.len;
+
+		if (offset >= size) {
+			SSDFS_ERR("invalid offset: "
+				  "offset %zd, size %zu\n",
+				  offset, size);
+			return -ERANGE;
+		}
+
+		memset(buffer + offset, 0, size - offset);
+	} else {
+		switch (xattr->name_type) {
+		case SSDFS_XATTR_INLINE_NAME:
+			/* do nothing here */
+			break;
+
+		case SSDFS_XATTR_USER_INLINE_NAME:
+			prefix_len =
+				strlen(SSDFS_NS_PREFIX[SSDFS_USER_NS_INDEX]);
+			err = ssdfs_memcpy(buffer, offset, size,
+				     SSDFS_NS_PREFIX[SSDFS_USER_NS_INDEX],
+				     0, prefix_len,
+				     prefix_len);
+			BUG_ON(unlikely(err != 0));
+			offset += prefix_len;
+			copied += prefix_len;
+			break;
+
+		case SSDFS_XATTR_TRUSTED_INLINE_NAME:
+			prefix_len =
+				strlen(SSDFS_NS_PREFIX[SSDFS_TRUSTED_NS_INDEX]);
+			err = ssdfs_memcpy(buffer, offset, size,
+				     SSDFS_NS_PREFIX[SSDFS_TRUSTED_NS_INDEX],
+				     0, prefix_len,
+				     prefix_len);
+			BUG_ON(unlikely(err != 0));
+			offset += prefix_len;
+			copied += prefix_len;
+			break;
+
+		case SSDFS_XATTR_SYSTEM_INLINE_NAME:
+			prefix_len =
+				strlen(SSDFS_NS_PREFIX[SSDFS_SYSTEM_NS_INDEX]);
+			err = ssdfs_memcpy(buffer, offset, size,
+				     SSDFS_NS_PREFIX[SSDFS_SYSTEM_NS_INDEX],
+				     0, prefix_len,
+				     prefix_len);
+			BUG_ON(unlikely(err != 0));
+			offset += prefix_len;
+			copied += prefix_len;
+			break;
+
+		case SSDFS_XATTR_SECURITY_INLINE_NAME:
+			prefix_len =
+			    strlen(SSDFS_NS_PREFIX[SSDFS_SECURITY_NS_INDEX]);
+			err = ssdfs_memcpy(buffer, offset, size,
+				     SSDFS_NS_PREFIX[SSDFS_SECURITY_NS_INDEX],
+				     0, prefix_len,
+				     prefix_len);
+			offset += prefix_len;
+			copied += prefix_len;
+			break;
+
+		default:
+			SSDFS_ERR("unexpected name type %#x\n",
+				  xattr->name_type);
+			return -EIO;
+		}
+
+		name_len = xattr->name_len;
+
+		err = ssdfs_memcpy(buffer, offset, size,
+				   xattr->inline_string,
+				   0, SSDFS_XATTR_INLINE_NAME_MAX_LEN,
+				   name_len);
+		BUG_ON(unlikely(err != 0));
+
+		offset += name_len;
+		copied += name_len;
+
+		if (offset >= size) {
+			SSDFS_ERR("invalid offset: "
+				  "offset %zd, size %zu\n",
+				  offset, size);
+			return -ERANGE;
+		}
+
+		memset(buffer + offset, 0, size - offset);
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("XATTR NAME DUMP\n");
+	print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+			     buffer, size);
+	SSDFS_DBG("\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return copied;
+}
+
+static inline
+size_t ssdfs_calculate_name_length(struct ssdfs_xattr_entry *xattr)
+{
+	size_t prefix_len = 0;
+	size_t name_len = 0;
+
+	switch (xattr->name_type) {
+	case SSDFS_XATTR_INLINE_NAME:
+	case SSDFS_XATTR_REGULAR_NAME:
+		/* do nothing here */
+		break;
+
+	case SSDFS_XATTR_USER_INLINE_NAME:
+	case SSDFS_XATTR_USER_REGULAR_NAME:
+		prefix_len = strlen(SSDFS_NS_PREFIX[SSDFS_USER_NS_INDEX]);
+		break;
+
+	case SSDFS_XATTR_TRUSTED_INLINE_NAME:
+	case SSDFS_XATTR_TRUSTED_REGULAR_NAME:
+		prefix_len = strlen(SSDFS_NS_PREFIX[SSDFS_TRUSTED_NS_INDEX]);
+		break;
+
+	case SSDFS_XATTR_SYSTEM_INLINE_NAME:
+	case SSDFS_XATTR_SYSTEM_REGULAR_NAME:
+		prefix_len = strlen(SSDFS_NS_PREFIX[SSDFS_SYSTEM_NS_INDEX]);
+		break;
+
+	case SSDFS_XATTR_SECURITY_INLINE_NAME:
+	case SSDFS_XATTR_SECURITY_REGULAR_NAME:
+		prefix_len = strlen(SSDFS_NS_PREFIX[SSDFS_SECURITY_NS_INDEX]);
+		break;
+
+	default:
+		/* do nothing */
+		break;
+	}
+
+	name_len = prefix_len + xattr->name_len;
+
+	return name_len;
+}
+
+inline
+ssize_t ssdfs_listxattr_inline_tree(struct inode *inode,
+				    struct ssdfs_btree_search *search,
+				    char *buffer, size_t size)
+{
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_shared_dict_btree_info *dict;
+	struct ssdfs_xattr_entry *xattr;
+	u8 *kaddr;
+	size_t xattr_size = sizeof(struct ssdfs_xattr_entry);
+	u16 items_count;
+	ssize_t res, copied = 0;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, buffer %p, size %zu\n",
+		  inode->i_ino, buffer, size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	dict = fsi->shdictree;
+	if (!dict) {
+		SSDFS_ERR("shared dictionary is absent\n");
+		return -ERANGE;
+	}
+
+	down_read(&ii->lock);
+
+	if (!ii->xattrs_tree) {
+		err = -ERANGE;
+		SSDFS_ERR("unexpected xattrs tree absence\n");
+		goto finish_tree_processing;
+	}
+
+	err = ssdfs_xattrs_tree_extract_range(ii->xattrs_tree,
+					      0,
+					      SSDFS_DEFAULT_INLINE_XATTR_COUNT,
+					      search);
+	if (err == -ENOENT) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unable to extract inline xattr: "
+			  "ino %lu\n",
+			  inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto finish_tree_processing;
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to extract inline xattr: "
+			  "ino %lu, err %d\n",
+			  inode->i_ino, err);
+		goto finish_tree_processing;
+	}
+
+finish_tree_processing:
+	up_read(&ii->lock);
+
+	if (err == -ENOENT) {
+		err = 0;
+		goto clean_up;
+	} else if (unlikely(err))
+		goto clean_up;
+
+	err = ssdfs_xattrs_tree_check_search_result(search);
+	if (unlikely(err)) {
+		SSDFS_ERR("corrupted search result: "
+			  "err %d\n", err);
+		goto clean_up;
+	}
+
+	items_count = search->result.count;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!search->result.raw_buf.place.ptr);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (i = 0; i < items_count; i++) {
+		kaddr = (u8 *)search->result.raw_buf.place.ptr;
+		xattr = (struct ssdfs_xattr_entry *)(kaddr + (i * xattr_size));
+
+		if (is_invalid_xattr(xattr)) {
+			err = -EIO;
+			SSDFS_ERR("found corrupted xattr\n");
+			goto clean_up;
+		}
+
+		if (buffer) {
+			res = ssdfs_copy_name2buffer(dict, xattr,
+						     search, copied,
+						     buffer, size);
+			if (res < 0) {
+				err = res;
+				SSDFS_ERR("failed to copy name: "
+					  "err %d\n", err);
+				goto clean_up;
+			} else
+				copied += res + 1;
+		} else
+			copied += ssdfs_calculate_name_length(xattr) + 1;
+	}
+
+clean_up:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("copied %zd\n", copied);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err < 0 ? err : copied;
+}
+
+inline
+ssize_t ssdfs_listxattr_generic_tree(struct inode *inode,
+				     struct ssdfs_btree_search *search,
+				     char *buffer, size_t size)
+{
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_shared_dict_btree_info *dict;
+	struct ssdfs_xattr_entry *xattr;
+	u8 *kaddr;
+	size_t xattr_size = sizeof(struct ssdfs_xattr_entry);
+	u64 start_hash, end_hash;
+	u16 items_count;
+	ssize_t res, copied = 0;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, buffer %p, size %zu\n",
+		  inode->i_ino, buffer, size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	dict = fsi->shdictree;
+	if (!dict) {
+		SSDFS_ERR("shared dictionary is absent\n");
+		return -ERANGE;
+	}
+
+	down_read(&ii->lock);
+
+	if (!ii->xattrs_tree) {
+		err = -ERANGE;
+		SSDFS_ERR("unexpected xattrs tree absence\n");
+		goto finish_get_start_hash;
+	}
+
+	err = ssdfs_xattrs_tree_get_start_hash(ii->xattrs_tree,
+						&start_hash);
+	if (err == -ENOENT)
+		goto finish_get_start_hash;
+	else if (unlikely(err)) {
+		SSDFS_ERR("fail to get start root hash: err %d\n", err);
+		goto finish_get_start_hash;
+	} else if (start_hash >= U64_MAX) {
+		err = -ERANGE;
+		SSDFS_ERR("invalid hash value\n");
+		goto finish_get_start_hash;
+	}
+
+finish_get_start_hash:
+	up_read(&ii->lock);
+
+	if (err == -ENOENT) {
+		err = 0;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unable to extract start hash: "
+			  "ino %lu\n",
+			  inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto clean_up;
+	} else if (unlikely(err))
+		goto clean_up;
+
+	do {
+		ssdfs_btree_search_init(search);
+
+		/* allow ssdfs_listxattr_generic_tree() to be interrupted */
+		if (fatal_signal_pending(current)) {
+			err = -ERESTARTSYS;
+			goto clean_up;
+		}
+		cond_resched();
+
+		down_read(&ii->lock);
+
+		err = ssdfs_xattrs_tree_find_leaf_node(ii->xattrs_tree,
+							start_hash,
+							search);
+		if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to find a leaf node: "
+				  "hash %llx, err %d\n",
+				  start_hash, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto finish_tree_processing;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find a leaf node: "
+				  "hash %llx, err %d\n",
+				  start_hash, err);
+			goto finish_tree_processing;
+		}
+
+		err = ssdfs_xattrs_tree_node_hash_range(ii->xattrs_tree,
+							search,
+							&start_hash,
+							&end_hash,
+							&items_count);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to get node's hash range: "
+				  "err %d\n", err);
+			goto finish_tree_processing;
+		}
+
+		if (items_count == 0) {
+			err = -ENOENT;
+			SSDFS_DBG("empty leaf node\n");
+			goto finish_tree_processing;
+		}
+
+		if (start_hash > end_hash) {
+			err = -ENOENT;
+			goto finish_tree_processing;
+		}
+
+		err = ssdfs_xattrs_tree_extract_range(ii->xattrs_tree,
+							0, items_count,
+							search);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to extract the range: "
+				  "items_count %u, err %d\n",
+				  items_count, err);
+			goto finish_tree_processing;
+		}
+
+finish_tree_processing:
+		up_read(&ii->lock);
+
+		if (err == -ENOENT) {
+			err = 0;
+			goto clean_up;
+		} else if (unlikely(err))
+			goto clean_up;
+
+		err = ssdfs_xattrs_tree_check_search_result(search);
+		if (unlikely(err)) {
+			SSDFS_ERR("corrupted search result: "
+				  "err %d\n", err);
+			goto clean_up;
+		}
+
+		items_count = search->result.count;
+
+		for (i = 0; i < items_count; i++) {
+			u64 hash;
+
+			kaddr = (u8 *)search->result.raw_buf.place.ptr;
+			xattr =
+			    (struct ssdfs_xattr_entry *)(kaddr +
+							    (i * xattr_size));
+			hash = le64_to_cpu(xattr->name_hash);
+
+			if (is_invalid_xattr(xattr)) {
+				err = -EIO;
+				SSDFS_ERR("found corrupted xattr\n");
+				goto clean_up;
+			}
+
+			if (buffer) {
+				res = ssdfs_copy_name2buffer(dict, xattr,
+							     search, copied,
+							     buffer, size);
+				if (res < 0) {
+					err = res;
+					SSDFS_ERR("failed to copy name: "
+						  "err %d\n", err);
+					goto clean_up;
+				} else
+					copied += res + 1;
+			} else {
+				copied +=
+					ssdfs_calculate_name_length(xattr) + 1;
+			}
+
+			start_hash = hash;
+		}
+
+		if (start_hash != end_hash) {
+			err = -ERANGE;
+			SSDFS_ERR("cur_hash %llx != end_hash %llx\n",
+				  start_hash, end_hash);
+			goto clean_up;
+		}
+
+		start_hash = end_hash + 1;
+
+		down_read(&ii->lock);
+		err = ssdfs_xattrs_tree_get_next_hash(ii->xattrs_tree,
+						      search,
+						      &start_hash);
+		up_read(&ii->lock);
+
+		ssdfs_btree_search_forget_parent_node(search);
+		ssdfs_btree_search_forget_child_node(search);
+
+		if (err == -ENOENT) {
+			err = 0;
+			SSDFS_DBG("no more xattrs in the tree\n");
+			goto clean_up;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to get next hash: err %d\n",
+				  err);
+			goto clean_up;
+		}
+	} while (start_hash < U64_MAX);
+
+clean_up:
+	return err < 0 ? err : copied;
+}
+
+/*
+ * Copy a list of attribute names into the buffer
+ * provided, or compute the buffer size required.
+ * Buffer is NULL to compute the size of the buffer required.
+ *
+ * Returns a negative error number on failure, or the number of bytes
+ * used / required on success.
+ */
+ssize_t ssdfs_listxattr(struct dentry *dentry, char *buffer, size_t size)
+{
+	struct inode *inode = d_inode(dentry);
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	struct ssdfs_btree_search *search;
+	int private_flags;
+	ssize_t copied = 0;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, buffer %p, size %zu\n",
+		  inode->i_ino, buffer, size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	private_flags = atomic_read(&ii->private_flags);
+
+	switch (private_flags) {
+	case SSDFS_INODE_HAS_INLINE_XATTR:
+	case SSDFS_INODE_HAS_XATTR_BTREE:
+		/* xattrs tree exists */
+		break;
+
+	default:
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("xattrs tree is absent: "
+			  "ino %lu\n",
+			  inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return 0;
+	}
+
+	search = ssdfs_btree_search_alloc();
+	if (!search) {
+		SSDFS_ERR("fail to allocate btree search object\n");
+		return -ENOMEM;
+	}
+
+	if (!ii->xattrs_tree) {
+		err = -ERANGE;
+		SSDFS_ERR("unexpected xattrs tree absence\n");
+		goto clean_up;
+	}
+
+	switch (atomic_read(&ii->xattrs_tree->type)) {
+	case SSDFS_INLINE_XATTR:
+	case SSDFS_INLINE_XATTR_ARRAY:
+		ssdfs_btree_search_init(search);
+		copied = ssdfs_listxattr_inline_tree(inode, search,
+						     buffer, size);
+		if (unlikely(copied < 0)) {
+			err = copied;
+			SSDFS_ERR("fail to extract the inline range: "
+				  "err %d\n", err);
+			goto clean_up;
+		}
+		break;
+
+	case SSDFS_PRIVATE_XATTR_BTREE:
+		copied = ssdfs_listxattr_generic_tree(inode, search,
+						      buffer, size);
+		if (unlikely(copied < 0)) {
+			err = copied;
+			SSDFS_ERR("fail to extract the range: "
+				  "err %d\n", err);
+			goto clean_up;
+		}
+		break;
+
+	default:
+		err = -ERANGE;
+		SSDFS_ERR("invalid xattrs tree type %#x\n",
+			  atomic_read(&ii->xattrs_tree->type));
+		goto clean_up;
+	}
+
+clean_up:
+	ssdfs_btree_search_free(search);
+
+	return err < 0 ? err : copied;
+}
+
+/*
+ * Read external blob
+ */
+static
+int ssdfs_xattr_read_external_blob(struct ssdfs_fs_info *fsi,
+				   struct inode *inode,
+				   struct ssdfs_xattr_entry *xattr,
+				   void *value, size_t size)
+{
+	struct ssdfs_segment_request *req;
+	struct ssdfs_peb_container *pebc;
+	struct ssdfs_blk2off_table *table;
+	struct ssdfs_offset_position pos;
+	struct ssdfs_segment_info *si;
+	struct ssdfs_segment_search_state seg_search;
+	struct ssdfs_request_content_block *block;
+	struct ssdfs_content_block *blk_state;
+	struct folio *folio;
+	u16 blob_size;
+	u64 seg_id;
+	u32 logical_blk;
+	u32 len;
+	u32 batch_size;
+	u64 logical_offset;
+	u32 data_bytes;
+	u32 copied_bytes = 0;
+	struct completion *end;
+	int i, j;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !inode || !xattr || !value);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	seg_id = le64_to_cpu(xattr->blob.descriptor.extent.seg_id);
+	logical_blk = le32_to_cpu(xattr->blob.descriptor.extent.logical_blk);
+	len = le32_to_cpu(xattr->blob.descriptor.extent.len);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("seg_id %llu, logical_blk %u, len %u\n",
+		  seg_id, logical_blk, len);
+
+	BUG_ON(seg_id == U64_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_segment_search_state_init(&seg_search,
+					SSDFS_USER_DATA_SEG_TYPE,
+					seg_id, U64_MAX);
+
+	si = ssdfs_grab_segment(fsi, &seg_search);
+	if (unlikely(IS_ERR_OR_NULL(si))) {
+		err = !si ? -ENOMEM : PTR_ERR(si);
+		if (err == -EINTR) {
+			/*
+			 * Ignore this error.
+			 */
+		} else {
+			SSDFS_ERR("fail to grab segment object: "
+				  "seg %llu, err %d\n",
+				  seg_id, err);
+		}
+		goto fail_get_segment;
+	}
+
+	if (!is_ssdfs_segment_ready_for_requests(si)) {
+		err = ssdfs_wait_segment_init_end(si);
+		if (unlikely(err)) {
+			SSDFS_ERR("segment initialization failed: "
+				  "seg %llu, err %d\n",
+				  si->seg_id, err);
+			goto finish_prepare_request;
+		}
+	}
+
+	blob_size = le16_to_cpu(xattr->blob_len);
+
+	if (blob_size > size) {
+		err = -EINVAL;
+		SSDFS_ERR("invalid request: blob_size %u > size %zu\n",
+			  blob_size, size);
+		goto finish_prepare_request;
+	}
+
+	batch_size = blob_size >> fsi->log_pagesize;
+
+	if (batch_size == 0)
+		batch_size = 1;
+
+	if (batch_size > SSDFS_EXTENT_LEN_MAX) {
+		err = -ERANGE;
+		SSDFS_WARN("invalid memory folios count: "
+			   "blob_size %u, batch_size %u\n",
+			   blob_size, batch_size);
+		goto finish_prepare_request;
+	}
+
+	req = ssdfs_request_alloc();
+	if (IS_ERR_OR_NULL(req)) {
+		err = (req == NULL ? -ENOMEM : PTR_ERR(req));
+		SSDFS_ERR("fail to allocate segment request: err %d\n",
+			  err);
+		goto finish_prepare_request;
+	}
+
+	ssdfs_request_init(req, fsi->pagesize);
+	ssdfs_get_request(req);
+
+	logical_offset = 0;
+	data_bytes = blob_size;
+	ssdfs_request_prepare_logical_extent(inode->i_ino,
+					     (u64)logical_offset,
+					     (u32)data_bytes,
+					     0, 0, req);
+
+	for (i = 0; i < batch_size; i++) {
+		err = ssdfs_request_add_allocated_folio_locked(i, req);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to add folio into request: "
+				  "err %d\n",
+				  err);
+			goto fail_read_blob;
+		}
+	}
+
+	ssdfs_request_define_segment(seg_id, req);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(logical_blk >= U16_MAX);
+	BUG_ON(len >= U16_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+	ssdfs_request_define_volume_extent((u16)logical_blk, (u16)len, req);
+
+	ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ,
+					    SSDFS_READ_PAGES_READAHEAD,
+					    SSDFS_REQ_SYNC,
+					    req);
+
+	table = si->blk2off_table;
+
+	err = ssdfs_blk2off_table_get_offset_position(table, logical_blk, &pos);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to convert: "
+			  "seg_id %llu, logical_blk %u, len %u, err %d\n",
+			  seg_id, logical_blk, len, err);
+		goto fail_read_blob;
+	}
+
+	pebc = &si->peb_array[pos.peb_index];
+
+	err = ssdfs_peb_readahead_pages(pebc, req, &end);
+	if (err == -EAGAIN) {
+		err = SSDFS_WAIT_COMPLETION(end);
+		if (unlikely(err)) {
+			SSDFS_ERR("PEB init failed: "
+				  "err %d\n", err);
+			goto fail_read_blob;
+		}
+
+		err = ssdfs_peb_readahead_pages(pebc, req, &end);
+	}
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to read page: err %d\n",
+			  err);
+		goto fail_read_blob;
+	}
+
+	for (i = 0; i < req->result.processed_blks; i++)
+		ssdfs_peb_mark_request_block_uptodate(pebc, req, i);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(req->result.content.count == 0);
+
+	for (i = 0; i < req->result.content.count; i++) {
+		block = &req->result.content.blocks[i];
+		blk_state = &block->new_state;
+
+		BUG_ON(folio_batch_count(&blk_state->batch) == 0);
+
+		for (j = 0; j < folio_batch_count(&blk_state->batch); j++) {
+			void *kaddr;
+			u32 processed_bytes = 0;
+			u32 page_index = 0;
+
+			folio = blk_state->batch.folios[j];
+
+			WARN_ON(!folio_test_locked(folio));
+
+			do {
+				kaddr = kmap_local_folio(folio,
+							 processed_bytes);
+				SSDFS_DBG("PAGE DUMP: blk_index %d, "
+					  "folio_index %d, page_index %u\n",
+					  i, j, page_index);
+				print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+						     kaddr,
+						     PAGE_SIZE);
+				SSDFS_DBG("\n");
+				kunmap_local(kaddr);
+
+				processed_bytes += PAGE_SIZE;
+				page_index++;
+			} while (processed_bytes < folio_size(folio));
+		}
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (i = 0; i < req->result.content.count; i++) {
+		block = &req->result.content.blocks[i];
+		blk_state = &block->new_state;
+
+		for (j = 0; j < folio_batch_count(&blk_state->batch); j++) {
+			u32 cur_len;
+
+			folio = blk_state->batch.folios[j];
+
+			if (copied_bytes >= blob_size)
+				goto finish_copy_operation;
+
+			cur_len = min_t(u32, (u32)folio_size(folio),
+					blob_size - copied_bytes);
+
+			err = __ssdfs_memcpy_from_folio(value,
+							copied_bytes, size,
+							folio,
+							0, folio_size(folio),
+							cur_len);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to copy: "
+					  "copied_bytes %u, cur_len %u\n",
+					  copied_bytes, cur_len);
+				goto fail_read_blob;
+			}
+
+			copied_bytes += cur_len;
+		}
+	}
+
+finish_copy_operation:
+	ssdfs_request_unlock_and_remove_folios(req);
+
+	ssdfs_put_request(req);
+	ssdfs_request_free(req, si);
+
+	ssdfs_segment_put_object(si);
+
+	return 0;
+
+fail_read_blob:
+	ssdfs_request_unlock_and_remove_folios(req);
+	ssdfs_put_request(req);
+	ssdfs_request_free(req, si);
+
+finish_prepare_request:
+	ssdfs_segment_put_object(si);
+
+fail_get_segment:
+	return err;
+}
+
+/*
+ * Copy an extended attribute into the buffer
+ * provided, or compute the buffer size required.
+ * Buffer is NULL to compute the size of the buffer required.
+ *
+ * Returns a negative error number on failure, or the number of bytes
+ * used / required on success.
+ */
+ssize_t __ssdfs_getxattr(struct inode *inode, int name_index, const char *name,
+			 void *value, size_t size)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	struct ssdfs_btree_search *search;
+	struct ssdfs_xattr_entry *xattr;
+	u8 *kaddr;
+	size_t name_len;
+	u16 blob_len;
+	u8 blob_type;
+	u8 blob_flags;
+	int private_flags;
+	ssize_t err = 0;
+
+	if (name == NULL) {
+		SSDFS_ERR("name pointer is NULL\n");
+		return -EINVAL;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("name_index %d, name %s, value %p, size %zu\n",
+		  name_index, name, value, size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	name_len = strlen(name);
+	if (name_len > SSDFS_MAX_NAME_LEN)
+		return -ERANGE;
+
+	private_flags = atomic_read(&ii->private_flags);
+
+	switch (private_flags) {
+	case SSDFS_INODE_HAS_INLINE_XATTR:
+	case SSDFS_INODE_HAS_XATTR_BTREE:
+		down_read(&ii->lock);
+
+		if (!ii->xattrs_tree) {
+			err = -ERANGE;
+			SSDFS_WARN("xattrs tree is absent!!!\n");
+			goto finish_search_xattr;
+		}
+
+		search = ssdfs_btree_search_alloc();
+		if (!search) {
+			err = -ENOMEM;
+			SSDFS_ERR("fail to allocate btree search object\n");
+			goto finish_search_xattr;
+		}
+
+		ssdfs_btree_search_init(search);
+
+		err = ssdfs_xattrs_tree_find(ii->xattrs_tree,
+					     name, name_len,
+					     search);
+
+		if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("inode %lu hasn't xattr %s\n",
+				  (unsigned long)inode->i_ino,
+				  name);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto xattr_is_not_available;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find the xattr: "
+				  "inode %lu, name %s\n",
+				  (unsigned long)inode->i_ino,
+				  name);
+			goto xattr_is_not_available;
+		}
+
+		if (search->result.state != SSDFS_BTREE_SEARCH_VALID_ITEM) {
+			err = -ERANGE;
+			SSDFS_ERR("invalid result's state %#x\n",
+				  search->result.state);
+			goto xattr_is_not_available;
+		}
+
+		switch (search->result.raw_buf.state) {
+		case SSDFS_BTREE_SEARCH_INLINE_BUFFER:
+		case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER:
+			/* expected state */
+			break;
+
+		default:
+			err = -ERANGE;
+			SSDFS_ERR("invalid buffer state %#x\n",
+				  search->result.raw_buf.state);
+			goto xattr_is_not_available;
+		}
+
+		if (!search->result.raw_buf.place.ptr) {
+			err = -ERANGE;
+			SSDFS_ERR("buffer is absent\n");
+			goto xattr_is_not_available;
+		}
+
+		if (search->result.raw_buf.size == 0) {
+			err = -ERANGE;
+			SSDFS_ERR("result.buf_size %zu\n",
+				  search->result.raw_buf.size);
+			goto xattr_is_not_available;
+		}
+
+		kaddr = (u8 *)search->result.raw_buf.place.ptr;
+		xattr = (struct ssdfs_xattr_entry *)kaddr;
+
+		blob_len = le16_to_cpu(xattr->blob_len);
+		blob_type = xattr->blob_type;
+		blob_flags = xattr->blob_flags;
+
+		switch (blob_type) {
+		case SSDFS_XATTR_INLINE_BLOB:
+			if (blob_len > SSDFS_XATTR_INLINE_BLOB_MAX_LEN) {
+				err = -ERANGE;
+				SSDFS_ERR("invalid blob_len %u\n",
+					  blob_len);
+				goto xattr_is_not_available;
+			}
+			break;
+
+		case SSDFS_XATTR_REGULAR_BLOB:
+			if (!(blob_flags & SSDFS_XATTR_HAS_EXTERNAL_BLOB)) {
+				err = -ERANGE;
+				SSDFS_ERR("invalid set of flags %#x\n",
+					  blob_flags);
+				goto xattr_is_not_available;
+			}
+
+			if (blob_len > SSDFS_XATTR_EXTERNAL_BLOB_MAX_LEN) {
+				err = -ERANGE;
+				SSDFS_ERR("invalid blob_len %u\n",
+					  blob_len);
+				goto xattr_is_not_available;
+			}
+			break;
+
+		default:
+			err = -ERANGE;
+			SSDFS_ERR("unexpected blob type %#x\n",
+				  blob_type);
+			goto xattr_is_not_available;
+		}
+
+		if (value) {
+			switch (blob_type) {
+			case SSDFS_XATTR_INLINE_BLOB:
+				/* return value of attribute */
+				err = ssdfs_memcpy(value, 0, size,
+					     xattr->blob.inline_value.bytes,
+					     0, SSDFS_XATTR_INLINE_BLOB_MAX_LEN,
+					     blob_len);
+				if (unlikely(err)) {
+					SSDFS_ERR("fail to copy inline blob: "
+						  "err %zd\n", err);
+					goto xattr_is_not_available;
+				}
+				break;
+
+			case SSDFS_XATTR_REGULAR_BLOB:
+				err = ssdfs_xattr_read_external_blob(fsi,
+								     inode,
+								     xattr,
+								     value,
+								     size);
+				if (err == -EINTR) {
+					/*
+					 * Ignore this error.
+					 */
+					goto xattr_is_not_available;
+				} else if (unlikely(err)) {
+					SSDFS_ERR("fail to read external blob: "
+						  "err %zd\n", err);
+					goto xattr_is_not_available;
+				}
+				break;
+
+			default:
+				BUG();
+			}
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("BLOB DUMP:\n");
+			print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+					     value, size);
+			SSDFS_DBG("\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		err = blob_len;
+
+xattr_is_not_available:
+		ssdfs_btree_search_free(search);
+
+finish_search_xattr:
+		up_read(&ii->lock);
+		break;
+
+	default:
+		err = -ENODATA;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("xattrs tree is absent: "
+			  "ino %lu\n",
+			  (unsigned long)inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+		break;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished: err %zd\n", err);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * Create, replace or remove an extended attribute for this inode.  Value
+ * is NULL to remove an existing extended attribute, and non-NULL to
+ * either replace an existing extended attribute, or create a new extended
+ * attribute. The flags XATTR_REPLACE and XATTR_CREATE
+ * specify that an extended attribute must exist and must not exist
+ * previous to the call, respectively.
+ *
+ * Returns 0, or a negative error number on failure.
+ */
+int __ssdfs_setxattr(struct inode *inode, int name_index, const char *name,
+			const void *value, size_t size, int flags)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	struct ssdfs_btree_search *search;
+	size_t name_len;
+	int private_flags;
+	u64 name_hash;
+	int err = 0;
+
+	if (name == NULL) {
+		SSDFS_ERR("name pointer is NULL\n");
+		return -EINVAL;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("name_index %d, name %s, value %p, "
+		  "size %zu, flags %#x\n",
+		  name_index, name, value, size, flags);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (value == NULL)
+		size = 0;
+
+	name_len = strlen(name);
+	if (name_len > SSDFS_MAX_NAME_LEN)
+		return -ERANGE;
+
+	private_flags = atomic_read(&ii->private_flags);
+
+	switch (private_flags) {
+	case SSDFS_INODE_HAS_INLINE_XATTR:
+	case SSDFS_INODE_HAS_XATTR_BTREE:
+		down_read(&ii->lock);
+
+		if (!ii->xattrs_tree) {
+			err = -ERANGE;
+			SSDFS_WARN("xattrs tree is absent!!!\n");
+			goto finish_setxattr;
+		}
+		break;
+
+	default:
+		down_write(&ii->lock);
+
+		if (ii->xattrs_tree) {
+			err = -ERANGE;
+			SSDFS_WARN("xattrs tree exists unexpectedly!!!\n");
+			goto finish_create_xattrs_tree;
+		} else {
+			err = ssdfs_xattrs_tree_create(fsi, ii);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to create the xattrs tree: "
+					  "ino %lu, err %d\n",
+					  inode->i_ino, err);
+				goto finish_create_xattrs_tree;
+			}
+
+			atomic_or(SSDFS_INODE_HAS_INLINE_XATTR,
+				  &ii->private_flags);
+		}
+
+finish_create_xattrs_tree:
+		downgrade_write(&ii->lock);
+
+		if (unlikely(err))
+			goto finish_setxattr;
+		break;
+	}
+
+	search = ssdfs_btree_search_alloc();
+	if (!search) {
+		err = -ENOMEM;
+		SSDFS_ERR("fail to allocate btree search object\n");
+		goto finish_setxattr;
+	}
+
+	ssdfs_btree_search_init(search);
+
+	name_hash = __ssdfs_generate_name_hash(name, name_len,
+					       SSDFS_XATTR_INLINE_NAME_MAX_LEN);
+	if (name_hash >= U64_MAX) {
+		err = -ERANGE;
+		SSDFS_ERR("invalid name hash\n");
+		goto clean_up;
+	}
+
+	if (value == NULL) {
+		/* remove value */
+		err = ssdfs_xattrs_tree_delete(ii->xattrs_tree,
+						name_hash,
+						name, name_len,
+						search);
+		if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to remove xattr: "
+				  "ino %lu, name %s, err %d\n",
+				  inode->i_ino, name, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto clean_up;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to remove xattr: "
+				  "ino %lu, name %s, err %d\n",
+				  inode->i_ino, name, err);
+			goto clean_up;
+		}
+	} else if (flags & XATTR_CREATE) {
+		err = ssdfs_xattrs_tree_add(ii->xattrs_tree,
+					    name_index,
+					    name, name_len,
+					    value, size,
+					    ii,
+					    search);
+		if (err == -ENOSPC) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to create xattr: "
+				  "ino %lu, name %s, err %d\n",
+				  inode->i_ino, name, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto clean_up;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to create xattr: "
+				  "ino %lu, name %s, err %d\n",
+				  inode->i_ino, name, err);
+			goto clean_up;
+		}
+	} else if (flags & XATTR_REPLACE) {
+		err = ssdfs_xattrs_tree_change(ii->xattrs_tree,
+						name_index,
+						name_hash,
+						name, name_len,
+						value, size,
+						search);
+		if (err == -ENOSPC) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to replace xattr: "
+				  "ino %lu, name %s, err %d\n",
+				  inode->i_ino, name, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto clean_up;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to replace xattr: "
+				  "ino %lu, name %s, err %d\n",
+				  inode->i_ino, name, err);
+			goto clean_up;
+		}
+	} else {
+		err = ssdfs_xattrs_tree_delete(ii->xattrs_tree,
+						name_hash,
+						name, name_len,
+						search);
+		if (err == -ENODATA) {
+			err = 0;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("no requested xattr in the tree: "
+				  "ino %lu, name %s\n",
+				  inode->i_ino, name);
+#endif /* CONFIG_SSDFS_DEBUG */
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to remove xattr: "
+				  "ino %lu, name %s, err %d\n",
+				  inode->i_ino, name, err);
+			goto clean_up;
+		}
+
+		ssdfs_btree_search_init(search);
+
+		err = ssdfs_xattrs_tree_add(ii->xattrs_tree,
+					    name_index,
+					    name, name_len,
+					    value, size,
+					    ii,
+					    search);
+		if (err == -ENOSPC) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to create xattr: "
+				  "ino %lu, name %s, err %d\n",
+				  inode->i_ino, name, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto clean_up;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to create xattr: "
+				  "ino %lu, name %s, err %d\n",
+				  inode->i_ino, name, err);
+			goto clean_up;
+		}
+	}
+
+	inode_set_ctime_to_ts(inode, current_time(inode));
+	mark_inode_dirty(inode);
+
+clean_up:
+	ssdfs_btree_search_free(search);
+
+finish_setxattr:
+	up_read(&ii->lock);
+
+	return err;
+}
diff --git a/fs/ssdfs/xattr.h b/fs/ssdfs/xattr.h
new file mode 100644
index 000000000000..aeebfa42667a
--- /dev/null
+++ b/fs/ssdfs/xattr.h
@@ -0,0 +1,88 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/xattr.h - extended attributes support declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_XATTR_H
+#define _SSDFS_XATTR_H
+
+#include <linux/xattr.h>
+
+/* Name indexes */
+#define SSDFS_USER_XATTR_ID			1
+#define SSDFS_POSIX_ACL_ACCESS_XATTR_ID		2
+#define SSDFS_POSIX_ACL_DEFAULT_XATTR_ID	3
+#define SSDFS_TRUSTED_XATTR_ID			4
+#define SSDFS_SECURITY_XATTR_ID			5
+#define SSDFS_SYSTEM_XATTR_ID			6
+#define SSDFS_RICHACL_XATTR_ID			7
+#define SSDFS_XATTR_MAX_ID			255
+
+extern const struct xattr_handler ssdfs_xattr_user_handler;
+extern const struct xattr_handler ssdfs_xattr_trusted_handler;
+extern const struct xattr_handler ssdfs_xattr_security_handler;
+
+extern const struct xattr_handler *ssdfs_xattr_handlers[];
+
+ssize_t __ssdfs_getxattr(struct inode *, int, const char *, void *, size_t);
+
+static inline
+ssize_t ssdfs_getxattr(struct inode *inode,
+			int name_index, const char *name,
+			void *value, size_t size)
+{
+	return __ssdfs_getxattr(inode, name_index, name, value, size);
+}
+
+int __ssdfs_setxattr(struct inode *, int, const char *,
+			const void *, size_t, int);
+
+static inline
+int ssdfs_setxattr(struct inode *inode,
+		    int name_index, const char *name,
+		    const void *value, size_t size, int flags)
+{
+	return __ssdfs_setxattr(inode, name_index, name,
+				value, size, flags);
+}
+
+ssize_t ssdfs_listxattr(struct dentry *, char *, size_t);
+
+#ifdef CONFIG_SSDFS_SECURITY
+int ssdfs_init_security(struct inode *, struct inode *, const struct qstr *);
+int ssdfs_init_inode_security(struct inode *, struct inode *,
+				const struct qstr *);
+#else
+static inline
+int ssdfs_init_security(struct inode *inode, struct inode *dir,
+			const struct qstr *qstr)
+{
+	return 0;
+}
+
+static inline
+int ssdfs_init_inode_security(struct inode *inode, struct inode *dir,
+				const struct qstr *qstr)
+{
+	return 0;
+}
+#endif /* CONFIG_SSDFS_SECURITY */
+
+#endif /* _SSDFS_XATTR_H */
diff --git a/fs/ssdfs/xattr_security.c b/fs/ssdfs/xattr_security.c
new file mode 100644
index 000000000000..b5caff8ebbec
--- /dev/null
+++ b/fs/ssdfs/xattr_security.c
@@ -0,0 +1,159 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/xattr_security.c - handler for storing security labels as xattrs.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/kernel.h>
+#include <linux/rwsem.h>
+#include <linux/security.h>
+#include <linux/pagevec.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "folio_vector.h"
+#include "ssdfs.h"
+#include "xattr.h"
+#include "acl.h"
+
+static
+int ssdfs_security_getxattr(const struct xattr_handler *handler,
+			    struct dentry *unused, struct inode *inode,
+			    const char *name, void *buffer, size_t size)
+{
+	size_t len;
+
+	if (name == NULL || strcmp(name, "") == 0) {
+		SSDFS_ERR("invalid name\n");
+		return -EINVAL;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, name %s, buffer %p, size %zu\n",
+		  (unsigned long)inode->i_ino,
+		  name, buffer, size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	len = strlen(name);
+
+	if ((len + XATTR_SECURITY_PREFIX_LEN) > XATTR_NAME_MAX)
+		return -EOPNOTSUPP;
+
+	return ssdfs_getxattr(inode, SSDFS_SECURITY_XATTR_ID, name,
+				buffer, size);
+}
+
+static
+int ssdfs_security_setxattr(const struct xattr_handler *handler,
+			    struct mnt_idmap *idmap,
+			    struct dentry *unused, struct inode *inode,
+			    const char *name, const void *value,
+			    size_t size, int flags)
+{
+	size_t len;
+
+	if (name == NULL || strcmp(name, "") == 0) {
+		SSDFS_ERR("invalid name\n");
+		return -EINVAL;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, name %s, value %p, size %zu, flags %#x\n",
+		  (unsigned long)inode->i_ino,
+		  name, value, size, flags);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	len = strlen(name);
+
+	if ((len + XATTR_SECURITY_PREFIX_LEN) > XATTR_NAME_MAX)
+		return -EOPNOTSUPP;
+
+	return ssdfs_setxattr(inode, SSDFS_SECURITY_XATTR_ID, name,
+				value, size, flags);
+}
+
+static
+int ssdfs_initxattrs(struct inode *inode, const struct xattr *xattr_array,
+			void *fs_info)
+{
+	const struct xattr *xattr;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, xattr_array %p, fs_info %p\n",
+		  (unsigned long)inode->i_ino,
+		  xattr_array, fs_info);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (xattr = xattr_array; xattr->name != NULL; xattr++) {
+		size_t name_len;
+
+		name_len = strlen(xattr->name);
+
+		if (name_len == 0)
+			continue;
+
+		if (name_len + XATTR_SECURITY_PREFIX_LEN > XATTR_NAME_MAX)
+			return -EOPNOTSUPP;
+
+		err = __ssdfs_setxattr(inode, SSDFS_SECURITY_XATTR_ID,
+					xattr->name, xattr->value,
+					xattr->value_len, 0);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+int ssdfs_init_security(struct inode *inode, struct inode *dir,
+			const struct qstr *qstr)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("dir_ino %lu, ino %lu\n",
+		  (unsigned long)dir->i_ino,
+		  (unsigned long)inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return security_inode_init_security(inode, dir, qstr,
+					    &ssdfs_initxattrs, NULL);
+}
+
+int ssdfs_init_inode_security(struct inode *inode, struct inode *dir,
+				const struct qstr *qstr)
+{
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("dir_ino %lu, ino %lu\n",
+		  (unsigned long)dir->i_ino,
+		  (unsigned long)inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_init_acl(inode, dir);
+	if (!err)
+		err = ssdfs_init_security(inode, dir, qstr);
+	return err;
+}
+
+const struct xattr_handler ssdfs_xattr_security_handler = {
+	.prefix	= XATTR_SECURITY_PREFIX,
+	.get	= ssdfs_security_getxattr,
+	.set	= ssdfs_security_setxattr,
+};
diff --git a/fs/ssdfs/xattr_tree.h b/fs/ssdfs/xattr_tree.h
new file mode 100644
index 000000000000..c8eaa779f240
--- /dev/null
+++ b/fs/ssdfs/xattr_tree.h
@@ -0,0 +1,143 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/xattr_tree.h - extended attributes btree declarations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#ifndef _SSDFS_XATTR_TREE_H
+#define _SSDFS_XATTR_TREE_H
+
+/*
+ * struct ssdfs_xattrs_btree_info - xattrs btree info
+ * @type: xattrs btree type
+ * @state: xattrs btree state
+ * @lock: xattrs btree lock
+ * @generic_tree: pointer on generic btree object
+ * @inline_xattrs: pointer on inline xattrs array
+ * @inline_count: number of valid inline xattrs
+ * @inline_capacity: capacity of xattrs in the inline array
+ * @buffer.tree: piece of memory for generic btree object
+ * @buffer.xattr: piece of memory for the inline xattr
+ * @root: pointer on root node
+ * @root_buffer: buffer for root node
+ * @desc: b-tree descriptor
+ * @owner: pointer on owner inode object
+ * @fsi: pointer on shared file system object
+ */
+struct ssdfs_xattrs_btree_info {
+	atomic_t type;
+	atomic_t state;
+
+	struct rw_semaphore lock;
+	struct ssdfs_btree *generic_tree;
+	struct ssdfs_xattr_entry *inline_xattrs;
+	u16 inline_count;
+	u16 inline_capacity;
+
+	union {
+		struct ssdfs_btree tree;
+		struct ssdfs_xattr_entry xattr;
+	} buffer;
+	struct ssdfs_btree_inline_root_node *root;
+	struct ssdfs_btree_inline_root_node root_buffer;
+
+	struct ssdfs_xattr_btree_descriptor desc;
+	struct ssdfs_inode_info *owner;
+	struct ssdfs_fs_info *fsi;
+};
+
+/* Xattr tree types */
+enum {
+	SSDFS_XATTR_BTREE_UNKNOWN_TYPE,
+	SSDFS_INLINE_XATTR,
+	SSDFS_INLINE_XATTR_ARRAY,
+	SSDFS_PRIVATE_XATTR_BTREE,
+	SSDFS_XATTR_BTREE_TYPE_MAX
+};
+
+/* Xattr tree states */
+enum {
+	SSDFS_XATTR_BTREE_UNKNOWN_STATE,
+	SSDFS_XATTR_BTREE_CREATED,
+	SSDFS_XATTR_BTREE_INITIALIZED,
+	SSDFS_XATTR_BTREE_DIRTY,
+	SSDFS_XATTR_BTREE_CORRUPTED,
+	SSDFS_XATTR_BTREE_STATE_MAX
+};
+
+/*
+ * Xattr tree API
+ */
+int ssdfs_xattrs_tree_create(struct ssdfs_fs_info *fsi,
+			    struct ssdfs_inode_info *ii);
+int ssdfs_xattrs_tree_init(struct ssdfs_fs_info *fsi,
+			  struct ssdfs_inode_info *ii);
+void ssdfs_xattrs_tree_destroy(struct ssdfs_inode_info *ii);
+int ssdfs_xattrs_tree_flush(struct ssdfs_fs_info *fsi,
+			   struct ssdfs_inode_info *ii);
+
+int ssdfs_xattrs_tree_find(struct ssdfs_xattrs_btree_info *tree,
+			  const char *name, size_t len,
+			  struct ssdfs_btree_search *search);
+int ssdfs_xattrs_tree_add(struct ssdfs_xattrs_btree_info *tree,
+			 int name_index,
+			 const char *name, size_t name_len,
+			 const void *value, size_t size,
+			 struct ssdfs_inode_info *ii,
+			 struct ssdfs_btree_search *search);
+int ssdfs_xattrs_tree_change(struct ssdfs_xattrs_btree_info *tree,
+			    int name_index,
+			    u64 name_hash,
+			    const char *name, size_t name_len,
+			    const void *value, size_t size,
+			    struct ssdfs_btree_search *search);
+int ssdfs_xattrs_tree_delete(struct ssdfs_xattrs_btree_info *tree,
+			     u64 name_hash,
+			     const char *name, size_t name_len,
+			     struct ssdfs_btree_search *search);
+int ssdfs_xattrs_tree_delete_all(struct ssdfs_xattrs_btree_info *tree);
+
+/*
+ * Xattr tree internal API
+ */
+int __ssdfs_xattrs_btree_node_get_xattr(struct ssdfs_fs_info *fsi,
+					struct ssdfs_btree_node_content *content,
+					u32 area_offset,
+					u32 area_size,
+					u32 node_size,
+					u16 item_index,
+					struct ssdfs_xattr_entry *xattr);
+int ssdfs_xattrs_tree_find_leaf_node(struct ssdfs_xattrs_btree_info *tree,
+					u64 name_hash,
+					struct ssdfs_btree_search *search);
+int ssdfs_xattrs_tree_extract_range(struct ssdfs_xattrs_btree_info *tree,
+				    u16 start_index, u16 count,
+				    struct ssdfs_btree_search *search);
+
+void ssdfs_debug_xattrs_btree_object(struct ssdfs_xattrs_btree_info *tree);
+
+/*
+ * Xattr btree specialized operations
+ */
+extern const struct ssdfs_btree_descriptor_operations
+						ssdfs_xattrs_btree_desc_ops;
+extern const struct ssdfs_btree_operations ssdfs_xattrs_btree_ops;
+extern const struct ssdfs_btree_node_operations ssdfs_xattrs_btree_node_ops;
+
+#endif /* _SSDFS_XATTR_TREE_H */
diff --git a/fs/ssdfs/xattr_trusted.c b/fs/ssdfs/xattr_trusted.c
new file mode 100644
index 000000000000..8e1b0cc19bf2
--- /dev/null
+++ b/fs/ssdfs/xattr_trusted.c
@@ -0,0 +1,93 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/xattr_trusted.c - handler for trusted extended attributes.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/kernel.h>
+#include <linux/rwsem.h>
+#include <linux/pagevec.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "folio_vector.h"
+#include "ssdfs.h"
+#include "xattr.h"
+
+static
+int ssdfs_trusted_getxattr(const struct xattr_handler *handler,
+			   struct dentry *unused, struct inode *inode,
+			   const char *name, void *buffer, size_t size)
+{
+	size_t len;
+
+	if (name == NULL || strcmp(name, "") == 0) {
+		SSDFS_ERR("invalid name\n");
+		return -EINVAL;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, name %s, buffer %p, size %zu\n",
+		  (unsigned long)inode->i_ino,
+		  name, buffer, size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	len = strlen(name);
+
+	if ((len + XATTR_TRUSTED_PREFIX_LEN) > XATTR_NAME_MAX)
+		return -EOPNOTSUPP;
+
+	return ssdfs_getxattr(inode, SSDFS_TRUSTED_XATTR_ID, name,
+				buffer, size);
+}
+
+static
+int ssdfs_trusted_setxattr(const struct xattr_handler *handler,
+			   struct mnt_idmap *idmap,
+			   struct dentry *unused, struct inode *inode,
+			   const char *name, const void *value,
+			   size_t size, int flags)
+{
+	size_t len;
+
+	if (name == NULL || strcmp(name, "") == 0) {
+		SSDFS_ERR("invalid name\n");
+		return -EINVAL;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, name %s, value %p, size %zu, flags %#x\n",
+		  (unsigned long)inode->i_ino,
+		  name, value, size, flags);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	len = strlen(name);
+
+	if ((len + XATTR_TRUSTED_PREFIX_LEN) > XATTR_NAME_MAX)
+		return -EOPNOTSUPP;
+
+	return ssdfs_setxattr(inode, SSDFS_TRUSTED_XATTR_ID, name,
+				value, size, flags);
+}
+
+const struct xattr_handler ssdfs_xattr_trusted_handler = {
+	.prefix	= XATTR_TRUSTED_PREFIX,
+	.get	= ssdfs_trusted_getxattr,
+	.set	= ssdfs_trusted_setxattr,
+};
diff --git a/fs/ssdfs/xattr_user.c b/fs/ssdfs/xattr_user.c
new file mode 100644
index 000000000000..c643f5816dbb
--- /dev/null
+++ b/fs/ssdfs/xattr_user.c
@@ -0,0 +1,93 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/xattr_user.c - handler for extended user attributes.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/kernel.h>
+#include <linux/rwsem.h>
+#include <linux/pagevec.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "folio_vector.h"
+#include "ssdfs.h"
+#include "xattr.h"
+
+static
+int ssdfs_user_getxattr(const struct xattr_handler *handler,
+			struct dentry *unused, struct inode *inode,
+			const char *name, void *buffer, size_t size)
+{
+	size_t len;
+
+	if (name == NULL || strcmp(name, "") == 0) {
+		SSDFS_ERR("invalid name\n");
+		return -EINVAL;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, name %s, buffer %p, size %zu\n",
+		  (unsigned long)inode->i_ino,
+		  name, buffer, size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	len = strlen(name);
+
+	if ((len + XATTR_USER_PREFIX_LEN) > XATTR_NAME_MAX)
+		return -EOPNOTSUPP;
+
+	return ssdfs_getxattr(inode, SSDFS_USER_XATTR_ID, name,
+				buffer, size);
+}
+
+static
+int ssdfs_user_setxattr(const struct xattr_handler *handler,
+			struct mnt_idmap *idmap,
+			struct dentry *unused, struct inode *inode,
+			const char *name, const void *value,
+			size_t size, int flags)
+{
+	size_t len;
+
+	if (name == NULL || strcmp(name, "") == 0) {
+		SSDFS_ERR("invalid name\n");
+		return -EINVAL;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, name %s, value %p, size %zu, flags %#x\n",
+		  (unsigned long)inode->i_ino,
+		  name, value, size, flags);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	len = strlen(name);
+
+	if ((len + XATTR_USER_PREFIX_LEN) > XATTR_NAME_MAX)
+		return -EOPNOTSUPP;
+
+	return ssdfs_setxattr(inode, SSDFS_USER_XATTR_ID, name,
+				value, size, flags);
+}
+
+const struct xattr_handler ssdfs_xattr_user_handler = {
+	.prefix	= XATTR_USER_PREFIX,
+	.get	= ssdfs_user_getxattr,
+	.set	= ssdfs_user_setxattr,
+};
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 71/79] ssdfs: implement IOCTL operations
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (28 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 67/79] ssdfs: implement extended attributes support Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 75/79] ssdfs: implement inode operations support Viacheslav Dubeyko
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

This patch implements IOCTL operations.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/ioctl.c | 453 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/ssdfs/ioctl.h |  58 ++++++
 2 files changed, 511 insertions(+)
 create mode 100644 fs/ssdfs/ioctl.c
 create mode 100644 fs/ssdfs/ioctl.h

diff --git a/fs/ssdfs/ioctl.c b/fs/ssdfs/ioctl.c
new file mode 100644
index 000000000000..c02f81d16d90
--- /dev/null
+++ b/fs/ssdfs/ioctl.c
@@ -0,0 +1,453 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/ioctl.c - IOCTL operations.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/kernel.h>
+#include <linux/rwsem.h>
+#include <linux/pagevec.h>
+#include <linux/mount.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "folio_vector.h"
+#include "ssdfs.h"
+#include "testing.h"
+#include "ioctl.h"
+
+static int ssdfs_ioctl_getflags(struct file *file, void __user *arg)
+{
+	struct inode *inode = file_inode(file);
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	unsigned int flags;
+
+	flags = ii->flags & SSDFS_FL_USER_VISIBLE;
+	return put_user(flags, (int __user *) arg);
+}
+
+static int ssdfs_ioctl_setflags(struct file *file, void __user *arg)
+{
+	struct mnt_idmap *idmap = file_mnt_idmap(file);
+	struct inode *inode = file_inode(file);
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	unsigned int flags, oldflags;
+	int err = 0;
+
+	err = mnt_want_write_file(file);
+	if (err)
+		return err;
+
+	if (!inode_owner_or_capable(idmap, inode))
+		return -EACCES;
+
+	if (get_user(flags, (int __user *)arg))
+		return -EFAULT;
+
+	flags = ssdfs_mask_flags(inode->i_mode, flags);
+
+	inode_lock(inode);
+	down_write(&ii->lock);
+
+	oldflags = ii->flags;
+
+	/*
+	 * The IMMUTABLE and APPEND_ONLY flags can only be changed by the
+	 * relevant capability.
+	 */
+	if ((flags ^ oldflags) & (SSDFS_APPEND_FL | SSDFS_IMMUTABLE_FL)) {
+		if (!capable(CAP_LINUX_IMMUTABLE)) {
+			err = -EPERM;
+			goto out_unlock_inode;
+		}
+	}
+
+	flags = flags & SSDFS_FL_USER_MODIFIABLE;
+	flags |= oldflags & ~SSDFS_FL_USER_MODIFIABLE;
+	ii->flags = flags;
+
+	ssdfs_set_inode_flags(inode);
+	inode_set_ctime_to_ts(inode, current_time(inode));
+	mark_inode_dirty(inode);
+
+out_unlock_inode:
+	up_write(&ii->lock);
+	inode_unlock(inode);
+	mnt_drop_write_file(file);
+	return err;
+}
+
+static int ssdfs_ioctl_do_testing(struct file *file, void __user *arg)
+{
+	struct inode *inode = file_inode(file);
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_testing_environment env;
+
+	if (copy_from_user(&env, arg, sizeof(env)))
+		return -EFAULT;
+
+	return ssdfs_do_testing(fsi, &env);
+}
+
+static int ssdfs_ioctl_create_snapshot(struct file *file, void __user *arg)
+{
+	struct inode *inode = file_inode(file);
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_snapshot_request *snr = NULL;
+	size_t info_size = sizeof(struct ssdfs_snapshot_info);
+	int err = 0;
+
+	snr = ssdfs_snapshot_request_alloc();
+	if (!snr) {
+		SSDFS_ERR("fail to allocate snaphot request\n");
+		return -ENOMEM;
+	}
+
+	snr->operation = SSDFS_CREATE_SNAPSHOT;
+	snr->ino = inode->i_ino;
+
+	if (copy_from_user(&snr->info, arg, info_size)) {
+		err = -EFAULT;
+		ssdfs_snapshot_request_free(snr);
+		goto finish_create_snapshot;
+	}
+
+	ssdfs_snapshot_reqs_queue_add_tail(&fsi->snapshots.reqs_queue, snr);
+
+finish_create_snapshot:
+	return err;
+}
+
+static int ssdfs_ioctl_list_snapshots(struct file *file, void __user *arg)
+{
+	struct inode *inode = file_inode(file);
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_snapshot_request *snr = NULL;
+	size_t info_size = sizeof(struct ssdfs_snapshot_info);
+	int err = 0;
+
+	snr = ssdfs_snapshot_request_alloc();
+	if (!snr) {
+		SSDFS_ERR("fail to allocate snaphot request\n");
+		return -ENOMEM;
+	}
+
+	snr->operation = SSDFS_LIST_SNAPSHOTS;
+	snr->ino = inode->i_ino;
+
+	if (copy_from_user(&snr->info, arg, info_size)) {
+		err = -EFAULT;
+		goto finish_list_snapshots;
+	}
+
+	err = ssdfs_execute_list_snapshots_request(&fsi->snapshots, snr);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to get the snapshots list: "
+			  "err %d\n", err);
+		goto finish_list_snapshots;
+	}
+
+	if (copy_to_user((struct ssdfs_snapshot_info __user *)arg,
+			 &snr->info, info_size)) {
+		err = -EFAULT;
+		goto finish_list_snapshots;
+	}
+
+finish_list_snapshots:
+	if (!snr)
+		ssdfs_snapshot_request_free(snr);
+
+	return err;
+}
+
+static int ssdfs_ioctl_modify_snapshot(struct file *file, void __user *arg)
+{
+	struct inode *inode = file_inode(file);
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_snapshot_request *snr = NULL;
+	size_t info_size = sizeof(struct ssdfs_snapshot_info);
+	int err = 0;
+
+	snr = ssdfs_snapshot_request_alloc();
+	if (!snr) {
+		SSDFS_ERR("fail to allocate snaphot request\n");
+		return -ENOMEM;
+	}
+
+	snr->operation = SSDFS_MODIFY_SNAPSHOT;
+	snr->ino = inode->i_ino;
+
+	if (copy_from_user(&snr->info, arg, info_size)) {
+		err = -EFAULT;
+		goto finish_modify_snapshot;
+	}
+
+	err = ssdfs_execute_modify_snapshot_request(fsi, snr);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to modify snapshot: "
+			  "err %d\n", err);
+		goto finish_modify_snapshot;
+	}
+
+finish_modify_snapshot:
+	if (!snr)
+		ssdfs_snapshot_request_free(snr);
+
+	return err;
+}
+
+static int ssdfs_ioctl_remove_snapshot(struct file *file, void __user *arg)
+{
+	struct inode *inode = file_inode(file);
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_snapshot_request *snr = NULL;
+	size_t info_size = sizeof(struct ssdfs_snapshot_info);
+	int err = 0;
+
+	snr = ssdfs_snapshot_request_alloc();
+	if (!snr) {
+		SSDFS_ERR("fail to allocate snaphot request\n");
+		return -ENOMEM;
+	}
+
+	snr->operation = SSDFS_REMOVE_SNAPSHOT;
+	snr->ino = inode->i_ino;
+
+	if (copy_from_user(&snr->info, arg, info_size)) {
+		err = -EFAULT;
+		goto finish_remove_snapshot;
+	}
+
+	err = ssdfs_execute_remove_snapshot_request(&fsi->snapshots, snr);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to delete snapshot: "
+			  "err %d\n", err);
+		goto finish_remove_snapshot;
+	}
+
+finish_remove_snapshot:
+	if (!snr)
+		ssdfs_snapshot_request_free(snr);
+
+	return err;
+}
+
+static int ssdfs_ioctl_remove_range(struct file *file, void __user *arg)
+{
+	struct inode *inode = file_inode(file);
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_snapshot_request *snr = NULL;
+	size_t info_size = sizeof(struct ssdfs_snapshot_info);
+	int err = 0;
+
+	snr = ssdfs_snapshot_request_alloc();
+	if (!snr) {
+		SSDFS_ERR("fail to allocate snaphot request\n");
+		return -ENOMEM;
+	}
+
+	snr->operation = SSDFS_REMOVE_RANGE;
+	snr->ino = inode->i_ino;
+
+	if (copy_from_user(&snr->info, arg, info_size)) {
+		err = -EFAULT;
+		goto finish_remove_range;
+	}
+
+	err = ssdfs_execute_remove_range_request(&fsi->snapshots, snr);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to remove range of snapshots: "
+			  "err %d\n", err);
+		goto finish_remove_range;
+	}
+
+finish_remove_range:
+	if (!snr)
+		ssdfs_snapshot_request_free(snr);
+
+	return err;
+}
+
+static int ssdfs_ioctl_show_snapshot_details(struct file *file,
+					     void __user *arg)
+{
+	struct inode *inode = file_inode(file);
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_snapshot_request *snr = NULL;
+	size_t info_size = sizeof(struct ssdfs_snapshot_info);
+	int err = 0;
+
+	snr = ssdfs_snapshot_request_alloc();
+	if (!snr) {
+		SSDFS_ERR("fail to allocate snaphot request\n");
+		return -ENOMEM;
+	}
+
+	snr->operation = SSDFS_SHOW_SNAPSHOT_DETAILS;
+	snr->ino = inode->i_ino;
+
+	if (copy_from_user(&snr->info, arg, info_size)) {
+		err = -EFAULT;
+		goto finish_show_snapshot_details;
+	}
+
+	err = ssdfs_execute_show_details_request(&fsi->snapshots, snr);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to show snapshot's details: "
+			  "err %d\n", err);
+		goto finish_show_snapshot_details;
+	}
+
+	if (copy_to_user((struct ssdfs_snapshot_info __user *)arg,
+			 &snr->info, info_size)) {
+		err = -EFAULT;
+		goto finish_show_snapshot_details;
+	}
+
+finish_show_snapshot_details:
+	if (!snr)
+		ssdfs_snapshot_request_free(snr);
+
+	return err;
+}
+
+static int ssdfs_ioctl_list_snapshot_rules(struct file *file, void __user *arg)
+{
+	struct inode *inode = file_inode(file);
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_snapshot_request *snr = NULL;
+	size_t info_size = sizeof(struct ssdfs_snapshot_info);
+	int err = 0;
+
+	snr = ssdfs_snapshot_request_alloc();
+	if (!snr) {
+		SSDFS_ERR("fail to allocate snaphot request\n");
+		return -ENOMEM;
+	}
+
+	snr->operation = SSDFS_LIST_SNAPSHOT_RULES;
+	snr->ino = inode->i_ino;
+
+	if (copy_from_user(&snr->info, arg, info_size)) {
+		err = -EFAULT;
+		goto finish_list_snapshot_rules;
+	}
+
+	err = ssdfs_execute_list_snapshot_rules_request(fsi, snr);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to get the snapshot rules list: "
+			  "err %d\n", err);
+		goto finish_list_snapshot_rules;
+	}
+
+	if (copy_to_user((struct ssdfs_snapshot_info __user *)arg,
+			 &snr->info, info_size)) {
+		err = -EFAULT;
+		goto finish_list_snapshot_rules;
+	}
+
+finish_list_snapshot_rules:
+	if (!snr)
+		ssdfs_snapshot_request_free(snr);
+
+	return err;
+}
+
+static int ssdfs_ioctl_tunefs_get_fs_config(struct file *file, void __user *arg)
+{
+	struct inode *inode = file_inode(file);
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_tunefs_options config;
+	size_t options_size = sizeof(struct ssdfs_tunefs_options);
+
+	memset(&config, 0, options_size);
+
+	ssdfs_tunefs_get_current_volume_config(fsi, &config.old_config);
+	ssdfs_tunefs_get_new_config_request(fsi, &config.new_config);
+
+	if (copy_to_user((struct ssdfs_tunefs_options __user *)arg,
+			 &config, options_size)) {
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static int ssdfs_ioctl_tunefs_set_fs_config(struct file *file, void __user *arg)
+{
+	struct inode *inode = file_inode(file);
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_tunefs_options config;
+	size_t options_size = sizeof(struct ssdfs_tunefs_options);
+	int err = 0;
+
+	if (copy_from_user(&config, arg, options_size)) {
+		return -EFAULT;
+	}
+
+	ssdfs_tunefs_get_current_volume_config(fsi, &config.old_config);
+
+	err = ssdfs_tunefs_check_requested_volume_config(fsi, &config);
+	if (!err)
+		ssdfs_tunefs_save_new_config_request(fsi, &config);
+
+	if (copy_to_user((struct ssdfs_tunefs_options __user *)arg,
+			 &config, options_size)) {
+		return -EFAULT;
+	}
+
+	return err;
+}
+
+/*
+ * The ssdfs_ioctl() is called by the ioctl(2) system call.
+ */
+long ssdfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+
+	switch (cmd) {
+	case FS_IOC_GETFLAGS:
+		return ssdfs_ioctl_getflags(file, argp);
+	case FS_IOC_SETFLAGS:
+		return ssdfs_ioctl_setflags(file, argp);
+	case SSDFS_IOC_DO_TESTING:
+		return ssdfs_ioctl_do_testing(file, argp);
+	case SSDFS_IOC_CREATE_SNAPSHOT:
+		return ssdfs_ioctl_create_snapshot(file, argp);
+	case SSDFS_IOC_LIST_SNAPSHOTS:
+		return ssdfs_ioctl_list_snapshots(file, argp);
+	case SSDFS_IOC_MODIFY_SNAPSHOT:
+		return ssdfs_ioctl_modify_snapshot(file, argp);
+	case SSDFS_IOC_REMOVE_SNAPSHOT:
+		return ssdfs_ioctl_remove_snapshot(file, argp);
+	case SSDFS_IOC_REMOVE_RANGE:
+		return ssdfs_ioctl_remove_range(file, argp);
+	case SSDFS_IOC_SHOW_DETAILS:
+		return ssdfs_ioctl_show_snapshot_details(file, argp);
+	case SSDFS_IOC_LIST_SNAPSHOT_RULES:
+		return ssdfs_ioctl_list_snapshot_rules(file, argp);
+	case SSDFS_IOC_TUNEFS_GET_CONFIG:
+		return ssdfs_ioctl_tunefs_get_fs_config(file, argp);
+	case SSDFS_IOC_TUNEFS_SET_CONFIG:
+		return ssdfs_ioctl_tunefs_set_fs_config(file, argp);
+	}
+
+	return -ENOTTY;
+}
diff --git a/fs/ssdfs/ioctl.h b/fs/ssdfs/ioctl.h
new file mode 100644
index 000000000000..405e85a5f528
--- /dev/null
+++ b/fs/ssdfs/ioctl.h
@@ -0,0 +1,58 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/ioctl.h - IOCTL related declaration.
+ *
+ * Copyright (c) 2019-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#ifndef _SSDFS_IOCTL_H
+#define _SSDFS_IOCTL_H
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+#include "testing.h"
+#include "snapshot.h"
+
+#define SSDFS_IOCTL_MAGIC 0xdf
+
+/*
+ * SSDFS_IOC_DO_TESTING - run internal testing
+ */
+#define SSDFS_IOC_DO_TESTING _IOW(SSDFS_IOCTL_MAGIC, 1, \
+				  struct ssdfs_testing_environment)
+
+/*
+ * Snapshot related IOCTLs
+ */
+#define SSDFS_IOC_CREATE_SNAPSHOT	_IOW(SSDFS_IOCTL_MAGIC, 2, \
+					     struct ssdfs_snapshot_info)
+#define SSDFS_IOC_LIST_SNAPSHOTS	_IOWR(SSDFS_IOCTL_MAGIC, 3, \
+					     struct ssdfs_snapshot_info)
+#define SSDFS_IOC_MODIFY_SNAPSHOT	_IOW(SSDFS_IOCTL_MAGIC, 4, \
+					     struct ssdfs_snapshot_info)
+#define SSDFS_IOC_REMOVE_SNAPSHOT	_IOW(SSDFS_IOCTL_MAGIC, 5, \
+					     struct ssdfs_snapshot_info)
+#define SSDFS_IOC_REMOVE_RANGE		_IOW(SSDFS_IOCTL_MAGIC, 6, \
+					     struct ssdfs_snapshot_info)
+#define SSDFS_IOC_SHOW_DETAILS		_IOWR(SSDFS_IOCTL_MAGIC, 7, \
+					     struct ssdfs_snapshot_info)
+#define SSDFS_IOC_LIST_SNAPSHOT_RULES	_IOWR(SSDFS_IOCTL_MAGIC, 8, \
+					     struct ssdfs_snapshot_info)
+
+/*
+ * The tunefs related IOCTLs
+ */
+#define SSDFS_IOC_TUNEFS_GET_CONFIG	_IOR(SSDFS_IOCTL_MAGIC, 9, \
+					     struct ssdfs_tunefs_options)
+#define SSDFS_IOC_TUNEFS_SET_CONFIG	_IOWR(SSDFS_IOCTL_MAGIC, 10, \
+					     struct ssdfs_tunefs_options)
+
+#endif /* _SSDFS_IOCTL_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 75/79] ssdfs: implement inode operations support
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (29 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 71/79] ssdfs: implement IOCTL operations Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:17 ` [PATCH v2 76/79] ssdfs: implement directory " Viacheslav Dubeyko
  2026-03-16  2:18 ` [PATCH v2 77/79] ssdfs: implement file " Viacheslav Dubeyko
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

Patch implements operations:
(1) ssdfs_read_inode - read raw inode
(2) ssdfs_iget - get existing inode
(3) ssdfs_new_inode - create new inode
(4) ssdfs_getattr - get attributes
(5) ssdfs_setattr - set attributes
(6) ssdfs_truncate - truncate file
(7) ssdfs_setsize - set file size
(8) ssdfs_evict_inode - evict inode
(9) ssdfs_write_inode - store dirty inode

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/inode.c | 1262 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 1262 insertions(+)
 create mode 100644 fs/ssdfs/inode.c

diff --git a/fs/ssdfs/inode.c b/fs/ssdfs/inode.c
new file mode 100644
index 000000000000..959e733b6e18
--- /dev/null
+++ b/fs/ssdfs/inode.c
@@ -0,0 +1,1262 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/inode.c - inode handling routines.
+ *
+ * Copyright (c) 2014-2019 HGST, a Western Digital Company.
+ *              http://www.hgst.com/
+ * Copyright (c) 2014-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ *
+ * (C) Copyright 2014-2019, HGST, Inc., All rights reserved.
+ *
+ * Created by HGST, San Jose Research Center, Storage Architecture Group
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ *
+ * Acknowledgement: Cyril Guyot
+ *                  Zvonimir Bandic
+ */
+
+#include <linux/mtd/mtd.h>
+#include <linux/mm.h>
+#include <linux/statfs.h>
+#include <linux/pagevec.h>
+#include <linux/dcache.h>
+#include <linux/random.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "folio_vector.h"
+#include "ssdfs.h"
+#include "request_queue.h"
+#include "btree_search.h"
+#include "btree_node.h"
+#include "btree.h"
+#include "extents_tree.h"
+#include "inodes_tree.h"
+#include "dentries_tree.h"
+#include "xattr_tree.h"
+#include "acl.h"
+#include "xattr.h"
+
+#include <trace/events/ssdfs.h>
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+atomic64_t ssdfs_inode_folio_leaks;
+atomic64_t ssdfs_inode_memory_leaks;
+atomic64_t ssdfs_inode_cache_leaks;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+/*
+ * void ssdfs_inode_cache_leaks_increment(void *kaddr)
+ * void ssdfs_inode_cache_leaks_decrement(void *kaddr)
+ * void *ssdfs_inode_kmalloc(size_t size, gfp_t flags)
+ * void *ssdfs_inode_kzalloc(size_t size, gfp_t flags)
+ * void *ssdfs_inode_kcalloc(size_t n, size_t size, gfp_t flags)
+ * void ssdfs_inode_kfree(void *kaddr)
+ * struct folio *ssdfs_inode_alloc_folio(gfp_t gfp_mask,
+ *                                       unsigned int order)
+ * struct folio *ssdfs_inode_add_batch_folio(struct folio_batch *batch,
+ *                                           unsigned int order)
+ * void ssdfs_inode_free_folio(struct folio *folio)
+ * void ssdfs_inode_folio_batch_release(struct folio_batch *batch)
+ */
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_MEMORY_LEAKS_CHECKER_FNS(inode)
+#else
+	SSDFS_MEMORY_ALLOCATOR_FNS(inode)
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+void ssdfs_inode_memory_leaks_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_inode_folio_leaks, 0);
+	atomic64_set(&ssdfs_inode_memory_leaks, 0);
+	atomic64_set(&ssdfs_inode_cache_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+void ssdfs_inode_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_inode_folio_leaks) != 0) {
+		SSDFS_ERR("INODE: "
+			  "memory leaks include %lld folios\n",
+			  atomic64_read(&ssdfs_inode_folio_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_inode_memory_leaks) != 0) {
+		SSDFS_ERR("INODE: "
+			  "memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_inode_memory_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_inode_cache_leaks) != 0) {
+		SSDFS_ERR("INODE: "
+			  "caches suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_inode_cache_leaks));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+bool is_raw_inode_checksum_correct(struct ssdfs_fs_info *fsi,
+				   void *buf, size_t size)
+{
+	struct ssdfs_inode *raw_inode;
+	size_t raw_inode_size;
+	__le32 old_checksum;
+	bool is_valid = false;
+
+	spin_lock(&fsi->inodes_tree->lock);
+	raw_inode_size = fsi->inodes_tree->raw_inode_size;
+	spin_unlock(&fsi->inodes_tree->lock);
+
+	if (raw_inode_size != size) {
+		SSDFS_WARN("raw_inode_size %zu != size %zu\n",
+			   raw_inode_size, size);
+		return false;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("RAW INODE DUMP:\n");
+	print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+			     buf, size);
+	SSDFS_DBG("\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	raw_inode = (struct ssdfs_inode *)buf;
+
+	old_checksum = raw_inode->checksum;
+	raw_inode->checksum = 0;
+	raw_inode->checksum = ssdfs_crc32_le(buf, size);
+
+	is_valid = old_checksum == raw_inode->checksum;
+
+	if (!is_valid) {
+		SSDFS_WARN("invalid inode's checksum: "
+			   "stored %#x != calculated %#x\n",
+			   le32_to_cpu(old_checksum),
+			   le32_to_cpu(raw_inode->checksum));
+		raw_inode->checksum = old_checksum;
+	}
+
+	return is_valid;
+}
+
+void ssdfs_set_inode_flags(struct inode *inode)
+{
+	unsigned int flags = SSDFS_I(inode)->flags;
+	unsigned int new_fl = 0;
+
+	if (flags & FS_SYNC_FL)
+		new_fl |= S_SYNC;
+	if (flags & FS_APPEND_FL)
+		new_fl |= S_APPEND;
+	if (flags & FS_IMMUTABLE_FL)
+		new_fl |= S_IMMUTABLE;
+	if (flags & FS_NOATIME_FL)
+		new_fl |= S_NOATIME;
+	if (flags & FS_DIRSYNC_FL)
+		new_fl |= S_DIRSYNC;
+	inode_set_flags(inode, new_fl, S_SYNC | S_APPEND | S_IMMUTABLE |
+			S_NOATIME | S_DIRSYNC);
+}
+
+static int ssdfs_inode_setops(struct inode *inode)
+{
+	if (S_ISREG(inode->i_mode)) {
+		inode->i_op = &ssdfs_file_inode_operations;
+		inode->i_fop = &ssdfs_file_operations;
+		inode->i_mapping->a_ops = &ssdfs_aops;
+	} else if (S_ISDIR(inode->i_mode)) {
+		inode->i_op = &ssdfs_dir_inode_operations;
+		inode->i_fop = &ssdfs_dir_operations;
+		inode->i_mapping->a_ops = &ssdfs_aops;
+	} else if (S_ISLNK(inode->i_mode)) {
+		inode->i_op = &ssdfs_symlink_inode_operations;
+		inode->i_mapping->a_ops = &ssdfs_aops;
+		inode_nohighmem(inode);
+	} else if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode) ||
+		   S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode)) {
+		inode->i_op = &ssdfs_special_inode_operations;
+		init_special_inode(inode, inode->i_mode, inode->i_rdev);
+	} else {
+		SSDFS_ERR("bogus i_mode %o for ino %lu\n",
+			  inode->i_mode, (unsigned long)inode->i_ino);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int ssdfs_read_inode(struct inode *inode)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_btree_search *search;
+	struct ssdfs_inode *raw_inode;
+	size_t raw_inode_size;
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	u16 private_flags;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu\n", (unsigned long)inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	search = ssdfs_btree_search_alloc();
+	if (!search) {
+		SSDFS_ERR("fail to allocate btree search object\n");
+		return -ENOMEM;
+	}
+
+	ssdfs_btree_search_init(search);
+
+	err = ssdfs_inodes_btree_find(fsi->inodes_tree, inode->i_ino, search);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to find the raw inode: "
+			  "ino %lu, err %d\n",
+			  inode->i_ino, err);
+		goto finish_read_inode;
+	}
+
+	switch (search->result.state) {
+	case SSDFS_BTREE_SEARCH_VALID_ITEM:
+		/* expected state */
+		break;
+
+	default:
+		err = -ERANGE;
+		SSDFS_ERR("invalid result state %#x\n",
+			  search->result.state);
+		goto finish_read_inode;
+	}
+
+	switch (search->result.raw_buf.state) {
+	case SSDFS_BTREE_SEARCH_INLINE_BUFFER:
+	case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER:
+		/* expected state */
+		break;
+
+	default:
+		err = -ERANGE;
+		SSDFS_ERR("invalid buffer state %#x\n",
+			  search->result.raw_buf.state);
+		goto finish_read_inode;
+	}
+
+	if (!search->result.raw_buf.place.ptr) {
+		err = -ERANGE;
+		SSDFS_ERR("empty result buffer pointer\n");
+		goto finish_read_inode;
+	}
+
+	if (search->result.raw_buf.items_count == 0) {
+		err = -ERANGE;
+		SSDFS_ERR("items_in_buffer %u\n",
+			  search->result.raw_buf.items_count);
+		goto finish_read_inode;
+	}
+
+	raw_inode = (struct ssdfs_inode *)search->result.raw_buf.place.ptr;
+	raw_inode_size =
+	    search->result.raw_buf.size / search->result.raw_buf.items_count;
+
+	if (!is_raw_inode_checksum_correct(fsi, raw_inode, raw_inode_size)) {
+		err = -EIO;
+		SSDFS_ERR("invalid inode's checksum: ino %lu\n",
+			  inode->i_ino);
+		goto finish_read_inode;
+	}
+
+	if (le16_to_cpu(raw_inode->magic) != SSDFS_INODE_MAGIC) {
+		err = -EIO;
+		SSDFS_ERR("invalid inode magic %#x\n",
+			  le16_to_cpu(raw_inode->magic));
+		goto finish_read_inode;
+	}
+
+	if (le64_to_cpu(raw_inode->ino) != inode->i_ino) {
+		err = -EIO;
+		SSDFS_ERR("raw_inode->ino %llu != i_ino %lu\n",
+			  le64_to_cpu(raw_inode->ino),
+			  inode->i_ino);
+		goto finish_read_inode;
+	}
+
+	inode->i_mode = le16_to_cpu(raw_inode->mode);
+	ii->flags = le32_to_cpu(raw_inode->flags);
+	i_uid_write(inode, le32_to_cpu(raw_inode->uid));
+	i_gid_write(inode, le32_to_cpu(raw_inode->gid));
+	set_nlink(inode, le32_to_cpu(raw_inode->refcount));
+
+	inode_set_atime(inode, le64_to_cpu(raw_inode->atime),
+			le32_to_cpu(raw_inode->atime_nsec));
+	inode_set_ctime(inode, le64_to_cpu(raw_inode->ctime),
+			le32_to_cpu(raw_inode->ctime_nsec));
+	inode_set_mtime(inode, le64_to_cpu(raw_inode->mtime),
+			le32_to_cpu(raw_inode->mtime_nsec));
+
+	ii->birthtime.tv_sec = le64_to_cpu(raw_inode->birthtime);
+	ii->birthtime.tv_nsec = le32_to_cpu(raw_inode->birthtime_nsec);
+	ii->raw_inode_size = fsi->raw_inode_size;
+
+	inode->i_generation = (u32)le64_to_cpu(raw_inode->generation);
+
+	inode->i_size = le64_to_cpu(raw_inode->size);
+	inode->i_blkbits = fsi->log_pagesize;
+	inode->i_blocks = le64_to_cpu(raw_inode->blocks);
+
+	private_flags = le16_to_cpu(raw_inode->private_flags);
+	atomic_set(&ii->private_flags, private_flags);
+	if (private_flags & ~SSDFS_INODE_PRIVATE_FLAGS_MASK) {
+		err = -EIO;
+		SSDFS_ERR("invalid set of private_flags %#x\n",
+			  private_flags);
+		goto finish_read_inode;
+	}
+
+	if (fsi->pagesize > SSDFS_8KB)
+		mapping_set_large_folios(inode->i_mapping);
+
+	err = ssdfs_inode_setops(inode);
+	if (unlikely(err))
+		goto finish_read_inode;
+
+	down_write(&ii->lock);
+
+	ii->parent_ino = le64_to_cpu(raw_inode->parent_ino);
+	ssdfs_set_inode_flags(inode);
+	ii->name_hash = le64_to_cpu(raw_inode->hash_code);
+	ii->name_len = le16_to_cpu(raw_inode->name_len);
+
+	ssdfs_memcpy(&ii->raw_inode,
+		     0, sizeof(struct ssdfs_inode),
+		     raw_inode,
+		     0, sizeof(struct ssdfs_inode),
+		     sizeof(struct ssdfs_inode));
+
+	if (S_ISREG(inode->i_mode)) {
+		if (private_flags & ~SSDFS_IFREG_PRIVATE_FLAG_MASK) {
+			err = -EIO;
+			SSDFS_ERR("regular file: invalid private flags %#x\n",
+				  private_flags);
+			goto unlock_mutable_fields;
+		}
+
+		if (is_ssdfs_file_inline(ii)) {
+			err = ssdfs_allocate_inline_file_buffer(inode);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to allocate inline buffer\n");
+				goto unlock_mutable_fields;
+			}
+
+			/*
+			 * TODO: pre-fetch file's content in buffer
+			 *       (if inode size > 256 bytes)
+			 */
+		} else if (private_flags & SSDFS_INODE_HAS_INLINE_EXTENTS ||
+			   private_flags & SSDFS_INODE_HAS_EXTENTS_BTREE) {
+			err = ssdfs_extents_tree_create(fsi, ii);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to create the extents tree: "
+					  "ino %lu, err %d\n",
+					  inode->i_ino, err);
+				goto unlock_mutable_fields;
+			}
+
+			err = ssdfs_extents_tree_init(fsi, ii);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to init the extents tree: "
+					  "ino %lu, err %d\n",
+					  inode->i_ino, err);
+				goto unlock_mutable_fields;
+			}
+		}
+
+		if (private_flags & SSDFS_INODE_HAS_INLINE_XATTR ||
+		    private_flags & SSDFS_INODE_HAS_XATTR_BTREE) {
+			err = ssdfs_xattrs_tree_create(fsi, ii);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to create the xattrs tree: "
+					  "ino %lu, err %d\n",
+					  inode->i_ino, err);
+				goto unlock_mutable_fields;
+			}
+
+			err = ssdfs_xattrs_tree_init(fsi, ii);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to init the xattrs tree: "
+					  "ino %lu, err %d\n",
+					  inode->i_ino, err);
+				goto unlock_mutable_fields;
+			}
+		}
+	} else if (S_ISDIR(inode->i_mode)) {
+		if (private_flags & ~SSDFS_IFDIR_PRIVATE_FLAG_MASK) {
+			err = -EIO;
+			SSDFS_ERR("folder: invalid private flags %#x\n",
+				  private_flags);
+			goto unlock_mutable_fields;
+		}
+
+		if (private_flags & SSDFS_INODE_HAS_INLINE_DENTRIES ||
+		    private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) {
+			err = ssdfs_dentries_tree_create(fsi, ii);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to create the dentries tree: "
+					  "ino %lu, err %d\n",
+					  inode->i_ino, err);
+				goto unlock_mutable_fields;
+			}
+
+			err = ssdfs_dentries_tree_init(fsi, ii);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to init the dentries tree: "
+					  "ino %lu, err %d\n",
+					  inode->i_ino, err);
+				goto unlock_mutable_fields;
+			}
+		}
+
+		if (private_flags & SSDFS_INODE_HAS_INLINE_XATTR ||
+		    private_flags & SSDFS_INODE_HAS_XATTR_BTREE) {
+			err = ssdfs_xattrs_tree_create(fsi, ii);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to create the xattrs tree: "
+					  "ino %lu, err %d\n",
+					  inode->i_ino, err);
+				goto unlock_mutable_fields;
+			}
+
+			err = ssdfs_xattrs_tree_init(fsi, ii);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to init the xattrs tree: "
+					  "ino %lu, err %d\n",
+					  inode->i_ino, err);
+				goto unlock_mutable_fields;
+			}
+		}
+	} else if (S_ISLNK(inode->i_mode) ||
+		   S_ISCHR(inode->i_mode) ||
+		   S_ISBLK(inode->i_mode) ||
+		   S_ISFIFO(inode->i_mode) ||
+		   S_ISSOCK(inode->i_mode)) {
+		/* do nothing */
+	} else {
+		err = -EINVAL;
+		SSDFS_ERR("bogus i_mode %o for ino %lu\n",
+			  inode->i_mode, (unsigned long)inode->i_ino);
+		goto unlock_mutable_fields;
+	}
+
+unlock_mutable_fields:
+	up_write(&ii->lock);
+
+finish_read_inode:
+	ssdfs_btree_search_free(search);
+	return err;
+}
+
+struct inode *ssdfs_iget(struct super_block *sb, ino_t ino)
+{
+	struct inode *inode;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu\n", (unsigned long)ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	inode = iget_locked(sb, ino);
+	if (unlikely(!inode)) {
+		err = -ENOMEM;
+		SSDFS_ERR("unable to obtain or to allocate inode %lu, err %d\n",
+			  (unsigned long)ino, err);
+		return ERR_PTR(err);
+	}
+
+	if (!(inode_state_read_once(inode) & I_NEW)) {
+		trace_ssdfs_iget(inode);
+		return inode;
+	}
+
+	err = ssdfs_read_inode(inode);
+	if (unlikely(err)) {
+		SSDFS_ERR("unable to read inode %lu, err %d\n",
+			  (unsigned long)ino, err);
+		goto bad_inode;
+	}
+
+	unlock_new_inode(inode);
+	trace_ssdfs_iget(inode);
+	return inode;
+
+bad_inode:
+	iget_failed(inode);
+	trace_ssdfs_iget_exit(inode, err);
+	return ERR_PTR(err);
+}
+
+static void ssdfs_init_raw_inode(struct ssdfs_inode_info *ii)
+{
+	struct ssdfs_inode *ri = &ii->raw_inode;
+
+	ri->magic = cpu_to_le16(SSDFS_INODE_MAGIC);
+	ri->mode = cpu_to_le16(ii->vfs_inode.i_mode);
+	ri->flags = cpu_to_le32(ii->flags);
+	ri->uid = cpu_to_le32(i_uid_read(&ii->vfs_inode));
+	ri->gid = cpu_to_le32(i_gid_read(&ii->vfs_inode));
+	ri->atime = cpu_to_le64(inode_get_atime_sec(&ii->vfs_inode));
+	ri->ctime = cpu_to_le64(inode_get_ctime_sec(&ii->vfs_inode));
+	ri->mtime = cpu_to_le64(inode_get_mtime_sec(&ii->vfs_inode));
+	ri->atime_nsec = cpu_to_le32(inode_get_atime_nsec(&ii->vfs_inode));
+	ri->ctime_nsec = cpu_to_le32(inode_get_ctime_nsec(&ii->vfs_inode));
+	ri->mtime_nsec = cpu_to_le32(inode_get_mtime_nsec(&ii->vfs_inode));
+	ri->birthtime = cpu_to_le64(ii->birthtime.tv_sec);
+	ri->birthtime_nsec = cpu_to_le32(ii->birthtime.tv_nsec);
+	ri->generation = cpu_to_le64(ii->vfs_inode.i_generation);
+	ri->size = cpu_to_le64(i_size_read(&ii->vfs_inode));
+	ri->blocks = cpu_to_le64(ii->vfs_inode.i_blocks);
+	ri->parent_ino = cpu_to_le64(ii->parent_ino);
+	ri->refcount = cpu_to_le32(ii->vfs_inode.i_nlink);
+	ri->checksum = 0;
+	ri->ino = cpu_to_le64(ii->vfs_inode.i_ino);
+	ri->hash_code = cpu_to_le64(ii->name_hash);
+	ri->name_len = cpu_to_le16(ii->name_len);
+}
+
+static void ssdfs_init_inode(struct mnt_idmap *idmap,
+			     struct inode *dir,
+			     struct inode *inode,
+			     umode_t mode,
+			     ino_t ino,
+			     const struct qstr *qstr)
+{
+	struct super_block *sb = dir->i_sb;
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+
+	inode->i_ino = ino;
+	ii->parent_ino = dir->i_ino;
+	ii->birthtime = current_time(inode);
+	ii->raw_inode_size = fsi->raw_inode_size;
+	inode_set_atime_to_ts(inode, ii->birthtime);
+	inode_set_mtime_to_ts(inode, ii->birthtime);
+	inode_set_ctime_to_ts(inode, ii->birthtime);
+	inode_init_owner(idmap, inode, dir, mode);
+	ii->flags = ssdfs_mask_flags(mode,
+			 SSDFS_I(dir)->flags & SSDFS_FL_INHERITED);
+	ssdfs_set_inode_flags(inode);
+	inode->i_generation = get_random_u32();
+	inode->i_blkbits = fsi->log_pagesize;
+	i_size_write(inode, 0);
+	inode->i_blocks = 0;
+	set_nlink(inode, 1);
+
+	down_write(&ii->lock);
+	ii->name_hash = ssdfs_generate_name_hash(qstr);
+	ii->name_len = (u16)qstr->len;
+	ssdfs_init_raw_inode(ii);
+	up_write(&ii->lock);
+
+	if (fsi->pagesize > SSDFS_8KB)
+		mapping_set_large_folios(inode->i_mapping);
+}
+
+struct inode *ssdfs_new_inode(struct mnt_idmap *idmap,
+			      struct inode *dir, umode_t mode,
+			      const struct qstr *qstr)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(dir->i_sb);
+	struct super_block *sb = dir->i_sb;
+	struct inode *inode;
+	struct ssdfs_btree_search *search;
+	struct ssdfs_inodes_btree_info *itree;
+	ino_t ino;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("dir_ino %lu, mode %o\n",
+		  (unsigned long)dir->i_ino, mode);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	itree = fsi->inodes_tree;
+
+	search = ssdfs_btree_search_alloc();
+	if (!search) {
+		err = -ENOMEM;
+		SSDFS_ERR("fail to allocate btree search object\n");
+		goto failed_new_inode;
+	}
+
+	ssdfs_btree_search_init(search);
+	err = ssdfs_inodes_btree_allocate(itree, &ino, search);
+	ssdfs_btree_search_free(search);
+
+	if (err == -ENOSPC) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("unable to allocate an inode: "
+			  "dir_ino %lu, err %d\n",
+			  (unsigned long)dir->i_ino, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		goto failed_new_inode;
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to allocate an inode: "
+			  "dir_ino %lu, err %d\n",
+			  (unsigned long)dir->i_ino, err);
+		goto failed_new_inode;
+	}
+
+	inode = new_inode(sb);
+	if (unlikely(!inode)) {
+		err = -ENOMEM;
+		SSDFS_ERR("unable to allocate inode: err %d\n", err);
+		goto failed_new_inode;
+	}
+
+	ssdfs_init_inode(idmap, dir, inode, mode, ino, qstr);
+
+	err = ssdfs_inode_setops(inode);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to set inode's operations: "
+			  "err %d\n", err);
+		goto bad_inode;
+	}
+
+	if (insert_inode_locked(inode) < 0) {
+		err = -EIO;
+		SSDFS_ERR("inode number already in use: "
+			  "ino %lu\n",
+			  (unsigned long) ino);
+		goto bad_inode;
+	}
+
+	err = ssdfs_init_acl(inode, dir);
+	if (err) {
+		SSDFS_ERR("fail to init ACL: "
+			  "err %d\n", err);
+		goto fail_drop;
+	}
+
+	err = ssdfs_init_security(inode, dir, qstr);
+	if (err) {
+		SSDFS_ERR("fail to init security xattr: "
+			  "err %d\n", err);
+		goto fail_drop;
+	}
+
+	mark_inode_dirty(inode);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("new inode %lu is created\n",
+		  ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	trace_ssdfs_inode_new(inode);
+	return inode;
+
+fail_drop:
+	trace_ssdfs_inode_new_exit(inode, err);
+	clear_nlink(inode);
+	unlock_new_inode(inode);
+	iput(inode);
+	return ERR_PTR(err);
+
+bad_inode:
+	trace_ssdfs_inode_new_exit(inode, err);
+	make_bad_inode(inode);
+	iput(inode);
+
+failed_new_inode:
+	return ERR_PTR(err);
+}
+
+int ssdfs_getattr(struct mnt_idmap *idmap,
+		  const struct path *path, struct kstat *stat,
+		  u32 request_mask, unsigned int query_flags)
+{
+	struct inode *inode = d_inode(path->dentry);
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	unsigned int flags;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu\n", (unsigned long)inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	flags = ii->flags & SSDFS_FL_USER_VISIBLE;
+	if (flags & SSDFS_APPEND_FL)
+		stat->attributes |= STATX_ATTR_APPEND;
+	if (flags & SSDFS_COMPR_FL)
+		stat->attributes |= STATX_ATTR_COMPRESSED;
+	if (flags & SSDFS_IMMUTABLE_FL)
+		stat->attributes |= STATX_ATTR_IMMUTABLE;
+	if (flags & SSDFS_NODUMP_FL)
+		stat->attributes |= STATX_ATTR_NODUMP;
+
+	stat->attributes_mask |= (STATX_ATTR_APPEND |
+				  STATX_ATTR_COMPRESSED |
+				  STATX_ATTR_ENCRYPTED |
+				  STATX_ATTR_IMMUTABLE |
+				  STATX_ATTR_NODUMP);
+
+	generic_fillattr(idmap, request_mask, inode, stat);
+	return 0;
+}
+
+static int ssdfs_truncate(struct inode *inode)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	struct ssdfs_extents_btree_info *tree = NULL;
+	struct timespec64 cur_time;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu\n", (unsigned long)inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
+	      S_ISLNK(inode->i_mode)))
+		return -EINVAL;
+
+	if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
+		return -EPERM;
+
+	if (is_ssdfs_file_inline(ii)) {
+		loff_t newsize = i_size_read(inode);
+		size_t inline_capacity =
+				ssdfs_inode_inline_file_capacity(inode);
+
+		if (newsize > inline_capacity) {
+			SSDFS_ERR("newsize %llu > inline_capacity %zu\n",
+				  (u64)newsize, inline_capacity);
+			return -E2BIG;
+		} else if (newsize == inline_capacity) {
+			/* do nothing */
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("newsize %llu == inline_capacity %zu\n",
+				  (u64)newsize, inline_capacity);
+#endif /* CONFIG_SSDFS_DEBUG */
+		} else if (newsize > 0) {
+			loff_t size = inline_capacity - newsize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+			BUG_ON(!ii->inline_file);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			memset((u8 *)ii->inline_file + newsize, 0, size);
+		}
+	} else {
+		tree = SSDFS_EXTREE(ii);
+		if (!tree) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("extents tree is absent: ino %lu\n",
+				  ii->vfs_inode.i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return -ENOENT;
+		}
+
+		switch (atomic_read(&tree->state)) {
+		case SSDFS_EXTENTS_BTREE_DIRTY:
+			down_write(&ii->lock);
+			err = ssdfs_extents_tree_flush(fsi, ii);
+			up_write(&ii->lock);
+
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to flush extents tree: "
+					  "ino %lu, err %d\n",
+					  inode->i_ino, err);
+				return err;
+			}
+			break;
+
+		default:
+			/* do nothing */
+			break;
+		}
+
+		err = ssdfs_extents_tree_truncate(inode);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to truncate extents tree: "
+				  "err %d\n",
+				  err);
+			return err;
+		}
+	}
+
+	cur_time = current_time(inode);
+	inode_set_mtime_to_ts(inode, cur_time);
+	inode_set_ctime_to_ts(inode, cur_time);
+
+	mark_inode_dirty(inode);
+
+	return 0;
+}
+
+static int ssdfs_setsize(struct inode *inode, struct iattr *attr)
+{
+	loff_t oldsize = i_size_read(inode);
+	loff_t newsize = attr->ia_size;
+	struct timespec64 cur_time;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu\n", (unsigned long)inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
+	    S_ISLNK(inode->i_mode)))
+		return -EINVAL;
+
+	if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
+		return -EPERM;
+
+	inode_dio_wait(inode);
+
+	if (newsize > oldsize) {
+		i_size_write(inode, newsize);
+		pagecache_isize_extended(inode, oldsize, newsize);
+	} else {
+		truncate_setsize(inode, newsize);
+
+		err = ssdfs_truncate(inode);
+		if (err)
+			return err;
+	}
+
+	cur_time = current_time(inode);
+	inode_set_mtime_to_ts(inode, cur_time);
+	inode_set_ctime_to_ts(inode, cur_time);
+
+	mark_inode_dirty(inode);
+	return 0;
+}
+
+int ssdfs_setattr(struct mnt_idmap *idmap,
+		  struct dentry *dentry, struct iattr *attr)
+{
+	struct inode *inode = dentry->d_inode;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu\n", (unsigned long)inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = setattr_prepare(idmap, dentry, attr);
+	if (err)
+		return err;
+
+	if (S_ISREG(inode->i_mode) &&
+	    attr->ia_valid & ATTR_SIZE &&
+	    attr->ia_size != inode->i_size) {
+		err = ssdfs_setsize(inode, attr);
+		if (err)
+			return err;
+	}
+
+	if (attr->ia_valid) {
+		setattr_copy(idmap, inode, attr);
+		mark_inode_dirty(inode);
+
+		if (attr->ia_valid & ATTR_MODE) {
+			err = posix_acl_chmod(idmap,
+					      dentry, inode->i_mode);
+		}
+	}
+
+	return err;
+}
+
+/*
+ * This method does all fs work to be done when in-core inode
+ * is about to be gone, for whatever reason.
+ */
+void ssdfs_evict_inode(struct inode *inode)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_xattrs_btree_info *xattrs_tree;
+	ino_t ino = inode->i_ino;
+	bool want_delete = false;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu mode %o count %d nlink %u\n",
+		  ino, inode->i_mode,
+		  atomic_read(&inode->i_count),
+		  inode->i_nlink);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	xattrs_tree = SSDFS_XATTREE(SSDFS_I(inode));
+
+	if (!inode->i_nlink) {
+		err = filemap_flush(inode->i_mapping);
+		if (err) {
+			SSDFS_WARN("inode %lu flush error: %d\n",
+				   ino, err);
+		}
+	}
+
+	err = filemap_fdatawait(inode->i_mapping);
+	if (err) {
+		SSDFS_WARN("inode %lu fdatawait error: %d\n",
+			   ino, err);
+		ssdfs_clear_dirty_folios(inode->i_mapping);
+	}
+
+	if (!inode->i_nlink && !is_bad_inode(inode))
+		want_delete = true;
+	else
+		want_delete = false;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu mode %o count %d nlink %u, "
+		  "is_bad_inode %#x, want_delete %#x\n",
+		  ino, inode->i_mode,
+		  atomic_read(&inode->i_count),
+		  inode->i_nlink,
+		  is_bad_inode(inode),
+		  want_delete);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	trace_ssdfs_inode_evict(inode);
+
+	truncate_inode_pages_final(&inode->i_data);
+
+	if (want_delete) {
+		sb_start_intwrite(inode->i_sb);
+
+		i_size_write(inode, 0);
+
+		err = ssdfs_truncate(inode);
+		if (err == -ENOENT) {
+			err = 0;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("extents tree is absent: ino %lu\n",
+				  ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+		} else if (err) {
+			SSDFS_WARN("fail to truncate inode: "
+				   "ino %lu, err %d\n",
+				   ino, err);
+		}
+
+		if (xattrs_tree) {
+			err = ssdfs_xattrs_tree_delete_all(xattrs_tree);
+			if (err) {
+				SSDFS_WARN("fail to truncate xattrs tree: "
+					   "ino %lu, err %d\n",
+					   ino, err);
+			}
+		}
+	}
+
+	clear_inode(inode);
+
+	if (want_delete) {
+		err = ssdfs_inodes_btree_delete(fsi->inodes_tree, ino);
+		if (err) {
+			SSDFS_WARN("fail to deallocate raw inode: "
+				   "ino %lu, err %d\n",
+				   ino, err);
+		}
+
+		sb_end_intwrite(inode->i_sb);
+	}
+}
+
+/*
+ * This method is called when the VFS needs to write an
+ * inode to disc
+ */
+int ssdfs_write_inode(struct inode *inode, struct writeback_control *wbc)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	struct ssdfs_inode *ri = &ii->raw_inode;
+	struct ssdfs_inodes_btree_info *itree;
+	struct ssdfs_btree_search *search;
+	struct ssdfs_dentries_btree_descriptor dentries_btree;
+	bool has_save_dentries_desc = true;
+	size_t dentries_desc_size =
+		sizeof(struct ssdfs_dentries_btree_descriptor);
+	struct ssdfs_extents_btree_descriptor extents_btree;
+	bool has_save_extents_desc = true;
+	size_t extents_desc_size =
+		sizeof(struct ssdfs_extents_btree_descriptor);
+	struct ssdfs_xattr_btree_descriptor xattr_btree;
+	bool has_save_xattrs_desc = true;
+	size_t xattr_desc_size =
+		sizeof(struct ssdfs_xattr_btree_descriptor);
+	int private_flags;
+	size_t raw_inode_size;
+	ino_t ino;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu\n", (unsigned long)inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	down_read(&fsi->volume_sem);
+	raw_inode_size = le16_to_cpu(fsi->vs->inodes_btree.desc.item_size);
+	ssdfs_memcpy(&dentries_btree, 0, dentries_desc_size,
+		     &fsi->vh->dentries_btree, 0, dentries_desc_size,
+		     dentries_desc_size);
+	ssdfs_memcpy(&extents_btree, 0, extents_desc_size,
+		     &fsi->vh->extents_btree, 0, extents_desc_size,
+		     extents_desc_size);
+	ssdfs_memcpy(&xattr_btree, 0, xattr_desc_size,
+		     &fsi->vh->xattr_btree, 0, xattr_desc_size,
+		     xattr_desc_size);
+	up_read(&fsi->volume_sem);
+
+	if (raw_inode_size != sizeof(struct ssdfs_inode)) {
+		SSDFS_WARN("raw_inode_size %zu != size %zu\n",
+			   raw_inode_size,
+			   sizeof(struct ssdfs_inode));
+		return -ERANGE;
+	}
+
+	itree = fsi->inodes_tree;
+	ino = inode->i_ino;
+
+	search = ssdfs_btree_search_alloc();
+	if (!search) {
+		SSDFS_ERR("fail to allocate btree search object\n");
+		return -ENOMEM;
+	}
+
+	ssdfs_btree_search_init(search);
+
+	err = ssdfs_inodes_btree_find(itree, ino, search);
+	if (unlikely(err)) {
+#ifdef CONFIG_SSDFS_TESTING
+		err = 0;
+		SSDFS_DBG("fail to find inode: "
+			  "ino %lu, err %d\n",
+			  ino, err);
+#else
+		SSDFS_ERR("fail to find inode: "
+			  "ino %lu, err %d\n",
+			  ino, err);
+#endif /* CONFIG_SSDFS_TESTING */
+		goto free_search_object;
+	}
+
+	down_write(&ii->lock);
+
+	ssdfs_init_raw_inode(ii);
+
+	if (S_ISREG(inode->i_mode) && ii->extents_tree) {
+		err = ssdfs_extents_tree_flush(fsi, ii);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to flush extents tree: "
+				  "ino %lu, err %d\n",
+				  inode->i_ino, err);
+			goto finish_write_inode;
+		}
+
+		if (memcmp(&extents_btree, &ii->extents_tree->desc,
+						extents_desc_size) != 0) {
+			ssdfs_memcpy(&extents_btree,
+				     0, extents_desc_size,
+				     &ii->extents_tree->desc,
+				     0, extents_desc_size,
+				     extents_desc_size);
+			has_save_extents_desc = true;
+		} else
+			has_save_extents_desc = false;
+	} else if (S_ISDIR(inode->i_mode) && ii->dentries_tree) {
+		err = ssdfs_dentries_tree_flush(fsi, ii);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to flush dentries tree: "
+				  "ino %lu, err %d\n",
+				  inode->i_ino, err);
+			goto finish_write_inode;
+		}
+
+		if (memcmp(&dentries_btree, &ii->dentries_tree->desc,
+						dentries_desc_size) != 0) {
+			ssdfs_memcpy(&dentries_btree,
+				     0, dentries_desc_size,
+				     &ii->dentries_tree->desc,
+				     0, dentries_desc_size,
+				     dentries_desc_size);
+			has_save_dentries_desc = true;
+		} else
+			has_save_dentries_desc = false;
+	}
+
+	if (ii->xattrs_tree) {
+		err = ssdfs_xattrs_tree_flush(fsi, ii);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to flush xattrs tree: "
+				  "ino %lu, err %d\n",
+				  inode->i_ino, err);
+			goto finish_write_inode;
+		}
+
+		if (memcmp(&xattr_btree, &ii->xattrs_tree->desc,
+						xattr_desc_size) != 0) {
+			ssdfs_memcpy(&xattr_btree,
+				     0, xattr_desc_size,
+				     &ii->xattrs_tree->desc,
+				     0, xattr_desc_size,
+				     xattr_desc_size);
+			has_save_xattrs_desc = true;
+		} else
+			has_save_xattrs_desc = false;
+	}
+
+	private_flags = atomic_read(&ii->private_flags);
+	if (private_flags & ~SSDFS_INODE_PRIVATE_FLAGS_MASK) {
+		err = -ERANGE;
+		SSDFS_WARN("invalid set of private_flags %#x\n",
+			   private_flags);
+		goto finish_write_inode;
+	} else
+		ri->private_flags = cpu_to_le16((u16)private_flags);
+
+	ri->checksum = ssdfs_crc32_le(ri, raw_inode_size);
+
+	switch (search->result.raw_buf.state) {
+	case SSDFS_BTREE_SEARCH_INLINE_BUFFER:
+	case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER:
+		/* expected state */
+		break;
+
+	default:
+		err = -ERANGE;
+		SSDFS_ERR("invalid result's buffer state: "
+			  "%#x\n",
+			  search->result.raw_buf.state);
+		goto finish_write_inode;
+	}
+
+	if (!search->result.raw_buf.place.ptr) {
+		err = -ERANGE;
+		SSDFS_ERR("invalid buffer\n");
+		goto finish_write_inode;
+	}
+
+	if (search->result.raw_buf.size < raw_inode_size) {
+		err = -ERANGE;
+		SSDFS_ERR("buf_size %zu < raw_inode_size %zu\n",
+			  search->result.raw_buf.size,
+			  raw_inode_size);
+		goto finish_write_inode;
+	}
+
+	if (search->result.raw_buf.items_count != 1) {
+		SSDFS_WARN("unexpected value: "
+			   "items_in_buffer %u\n",
+			   search->result.raw_buf.items_count);
+	}
+
+	ssdfs_memcpy(search->result.raw_buf.place.ptr,
+		     0, search->result.raw_buf.size,
+		     ri,
+		     0, raw_inode_size,
+		     raw_inode_size);
+
+finish_write_inode:
+	up_write(&ii->lock);
+
+	if (unlikely(err))
+		goto free_search_object;
+
+	if (has_save_dentries_desc || has_save_extents_desc ||
+						has_save_xattrs_desc) {
+		down_write(&fsi->volume_sem);
+		ssdfs_memcpy(&fsi->vh->dentries_btree,
+			     0, dentries_desc_size,
+			     &dentries_btree,
+			     0, dentries_desc_size,
+			     dentries_desc_size);
+		ssdfs_memcpy(&fsi->vh->extents_btree,
+			     0, extents_desc_size,
+			     &extents_btree,
+			     0, extents_desc_size,
+			     extents_desc_size);
+		ssdfs_memcpy(&fsi->vh->xattr_btree,
+			     0, xattr_desc_size,
+			     &xattr_btree,
+			     0, xattr_desc_size,
+			     xattr_desc_size);
+		up_write(&fsi->volume_sem);
+	}
+
+	err = ssdfs_inodes_btree_change(itree, ino, search);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to change inode: "
+			  "ino %lu, err %d\n",
+			  ino, err);
+		goto free_search_object;
+	}
+
+free_search_object:
+	ssdfs_btree_search_free(search);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished: ino %lu, err %d\n",
+		  (unsigned long)inode->i_ino, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * This method is called when the VFS needs
+ * to get filesystem statistics
+ */
+int ssdfs_statfs(struct dentry *dentry, struct kstatfs *buf)
+{
+	struct super_block *sb = dentry->d_sb;
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(sb);
+#ifdef CONFIG_SSDFS_BLOCK_DEVICE
+	u64 id = huge_encode_dev(sb->s_bdev->bd_dev);
+#endif
+	u64 nsegs;
+	u32 pages_per_seg;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu\n", (unsigned long)dentry->d_inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	buf->f_type = SSDFS_SUPER_MAGIC;
+	buf->f_bsize = sb->s_blocksize;
+	buf->f_frsize = buf->f_bsize;
+
+	mutex_lock(&fsi->resize_mutex);
+	nsegs = fsi->nsegs;
+	mutex_unlock(&fsi->resize_mutex);
+
+	pages_per_seg = fsi->pages_per_seg;
+	buf->f_blocks = nsegs * pages_per_seg;
+
+	spin_lock(&fsi->volume_state_lock);
+	buf->f_bfree = fsi->free_pages;
+	spin_unlock(&fsi->volume_state_lock);
+
+	buf->f_bavail = buf->f_bfree;
+
+	spin_lock(&fsi->inodes_tree->lock);
+	buf->f_files = fsi->inodes_tree->allocated_inodes;
+	buf->f_ffree = fsi->inodes_tree->free_inodes;
+	spin_unlock(&fsi->inodes_tree->lock);
+
+	buf->f_namelen = SSDFS_MAX_NAME_LEN;
+
+#ifdef CONFIG_SSDFS_MTD_DEVICE
+	buf->f_fsid.val[0] = SSDFS_SUPER_MAGIC;
+	buf->f_fsid.val[1] = fsi->mtd->index;
+#elif defined(CONFIG_SSDFS_BLOCK_DEVICE)
+	buf->f_fsid.val[0] = (u32)id;
+	buf->f_fsid.val[1] = (u32)(id >> 32);
+#else
+	BUILD_BUG();
+#endif
+
+	return 0;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 76/79] ssdfs: implement directory operations support
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (30 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 75/79] ssdfs: implement inode operations support Viacheslav Dubeyko
@ 2026-03-16  2:17 ` Viacheslav Dubeyko
  2026-03-16  2:18 ` [PATCH v2 77/79] ssdfs: implement file " Viacheslav Dubeyko
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

Implement directory operations.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/dir.c | 2197 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 2197 insertions(+)
 create mode 100644 fs/ssdfs/dir.c

diff --git a/fs/ssdfs/dir.c b/fs/ssdfs/dir.c
new file mode 100644
index 000000000000..2565bfbfdf57
--- /dev/null
+++ b/fs/ssdfs/dir.c
@@ -0,0 +1,2197 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/dir.c - folder operations.
+ *
+ * Copyright (c) 2019-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/rwsem.h>
+#include <linux/slab.h>
+#include <linux/pagevec.h>
+#include <linux/sched/signal.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "folio_vector.h"
+#include "ssdfs.h"
+#include "btree_search.h"
+#include "btree_node.h"
+#include "btree.h"
+#include "dentries_tree.h"
+#include "shared_dictionary.h"
+#include "xattr.h"
+#include "acl.h"
+
+#include <trace/events/ssdfs.h>
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+atomic64_t ssdfs_dir_folio_leaks;
+atomic64_t ssdfs_dir_memory_leaks;
+atomic64_t ssdfs_dir_cache_leaks;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+/*
+ * void ssdfs_dir_cache_leaks_increment(void *kaddr)
+ * void ssdfs_dir_cache_leaks_decrement(void *kaddr)
+ * void *ssdfs_dir_kmalloc(size_t size, gfp_t flags)
+ * void *ssdfs_dir_kzalloc(size_t size, gfp_t flags)
+ * void *ssdfs_dir_kcalloc(size_t n, size_t size, gfp_t flags)
+ * void ssdfs_dir_kfree(void *kaddr)
+ * struct folio *ssdfs_dir_alloc_folio(gfp_t gfp_mask,
+ *                                     unsigned int order)
+ * struct folio *ssdfs_dir_add_batch_folio(struct folio_batch *batch,
+ *                                         unsigned int order)
+ * void ssdfs_dir_free_folio(struct folio *folio)
+ * void ssdfs_dir_folio_batch_release(struct folio_batch *batch)
+ */
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_MEMORY_LEAKS_CHECKER_FNS(dir)
+#else
+	SSDFS_MEMORY_ALLOCATOR_FNS(dir)
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+void ssdfs_dir_memory_leaks_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_dir_folio_leaks, 0);
+	atomic64_set(&ssdfs_dir_memory_leaks, 0);
+	atomic64_set(&ssdfs_dir_cache_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+void ssdfs_dir_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_dir_folio_leaks) != 0) {
+		SSDFS_ERR("DIR: "
+			  "memory leaks include %lld folios\n",
+			  atomic64_read(&ssdfs_dir_folio_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_dir_memory_leaks) != 0) {
+		SSDFS_ERR("DIR: "
+			  "memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_dir_memory_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_dir_cache_leaks) != 0) {
+		SSDFS_ERR("DIR: "
+			  "caches suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_dir_cache_leaks));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+static unsigned char
+ssdfs_filetype_table[SSDFS_FT_MAX] = {
+	[SSDFS_FT_UNKNOWN]	= DT_UNKNOWN,
+	[SSDFS_FT_REG_FILE]	= DT_REG,
+	[SSDFS_FT_DIR]		= DT_DIR,
+	[SSDFS_FT_CHRDEV]	= DT_CHR,
+	[SSDFS_FT_BLKDEV]	= DT_BLK,
+	[SSDFS_FT_FIFO]		= DT_FIFO,
+	[SSDFS_FT_SOCK]		= DT_SOCK,
+	[SSDFS_FT_SYMLINK]	= DT_LNK,
+};
+
+int ssdfs_inode_by_name(struct inode *dir,
+			const struct qstr *child,
+			ino_t *ino)
+{
+	struct ssdfs_inode_info *ii = SSDFS_I(dir);
+	struct ssdfs_btree_search *search;
+	struct ssdfs_dir_entry *raw_dentry;
+	size_t dentry_size = sizeof(struct ssdfs_dir_entry);
+	int private_flags;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!rwsem_is_locked(&ii->lock));
+
+	SSDFS_DBG("dir_ino %lu, target_name %s\n",
+		  (unsigned long)dir->i_ino,
+		  child->name);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*ino = 0;
+	private_flags = atomic_read(&ii->private_flags);
+
+	if (private_flags & SSDFS_INODE_HAS_INLINE_DENTRIES ||
+	    private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) {
+		if (!ii->dentries_tree) {
+			err = -ERANGE;
+			SSDFS_WARN("dentries tree absent!!!\n");
+			goto finish_search_dentry;
+		}
+
+		search = ssdfs_btree_search_alloc();
+		if (!search) {
+			err = -ENOMEM;
+			SSDFS_ERR("fail to allocate btree search object\n");
+			goto finish_search_dentry;
+		}
+
+		ssdfs_btree_search_init(search);
+
+		err = ssdfs_dentries_tree_find(ii->dentries_tree,
+						child->name,
+						child->len,
+						search);
+		if (err == -ENODATA) {
+			err = -ENOENT;
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("dir %lu hasn't child %s\n",
+				  (unsigned long)dir->i_ino,
+				  child->name);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto dentry_is_not_available;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find the dentry: "
+				  "dir %lu, child %s\n",
+				  (unsigned long)dir->i_ino,
+				  child->name);
+			goto dentry_is_not_available;
+		}
+
+		if (search->result.state != SSDFS_BTREE_SEARCH_VALID_ITEM) {
+			err = -ERANGE;
+			SSDFS_ERR("invalid result's state %#x\n",
+				  search->result.state);
+			goto dentry_is_not_available;
+		}
+
+		switch (search->result.raw_buf.state) {
+		case SSDFS_BTREE_SEARCH_INLINE_BUFFER:
+		case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER:
+			/* expected state */
+			break;
+
+		default:
+			err = -ERANGE;
+			SSDFS_ERR("invalid buffer state %#x\n",
+				  search->result.raw_buf.state);
+			goto dentry_is_not_available;
+		}
+
+		if (!search->result.raw_buf.place.ptr) {
+			err = -ERANGE;
+			SSDFS_ERR("buffer is absent\n");
+			goto dentry_is_not_available;
+		}
+
+		if (search->result.raw_buf.size < dentry_size) {
+			err = -ERANGE;
+			SSDFS_ERR("buf_size %zu < dentry_size %zu\n",
+				  search->result.raw_buf.size,
+				  dentry_size);
+			goto dentry_is_not_available;
+		}
+
+		raw_dentry =
+		    (struct ssdfs_dir_entry *)search->result.raw_buf.place.ptr;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(le64_to_cpu(raw_dentry->ino) >= U32_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		*ino = (ino_t)le64_to_cpu(raw_dentry->ino);
+
+dentry_is_not_available:
+		ssdfs_btree_search_free(search);
+	} else {
+		err = -ENOENT;
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("dentries tree is absent: "
+			  "ino %lu\n",
+			  (unsigned long)dir->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+finish_search_dentry:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+	return err;
+}
+
+/*
+ * The ssdfs_lookup() is called when the VFS needs
+ * to look up an inode in a parent directory.
+ */
+static struct dentry *ssdfs_lookup(struct inode *dir, struct dentry *target,
+				  unsigned int flags)
+{
+	struct inode *inode = NULL;
+	ino_t ino;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("dir %lu, flags %#x\n", (unsigned long)dir->i_ino, flags);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (target->d_name.len > SSDFS_MAX_NAME_LEN)
+		return ERR_PTR(-ENAMETOOLONG);
+
+	down_read(&SSDFS_I(dir)->lock);
+	err = ssdfs_inode_by_name(dir, &target->d_name, &ino);
+	up_read(&SSDFS_I(dir)->lock);
+
+	if (err == -ENOENT) {
+		err = 0;
+		ino = 0;
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to find the inode: "
+			  "err %d\n",
+			  err);
+		return ERR_PTR(err);
+	}
+
+	if (ino) {
+		inode = ssdfs_iget(dir->i_sb, ino);
+		if (inode == ERR_PTR(-ESTALE)) {
+			SSDFS_ERR("deleted inode referenced: %lu\n",
+				  (unsigned long)ino);
+			return ERR_PTR(-EIO);
+		}
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return d_splice_alias(inode, target);
+}
+
+static int ssdfs_add_link(struct inode *dir, struct dentry *dentry,
+			  struct inode *inode)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(dir->i_sb);
+	struct ssdfs_inode_info *dir_ii = SSDFS_I(dir);
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	struct ssdfs_btree_search *search;
+	int private_flags;
+	struct timespec64 cur_time;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!rwsem_is_locked(&dir_ii->lock));
+
+	SSDFS_DBG("Created ino %lu with mode %o, nlink %d, nrpages %ld\n",
+		  (unsigned long)inode->i_ino, inode->i_mode,
+		  inode->i_nlink, inode->i_mapping->nrpages);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	private_flags = atomic_read(&dir_ii->private_flags);
+
+	if (private_flags & SSDFS_INODE_HAS_INLINE_DENTRIES ||
+	    private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) {
+		if (!dir_ii->dentries_tree) {
+			err = -ERANGE;
+			SSDFS_WARN("dentries tree absent!!!\n");
+			goto finish_add_link;
+		}
+	} else {
+		if (dir_ii->dentries_tree) {
+			err = -ERANGE;
+			SSDFS_WARN("dentries tree exists unexpectedly!!!\n");
+			goto finish_create_dentries_tree;
+		} else {
+			err = ssdfs_dentries_tree_create(fsi, dir_ii);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to create the dentries tree: "
+					  "ino %lu, err %d\n",
+					  dir->i_ino, err);
+				goto finish_create_dentries_tree;
+			}
+
+			atomic_or(SSDFS_INODE_HAS_INLINE_DENTRIES,
+				  &dir_ii->private_flags);
+		}
+
+finish_create_dentries_tree:
+		if (unlikely(err))
+			goto finish_add_link;
+	}
+
+	search = ssdfs_btree_search_alloc();
+	if (!search) {
+		err = -ENOMEM;
+		SSDFS_ERR("fail to allocate btree search object\n");
+		goto finish_add_link;
+	}
+
+	ssdfs_btree_search_init(search);
+
+	err = ssdfs_dentries_tree_add(dir_ii->dentries_tree,
+				      &dentry->d_name,
+				      ii, search);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to add the dentry: "
+			  "ino %lu, err %d\n",
+			  inode->i_ino, err);
+	} else {
+		cur_time = current_time(dir);
+		inode_set_mtime_to_ts(dir, cur_time);
+		inode_set_ctime_to_ts(dir, cur_time);
+		mark_inode_dirty(dir);
+	}
+
+	ssdfs_btree_search_free(search);
+
+finish_add_link:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+	return err;
+}
+
+static int ssdfs_add_nondir(struct inode *dir, struct dentry *dentry,
+			    struct inode *inode)
+{
+	struct ssdfs_inode_info *dir_ii = SSDFS_I(dir);
+	int private_flags;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("Created ino %lu with mode %o, nlink %d, nrpages %ld\n",
+		  (unsigned long)inode->i_ino, inode->i_mode,
+		  inode->i_nlink, inode->i_mapping->nrpages);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	private_flags = atomic_read(&dir_ii->private_flags);
+
+	if (private_flags & SSDFS_INODE_HAS_INLINE_DENTRIES ||
+	    private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) {
+		down_read(&dir_ii->lock);
+		err = ssdfs_add_link(dir, dentry, inode);
+		up_read(&dir_ii->lock);
+	} else {
+		down_write(&dir_ii->lock);
+		err = ssdfs_add_link(dir, dentry, inode);
+		up_write(&dir_ii->lock);
+	}
+
+	if (err) {
+		inode_dec_link_count(inode);
+		iget_failed(inode);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		return err;
+	}
+
+	unlock_new_inode(inode);
+	d_instantiate(dentry, inode);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+/*
+ * The ssdfs_create() is called by the open(2) and
+ * creat(2) system calls.
+ */
+int ssdfs_create(struct mnt_idmap *idmap,
+		 struct inode *dir, struct dentry *dentry,
+		 umode_t mode, bool excl)
+{
+	struct inode *inode;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("dir %lu, mode %o\n", (unsigned long)dir->i_ino, mode);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	inode = ssdfs_new_inode(idmap, dir, mode, &dentry->d_name);
+	if (IS_ERR(inode)) {
+		err = PTR_ERR(inode);
+		goto failed_create;
+	}
+
+	mark_inode_dirty(inode);
+	err = ssdfs_add_nondir(dir, dentry, inode);
+
+failed_create:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+	return err;
+}
+
+/*
+ * The ssdfs_mknod() is called by the mknod(2) system call
+ * to create a device (char, block) inode or a named pipe
+ * (FIFO) or socket.
+ */
+static int ssdfs_mknod(struct mnt_idmap *idmap,
+			struct inode *dir, struct dentry *dentry,
+			umode_t mode, dev_t rdev)
+{
+	struct inode *inode;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("dir %lu, mode %o, rdev %#x\n",
+		  (unsigned long)dir->i_ino, mode, rdev);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (dentry->d_name.len > SSDFS_MAX_NAME_LEN)
+		return -ENAMETOOLONG;
+
+	inode = ssdfs_new_inode(idmap, dir, mode, &dentry->d_name);
+	if (IS_ERR(inode))
+		return PTR_ERR(inode);
+
+	init_special_inode(inode, mode, rdev);
+
+	mark_inode_dirty(inode);
+	err = ssdfs_add_nondir(dir, dentry, inode);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * Create symlink.
+ * The ssdfs_symlink() is called by the symlink(2) system call.
+ */
+static int ssdfs_symlink(struct mnt_idmap *idmap,
+			 struct inode *dir, struct dentry *dentry,
+			 const char *target)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(dir->i_sb);
+	struct inode *inode;
+	size_t target_len = strlen(target) + 1;
+	size_t raw_inode_size;
+	size_t inline_len;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("dir %lu, target_len %zu\n",
+		  (unsigned long)dir->i_ino, target_len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (target_len > dir->i_sb->s_blocksize)
+		return -ENAMETOOLONG;
+
+	down_read(&fsi->volume_sem);
+	raw_inode_size = le16_to_cpu(fsi->vs->inodes_btree.desc.item_size);
+	up_read(&fsi->volume_sem);
+
+	inline_len = offsetof(struct ssdfs_inode, internal);
+
+	if (raw_inode_size <= inline_len) {
+		SSDFS_ERR("invalid raw inode size %zu\n",
+			  raw_inode_size);
+		return -EFAULT;
+	}
+
+	inline_len = raw_inode_size - inline_len;
+
+	inode = ssdfs_new_inode(idmap, dir, S_IFLNK | S_IRWXUGO, &dentry->d_name);
+	if (IS_ERR(inode))
+		return PTR_ERR(inode);
+
+	if (target_len > inline_len) {
+		/* slow symlink */
+		inode_nohighmem(inode);
+
+		err = page_symlink(inode, target, target_len);
+		if (err)
+			goto out_fail;
+	} else {
+		/* fast symlink */
+		down_write(&SSDFS_I(inode)->lock);
+		inode->i_link = (char *)SSDFS_I(inode)->raw_inode.internal;
+		memcpy(inode->i_link, target, target_len);
+		inode->i_size = target_len - 1;
+		atomic_or(SSDFS_INODE_HAS_INLINE_FILE,
+			  &SSDFS_I(inode)->private_flags);
+		up_write(&SSDFS_I(inode)->lock);
+	}
+
+	mark_inode_dirty(inode);
+	err = ssdfs_add_nondir(dir, dentry, inode);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+
+out_fail:
+	inode_dec_link_count(inode);
+	iget_failed(inode);
+	return err;
+}
+
+/*
+ * Create hardlink.
+ * The ssdfs_link() is called by the link(2) system call.
+ */
+static int ssdfs_link(struct dentry *old_dentry, struct inode *dir,
+			struct dentry *dentry)
+{
+	struct inode *inode = d_inode(old_dentry);
+	struct ssdfs_inode_info *dir_ii = SSDFS_I(dir);
+	int private_flags;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("dir %lu, inode %lu\n",
+		  (unsigned long)dir->i_ino, (unsigned long)inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (inode->i_nlink >= SSDFS_LINK_MAX)
+		return -EMLINK;
+
+	if (!S_ISREG(inode->i_mode))
+		return -EPERM;
+
+	inode_set_ctime_to_ts(inode, current_time(inode));
+	inode_inc_link_count(inode);
+	ihold(inode);
+
+	private_flags = atomic_read(&dir_ii->private_flags);
+
+	if (private_flags & SSDFS_INODE_HAS_INLINE_DENTRIES ||
+	    private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) {
+		down_read(&dir_ii->lock);
+		err = ssdfs_add_link(dir, dentry, inode);
+		up_read(&dir_ii->lock);
+	} else {
+		down_write(&dir_ii->lock);
+		err = ssdfs_add_link(dir, dentry, inode);
+		up_write(&dir_ii->lock);
+	}
+
+	if (err) {
+		inode_dec_link_count(inode);
+		iput(inode);
+		return err;
+	}
+
+	d_instantiate(dentry, inode);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+}
+
+/*
+ * Set the first fragment of directory.
+ */
+static int ssdfs_make_empty(struct inode *inode, struct inode *parent)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	struct ssdfs_inode_info *parent_ii = SSDFS_I(parent);
+	struct ssdfs_btree_search *search;
+	int private_flags;
+	struct qstr dot = QSTR_INIT(".", 1);
+	struct qstr dotdot = QSTR_INIT("..", 2);
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("Created ino %lu with mode %o, nlink %d, nrpages %ld\n",
+		  (unsigned long)inode->i_ino, inode->i_mode,
+		  inode->i_nlink, inode->i_mapping->nrpages);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	private_flags = atomic_read(&ii->private_flags);
+
+	if (private_flags & SSDFS_INODE_HAS_INLINE_DENTRIES ||
+	    private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) {
+		down_read(&ii->lock);
+
+		if (!ii->dentries_tree) {
+			err = -ERANGE;
+			SSDFS_WARN("dentries tree absent!!!\n");
+			goto finish_make_empty_dir;
+		}
+	} else {
+		down_write(&ii->lock);
+
+		if (ii->dentries_tree) {
+			err = -ERANGE;
+			SSDFS_WARN("dentries tree exists unexpectedly!!!\n");
+			goto finish_create_dentries_tree;
+		} else {
+			err = ssdfs_dentries_tree_create(fsi, ii);
+			if (unlikely(err)) {
+				SSDFS_ERR("fail to create the dentries tree: "
+					  "ino %lu, err %d\n",
+					  inode->i_ino, err);
+				goto finish_create_dentries_tree;
+			}
+
+			atomic_or(SSDFS_INODE_HAS_INLINE_DENTRIES,
+				  &ii->private_flags);
+		}
+
+finish_create_dentries_tree:
+		downgrade_write(&ii->lock);
+
+		if (unlikely(err))
+			goto finish_make_empty_dir;
+	}
+
+	search = ssdfs_btree_search_alloc();
+	if (!search) {
+		err = -ENOMEM;
+		SSDFS_ERR("fail to allocate btree search object\n");
+		goto finish_make_empty_dir;
+	}
+
+	ssdfs_btree_search_init(search);
+
+	err = ssdfs_dentries_tree_add(ii->dentries_tree,
+				      &dot, ii, search);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to add dentry: "
+			  "ino %lu, err %d\n",
+			  inode->i_ino, err);
+		goto free_search_object;
+	}
+
+	err = ssdfs_dentries_tree_add(ii->dentries_tree,
+				      &dotdot, parent_ii,
+				      search);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to add dentry: "
+			  "ino %lu, err %d\n",
+			  parent->i_ino, err);
+		goto free_search_object;
+	}
+
+free_search_object:
+	ssdfs_btree_search_free(search);
+
+finish_make_empty_dir:
+	up_read(&ii->lock);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+static int __ssdfs_mkdir(struct mnt_idmap *idmap,
+			 struct inode *dir, struct dentry *dentry, umode_t mode)
+{
+	struct inode *inode;
+	struct ssdfs_inode_info *dir_ii = SSDFS_I(dir);
+	int private_flags;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("dir %lu, mode %o\n",
+		  (unsigned long)dir->i_ino, mode);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (dentry->d_name.len > SSDFS_MAX_NAME_LEN)
+		return -ENAMETOOLONG;
+
+	inode_inc_link_count(dir);
+
+	inode = ssdfs_new_inode(idmap, dir, S_IFDIR | mode, &dentry->d_name);
+	err = PTR_ERR(inode);
+	if (IS_ERR(inode))
+		goto out_dir;
+
+	inode_inc_link_count(inode);
+
+	err = ssdfs_make_empty(inode, dir);
+	if (err)
+		goto out_fail;
+
+	private_flags = atomic_read(&dir_ii->private_flags);
+
+	if (private_flags & SSDFS_INODE_HAS_INLINE_DENTRIES ||
+	    private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) {
+		down_read(&dir_ii->lock);
+		err = ssdfs_add_link(dir, dentry, inode);
+		up_read(&dir_ii->lock);
+	} else {
+		down_write(&dir_ii->lock);
+		err = ssdfs_add_link(dir, dentry, inode);
+		up_write(&dir_ii->lock);
+	}
+
+	if (err)
+		goto out_fail;
+
+	d_instantiate(dentry, inode);
+	unlock_new_inode(inode);
+	return 0;
+
+out_fail:
+	inode_dec_link_count(inode);
+	inode_dec_link_count(inode);
+	unlock_new_inode(inode);
+	iput(inode);
+out_dir:
+	inode_dec_link_count(dir);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * Create subdirectory.
+ * The ssdfs_mkdir() is called by the mkdir(2) system call.
+ */
+static struct dentry *ssdfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
+				  struct dentry *dentry, umode_t mode)
+{
+	return ERR_PTR(__ssdfs_mkdir(idmap, dir, dentry, mode));
+}
+
+/*
+ * Delete inode.
+ * The ssdfs_unlink() is called by the unlink(2) system call.
+ */
+static int ssdfs_unlink(struct inode *dir, struct dentry *dentry)
+{
+	struct ssdfs_inode_info *ii = SSDFS_I(dir);
+	struct inode *inode = d_inode(dentry);
+	struct ssdfs_btree_search *search;
+	int private_flags;
+	u64 name_hash;
+	struct timespec64 cur_time;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("dir %lu, inode %lu\n",
+		  (unsigned long)dir->i_ino, (unsigned long)inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	trace_ssdfs_unlink_enter(dir, dentry);
+
+	private_flags = atomic_read(&ii->private_flags);
+
+	if (private_flags & SSDFS_INODE_HAS_INLINE_DENTRIES ||
+	    private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) {
+		down_read(&ii->lock);
+
+		if (!ii->dentries_tree) {
+			err = -ERANGE;
+			SSDFS_WARN("dentries tree absent!!!\n");
+			goto finish_delete_dentry;
+		}
+
+		search = ssdfs_btree_search_alloc();
+		if (!search) {
+			err = -ENOMEM;
+			SSDFS_ERR("fail to allocate btree search object\n");
+			goto finish_delete_dentry;
+		}
+
+		ssdfs_btree_search_init(search);
+
+		name_hash = ssdfs_generate_name_hash(&dentry->d_name);
+		if (name_hash >= U64_MAX) {
+			err = -ERANGE;
+			SSDFS_ERR("invalid name hash\n");
+			goto dentry_is_not_available;
+		}
+
+		err = ssdfs_dentries_tree_delete(ii->dentries_tree,
+						 name_hash,
+						 inode->i_ino,
+						 search);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to delete the dentry: "
+				  "name_hash %llx, ino %lu, err %d\n",
+				  name_hash, inode->i_ino, err);
+			goto dentry_is_not_available;
+		}
+
+dentry_is_not_available:
+		ssdfs_btree_search_free(search);
+
+finish_delete_dentry:
+		up_read(&ii->lock);
+
+		if (unlikely(err))
+			goto finish_unlink;
+	} else {
+		err = -ENOENT;
+		SSDFS_ERR("dentries tree is absent\n");
+		goto finish_unlink;
+	}
+
+	mark_inode_dirty(dir);
+	mark_inode_dirty(inode);
+
+	cur_time = current_time(dir);
+	inode_set_ctime_to_ts(inode, cur_time);
+	inode_set_mtime_to_ts(dir, cur_time);
+	inode_set_ctime_to_ts(dir, cur_time);
+
+	inode_dec_link_count(inode);
+
+finish_unlink:
+	trace_ssdfs_unlink_exit(inode, err);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+static inline bool ssdfs_empty_dir(struct inode *dir)
+{
+	struct ssdfs_inode_info *ii = SSDFS_I(dir);
+	bool is_empty = false;
+	int private_flags;
+	u64 dentries_count;
+	u64 threshold = 2; /* . and .. */
+
+	private_flags = atomic_read(&ii->private_flags);
+
+	if (private_flags & SSDFS_INODE_HAS_INLINE_DENTRIES ||
+	    private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) {
+		down_read(&ii->lock);
+
+		if (!ii->dentries_tree) {
+			SSDFS_WARN("dentries tree absent!!!\n");
+			is_empty = true;
+		} else {
+			dentries_count =
+			    atomic64_read(&ii->dentries_tree->dentries_count);
+
+			if (dentries_count > threshold) {
+				/* not empty folder */
+				is_empty = false;
+			} else if (dentries_count < threshold) {
+				SSDFS_WARN("unexpected dentries count %llu\n",
+					   dentries_count);
+				is_empty = true;
+			} else
+				is_empty = true;
+		}
+
+		up_read(&ii->lock);
+	} else {
+		/* dentries tree is absent */
+		is_empty = true;
+	}
+
+	return is_empty;
+}
+
+/*
+ * Delete subdirectory.
+ * The ssdfs_rmdir() is called by the rmdir(2) system call.
+ */
+static int ssdfs_rmdir(struct inode *dir, struct dentry *dentry)
+{
+	struct inode *inode = d_inode(dentry);
+	int err = -ENOTEMPTY;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("dir %lu, subdir %lu\n",
+		  (unsigned long)dir->i_ino, (unsigned long)inode->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (ssdfs_empty_dir(inode)) {
+		err = ssdfs_unlink(dir, dentry);
+		if (!err) {
+			inode->i_size = 0;
+			inode_dec_link_count(inode);
+			inode_dec_link_count(dir);
+		}
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+enum {
+	SSDFS_FIRST_INODE_LOCK = 0,
+	SSDFS_SECOND_INODE_LOCK,
+	SSDFS_THIRD_INODE_LOCK,
+	SSDFS_FOURTH_INODE_LOCK,
+};
+
+static void lock_4_inodes(struct inode *inode1, struct inode *inode2,
+			  struct inode *inode3, struct inode *inode4)
+{
+	down_write_nested(&SSDFS_I(inode1)->lock, SSDFS_FIRST_INODE_LOCK);
+
+	if (inode2 != inode1) {
+		down_write_nested(&SSDFS_I(inode2)->lock,
+					SSDFS_SECOND_INODE_LOCK);
+	}
+
+	if (inode3) {
+		down_write_nested(&SSDFS_I(inode3)->lock,
+					SSDFS_THIRD_INODE_LOCK);
+	}
+
+	if (inode4) {
+		down_write_nested(&SSDFS_I(inode4)->lock,
+					SSDFS_FOURTH_INODE_LOCK);
+	}
+}
+
+static void unlock_4_inodes(struct inode *inode1, struct inode *inode2,
+			    struct inode *inode3, struct inode *inode4)
+{
+	if (inode4)
+		up_write(&SSDFS_I(inode4)->lock);
+	if (inode3)
+		up_write(&SSDFS_I(inode3)->lock);
+	if (inode1 != inode2)
+		up_write(&SSDFS_I(inode2)->lock);
+	up_write(&SSDFS_I(inode1)->lock);
+}
+
+/*
+ * Regular rename.
+ */
+static int ssdfs_rename_target(struct inode *old_dir,
+				struct dentry *old_dentry,
+				struct inode *new_dir,
+				struct dentry *new_dentry,
+				unsigned int flags)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(old_dir->i_sb);
+	struct ssdfs_inode_info *old_dir_ii = SSDFS_I(old_dir);
+	struct ssdfs_inode_info *new_dir_ii = SSDFS_I(new_dir);
+	struct inode *old_inode = d_inode(old_dentry);
+	struct ssdfs_inode_info *old_ii = SSDFS_I(old_inode);
+	struct inode *new_inode = d_inode(new_dentry);
+	struct ssdfs_btree_search *search;
+	struct qstr dotdot = QSTR_INIT("..", 2);
+	bool is_dir = S_ISDIR(old_inode->i_mode);
+	bool move = (new_dir != old_dir);
+	bool unlink = new_inode != NULL;
+	ino_t old_ino, old_parent_ino, new_ino;
+	struct timespec64 time;
+	u64 name_hash;
+	int err = -ENOENT;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("old_dir %lu, old_inode %lu, "
+		  "new_dir %lu, new_inode %p\n",
+		  (unsigned long)old_dir->i_ino,
+		  (unsigned long)old_inode->i_ino,
+		  (unsigned long)new_dir->i_ino,
+		  new_inode);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	search = ssdfs_btree_search_alloc();
+	if (!search) {
+		err = -ENOMEM;
+		SSDFS_ERR("fail to allocate btree search object\n");
+		goto out;
+	}
+
+	ssdfs_btree_search_init(search);
+
+	lock_4_inodes(old_dir, new_dir, old_inode, new_inode);
+
+	err = ssdfs_inode_by_name(old_dir, &old_dentry->d_name, &old_ino);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to find old dentry: err %d\n", err);
+		goto finish_target_rename;
+	} else if (old_ino != old_inode->i_ino) {
+		err = -ERANGE;
+		SSDFS_ERR("invalid ino: found ino %lu != requested ino %lu\n",
+			  old_ino, old_inode->i_ino);
+		goto finish_target_rename;
+	}
+
+	if (S_ISDIR(old_inode->i_mode)) {
+		err = ssdfs_inode_by_name(old_inode, &dotdot, &old_parent_ino);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to find parent dentry: err %d\n", err);
+			goto finish_target_rename;
+		} else if (old_parent_ino != old_dir->i_ino) {
+			err = -ERANGE;
+			SSDFS_ERR("invalid ino: "
+				  "found ino %lu != requested ino %lu\n",
+				  old_parent_ino, old_dir->i_ino);
+			goto finish_target_rename;
+		}
+	}
+
+	if (!old_dir_ii->dentries_tree) {
+		err = -ERANGE;
+		SSDFS_ERR("old dir hasn't dentries tree\n");
+		goto finish_target_rename;
+	}
+
+	if (!new_dir_ii->dentries_tree) {
+		err = -ERANGE;
+		SSDFS_ERR("new dir hasn't dentries tree\n");
+		goto finish_target_rename;
+	}
+
+	if (S_ISDIR(old_inode->i_mode) && !old_ii->dentries_tree) {
+		err = -ERANGE;
+		SSDFS_ERR("old inode hasn't dentries tree\n");
+		goto finish_target_rename;
+	}
+
+	if (flags & RENAME_WHITEOUT) {
+		/* TODO: implement support */
+		SSDFS_WARN("TODO: implement support of RENAME_WHITEOUT\n");
+	}
+
+	if (new_inode) {
+		err = -ENOTEMPTY;
+		if (is_dir && !ssdfs_empty_dir(new_inode))
+			goto finish_target_rename;
+
+		err = ssdfs_inode_by_name(new_dir, &new_dentry->d_name,
+					  &new_ino);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to find new dentry: err %d\n", err);
+			goto finish_target_rename;
+		} else if (new_ino != new_inode->i_ino) {
+			err = -ERANGE;
+			SSDFS_ERR("invalid ino: "
+				  "found ino %lu != requested ino %lu\n",
+				  new_ino, new_inode->i_ino);
+			goto finish_target_rename;
+		}
+
+		name_hash = ssdfs_generate_name_hash(&new_dentry->d_name);
+
+		err = ssdfs_dentries_tree_change(new_dir_ii->dentries_tree,
+						 name_hash,
+						 new_inode->i_ino,
+						 &old_dentry->d_name,
+						 old_ii,
+						 search);
+		if (unlikely(err)) {
+			ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__,
+					"fail to update dentry: err %d\n",
+					err);
+			goto finish_target_rename;
+		}
+	} else {
+		err = ssdfs_add_link(new_dir, new_dentry, old_inode);
+		if (unlikely(err)) {
+			ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__,
+					"fail to add the link: err %d\n",
+					err);
+			goto finish_target_rename;
+		}
+	}
+
+	name_hash = ssdfs_generate_name_hash(&old_dentry->d_name);
+
+	err = ssdfs_dentries_tree_delete(old_dir_ii->dentries_tree,
+					 name_hash,
+					 old_inode->i_ino,
+					 search);
+	if (unlikely(err)) {
+		ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__,
+				"fail to delete the dentry: "
+				"name_hash %llx, ino %lu, err %d\n",
+				name_hash, old_inode->i_ino, err);
+		goto finish_target_rename;
+	}
+
+	if (is_dir && move) {
+		/* update ".." directory entry info of old dentry */
+		name_hash = ssdfs_generate_name_hash(&dotdot);
+		err = ssdfs_dentries_tree_change(old_ii->dentries_tree,
+						 name_hash, old_dir->i_ino,
+						 &dotdot, new_dir_ii,
+						 search);
+		if (unlikely(err)) {
+			ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__,
+					"fail to update dentry: err %d\n",
+					err);
+			goto finish_target_rename;
+		}
+	}
+
+	old_ii->parent_ino = new_dir->i_ino;
+
+	/*
+	 * Like most other Unix systems, set the @i_ctime for inodes on a
+	 * rename.
+	 */
+	time = current_time(old_dir);
+	inode_set_ctime_to_ts(old_inode, time);
+	mark_inode_dirty(old_inode);
+
+	/* We must adjust parent link count when renaming directories */
+	if (is_dir) {
+		if (move) {
+			/*
+			 * @old_dir loses a link because we are moving
+			 * @old_inode to a different directory.
+			 */
+			inode_dec_link_count(old_dir);
+			/*
+			 * @new_dir only gains a link if we are not also
+			 * overwriting an existing directory.
+			 */
+			if (!unlink)
+				inode_inc_link_count(new_dir);
+		} else {
+			/*
+			 * @old_inode is not moving to a different directory,
+			 * but @old_dir still loses a link if we are
+			 * overwriting an existing directory.
+			 */
+			if (unlink)
+				inode_dec_link_count(old_dir);
+		}
+	}
+
+	inode_set_mtime_to_ts(old_dir, time);
+	inode_set_ctime_to_ts(old_dir, time);
+	inode_set_mtime_to_ts(new_dir, time);
+	inode_set_ctime_to_ts(new_dir, time);
+
+	/*
+	 * And finally, if we unlinked a direntry which happened to have the
+	 * same name as the moved direntry, we have to decrement @i_nlink of
+	 * the unlinked inode and change its ctime.
+	 */
+	if (unlink) {
+		/*
+		 * Directories cannot have hard-links, so if this is a
+		 * directory, just clear @i_nlink.
+		 */
+		if (is_dir) {
+			clear_nlink(new_inode);
+			mark_inode_dirty(new_inode);
+		} else
+			inode_dec_link_count(new_inode);
+		inode_set_ctime_to_ts(new_inode, time);
+	}
+
+finish_target_rename:
+	unlock_4_inodes(old_dir, new_dir, old_inode, new_inode);
+	ssdfs_btree_search_free(search);
+
+out:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+	return err;
+}
+
+/*
+ * Cross-directory rename.
+ */
+static int ssdfs_cross_rename(struct inode *old_dir,
+				struct dentry *old_dentry,
+				struct inode *new_dir,
+				struct dentry *new_dentry)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(old_dir->i_sb);
+	struct ssdfs_inode_info *old_dir_ii = SSDFS_I(old_dir);
+	struct ssdfs_inode_info *new_dir_ii = SSDFS_I(new_dir);
+	struct inode *old_inode = d_inode(old_dentry);
+	struct ssdfs_inode_info *old_ii = SSDFS_I(old_inode);
+	struct inode *new_inode = d_inode(new_dentry);
+	struct ssdfs_inode_info *new_ii = SSDFS_I(new_inode);
+	struct ssdfs_btree_search *search;
+	struct qstr dotdot = QSTR_INIT("..", 2);
+	ino_t old_ino, new_ino;
+	struct timespec64 time;
+	u64 name_hash;
+	int err = -ENOENT;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("old_dir %lu, old_inode %lu, new_dir %lu\n",
+		  (unsigned long)old_dir->i_ino,
+		  (unsigned long)old_inode->i_ino,
+		  (unsigned long)new_dir->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	search = ssdfs_btree_search_alloc();
+	if (!search) {
+		err = -ENOMEM;
+		SSDFS_ERR("fail to allocate btree search object\n");
+		goto out;
+	}
+
+	ssdfs_btree_search_init(search);
+
+	lock_4_inodes(old_dir, new_dir, old_inode, new_inode);
+
+	err = ssdfs_inode_by_name(old_dir, &old_dentry->d_name, &old_ino);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to find old dentry: err %d\n", err);
+		goto finish_cross_rename;
+	} else if (old_ino != old_inode->i_ino) {
+		err = -ERANGE;
+		SSDFS_ERR("invalid ino: found ino %lu != requested ino %lu\n",
+			  old_ino, old_inode->i_ino);
+		goto finish_cross_rename;
+	}
+
+	err = ssdfs_inode_by_name(new_dir, &new_dentry->d_name, &new_ino);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to find new dentry: err %d\n", err);
+		goto finish_cross_rename;
+	} else if (new_ino != new_inode->i_ino) {
+		err = -ERANGE;
+		SSDFS_ERR("invalid ino: found ino %lu != requested ino %lu\n",
+			  new_ino, new_inode->i_ino);
+		goto finish_cross_rename;
+	}
+
+	if (!old_dir_ii->dentries_tree) {
+		err = -ERANGE;
+		SSDFS_ERR("old dir hasn't dentries tree\n");
+		goto finish_cross_rename;
+	}
+
+	if (!new_dir_ii->dentries_tree) {
+		err = -ERANGE;
+		SSDFS_ERR("new dir hasn't dentries tree\n");
+		goto finish_cross_rename;
+	}
+
+	if (S_ISDIR(old_inode->i_mode) && !old_ii->dentries_tree) {
+		err = -ERANGE;
+		SSDFS_ERR("old inode hasn't dentries tree\n");
+		goto finish_cross_rename;
+	}
+
+	if (S_ISDIR(new_inode->i_mode) && !new_ii->dentries_tree) {
+		err = -ERANGE;
+		SSDFS_ERR("new inode hasn't dentries tree\n");
+		goto finish_cross_rename;
+	}
+
+	name_hash = ssdfs_generate_name_hash(&dotdot);
+
+	/* update ".." directory entry info of old dentry */
+	if (S_ISDIR(old_inode->i_mode)) {
+		err = ssdfs_dentries_tree_change(old_ii->dentries_tree,
+						 name_hash, old_dir->i_ino,
+						 &dotdot, new_dir_ii,
+						 search);
+		if (unlikely(err)) {
+			ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__,
+					"fail to update dentry: err %d\n",
+					err);
+			goto finish_cross_rename;
+		}
+	}
+
+	/* update ".." directory entry info of new dentry */
+	if (S_ISDIR(new_inode->i_mode)) {
+		err = ssdfs_dentries_tree_change(new_ii->dentries_tree,
+						 name_hash, new_dir->i_ino,
+						 &dotdot, old_dir_ii,
+						 search);
+		if (unlikely(err)) {
+			ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__,
+					"fail to update dentry: err %d\n",
+					err);
+			goto finish_cross_rename;
+		}
+	}
+
+	/* update directory entry info of old dir inode */
+	name_hash = ssdfs_generate_name_hash(&old_dentry->d_name);
+
+	err = ssdfs_dentries_tree_change(old_dir_ii->dentries_tree,
+					 name_hash, old_inode->i_ino,
+					 &new_dentry->d_name, new_ii,
+					 search);
+	if (unlikely(err)) {
+		ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__,
+				"fail to update dentry: err %d\n",
+				err);
+		goto finish_cross_rename;
+	}
+
+	/* update directory entry info of new dir inode */
+	name_hash = ssdfs_generate_name_hash(&new_dentry->d_name);
+
+	err = ssdfs_dentries_tree_change(new_dir_ii->dentries_tree,
+					 name_hash, new_inode->i_ino,
+					 &old_dentry->d_name, old_ii,
+					 search);
+	if (unlikely(err)) {
+		ssdfs_fs_error(fsi->sb, __FILE__, __func__, __LINE__,
+				"fail to update dentry: err %d\n",
+				err);
+		goto finish_cross_rename;
+	}
+
+	old_ii->parent_ino = new_dir->i_ino;
+	new_ii->parent_ino = old_dir->i_ino;
+
+	time = current_time(old_dir);
+	inode_set_ctime_to_ts(old_inode, time);
+	inode_set_ctime_to_ts(new_inode, time);
+	inode_set_mtime_to_ts(old_dir, time);
+	inode_set_ctime_to_ts(old_dir, time);
+	inode_set_mtime_to_ts(new_dir, time);
+	inode_set_ctime_to_ts(new_dir, time);
+
+	if (old_dir != new_dir) {
+		if (S_ISDIR(old_inode->i_mode) &&
+		    !S_ISDIR(new_inode->i_mode)) {
+			inode_inc_link_count(new_dir);
+			inode_dec_link_count(old_dir);
+		}
+		else if (!S_ISDIR(old_inode->i_mode) &&
+			 S_ISDIR(new_inode->i_mode)) {
+			inode_dec_link_count(new_dir);
+			inode_inc_link_count(old_dir);
+		}
+	}
+
+	mark_inode_dirty(old_inode);
+	mark_inode_dirty(new_inode);
+
+finish_cross_rename:
+	unlock_4_inodes(old_dir, new_dir, old_inode, new_inode);
+	ssdfs_btree_search_free(search);
+
+out:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+/*
+ * The ssdfs_rename() is called by the rename(2) system call
+ * to rename the object to have the parent and name given by
+ * the second inode and dentry.
+ */
+static int ssdfs_rename(struct mnt_idmap *idmap,
+			struct inode *old_dir, struct dentry *old_dentry,
+			struct inode *new_dir, struct dentry *new_dentry,
+			unsigned int flags)
+{
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("old_dir %lu, old_inode %lu, new_dir %lu\n",
+		  (unsigned long)old_dir->i_ino,
+		  (unsigned long)old_dentry->d_inode->i_ino,
+		  (unsigned long)new_dir->i_ino);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT)) {
+		SSDFS_ERR("invalid flags %#x\n", flags);
+		return -EINVAL;
+	}
+
+	if (flags & RENAME_EXCHANGE) {
+		return ssdfs_cross_rename(old_dir, old_dentry,
+					  new_dir, new_dentry);
+	}
+
+	return ssdfs_rename_target(old_dir, old_dentry, new_dir, new_dentry,
+				   flags);
+}
+
+static
+int ssdfs_dentries_tree_get_start_hash(struct ssdfs_dentries_btree_info *tree,
+					u64 *start_hash)
+{
+	struct ssdfs_btree_index *index;
+	struct ssdfs_dir_entry *cur_dentry;
+	u64 dentries_count;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!tree || !start_hash);
+
+	SSDFS_DBG("tree %p, start_hash %p\n",
+		  tree, start_hash);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*start_hash = U64_MAX;
+
+	switch (atomic_read(&tree->state)) {
+	case SSDFS_DENTRIES_BTREE_CREATED:
+	case SSDFS_DENTRIES_BTREE_INITIALIZED:
+	case SSDFS_DENTRIES_BTREE_DIRTY:
+		/* expected state */
+		break;
+
+	default:
+		SSDFS_ERR("invalid dentries tree's state %#x\n",
+			  atomic_read(&tree->state));
+		return -ERANGE;
+	};
+
+	dentries_count = atomic64_read(&tree->dentries_count);
+
+	if (dentries_count < 2) {
+		SSDFS_WARN("folder is corrupted: "
+			   "dentries_count %llu\n",
+			   dentries_count);
+		return -ERANGE;
+	} else if (dentries_count == 2)
+		return -ENOENT;
+
+	switch (atomic_read(&tree->type)) {
+	case SSDFS_INLINE_DENTRIES_ARRAY:
+		down_read(&tree->lock);
+
+		if (!tree->inline_dentries) {
+			err = -ERANGE;
+			SSDFS_ERR("inline tree's pointer is empty\n");
+			goto finish_process_inline_tree;
+		}
+
+		cur_dentry = &tree->inline_dentries[0];
+		*start_hash = le64_to_cpu(cur_dentry->hash_code);
+
+finish_process_inline_tree:
+		up_read(&tree->lock);
+
+		if (*start_hash >= U64_MAX) {
+			/* warn about invalid hash code */
+			SSDFS_WARN("inline array: hash_code is invalid\n");
+		}
+		break;
+
+	case SSDFS_PRIVATE_DENTRIES_BTREE:
+		down_read(&tree->lock);
+
+		if (!tree->root) {
+			err = -ERANGE;
+			SSDFS_ERR("root node pointer is NULL\n");
+			goto finish_get_start_hash;
+		}
+
+		index = &tree->root->indexes[SSDFS_ROOT_NODE_LEFT_LEAF_NODE];
+		*start_hash = le64_to_cpu(index->hash);
+
+finish_get_start_hash:
+		up_read(&tree->lock);
+
+		if (*start_hash >= U64_MAX) {
+			/* warn about invalid hash code */
+			SSDFS_WARN("private dentry: hash_code is invalid\n");
+		}
+		break;
+
+	default:
+		err = -ERANGE;
+		SSDFS_ERR("invalid tree type %#x\n",
+			  atomic_read(&tree->type));
+		break;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+static
+int ssdfs_dentries_tree_get_next_hash(struct ssdfs_dentries_btree_info *tree,
+					struct ssdfs_btree_search *search,
+					u64 *next_hash)
+{
+	u64 old_hash;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!tree || !search || !next_hash);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	old_hash = le64_to_cpu(search->node.found_index.index.hash);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("search %p, next_hash %p, old (node %u, hash %llx)\n",
+		  search, next_hash, search->node.id, old_hash);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (atomic_read(&tree->type)) {
+	case SSDFS_INLINE_DENTRIES_ARRAY:
+		SSDFS_DBG("inline dentries array is unsupported\n");
+		return -ENOENT;
+
+	case SSDFS_PRIVATE_DENTRIES_BTREE:
+		/* expected tree type */
+		break;
+
+	default:
+		SSDFS_ERR("invalid tree type %#x\n",
+			  atomic_read(&tree->type));
+		return -ERANGE;
+	}
+
+	down_read(&tree->lock);
+	err = ssdfs_btree_get_next_hash(tree->generic_tree, search, next_hash);
+	up_read(&tree->lock);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err;
+}
+
+static
+int ssdfs_dentries_tree_node_hash_range(struct ssdfs_dentries_btree_info *tree,
+					struct ssdfs_btree_search *search,
+					u64 *start_hash, u64 *end_hash,
+					u16 *items_count)
+{
+	struct ssdfs_dir_entry *cur_dentry;
+	u64 dentries_count;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!search || !start_hash || !end_hash || !items_count);
+
+	SSDFS_DBG("search %p, start_hash %p, "
+		  "end_hash %p, items_count %p\n",
+		  search, start_hash, end_hash, items_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*start_hash = *end_hash = U64_MAX;
+	*items_count = 0;
+
+	switch (atomic_read(&tree->state)) {
+	case SSDFS_DENTRIES_BTREE_CREATED:
+	case SSDFS_DENTRIES_BTREE_INITIALIZED:
+	case SSDFS_DENTRIES_BTREE_DIRTY:
+		/* expected state */
+		break;
+
+	default:
+		SSDFS_ERR("invalid dentries tree's state %#x\n",
+			  atomic_read(&tree->state));
+		return -ERANGE;
+	};
+
+	switch (atomic_read(&tree->type)) {
+	case SSDFS_INLINE_DENTRIES_ARRAY:
+		dentries_count = atomic64_read(&tree->dentries_count);
+		if (dentries_count >= U16_MAX) {
+			err = -ERANGE;
+			SSDFS_ERR("unexpected dentries count %llu\n",
+				  dentries_count);
+			goto finish_extract_hash_range;
+		}
+
+		*items_count = (u16)dentries_count;
+
+		if (*items_count == 0)
+			goto finish_extract_hash_range;
+
+		down_read(&tree->lock);
+
+		if (!tree->inline_dentries) {
+			err = -ERANGE;
+			SSDFS_ERR("inline tree's pointer is empty\n");
+			goto finish_process_inline_tree;
+		}
+
+		cur_dentry = &tree->inline_dentries[0];
+		*start_hash = le64_to_cpu(cur_dentry->hash_code);
+
+		if (dentries_count > SSDFS_INLINE_DENTRIES_COUNT) {
+			err = -ERANGE;
+			SSDFS_ERR("dentries_count %llu > max_value %u\n",
+				  dentries_count,
+				  SSDFS_INLINE_DENTRIES_COUNT);
+			goto finish_process_inline_tree;
+		}
+
+		cur_dentry = &tree->inline_dentries[dentries_count - 1];
+		*end_hash = le64_to_cpu(cur_dentry->hash_code);
+
+finish_process_inline_tree:
+		up_read(&tree->lock);
+		break;
+
+	case SSDFS_PRIVATE_DENTRIES_BTREE:
+		err = ssdfs_btree_node_get_hash_range(search,
+						      start_hash,
+						      end_hash,
+						      items_count);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to get hash range: err %d\n",
+				  err);
+			goto finish_extract_hash_range;
+		}
+		break;
+
+	default:
+		SSDFS_ERR("invalid tree type %#x\n",
+			  atomic_read(&tree->type));
+		return -ERANGE;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("start_hash %llx, end_hash %llx, items_count %u\n",
+		  *start_hash, *end_hash, *items_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+finish_extract_hash_range:
+	return err;
+}
+
+static
+int ssdfs_dentries_tree_check_search_result(struct ssdfs_btree_search *search)
+{
+	size_t dentry_size = sizeof(struct ssdfs_dir_entry);
+	u16 items_count;
+	size_t buf_size;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!search);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (search->result.state) {
+	case SSDFS_BTREE_SEARCH_VALID_ITEM:
+		/* expected state */
+		break;
+
+	default:
+		SSDFS_ERR("unexpected result's state %#x\n",
+			  search->result.state);
+		return  -ERANGE;
+	}
+
+	switch (search->result.raw_buf.state) {
+	case SSDFS_BTREE_SEARCH_INLINE_BUFFER:
+	case SSDFS_BTREE_SEARCH_EXTERNAL_BUFFER:
+		if (!search->result.raw_buf.place.ptr) {
+			SSDFS_ERR("buffer pointer is NULL\n");
+			return -ERANGE;
+		}
+		break;
+
+	default:
+		SSDFS_ERR("unexpected buffer's state\n");
+		return -ERANGE;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(search->result.raw_buf.items_count >= U16_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	items_count = (u16)search->result.raw_buf.items_count;
+
+	if (items_count == 0) {
+		SSDFS_ERR("items_in_buffer %u\n",
+			  items_count);
+		return -ENOENT;
+	} else if (items_count != search->result.count) {
+		SSDFS_ERR("items_count %u != search->result.count %u\n",
+			  items_count, search->result.count);
+		return -ERANGE;
+	}
+
+	buf_size = dentry_size * items_count;
+
+	if (buf_size != search->result.raw_buf.size) {
+		SSDFS_ERR("buf_size %zu != search->result.raw_buf.size %zu\n",
+			  buf_size,
+			  search->result.raw_buf.size);
+		return -ERANGE;
+	}
+
+	return 0;
+}
+
+static
+bool is_invalid_dentry(struct ssdfs_dir_entry *dentry)
+{
+	u8 name_len;
+	bool is_invalid = false;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!dentry);
+
+	SSDFS_DBG("dentry_type %#x, file_type %#x, "
+		  "flags %#x, name_len %u, "
+		  "hash_code %llx, ino %llu\n",
+		  dentry->dentry_type, dentry->file_type,
+		  dentry->flags, dentry->name_len,
+		  le64_to_cpu(dentry->hash_code),
+		  le64_to_cpu(dentry->ino));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	switch (dentry->dentry_type) {
+	case SSDFS_INLINE_DENTRY:
+	case SSDFS_REGULAR_DENTRY:
+		/* expected dentry type */
+		break;
+
+	default:
+		is_invalid = true;
+		SSDFS_ERR("invalid dentry type %#x\n",
+			  dentry->dentry_type);
+		goto finish_check;
+	}
+
+	if (dentry->file_type <= SSDFS_FT_UNKNOWN ||
+	    dentry->file_type >= SSDFS_FT_MAX) {
+		is_invalid = true;
+		SSDFS_ERR("invalid file type %#x\n",
+			  dentry->file_type);
+		goto finish_check;
+	}
+
+	if (dentry->flags & ~SSDFS_DENTRY_FLAGS_MASK) {
+		is_invalid = true;
+		SSDFS_ERR("invalid set of flags %#x\n",
+			  dentry->flags);
+		goto finish_check;
+	}
+
+	name_len = dentry->name_len;
+
+	if (name_len > SSDFS_MAX_NAME_LEN) {
+		is_invalid = true;
+		SSDFS_ERR("invalid name_len %u\n",
+			  name_len);
+		goto finish_check;
+	}
+
+	if (le64_to_cpu(dentry->hash_code) >= U64_MAX) {
+		is_invalid = true;
+		SSDFS_ERR("invalid hash_code\n");
+		goto finish_check;
+	}
+
+	if (le64_to_cpu(dentry->ino) >= U32_MAX) {
+		is_invalid = true;
+		SSDFS_ERR("ino %llu is too huge\n",
+			  le64_to_cpu(dentry->ino));
+		goto finish_check;
+	}
+
+finish_check:
+	if (is_invalid) {
+		SSDFS_ERR("dentry_type %#x, file_type %#x, "
+			  "flags %#x, name_len %u, "
+			  "hash_code %llx, ino %llu\n",
+			  dentry->dentry_type, dentry->file_type,
+			  dentry->flags, dentry->name_len,
+			  le64_to_cpu(dentry->hash_code),
+			  le64_to_cpu(dentry->ino));
+	}
+
+	return is_invalid;
+}
+
+/*
+ * The ssdfs_readdir() is called when the VFS needs
+ * to read the directory contents.
+ */
+static int ssdfs_readdir(struct file *file, struct dir_context *ctx)
+{
+	struct inode *inode = file_inode(file);
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	struct qstr dot = QSTR_INIT(".", 1);
+	u64 dot_hash;
+	struct qstr dotdot = QSTR_INIT("..", 2);
+	u64 dotdot_hash;
+	struct ssdfs_shared_dict_btree_info *dict;
+	struct ssdfs_btree_search *search;
+	struct ssdfs_dir_entry *dentry;
+	size_t dentry_size = sizeof(struct ssdfs_dir_entry);
+	int private_flags;
+	u64 start_hash = U64_MAX;
+	u64 end_hash = U64_MAX;
+	u64 hash = U64_MAX;
+	u64 start_pos;
+	u16 items_count;
+	ino_t ino;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("file %p, ctx %p\n", file, ctx);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (ctx->pos < 0) {
+		SSDFS_DBG("ctx->pos %lld\n", ctx->pos);
+		return 0;
+	}
+
+	dict = fsi->shdictree;
+	if (!dict) {
+		SSDFS_ERR("shared dictionary is absent\n");
+		return -ERANGE;
+	}
+
+	dot_hash = ssdfs_generate_name_hash(&dot);
+	dotdot_hash = ssdfs_generate_name_hash(&dotdot);
+
+	private_flags = atomic_read(&ii->private_flags);
+
+	if (private_flags & SSDFS_INODE_HAS_INLINE_DENTRIES ||
+	    private_flags & SSDFS_INODE_HAS_DENTRIES_BTREE) {
+		down_read(&ii->lock);
+		if (!ii->dentries_tree)
+			err = -ERANGE;
+		up_read(&ii->lock);
+
+		if (unlikely(err)) {
+			SSDFS_WARN("dentries tree is absent\n");
+			return -ERANGE;
+		}
+	} else {
+		if (!S_ISDIR(inode->i_mode)) {
+			SSDFS_WARN("this is not folder!!!\n");
+			return -EINVAL;
+		}
+
+		down_read(&ii->lock);
+		if (ii->dentries_tree)
+			err = -ERANGE;
+		up_read(&ii->lock);
+
+		if (unlikely(err)) {
+			SSDFS_WARN("dentries tree exists!!!!\n");
+			return err;
+		}
+	}
+
+	start_pos = ctx->pos;
+
+	if (ctx->pos == 0) {
+		down_read(&ii->lock);
+		err = ssdfs_inode_by_name(inode, &dot, &ino);
+		up_read(&ii->lock);
+
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to find dentry: err %d\n", err);
+			goto out;
+		}
+
+		if (!dir_emit_dot(file, ctx)) {
+			err = -ERANGE;
+			SSDFS_ERR("fail to emit dentry\n");
+			goto out;
+		}
+
+		ctx->pos = 1;
+	}
+
+	if (ctx->pos == 1) {
+		down_read(&ii->lock);
+		err = ssdfs_inode_by_name(inode, &dotdot, &ino);
+		up_read(&ii->lock);
+
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to find dentry: err %d\n", err);
+			goto out;
+		}
+
+		if (!dir_emit_dotdot(file, ctx)) {
+			err = -ERANGE;
+			SSDFS_ERR("fail to emit dentry\n");
+			goto out;
+		}
+
+		ctx->pos = 2;
+	}
+
+	if (ctx->pos >= 2) {
+		down_read(&ii->lock);
+		err = ssdfs_dentries_tree_get_start_hash(ii->dentries_tree,
+							 &start_hash);
+		up_read(&ii->lock);
+
+		if (err == -ENOENT) {
+			err = 0;
+			ctx->pos = 2;
+			goto out;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to get start root hash: err %d\n", err);
+			goto out;
+		} else if (start_hash >= U64_MAX) {
+			err = -ERANGE;
+			SSDFS_ERR("invalid hash value\n");
+			goto out;
+		}
+
+		ctx->pos = 2;
+	}
+
+	search = ssdfs_btree_search_alloc();
+	if (!search) {
+		err = -ENOMEM;
+		SSDFS_ERR("fail to allocate btree search object\n");
+		goto out;
+	}
+
+	do {
+		ssdfs_btree_search_init(search);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("ctx->pos %llu, start_hash %llx\n",
+			  (u64)ctx->pos, start_hash);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		/* allow readdir() to be interrupted */
+		if (fatal_signal_pending(current)) {
+			err = -ERESTARTSYS;
+			goto out_free;
+		}
+		cond_resched();
+
+		down_read(&ii->lock);
+
+		err = ssdfs_dentries_tree_find_leaf_node(ii->dentries_tree,
+							 start_hash,
+							 search);
+		if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to find a leaf node: "
+				  "hash %llx, err %d\n",
+				  start_hash, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto finish_tree_processing;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to find a leaf node: "
+				  "hash %llx, err %d\n",
+				  start_hash, err);
+			goto finish_tree_processing;
+		}
+
+		err = ssdfs_dentries_tree_node_hash_range(ii->dentries_tree,
+							  search,
+							  &start_hash,
+							  &end_hash,
+							  &items_count);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to get node's hash range: "
+				  "err %d\n", err);
+			goto finish_tree_processing;
+		}
+
+		if (items_count == 0) {
+			err = -ENOENT;
+			SSDFS_DBG("empty leaf node\n");
+			goto finish_tree_processing;
+		}
+
+		if (start_hash > end_hash) {
+			err = -ENOENT;
+			goto finish_tree_processing;
+		}
+
+		err = ssdfs_dentries_tree_extract_range(ii->dentries_tree,
+							0, items_count,
+							search);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to extract the range: "
+				  "items_count %u, err %d\n",
+				  items_count, err);
+			goto finish_tree_processing;
+		}
+
+finish_tree_processing:
+		up_read(&ii->lock);
+
+		if (err == -ENODATA) {
+			err = 0;
+			goto out_free;
+		} else if (unlikely(err))
+			goto out_free;
+
+		err = ssdfs_dentries_tree_check_search_result(search);
+		if (unlikely(err)) {
+			SSDFS_ERR("corrupted search result: "
+				  "err %d\n", err);
+			goto out_free;
+		}
+
+		items_count = search->result.count;
+
+		for (i = 0; i < items_count; i++) {
+			u8 *start_ptr = (u8 *)search->result.raw_buf.place.ptr;
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("start_pos %llu, ctx->pos %llu\n",
+				  start_pos, ctx->pos);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			dentry = (struct ssdfs_dir_entry *)(start_ptr +
+							(i * dentry_size));
+			hash = le64_to_cpu(dentry->hash_code);
+
+			if (ctx->pos < start_pos) {
+				if (dot_hash == hash || dotdot_hash == hash) {
+					/* skip counting */
+					continue;
+				} else {
+					ctx->pos++;
+					continue;
+				}
+			}
+
+			if (is_invalid_dentry(dentry)) {
+				err = -EIO;
+				SSDFS_ERR("found corrupted dentry\n");
+				goto out_free;
+			}
+
+			if (dot_hash == hash || dotdot_hash == hash) {
+				/*
+				 * These items were created already.
+				 * Simply skip the case.
+				 */
+			} else if (dentry->flags &
+					SSDFS_DENTRY_HAS_EXTERNAL_STRING) {
+				err = ssdfs_shared_dict_get_name(dict, hash,
+							&search->name.string);
+				if (unlikely(err)) {
+					SSDFS_ERR("fail to extract the name: "
+						  "hash %llx, err %d\n",
+						  hash, err);
+					goto out_free;
+				}
+
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("ctx->pos %llu, name %s, "
+					  "name_len %zu, "
+					  "ino %llu, hash %llx\n",
+					  ctx->pos,
+					  search->name.string.str,
+					  search->name.string.len,
+					  le64_to_cpu(dentry->ino),
+					  hash);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+				if (!dir_emit(ctx,
+				    search->name.string.str,
+				    search->name.string.len,
+				    (ino_t)le64_to_cpu(dentry->ino),
+				    ssdfs_filetype_table[dentry->file_type])) {
+					/* stopped by some reason */
+					err = 1;
+					goto out_free;
+				} else
+					ctx->pos++;
+			} else {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("ctx->pos %llu, name %s, "
+					  "name_len %u, "
+					  "ino %llu, hash %llx\n",
+					  ctx->pos,
+					  dentry->inline_string,
+					  dentry->name_len,
+					  le64_to_cpu(dentry->ino),
+					  hash);
+				SSDFS_DBG("dentry %p, name %p\n",
+					  dentry, dentry->inline_string);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+				if (!dir_emit(ctx,
+				    dentry->inline_string,
+				    dentry->name_len,
+				    (ino_t)le64_to_cpu(dentry->ino),
+				    ssdfs_filetype_table[dentry->file_type])) {
+					/* stopped by some reason */
+					err = 1;
+					goto out_free;
+				} else
+					ctx->pos++;
+			}
+		}
+
+		if (hash != end_hash) {
+			err = -ERANGE;
+			SSDFS_ERR("hash %llx < end_hash %llx\n",
+				  hash, end_hash);
+			goto out_free;
+		}
+
+		start_hash = end_hash + 1;
+
+		down_read(&ii->lock);
+		err = ssdfs_dentries_tree_get_next_hash(ii->dentries_tree,
+							search,
+							&start_hash);
+		up_read(&ii->lock);
+
+		ssdfs_btree_search_forget_parent_node(search);
+		ssdfs_btree_search_forget_child_node(search);
+
+		if (err == -ENOENT) {
+			err = 0;
+			ctx->pos = U64_MAX;
+			SSDFS_DBG("no more items in the folder\n");
+			goto out_free;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to get next hash: err %d\n",
+				  err);
+			goto out_free;
+		}
+	} while (start_hash < U64_MAX);
+
+out_free:
+	ssdfs_btree_search_free(search);
+
+out:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+	return err;
+}
+
+const struct inode_operations ssdfs_dir_inode_operations = {
+	.create		= ssdfs_create,
+	.lookup		= ssdfs_lookup,
+	.link		= ssdfs_link,
+	.unlink		= ssdfs_unlink,
+	.symlink	= ssdfs_symlink,
+	.mkdir		= ssdfs_mkdir,
+	.rmdir		= ssdfs_rmdir,
+	.mknod		= ssdfs_mknod,
+	.rename		= ssdfs_rename,
+	.setattr	= ssdfs_setattr,
+	.listxattr	= ssdfs_listxattr,
+	.get_inode_acl	= ssdfs_get_acl,
+	.set_acl	= ssdfs_set_acl,
+};
+
+const struct file_operations ssdfs_dir_operations = {
+	.read		= generic_read_dir,
+	.iterate_shared	= ssdfs_readdir,
+	.unlocked_ioctl	= ssdfs_ioctl,
+	.fsync		= ssdfs_fsync,
+	.llseek		= generic_file_llseek,
+};
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 77/79] ssdfs: implement file operations support
  2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
                   ` (31 preceding siblings ...)
  2026-03-16  2:17 ` [PATCH v2 76/79] ssdfs: implement directory " Viacheslav Dubeyko
@ 2026-03-16  2:18 ` Viacheslav Dubeyko
  32 siblings, 0 replies; 34+ messages in thread
From: Viacheslav Dubeyko @ 2026-03-16  2:18 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, Viacheslav Dubeyko

Complete patchset is available here:
https://github.com/dubeyko/ssdfs-driver/tree/master/patchset/linux-kernel-6.18.0

Implement file operations support.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/ssdfs/file.c | 4341 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 4341 insertions(+)
 create mode 100644 fs/ssdfs/file.c

diff --git a/fs/ssdfs/file.c b/fs/ssdfs/file.c
new file mode 100644
index 000000000000..79179e221004
--- /dev/null
+++ b/fs/ssdfs/file.c
@@ -0,0 +1,4341 @@
+/*
+ * SPDX-License-Identifier: BSD-3-Clause-Clear
+ *
+ * SSDFS -- SSD-oriented File System.
+ *
+ * fs/ssdfs/file.c - file operations.
+ *
+ * Copyright (c) 2019-2026 Viacheslav Dubeyko <slava@dubeyko.com>
+ *              http://www.ssdfs.org/
+ * All rights reserved.
+ *
+ * Authors: Viacheslav Dubeyko <slava@dubeyko.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+#include <linux/writeback.h>
+#include <linux/pagevec.h>
+#include <linux/blkdev.h>
+
+#include "peb_mapping_queue.h"
+#include "peb_mapping_table_cache.h"
+#include "folio_vector.h"
+#include "ssdfs.h"
+#include "request_queue.h"
+#include "folio_array.h"
+#include "peb.h"
+#include "offset_translation_table.h"
+#include "peb_container.h"
+#include "segment_bitmap.h"
+#include "segment.h"
+#include "btree_search.h"
+#include "btree_node.h"
+#include "btree.h"
+#include "inodes_tree.h"
+#include "extents_tree.h"
+#include "xattr.h"
+#include "acl.h"
+#include "peb_mapping_table.h"
+
+#include <trace/events/ssdfs.h>
+
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+atomic64_t ssdfs_file_folio_leaks;
+atomic64_t ssdfs_file_memory_leaks;
+atomic64_t ssdfs_file_cache_leaks;
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+/*
+ * void ssdfs_file_cache_leaks_increment(void *kaddr)
+ * void ssdfs_file_cache_leaks_decrement(void *kaddr)
+ * void *ssdfs_file_kmalloc(size_t size, gfp_t flags)
+ * void *ssdfs_file_kzalloc(size_t size, gfp_t flags)
+ * void *ssdfs_file_kcalloc(size_t n, size_t size, gfp_t flags)
+ * void ssdfs_file_kfree(void *kaddr)
+ * struct folio *ssdfs_file_alloc_folio(gfp_t gfp_mask,
+ *                                      unsigned int order)
+ * struct folio *ssdfs_file_add_batch_folio(struct folio_batch *batch,
+ *                                          unsigned int order)
+ * void ssdfs_file_free_folio(struct folio *folio)
+ * void ssdfs_file_folio_batch_release(struct folio_batch *batch)
+ */
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	SSDFS_MEMORY_LEAKS_CHECKER_FNS(file)
+#else
+	SSDFS_MEMORY_ALLOCATOR_FNS(file)
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+
+void ssdfs_file_memory_leaks_init(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	atomic64_set(&ssdfs_file_folio_leaks, 0);
+	atomic64_set(&ssdfs_file_memory_leaks, 0);
+	atomic64_set(&ssdfs_file_cache_leaks, 0);
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+void ssdfs_file_check_memory_leaks(void)
+{
+#ifdef CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING
+	if (atomic64_read(&ssdfs_file_folio_leaks) != 0) {
+		SSDFS_ERR("FILE: "
+			  "memory leaks include %lld folios\n",
+			  atomic64_read(&ssdfs_file_folio_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_file_memory_leaks) != 0) {
+		SSDFS_ERR("FILE: "
+			  "memory allocator suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_file_memory_leaks));
+	}
+
+	if (atomic64_read(&ssdfs_file_cache_leaks) != 0) {
+		SSDFS_ERR("FILE: "
+			  "caches suffers from %lld leaks\n",
+			  atomic64_read(&ssdfs_file_cache_leaks));
+	}
+#endif /* CONFIG_SSDFS_MEMORY_LEAKS_ACCOUNTING */
+}
+
+enum {
+	SSDFS_BLOCK_BASED_REQUEST,
+	SSDFS_EXTENT_BASED_REQUEST,
+};
+
+enum {
+	SSDFS_CURRENT_THREAD_READ,
+	SSDFS_DELEGATE_TO_READ_THREAD,
+};
+
+static inline
+bool can_file_be_inline(struct inode *inode, loff_t new_size)
+{
+	size_t capacity = ssdfs_inode_inline_file_capacity(inode);
+
+	if (capacity == 0)
+		return false;
+
+	if (capacity < new_size)
+		return false;
+
+	return true;
+}
+
+static inline
+size_t ssdfs_inode_size_threshold(void)
+{
+	return sizeof(struct ssdfs_inode) -
+			offsetof(struct ssdfs_inode, internal);
+}
+
+int ssdfs_allocate_inline_file_buffer(struct inode *inode)
+{
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	size_t threshold = ssdfs_inode_size_threshold();
+	size_t inline_capacity;
+
+	if (ii->inline_file)
+		return 0;
+
+	inline_capacity = ssdfs_inode_inline_file_capacity(inode);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("inline_capacity %zu, threshold %zu\n",
+		  inline_capacity, threshold);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (inline_capacity < threshold) {
+		SSDFS_ERR("inline_capacity %zu < threshold %zu\n",
+			  inline_capacity, threshold);
+		return -ERANGE;
+	} else if (inline_capacity == threshold) {
+		ii->inline_file = ii->raw_inode.internal;
+	} else {
+		ii->inline_file =
+			ssdfs_file_kzalloc(inline_capacity, GFP_KERNEL);
+		if (!ii->inline_file) {
+			SSDFS_ERR("fail to allocate inline buffer: "
+				  "ino %lu, inline_capacity %zu\n",
+				  inode->i_ino, inline_capacity);
+			return -ENOMEM;
+		}
+	}
+
+	return 0;
+}
+
+void ssdfs_destroy_inline_file_buffer(struct inode *inode)
+{
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	size_t threshold = ssdfs_inode_size_threshold();
+	size_t inline_capacity;
+
+	if (!ii->inline_file)
+		return;
+
+	inline_capacity = ssdfs_inode_inline_file_capacity(inode);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("inline_capacity %zu, threshold %zu\n",
+		  inline_capacity, threshold);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (inline_capacity <= threshold) {
+		ii->inline_file = NULL;
+	} else {
+		ssdfs_file_kfree(ii->inline_file);
+		ii->inline_file = NULL;
+	}
+}
+
+/*
+ * ssdfs_read_block_async() - read block async
+ * @fsi: pointer on shared file system object
+ * @req: request object
+ */
+static
+int ssdfs_read_block_async(struct ssdfs_fs_info *fsi,
+			   struct ssdfs_segment_request *req)
+{
+	struct ssdfs_segment_info *si;
+	struct ssdfs_segment_search_state seg_search;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !req);
+	BUG_ON((req->extent.logical_offset >> fsi->log_pagesize) >= U32_MAX);
+
+	SSDFS_DBG("fsi %p, req %p\n", fsi, req);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_prepare_volume_extent(fsi, req);
+	if (err == -EAGAIN) {
+		err = 0;
+		SSDFS_DBG("logical extent processed partially\n");
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to prepare volume extent: "
+			  "ino %llu, logical_offset %llu, "
+			  "data_bytes %u, cno %llu, "
+			  "parent_snapshot %llu, err %d\n",
+			  req->extent.ino,
+			  req->extent.logical_offset,
+			  req->extent.data_bytes,
+			  req->extent.cno,
+			  req->extent.parent_snapshot,
+			  err);
+		return err;
+	}
+
+	req->place.len = 1;
+
+	ssdfs_segment_search_state_init(&seg_search,
+					SSDFS_USER_DATA_SEG_TYPE,
+					req->place.start.seg_id, U64_MAX);
+
+	si = ssdfs_grab_segment(fsi, &seg_search);
+	if (unlikely(IS_ERR_OR_NULL(si))) {
+		SSDFS_ERR("fail to grab segment object: "
+			  "seg %llu, err %ld\n",
+			  req->place.start.seg_id,
+			  PTR_ERR(si));
+		return PTR_ERR(si);
+	}
+
+	if (!is_ssdfs_segment_ready_for_requests(si)) {
+		err = ssdfs_wait_segment_init_end(si);
+		if (unlikely(err)) {
+			SSDFS_ERR("segment initialization failed: "
+				  "seg %llu, ino %llu, err %d\n",
+				  si->seg_id, req->extent.ino, err);
+			return err;
+		}
+	}
+
+	err = ssdfs_segment_read_block_async(si, SSDFS_REQ_ASYNC, req);
+	if (unlikely(err)) {
+		SSDFS_ERR("read request failed: "
+			  "ino %llu, logical_offset %llu, size %u, err %d\n",
+			  req->extent.ino, req->extent.logical_offset,
+			  req->extent.data_bytes, err);
+		return err;
+	}
+
+	ssdfs_segment_put_object(si);
+
+	return 0;
+}
+
+/*
+ * ssdfs_read_block_by_current_thread() - read block by current thread
+ * @fsi: pointer on shared file system object
+ * @req: request object
+ */
+static
+int ssdfs_read_block_by_current_thread(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request *req)
+{
+	struct ssdfs_segment_info *si;
+	struct ssdfs_peb_container *pebc;
+	struct ssdfs_blk2off_table *table;
+	struct ssdfs_offset_position pos;
+	struct ssdfs_segment_search_state seg_search;
+	u16 logical_blk;
+	struct completion *end;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !req);
+	BUG_ON((req->extent.logical_offset >> fsi->log_pagesize) >= U32_MAX);
+
+	SSDFS_DBG("fsi %p, req %p\n", fsi, req);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_prepare_volume_extent(fsi, req);
+	if (err == -EAGAIN) {
+		err = 0;
+		SSDFS_DBG("logical extent processed partially\n");
+	} else if (err == -ENOENT) {
+		SSDFS_DBG("fork is absent: "
+			  "ino %llu, logical_offset %llu, "
+			  "data_bytes %u, cno %llu, "
+			  "parent_snapshot %llu, err %d\n",
+			  req->extent.ino,
+			  req->extent.logical_offset,
+			  req->extent.data_bytes,
+			  req->extent.cno,
+			  req->extent.parent_snapshot,
+			  err);
+		return err;
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to prepare volume extent: "
+			  "ino %llu, logical_offset %llu, "
+			  "data_bytes %u, cno %llu, "
+			  "parent_snapshot %llu, err %d\n",
+			  req->extent.ino,
+			  req->extent.logical_offset,
+			  req->extent.data_bytes,
+			  req->extent.cno,
+			  req->extent.parent_snapshot,
+			  err);
+		return err;
+	}
+
+	req->place.len = 1;
+
+	ssdfs_segment_search_state_init(&seg_search,
+					SSDFS_USER_DATA_SEG_TYPE,
+					req->place.start.seg_id, U64_MAX);
+
+	si = ssdfs_grab_segment(fsi, &seg_search);
+	if (unlikely(IS_ERR_OR_NULL(si))) {
+		SSDFS_ERR("fail to grab segment object: "
+			  "seg %llu, err %d\n",
+			  req->place.start.seg_id, err);
+		return PTR_ERR(si);
+	}
+
+	if (!is_ssdfs_segment_ready_for_requests(si)) {
+		err = ssdfs_wait_segment_init_end(si);
+		if (unlikely(err)) {
+			SSDFS_ERR("segment initialization failed: "
+				  "seg %llu, err %d\n",
+				  si->seg_id, err);
+			goto finish_read_block;
+		}
+	}
+
+	ssdfs_request_prepare_internal_data(SSDFS_PEB_READ_REQ,
+					    SSDFS_READ_PAGE,
+					    SSDFS_REQ_SYNC,
+					    req);
+	ssdfs_request_define_segment(si->seg_id, req);
+
+	table = si->blk2off_table;
+	logical_blk = req->place.start.blk_index;
+
+	err = ssdfs_blk2off_table_get_offset_position(table, logical_blk, &pos);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to convert: "
+			  "logical_blk %u, err %d\n",
+			  logical_blk, err);
+		goto finish_read_block;
+	}
+
+	pebc = &si->peb_array[pos.peb_index];
+
+	ssdfs_peb_read_request_cno(pebc);
+
+	err = ssdfs_peb_read_page(pebc, req, &end);
+	if (err == -EAGAIN) {
+		err = SSDFS_WAIT_COMPLETION(end);
+		if (unlikely(err)) {
+			SSDFS_ERR("PEB init failed: "
+				  "err %d\n", err);
+			goto forget_request_cno;
+		}
+
+		err = ssdfs_peb_read_page(pebc, req, &end);
+	}
+
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to read block: err %d\n",
+			  err);
+		goto forget_request_cno;
+	}
+
+	for (i = 0; i < req->result.processed_blks; i++)
+		ssdfs_peb_mark_request_block_uptodate(pebc, req, i);
+
+forget_request_cno:
+	ssdfs_peb_finish_read_request_cno(pebc);
+
+finish_read_block:
+	req->result.err = err;
+	complete(&req->result.wait);
+	ssdfs_segment_put_object(si);
+
+	return 0;
+}
+
+static
+int ssdfs_read_block_nolock(struct file *file, struct folio_batch *batch,
+			    int read_mode)
+{
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(file_inode(file)->i_sb);
+	struct inode *inode = file_inode(file);
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	struct folio *folio;
+	struct ssdfs_segment_request *req = NULL;
+	ino_t ino = file_inode(file)->i_ino;
+	loff_t logical_offset;
+	loff_t data_bytes = 0;
+	loff_t file_size;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, read_mode %#x\n",
+		  ino, read_mode);
+
+	BUG_ON(folio_batch_count(batch) == 0);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	folio = batch->folios[0];
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	logical_offset = (loff_t)folio->index << PAGE_SHIFT;
+
+	for (i = 0; i < folio_batch_count(batch); i++) {
+		folio = batch->folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		__ssdfs_memzero_folio(folio, 0, folio_size(folio),
+				      folio_size(folio));
+
+		data_bytes += folio_size(folio);
+	}
+
+	file_size = i_size_read(file_inode(file));
+	data_bytes = min_t(loff_t, file_size - logical_offset, data_bytes);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(data_bytes > U32_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (logical_offset >= file_size) {
+		/* Reading beyond inode */
+		for (i = 0; i < folio_batch_count(batch); i++) {
+			folio = batch->folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+			BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			folio_mark_uptodate(folio);
+			flush_dcache_folio(folio);
+		}
+
+		goto finish_read_block;
+	} else if (is_ssdfs_file_inline(ii)) {
+		loff_t byte_offset = 0;
+		loff_t iter_bytes;
+		size_t inline_capacity =
+				ssdfs_inode_inline_file_capacity(inode);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("inline_capacity %zu, file_size %llu\n",
+			  inline_capacity, file_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (file_size > inline_capacity) {
+			err = -E2BIG;
+			SSDFS_ERR("file_size %llu is greater than capacity %zu\n",
+				  file_size, inline_capacity);
+			goto fail_read_block;
+		} else if (data_bytes > inline_capacity) {
+			err = -ERANGE;
+			SSDFS_ERR("data_bytes %llu is greater than capacity %zu\n",
+				  data_bytes, inline_capacity);
+			goto fail_read_block;
+		}
+
+		for (i = 0; i < folio_batch_count(batch); i++) {
+			folio = batch->folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+			BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			if (byte_offset < inline_capacity) {
+#ifdef CONFIG_SSDFS_DEBUG
+				BUG_ON(data_bytes <= byte_offset);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+				iter_bytes = min_t(loff_t, folio_size(folio),
+						   data_bytes - byte_offset);
+
+				err = __ssdfs_memcpy_to_folio(folio,
+							      0,
+							      folio_size(folio),
+							      ii->inline_file,
+							      byte_offset,
+							      inline_capacity,
+							      iter_bytes);
+				if (unlikely(err)) {
+					SSDFS_ERR("fail to copy file's content: "
+						  "err %d\n", err);
+					goto fail_read_block;
+				}
+			}
+
+			folio_mark_uptodate(folio);
+			flush_dcache_folio(folio);
+
+			byte_offset += folio_size(folio);
+		}
+
+		goto finish_read_block;
+	}
+
+	req = ssdfs_request_alloc();
+	if (IS_ERR_OR_NULL(req)) {
+		err = (req == NULL ? -ENOMEM : PTR_ERR(req));
+		SSDFS_ERR("fail to allocate segment request: err %d\n",
+			  err);
+		return err;
+	}
+
+	ssdfs_request_init(req, fsi->pagesize);
+	ssdfs_get_request(req);
+
+	ssdfs_request_prepare_logical_extent(ino,
+					     (u64)logical_offset,
+					     (u32)data_bytes,
+					     0, 0, req);
+
+	for (i = 0; i < folio_batch_count(batch); i++) {
+		folio = batch->folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		err = ssdfs_request_add_folio(folio, 0, req);
+		if (err) {
+			SSDFS_ERR("fail to add folio into request: "
+				  "ino %lu, folio_index %lu, err %d\n",
+				  ino, folio->index, err);
+			goto fail_read_block;
+		}
+	}
+
+	switch (read_mode) {
+	case SSDFS_CURRENT_THREAD_READ:
+		err = ssdfs_read_block_by_current_thread(fsi, req);
+		if (err == -ENOENT) {
+			SSDFS_DBG("empty block has been prepared\n");
+
+			for (i = 0; i < folio_batch_count(batch); i++) {
+				folio = batch->folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+				BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+				folio_mark_uptodate(folio);
+				flush_dcache_folio(folio);
+			}
+
+			ssdfs_put_request(req);
+			ssdfs_request_free(req, NULL);
+			goto finish_read_block;
+		} else if (err) {
+			SSDFS_ERR("fail to read block: err %d\n", err);
+			goto fail_read_block;
+		}
+
+		err = SSDFS_WAIT_COMPLETION(&req->result.wait);
+		if (unlikely(err)) {
+			SSDFS_ERR("read request failed: "
+				  "ino %lu, logical_offset %llu, "
+				  "size %u, err %d\n",
+				  ino, (u64)logical_offset,
+				  (u32)data_bytes, err);
+			goto fail_read_block;
+		}
+
+		if (req->result.err) {
+			SSDFS_ERR("read request failed: "
+				  "ino %lu, logical_offset %llu, "
+				  "size %u, err %d\n",
+				  ino, (u64)logical_offset,
+				  (u32)data_bytes,
+				  req->result.err);
+			goto fail_read_block;
+		}
+
+		ssdfs_put_request(req);
+		ssdfs_request_free(req, NULL);
+		break;
+
+	case SSDFS_DELEGATE_TO_READ_THREAD:
+		err = ssdfs_read_block_async(fsi, req);
+		if (err) {
+			SSDFS_ERR("fail to read block: err %d\n", err);
+			goto fail_read_block;
+		}
+		break;
+
+	default:
+		BUG();
+	}
+
+finish_read_block:
+	return 0;
+
+fail_read_block:
+	for (i = 0; i < folio_batch_count(batch); i++) {
+		folio = batch->folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		folio_clear_uptodate(folio);
+	}
+
+	if (req) {
+		ssdfs_put_request(req);
+		ssdfs_request_free(req, NULL);
+	}
+
+	return err;
+}
+
+static
+int ssdfs_read_block(struct file *file, struct folio *folio)
+{
+	struct inode *inode = file_inode(file);
+	struct address_space *mapping = file->f_mapping;
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct folio_batch fbatch;
+	struct folio *cur_folio;
+	loff_t logical_offset;
+	pgoff_t index;
+	fgf_t fgp_flags = FGP_CREAT | FGP_LOCK;
+	u32 processed_bytes = 0;
+	bool need_read_block = false;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, folio_index %lu, "
+		  "folio_size %zu\n",
+		  file_inode(file)->i_ino, folio->index,
+		  folio_size(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	folio_batch_init(&fbatch);
+
+	logical_offset = (loff_t)folio->index << PAGE_SHIFT;
+	logical_offset >>= fsi->log_pagesize;
+	logical_offset <<= fsi->log_pagesize;
+
+	index = logical_offset >> PAGE_SHIFT;
+
+	while (processed_bytes < fsi->pagesize) {
+		if (index == folio->index) {
+			cur_folio = folio;
+			ssdfs_account_locked_folio(folio);
+		} else {
+			fgp_flags |= fgf_set_order(fsi->pagesize -
+						   processed_bytes);
+
+			cur_folio = __filemap_get_folio(mapping,
+						    index,
+						    fgp_flags,
+						    mapping_gfp_mask(mapping));
+			if (!cur_folio) {
+				SSDFS_ERR("fail to grab folio: page_index %lu\n",
+					  index);
+				return -ENOMEM;
+			} else if (IS_ERR(cur_folio)) {
+				SSDFS_ERR("fail to grab folio: "
+					  "page_index %lu, err %ld\n",
+					  index, PTR_ERR(cur_folio));
+				return PTR_ERR(cur_folio);
+			}
+
+			ssdfs_account_locked_folio(cur_folio);
+		}
+
+		folio_batch_add(&fbatch, cur_folio);
+
+		if (!folio_test_uptodate(cur_folio))
+			need_read_block = true;
+
+		index += folio_size(cur_folio) >> PAGE_SHIFT;
+		processed_bytes += folio_size(cur_folio);
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	processed_bytes = 0;
+
+	for (i = 0; i < folio_batch_count(&fbatch); i++) {
+		cur_folio = fbatch.folios[i];
+
+		if (!cur_folio)
+			continue;
+
+		processed_bytes += folio_size(cur_folio);
+	}
+
+	if (processed_bytes != fsi->pagesize) {
+		SSDFS_ERR("invalid block batch: "
+			  "ino %lu, folio_index %lu, "
+			  "folio_size %zu, processed_bytes %u, "
+			  "pagesize %u\n",
+			  file_inode(file)->i_ino, folio->index,
+			  folio_size(folio), processed_bytes,
+			  fsi->pagesize);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (need_read_block) {
+		err = ssdfs_read_block_nolock(file, &fbatch,
+						SSDFS_CURRENT_THREAD_READ);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to read folio: "
+				  "index %lu, err %d\n",
+				  folio->index, err);
+		}
+	}
+
+	for (i = 0; i < folio_batch_count(&fbatch); i++) {
+		cur_folio = fbatch.folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!cur_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		ssdfs_folio_unlock(cur_folio);
+	}
+
+	return err;
+}
+
+static
+int ssdfs_check_read_request(struct ssdfs_segment_request *req)
+{
+	wait_queue_head_t *wq = NULL;
+	int res;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!req);
+
+	SSDFS_DBG("req %p\n", req);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+check_req_state:
+	switch (atomic_read(&req->result.state)) {
+	case SSDFS_REQ_CREATED:
+	case SSDFS_REQ_STARTED:
+		wq = &req->private.wait_queue;
+
+		res = wait_event_killable_timeout(*wq,
+					has_request_been_executed(req),
+					SSDFS_DEFAULT_TIMEOUT);
+		if (res < 0) {
+			err = res;
+			WARN_ON(1);
+		} else if (res > 1) {
+			/*
+			 * Condition changed before timeout
+			 */
+			goto check_req_state;
+		} else {
+			/* timeout is elapsed */
+			err = -ERANGE;
+			WARN_ON(1);
+		}
+		break;
+
+	case SSDFS_REQ_FINISHED:
+		/* do nothing */
+		break;
+
+	case SSDFS_REQ_FAILED:
+		err = req->result.err;
+
+		if (!err) {
+			SSDFS_ERR("error code is absent: "
+				  "req %p, err %d\n",
+				  req, err);
+			err = -ERANGE;
+		}
+
+		SSDFS_ERR("read request is failed: "
+			  "err %d\n", err);
+		goto finish_check;
+
+	default:
+		err = -ERANGE;
+		SSDFS_ERR("invalid result's state %#x\n",
+			  atomic_read(&req->result.state));
+		goto finish_check;
+	}
+
+finish_check:
+	return err;
+}
+
+static
+int ssdfs_wait_read_request_end(struct ssdfs_fs_info *fsi,
+				struct ssdfs_segment_request *req)
+{
+	struct ssdfs_segment_info *si;
+	struct ssdfs_segment_search_state seg_search;
+	wait_queue_head_t *wait;
+	int res;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("req %p\n", req);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!req)
+		return 0;
+
+	err = ssdfs_check_read_request(req);
+	if (unlikely(err)) {
+		SSDFS_ERR("read request failed: "
+			  "err %d\n", err);
+		goto free_request;
+	}
+
+	ssdfs_segment_search_state_init(&seg_search,
+					SSDFS_USER_DATA_SEG_TYPE,
+					req->place.start.seg_id, U64_MAX);
+
+	si = ssdfs_grab_segment(fsi, &seg_search);
+	if (unlikely(IS_ERR_OR_NULL(si))) {
+		err = (si == NULL ? -ENOMEM : PTR_ERR(si));
+		SSDFS_ERR("fail to grab segment object: "
+			  "seg %llu, err %d\n",
+			  req->place.start.seg_id,
+			  err);
+		goto finish_wait;
+	}
+
+	wait = &si->wait_queue[SSDFS_PEB_READ_THREAD];
+
+	if (atomic_read(&req->private.refs_count) != 0) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("start waiting: refs_count %d\n",
+			   atomic_read(&req->private.refs_count));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		res = wait_event_killable_timeout(*wait,
+			    atomic_read(&req->private.refs_count) == 0,
+			    SSDFS_DEFAULT_TIMEOUT);
+		if (res < 0) {
+			err = res;
+			WARN_ON(1);
+		} else if (res > 1) {
+			/*
+			 * Condition changed before timeout
+			 */
+		} else {
+			/* timeout is elapsed */
+			err = -ERANGE;
+			WARN_ON(1);
+		}
+	}
+
+	ssdfs_segment_put_object(si);
+
+free_request:
+	ssdfs_request_free(req, si);
+
+finish_wait:
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+	return err;
+}
+
+struct ssdfs_readahead_env {
+	struct file *file;
+	struct ssdfs_segment_request **reqs;
+	unsigned count;
+	unsigned capacity;
+
+	struct folio_batch batch;
+	struct ssdfs_logical_extent requested;
+	struct ssdfs_volume_extent place;
+	struct ssdfs_volume_extent cur_extent;
+};
+
+static
+struct ssdfs_segment_request *
+ssdfs_issue_read_request(struct ssdfs_readahead_env *env)
+{
+	struct ssdfs_fs_info *fsi;
+	struct ssdfs_segment_request *req = NULL;
+	struct ssdfs_segment_info *si;
+	struct ssdfs_segment_search_state seg_search;
+	loff_t data_bytes = 0;
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env);
+
+	SSDFS_DBG("requested (ino %llu, logical_offset %llu, "
+		  "cno %llu, parent_snapshot %llu), "
+		  "current extent (seg_id %llu, logical_blk %u, len %u)\n",
+		  env->requested.ino,
+		  env->requested.logical_offset,
+		  env->requested.cno,
+		  env->requested.parent_snapshot,
+		  env->cur_extent.start.seg_id,
+		  env->cur_extent.start.blk_index,
+		  env->cur_extent.len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	fsi = SSDFS_FS_I(file_inode(env->file)->i_sb);
+
+	req = ssdfs_request_alloc();
+	if (IS_ERR_OR_NULL(req)) {
+		err = (req == NULL ? -ENOMEM : PTR_ERR(req));
+		SSDFS_ERR("fail to allocate segment request: err %d\n",
+			  err);
+		return req;
+	}
+
+	ssdfs_request_init(req, fsi->pagesize);
+	ssdfs_get_request(req);
+
+	for (i = 0; i < folio_batch_count(&env->batch); i++) {
+		struct folio *folio;
+
+		folio = env->batch.folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		data_bytes += folio_size(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("folio_index %d, folio_size %zu, "
+			  "data_bytes %llu\n",
+			  i, folio_size(folio),
+			  (u64)data_bytes);
+#endif /* CONFIG_SSDFS_DEBUG */
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(data_bytes == 0);
+	BUG_ON(data_bytes > fsi->pagesize);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_request_prepare_logical_extent(env->requested.ino,
+					     env->requested.logical_offset,
+					     (u32)data_bytes,
+					     env->requested.cno,
+					     env->requested.parent_snapshot,
+					     req);
+
+	ssdfs_request_define_segment(env->cur_extent.start.seg_id, req);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(env->cur_extent.start.blk_index >= U16_MAX);
+	BUG_ON(env->cur_extent.len >= U16_MAX);
+#endif /* CONFIG_SSDFS_DEBUG */
+	ssdfs_request_define_volume_extent(env->cur_extent.start.blk_index,
+					   env->cur_extent.len,
+					   req);
+
+	for (i = 0; i < folio_batch_count(&env->batch); i++) {
+		err = ssdfs_request_add_folio(env->batch.folios[i], 0, req);
+		if (err) {
+			SSDFS_ERR("fail to add folio into request: "
+				  "ino %llu, err %d\n",
+				  env->requested.ino, err);
+			goto fail_issue_read_request;
+		}
+
+		env->batch.folios[i] = NULL;
+	}
+
+	ssdfs_segment_search_state_init(&seg_search,
+					SSDFS_USER_DATA_SEG_TYPE,
+					req->place.start.seg_id, U64_MAX);
+
+	si = ssdfs_grab_segment(fsi, &seg_search);
+	if (unlikely(IS_ERR_OR_NULL(si))) {
+		err = (si == NULL ? -ENOMEM : PTR_ERR(si));
+		SSDFS_ERR("fail to grab segment object: "
+			  "seg %llu, err %d\n",
+			  req->place.start.seg_id,
+			  err);
+		goto fail_issue_read_request;
+	}
+
+	if (!is_ssdfs_segment_ready_for_requests(si)) {
+		err = ssdfs_wait_segment_init_end(si);
+		if (unlikely(err)) {
+			SSDFS_ERR("segment initialization failed: "
+				  "seg %llu, ino %llu, err %d\n",
+				  si->seg_id, req->extent.ino, err);
+			goto fail_issue_read_request;
+		}
+	}
+
+	err = ssdfs_segment_read_block_async(si, SSDFS_REQ_ASYNC_NO_FREE, req);
+	if (unlikely(err)) {
+		SSDFS_ERR("read request failed: "
+			  "ino %llu, logical_offset %llu, size %u, err %d\n",
+			  req->extent.ino, req->extent.logical_offset,
+			  req->extent.data_bytes, err);
+		goto fail_issue_read_request;
+	}
+
+	ssdfs_segment_put_object(si);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return req;
+
+fail_issue_read_request:
+	ssdfs_put_request(req);
+	ssdfs_request_free(req, si);
+
+	return ERR_PTR(err);
+}
+
+static
+int ssdfs_readahead_block(struct ssdfs_readahead_env *env)
+{
+	struct ssdfs_fs_info *fsi;
+	struct inode *inode;
+	struct folio *folio;
+	ino_t ino;
+	pgoff_t index;
+	loff_t logical_offset;
+	loff_t data_bytes;
+	loff_t file_size;
+	int i;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!env);
+
+	SSDFS_DBG("folios_count %u\n",
+		  folio_batch_count(&env->batch));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	inode = file_inode(env->file);
+	fsi = SSDFS_FS_I(inode->i_sb);
+	ino = inode->i_ino;
+
+	if (folio_batch_count(&env->batch) == 0) {
+		SSDFS_ERR("empty batch\n");
+		return -ERANGE;
+	}
+
+	folio = env->batch.folios[0];
+
+	index = folio->index;
+	logical_offset = (loff_t)index << PAGE_SHIFT;
+
+	file_size = i_size_read(inode);
+	data_bytes = file_size - logical_offset;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(data_bytes > U32_MAX);
+
+	SSDFS_DBG("folio_index %llu, logical_offset %llu, "
+		  "file_size %llu, data_bytes %llu\n",
+		  (u64)index, (u64)logical_offset,
+		  (u64)file_size, (u64)data_bytes);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	env->requested.ino = ino;
+	env->requested.logical_offset = logical_offset;
+	env->requested.data_bytes = data_bytes;
+	env->requested.cno = 0;
+	env->requested.parent_snapshot = 0;
+
+	if (env->place.len == 0) {
+		err = __ssdfs_prepare_volume_extent(fsi, inode,
+						    &env->requested,
+						    &env->place);
+		if (err == -EAGAIN) {
+			err = 0;
+			SSDFS_DBG("logical extent processed partially\n");
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to prepare volume extent: "
+				  "ino %llu, logical_offset %llu, "
+				  "data_bytes %u, cno %llu, "
+				  "parent_snapshot %llu, err %d\n",
+				  env->requested.ino,
+				  env->requested.logical_offset,
+				  env->requested.data_bytes,
+				  env->requested.cno,
+				  env->requested.parent_snapshot,
+				  err);
+			goto fail_readahead_block;
+		}
+	}
+
+	if (env->place.len == 0) {
+		err = -ERANGE;
+		SSDFS_ERR("found empty extent\n");
+		goto fail_readahead_block;
+	}
+
+	env->cur_extent.start.seg_id = env->place.start.seg_id;
+	env->cur_extent.start.blk_index = env->place.start.blk_index;
+	env->cur_extent.len = 1;
+
+	env->place.start.blk_index++;
+	env->place.len--;
+
+	env->reqs[env->count] = ssdfs_issue_read_request(env);
+	if (IS_ERR_OR_NULL(env->reqs[env->count])) {
+		err = (env->reqs[env->count] == NULL ? -ENOMEM :
+					PTR_ERR(env->reqs[env->count]));
+		env->reqs[env->count] = NULL;
+
+		if (err == -ENODATA) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("no data for the block: "
+				  "index %d\n", env->count);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto fail_readahead_block;
+		} else {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("unable to issue request: "
+				  "index %d, err %d\n",
+				  env->count, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+			goto fail_readahead_block;
+		}
+	} else
+		env->count++;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return 0;
+
+fail_readahead_block:
+	for (i = 0; i < folio_batch_count(&env->batch); i++) {
+		folio = env->batch.folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		__ssdfs_memzero_folio(folio, 0, folio_size(folio),
+					folio_size(folio));
+
+		folio_clear_uptodate(folio);
+		ssdfs_folio_unlock(folio);
+		ssdfs_folio_put(folio);
+	}
+
+	return err;
+}
+
+/*
+ * The ssdfs_readahead() is called by the VM to read pages
+ * associated with the address_space object. The pages are
+ * consecutive in the page cache and are locked.
+ * The implementation should decrement the page refcount
+ * after starting I/O on each page. Usually the page will be
+ * unlocked by the I/O completion handler. The ssdfs_readahead()
+ * is only used for read-ahead, so read errors are ignored.
+ */
+static
+void ssdfs_readahead(struct readahead_control *rac)
+{
+	struct inode *inode = file_inode(rac->file);
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_readahead_env env;
+	struct folio *folio;
+	pgoff_t index;
+	loff_t logical_offset;
+	loff_t file_size;
+	unsigned i;
+	int res;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, nr_pages %u\n",
+		  file_inode(rac->file)->i_ino,
+		  readahead_count(rac));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (is_ssdfs_file_inline(ii)) {
+		/* do nothing */
+		return;
+	}
+
+	env.file = rac->file;
+	env.count = 0;
+	env.capacity = readahead_count(rac);
+
+	env.reqs = ssdfs_file_kcalloc(env.capacity,
+				  sizeof(struct ssdfs_segment_request *),
+				  GFP_KERNEL);
+	if (!env.reqs) {
+		SSDFS_ERR("fail to allocate requests array\n");
+		return;
+	}
+
+	folio_batch_init(&env.batch);
+	memset(&env.requested, 0, sizeof(struct ssdfs_logical_extent));
+	memset(&env.place, 0, sizeof(struct ssdfs_volume_extent));
+	memset(&env.cur_extent, 0, sizeof(struct ssdfs_volume_extent));
+
+	for (i = 0; i < env.capacity; i++) {
+		u32 processed_bytes = 0;
+
+		folio_batch_reinit(&env.batch);
+
+		while (processed_bytes < fsi->pagesize) {
+			folio = readahead_folio(rac);
+			if (!folio) {
+				SSDFS_DBG("no more folios\n");
+
+				if (processed_bytes > 0)
+					goto try_readahead_block;
+				else
+					goto finish_requests_processing;
+			}
+
+			prefetchw(&folio->flags);
+
+			index = folio->index;
+			logical_offset = (loff_t)index << PAGE_SHIFT;
+			file_size = i_size_read(file_inode(env.file));
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("index %lu, folio_size %zu, "
+				  "logical_offset %llu, file_size %llu\n",
+				  index, folio_size(folio),
+				  logical_offset, file_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			if (logical_offset >= file_size) {
+				/* Reading beyond inode */
+				err = -ENODATA;
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("Reading beyond inode: "
+					  "index %lu, folio_size %zu, "
+					  "logical_offset %llu, file_size %llu\n",
+					  index, folio_size(folio),
+					  logical_offset, file_size);
+#endif /* CONFIG_SSDFS_DEBUG */
+				folio_mark_uptodate(folio);
+				flush_dcache_folio(folio);
+
+				if (processed_bytes > 0)
+					goto try_readahead_block;
+				else
+					goto finish_requests_processing;
+			}
+
+			ssdfs_folio_get(folio);
+			ssdfs_account_locked_folio(folio);
+
+			__ssdfs_memzero_folio(folio, 0, folio_size(folio),
+					      folio_size(folio));
+
+			folio_batch_add(&env.batch, folio);
+
+			processed_bytes += folio_size(folio);
+		}
+
+try_readahead_block:
+		err = ssdfs_readahead_block(&env);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to process block: "
+				  "index %u, err %d\n",
+				  env.count, err);
+			break;
+		}
+	}
+
+finish_requests_processing:
+	for (i = 0; i < env.count; i++) {
+		res = ssdfs_wait_read_request_end(fsi, env.reqs[i]);
+		if (res) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("waiting has finished with issue: "
+				  "index %u, err %d\n",
+				  i, res);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		if (err == 0)
+			err = res;
+
+		env.reqs[i] = NULL;
+	}
+
+	if (env.reqs)
+		ssdfs_file_kfree(env.reqs);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	if (err) {
+		SSDFS_DBG("readahead fails: "
+			  "ino %lu, nr_pages %u, err %d\n",
+			  file_inode(rac->file)->i_ino,
+			  readahead_count(rac), err);
+	} else {
+		SSDFS_DBG("readahead finished: "
+			  "ino %lu, nr_pages %u, err %d\n",
+			  file_inode(rac->file)->i_ino,
+			  readahead_count(rac), err);
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return;
+}
+
+static ssize_t ssdfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+	struct folio *folio;
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file_inode(file);
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct address_space *mapping = file->f_mapping;
+	pgoff_t start_index = iocb->ki_pos >> PAGE_SHIFT;
+	size_t iter_bytes = iov_iter_count(iter);
+	u32 processed_bytes;
+	size_t folios_count;
+	int pages_per_folio = fsi->pagesize >> PAGE_SHIFT;
+	fgf_t fgp_flags = FGP_CREAT | FGP_LOCK;
+	int i;
+	ssize_t res = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, pos %llu, iter_bytes %zu\n",
+		  inode->i_ino, iocb->ki_pos, iter_bytes);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!iter_bytes)
+		return 0;
+
+	if (iocb->ki_pos >= i_size_read(inode))
+		return 0;
+
+	iter_bytes = min_t(size_t,
+			   iter_bytes, i_size_read(inode) - iocb->ki_pos);
+	folios_count = (iter_bytes + fsi->pagesize - 1) >> fsi->log_pagesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("iter_bytes %zu, folios_count %zu\n",
+		  iter_bytes, folios_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!mapping_large_folio_support(mapping))
+		goto read_file_content_now;
+
+	if (iter_bytes == 0) {
+		SSDFS_DBG("nothing to read: iter_bytes %zu\n",
+			  iter_bytes);
+		return 0;
+	}
+
+	for (i = 0; i < folios_count; i++) {
+		pgoff_t cur_index = start_index + (i * pages_per_folio);
+		processed_bytes = 0;
+
+		while (processed_bytes < fsi->pagesize) {
+			pgoff_t page_index = cur_index +
+						(processed_bytes >> PAGE_SHIFT);
+			fgp_flags |= fgf_set_order(fsi->pagesize -
+						   processed_bytes);
+
+			folio = __filemap_get_folio(mapping,
+						    page_index,
+						    fgp_flags,
+						    mapping_gfp_mask(mapping));
+			if (!folio) {
+				SSDFS_ERR("fail to grab folio: page_index %lu\n",
+					  page_index);
+				return -ENOMEM;
+			} else if (IS_ERR(folio)) {
+				SSDFS_ERR("fail to grab folio: "
+					  "page_index %lu, err %ld\n",
+					  page_index, PTR_ERR(folio));
+				return PTR_ERR(folio);
+			}
+
+			ssdfs_account_locked_folio(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("folio %p, page_index %lu, count %d, "
+				  "folio_size %zu, page_size %u, "
+				  "fgp_flags %#x, order %u\n",
+				  folio, page_index,
+				  folio_ref_count(folio),
+				  folio_size(folio),
+				  fsi->pagesize,
+				  fgp_flags,
+				  FGF_GET_ORDER(fgp_flags));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			processed_bytes += folio_size(folio);
+
+			ssdfs_folio_unlock(folio);
+			ssdfs_folio_put(folio);
+		}
+	}
+
+read_file_content_now:
+	res = generic_file_read_iter(iocb, iter);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("finished: res %zd\n", res);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return res;
+}
+
+/*
+ * ssdfs_check_async_write_request() - check user data write request
+ * @req: segment request
+ *
+ * This method tries to check the state of request.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-ERANGE     - internal error.
+ */
+static
+int ssdfs_check_async_write_request(struct ssdfs_segment_request *req)
+{
+	wait_queue_head_t *wq = NULL;
+	int res;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!req);
+
+	SSDFS_DBG("req %p\n", req);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+check_req_state:
+	switch (atomic_read(&req->result.state)) {
+	case SSDFS_REQ_CREATED:
+	case SSDFS_REQ_STARTED:
+		wq = &req->private.wait_queue;
+
+		res = wait_event_killable_timeout(*wq,
+					has_request_been_executed(req),
+					SSDFS_DEFAULT_TIMEOUT);
+		if (res < 0) {
+			err = res;
+			WARN_ON(1);
+		} else if (res > 1) {
+			/*
+			 * Condition changed before timeout
+			 */
+			goto check_req_state;
+		} else {
+			/* timeout is elapsed */
+			err = -ERANGE;
+			WARN_ON(1);
+		}
+		break;
+
+	case SSDFS_REQ_FINISHED:
+		/* do nothing */
+		break;
+
+	case SSDFS_REQ_FAILED:
+		err = req->result.err;
+
+		if (!err) {
+			SSDFS_ERR("error code is absent: "
+				  "req %p, err %d\n",
+				  req, err);
+			err = -ERANGE;
+		}
+
+		SSDFS_ERR("write request is failed: "
+			  "err %d\n", err);
+		goto finish_check;
+
+	default:
+		err = -ERANGE;
+		SSDFS_ERR("invalid result's state %#x\n",
+			  atomic_read(&req->result.state));
+		goto finish_check;
+	}
+
+finish_check:
+	return err;
+}
+
+/*
+ * ssdfs_check_sync_write_request() - check user data write request
+ * @req: segment request
+ *
+ * This method tries to check the state of request.
+ *
+ * RETURN:
+ * [success]
+ * [failure] - error code:
+ *
+ * %-ERANGE     - internal error.
+ */
+static
+int ssdfs_check_sync_write_request(struct ssdfs_fs_info *fsi,
+				   struct ssdfs_segment_request *req)
+{
+	int i, j;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!req);
+
+	SSDFS_DBG("req %p\n", req);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = SSDFS_WAIT_COMPLETION(&req->result.wait);
+	if (unlikely(err)) {
+		SSDFS_ERR("write request failed: err %d\n",
+			  err);
+		return err;
+	}
+
+	switch (atomic_read(&req->result.state)) {
+	case SSDFS_REQ_FINISHED:
+		/* do nothing */
+		break;
+
+	case SSDFS_REQ_FAILED:
+		err = req->result.err;
+
+		if (!err) {
+			SSDFS_ERR("error code is absent: "
+				  "req %p, err %d\n",
+				  req, err);
+			err = -ERANGE;
+		}
+
+		SSDFS_ERR("write request is failed: "
+			  "err %d\n", err);
+		return err;
+
+	default:
+		SSDFS_ERR("unexpected result state %#x\n",
+			  atomic_read(&req->result.state));
+		return -ERANGE;
+	}
+
+	if (req->result.err) {
+		err = req->result.err;
+		SSDFS_ERR("write request failed: err %d\n",
+			  req->result.err);
+		return req->result.err;
+	}
+
+	for (i = 0; i < req->result.content.count; i++) {
+		struct ssdfs_request_content_block *block;
+		struct ssdfs_content_block *blk_state;
+
+		block = &req->result.content.blocks[i];
+		blk_state = &block->new_state;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(folio_batch_count(&blk_state->batch) == 0);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		for (j = 0; j < folio_batch_count(&blk_state->batch); j++) {
+			struct folio *folio = blk_state->batch.folios[j];
+
+#ifdef CONFIG_SSDFS_DEBUG
+			BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			clear_folio_new(folio);
+			folio_mark_uptodate(folio);
+			ssdfs_clear_dirty_folio(folio);
+
+			ssdfs_folio_unlock(folio);
+			ssdfs_folio_end_writeback(fsi, U64_MAX, 0, folio);
+			ssdfs_request_writeback_folios_dec(req);
+		}
+	}
+
+	return 0;
+}
+
+static
+int ssdfs_wait_write_pool_requests_end(struct ssdfs_fs_info *fsi,
+					struct ssdfs_segment_request_pool *pool)
+{
+	struct ssdfs_segment_request *req;
+	struct ssdfs_segment_info *si;
+	struct ssdfs_segment_search_state seg_search;
+	wait_queue_head_t *wait;
+	bool has_request_failed = false;
+	int i;
+	int res;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("pool %p\n", pool);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!pool)
+		return 0;
+
+	if (pool->count == 0) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("request pool is empty\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		return 0;
+	}
+
+	switch (pool->req_class) {
+	case SSDFS_PEB_CREATE_DATA_REQ:
+	case SSDFS_PEB_UPDATE_REQ:
+		/* expected class */
+		break;
+
+	default:
+		SSDFS_ERR("unexpected class of request %#x\n",
+			  pool->req_class);
+		return -ERANGE;
+	}
+
+	switch (pool->req_command) {
+	case SSDFS_CREATE_BLOCK:
+	case SSDFS_CREATE_EXTENT:
+	case SSDFS_UPDATE_BLOCK:
+	case SSDFS_UPDATE_EXTENT:
+		/* expected class */
+		break;
+
+	default:
+		SSDFS_ERR("unexpected command of request %#x\n",
+			  pool->req_command);
+		return -ERANGE;
+	}
+
+	switch (pool->req_type) {
+	case SSDFS_REQ_SYNC:
+		for (i = 0; i < pool->count; i++) {
+			req = pool->pointers[i];
+
+			if (!req) {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("request %d is empty\n", i);
+#endif /* CONFIG_SSDFS_DEBUG */
+				continue;
+			}
+
+			err = ssdfs_check_sync_write_request(fsi, req);
+			if (unlikely(err)) {
+				SSDFS_ERR("request %d is failed: err %d\n",
+					  i, err);
+				has_request_failed = true;
+			}
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("seg_id %llu, i %d, pool->count %u, "
+				  "request's reference count %d\n",
+				  si->seg_id, i, pool->count,
+				  atomic_read(&req->private.refs_count));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			ssdfs_put_request(req);
+			ssdfs_request_free(req, NULL);
+			pool->pointers[i] = NULL;
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("request %d is freed\n", i);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		ssdfs_segment_request_pool_init(pool);
+		break;
+
+	case SSDFS_REQ_ASYNC_NO_FREE:
+		for (i = 0; i < pool->count; i++) {
+			req = pool->pointers[i];
+
+			if (!req) {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("request %d is empty\n", i);
+#endif /* CONFIG_SSDFS_DEBUG */
+				continue;
+			}
+
+			err = ssdfs_check_async_write_request(req);
+			if (unlikely(err)) {
+				SSDFS_ERR("request %d is failed: err %d\n",
+					  i, err);
+				has_request_failed = true;
+			}
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("seg_id %llu, i %d, pool->count %u, "
+				  "request's reference count %d\n",
+				  si->seg_id, i, pool->count,
+				  atomic_read(&req->private.refs_count));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			ssdfs_put_request(req);
+
+			ssdfs_segment_search_state_init(&seg_search,
+						SSDFS_USER_DATA_SEG_TYPE,
+						req->place.start.seg_id,
+						U64_MAX);
+
+			si = ssdfs_grab_segment(fsi, &seg_search);
+			if (unlikely(IS_ERR_OR_NULL(si))) {
+				err = (si == NULL ? -ENOMEM : PTR_ERR(si));
+				SSDFS_ERR("fail to grab segment object: "
+					  "seg %llu, err %d\n",
+					  req->place.start.seg_id,
+					  err);
+				continue;
+			}
+
+			wait = &si->wait_queue[SSDFS_PEB_READ_THREAD];
+
+			if (atomic_read(&req->private.refs_count) != 0) {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("start waiting: refs_count %d\n",
+					   atomic_read(&req->private.refs_count));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+				res = wait_event_killable_timeout(*wait,
+				    atomic_read(&req->private.refs_count) == 0,
+				    SSDFS_DEFAULT_TIMEOUT);
+				if (res < 0) {
+					err = res;
+					WARN_ON(1);
+				} else if (res > 1) {
+					/*
+					 * Condition changed before timeout
+					 */
+				} else {
+					/* timeout is elapsed */
+					err = -ERANGE;
+					WARN_ON(1);
+				}
+			}
+
+			ssdfs_segment_put_object(si);
+
+			ssdfs_request_free(req, si);
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("request %d is freed\n", i);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		ssdfs_segment_request_pool_init(pool);
+		break;
+
+	case SSDFS_REQ_ASYNC:
+		ssdfs_segment_request_pool_init(pool);
+		break;
+
+	default:
+		SSDFS_ERR("unknown request type %#x\n",
+			  pool->req_type);
+		return -ERANGE;
+	}
+
+	if (has_request_failed)
+		return -ERANGE;
+
+	return 0;
+}
+
+static
+void ssdfs_clean_failed_request_pool(struct ssdfs_segment_request_pool *pool)
+{
+	struct ssdfs_segment_request *req;
+	int i;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("pool %p\n", pool);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!pool)
+		return;
+
+	if (pool->count == 0) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("request pool is empty\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+		return;
+	}
+
+	switch (pool->req_type) {
+	case SSDFS_REQ_SYNC:
+		for (i = 0; i < pool->count; i++) {
+			req = pool->pointers[i];
+
+			if (!req) {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("request %d is empty\n", i);
+#endif /* CONFIG_SSDFS_DEBUG */
+				continue;
+			}
+
+			ssdfs_put_request(req);
+			ssdfs_request_free(req, NULL);
+		}
+		break;
+
+	case SSDFS_REQ_ASYNC_NO_FREE:
+		for (i = 0; i < pool->count; i++) {
+			req = pool->pointers[i];
+
+			if (!req) {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("request %d is empty\n", i);
+#endif /* CONFIG_SSDFS_DEBUG */
+				continue;
+			}
+
+			ssdfs_request_free(req, NULL);
+		}
+		break;
+
+	case SSDFS_REQ_ASYNC:
+		/* do nothing */
+		break;
+
+	default:
+		SSDFS_ERR("unknown request type %#x\n",
+			  pool->req_type);
+	}
+}
+
+/*
+ * ssdfs_update_block() - update block.
+ * @fsi: pointer on shared file system object
+ * @pool: segment request pool
+ * @batch: dirty memory folios batch
+ */
+static
+int ssdfs_update_block(struct ssdfs_fs_info *fsi,
+		       struct ssdfs_segment_request_pool *pool,
+		       struct ssdfs_dirty_folios_batch *batch,
+		       struct writeback_control *wbc)
+{
+	struct ssdfs_segment_info *si;
+	struct ssdfs_segment_search_state seg_search;
+	struct folio *folio;
+	struct inode *inode;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !pool || !batch);
+
+	if (batch->content.count == 0) {
+		SSDFS_ERR("batch is empty\n");
+		return -ERANGE;
+	}
+
+	SSDFS_DBG("fsi %p, pool %p, batch %p\n",
+		  fsi, pool, batch);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (batch->processed_blks >= batch->content.count) {
+		SSDFS_ERR("processed_blks %u >= batch_size %u\n",
+			  batch->processed_blks,
+			  batch->content.count);
+		return -ERANGE;
+	}
+
+	folio = batch->content.blocks[batch->processed_blks].batch.folios[0];
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	inode = folio->mapping->host;
+
+	err = __ssdfs_prepare_volume_extent(fsi, inode,
+					    &batch->requested_extent,
+					    &batch->place);
+	if (err == -EAGAIN) {
+		err = 0;
+		SSDFS_DBG("logical extent processed partially\n");
+	} else if (unlikely(err)) {
+		SSDFS_ERR("fail to prepare volume extent: "
+			  "ino %llu, logical_offset %llu, "
+			  "data_bytes %u, cno %llu, "
+			  "parent_snapshot %llu, err %d\n",
+			  batch->requested_extent.ino,
+			  batch->requested_extent.logical_offset,
+			  batch->requested_extent.data_bytes,
+			  batch->requested_extent.cno,
+			  batch->requested_extent.parent_snapshot,
+			  err);
+		return err;
+	}
+
+	ssdfs_segment_search_state_init(&seg_search,
+					SSDFS_USER_DATA_SEG_TYPE,
+					batch->place.start.seg_id,
+					U64_MAX);
+
+	si = ssdfs_grab_segment(fsi, &seg_search);
+	if (unlikely(IS_ERR_OR_NULL(si))) {
+		SSDFS_ERR("fail to grab segment object: "
+			  "seg %llu, err %d\n",
+			  batch->place.start.seg_id, err);
+		return PTR_ERR(si);
+	}
+
+	if (!is_ssdfs_segment_ready_for_requests(si)) {
+		err = ssdfs_wait_segment_init_end(si);
+		if (unlikely(err)) {
+			SSDFS_ERR("segment initialization failed: "
+				  "seg %llu, err %d\n",
+				  si->seg_id, err);
+			return err;
+		}
+	}
+
+	if (wbc->sync_mode == WB_SYNC_NONE) {
+		err = ssdfs_segment_update_data_block_async(si,
+						       SSDFS_REQ_ASYNC,
+						       pool, batch);
+	} else if (wbc->sync_mode == WB_SYNC_ALL)
+		err = ssdfs_segment_update_data_block_sync(si, pool, batch);
+	else
+		BUG();
+
+	if (err == -EAGAIN) {
+		SSDFS_DBG("wait finishing requests in pool\n");
+	} else if (unlikely(err)) {
+		SSDFS_ERR("update request failed: "
+			  "ino %llu, logical_offset %llu, size %u, err %d\n",
+			  batch->requested_extent.ino,
+			  batch->requested_extent.logical_offset,
+			  batch->requested_extent.data_bytes,
+			  err);
+	}
+
+	ssdfs_segment_put_object(si);
+
+	return err;
+}
+
+/*
+ * ssdfs_update_extent() - update extent.
+ * @fsi: pointer on shared file system object
+ * @pool: segment request pool
+ * @batch: dirty memory folios batch
+ */
+static
+int ssdfs_update_extent(struct ssdfs_fs_info *fsi,
+			struct ssdfs_segment_request_pool *pool,
+			struct ssdfs_dirty_folios_batch *batch,
+			struct writeback_control *wbc)
+{
+	struct ssdfs_segment_info *si;
+	struct ssdfs_segment_search_state seg_search;
+	struct folio *folio;
+	struct inode *inode;
+	u32 batch_size;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!fsi || !pool || !batch);
+
+	if (batch->content.count == 0) {
+		SSDFS_ERR("batch is empty\n");
+		return -ERANGE;
+	}
+
+	SSDFS_DBG("fsi %p, pool %p, batch %p\n",
+		  fsi, pool, batch);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	batch_size = batch->content.count;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("batch_size %u, batch->processed_blks %u\n",
+		  batch_size,
+		  batch->processed_blks);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (batch->processed_blks >= batch_size) {
+		SSDFS_ERR("processed_blks %u >= batch_size %u\n",
+			  batch->processed_blks, batch_size);
+		return -ERANGE;
+	}
+
+	folio = batch->content.blocks[batch->processed_blks].batch.folios[0];
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	inode = folio->mapping->host;
+
+	while (batch->processed_blks < batch_size) {
+		err = __ssdfs_prepare_volume_extent(fsi, inode,
+						    &batch->requested_extent,
+						    &batch->place);
+		if (err == -EAGAIN) {
+			err = 0;
+			SSDFS_DBG("logical extent processed partially\n");
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to prepare volume extent: "
+				  "ino %llu, logical_offset %llu, "
+				  "data_bytes %u, cno %llu, "
+				  "parent_snapshot %llu, err %d\n",
+				  batch->requested_extent.ino,
+				  batch->requested_extent.logical_offset,
+				  batch->requested_extent.data_bytes,
+				  batch->requested_extent.cno,
+				  batch->requested_extent.parent_snapshot,
+				  err);
+			return err;
+		}
+
+		ssdfs_segment_search_state_init(&seg_search,
+						SSDFS_USER_DATA_SEG_TYPE,
+						batch->place.start.seg_id,
+						U64_MAX);
+
+		si = ssdfs_grab_segment(fsi, &seg_search);
+		if (unlikely(IS_ERR_OR_NULL(si))) {
+			SSDFS_ERR("fail to grab segment object: "
+				  "seg %llu, err %d\n",
+				  batch->place.start.seg_id, err);
+			return PTR_ERR(si);
+		}
+
+		if (!is_ssdfs_segment_ready_for_requests(si)) {
+			err = ssdfs_wait_segment_init_end(si);
+			if (unlikely(err)) {
+				SSDFS_ERR("segment initialization failed: "
+					  "seg %llu, err %d\n",
+					  si->seg_id, err);
+				return err;
+			}
+		}
+
+		if (wbc->sync_mode == WB_SYNC_NONE) {
+			err = ssdfs_segment_update_data_extent_async(si,
+							    SSDFS_REQ_ASYNC,
+							    pool, batch);
+		} else if (wbc->sync_mode == WB_SYNC_ALL)
+			err = ssdfs_segment_update_data_extent_sync(si,
+							    pool, batch);
+		else
+			BUG();
+
+		ssdfs_segment_put_object(si);
+
+		if (err == -EAGAIN) {
+			if (batch->processed_blks >= batch_size) {
+				err = -ERANGE;
+				SSDFS_ERR("processed_blks %u >= batch_size %u\n",
+					  batch->processed_blks, batch_size);
+				goto finish_update_extent;
+			} else {
+				err = 0;
+				/* process the rest of memory pages */
+				continue;
+			}
+		} else if (err == -ENOSPC) {
+			err = -EAGAIN;
+			SSDFS_DBG("wait finishing requests in pool\n");
+			goto finish_update_extent;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("update request failed: "
+				  "ino %llu, logical_offset %llu, "
+				  "size %u, err %d\n",
+				  batch->requested_extent.ino,
+				  batch->requested_extent.logical_offset,
+				  batch->requested_extent.data_bytes,
+				  err);
+			goto finish_update_extent;
+		}
+	}
+
+finish_update_extent:
+	return err;
+}
+
+static
+int ssdfs_issue_async_block_write_request(struct writeback_control *wbc,
+					  struct ssdfs_segment_request_pool *pool,
+					  struct ssdfs_dirty_folios_batch *batch)
+{
+	struct folio *folio;
+	struct inode *inode;
+	struct ssdfs_inode_info *ii;
+	struct ssdfs_fs_info *fsi;
+	ino_t ino;
+	u64 logical_offset;
+	u32 data_bytes;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!wbc || !pool || !batch);
+
+	if (batch->content.count == 0) {
+		SSDFS_ERR("batch is empty\n");
+		return -ERANGE;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (batch->processed_blks >= batch->content.count) {
+		SSDFS_ERR("processed_blks %u >= batch_size %u\n",
+			  batch->processed_blks,
+			  batch->content.count);
+		return -ERANGE;
+	}
+
+	folio = batch->content.blocks[batch->processed_blks].batch.folios[0];
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	inode = folio->mapping->host;
+	ii = SSDFS_I(inode);
+	fsi = SSDFS_FS_I(inode->i_sb);
+	ino = inode->i_ino;
+	logical_offset = batch->requested_extent.logical_offset;
+	data_bytes = batch->requested_extent.data_bytes;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, logical_offset %llu, "
+		  "data_bytes %u, sync_mode %#x\n",
+		  ino, logical_offset, data_bytes, wbc->sync_mode);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (need_add_block(folio)) {
+		err = ssdfs_segment_add_data_block_async(fsi, pool, batch);
+		if (err == -EAGAIN) {
+			SSDFS_DBG("wait finishing requests in pool\n");
+			return err;
+		}
+	} else {
+		err = ssdfs_update_block(fsi, pool, batch, wbc);
+		if (err == -EAGAIN) {
+			SSDFS_DBG("wait finishing requests in pool\n");
+			return err;
+		}
+	}
+
+	if (err) {
+		SSDFS_ERR("fail to write folio async: "
+			  "ino %lu, folio_index %llu, err %d\n",
+			  ino, (u64)folio->index, err);
+		return err;
+	}
+
+	return 0;
+}
+
+static
+int ssdfs_issue_sync_block_write_request(struct writeback_control *wbc,
+					 struct ssdfs_segment_request_pool *pool,
+					 struct ssdfs_dirty_folios_batch *batch)
+{
+	struct folio *folio;
+	struct inode *inode;
+	struct ssdfs_inode_info *ii;
+	struct ssdfs_fs_info *fsi;
+	ino_t ino;
+	u64 logical_offset;
+	u32 data_bytes;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!wbc || !pool || !batch);
+
+	if (batch->content.count == 0) {
+		SSDFS_ERR("batch is empty\n");
+		return -ERANGE;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (batch->processed_blks >= batch->content.count) {
+		SSDFS_ERR("processed_blks %u >= batch_size %u\n",
+			  batch->processed_blks,
+			  batch->content.count);
+		return -ERANGE;
+	}
+
+	folio = batch->content.blocks[batch->processed_blks].batch.folios[0];
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	inode = folio->mapping->host;
+	ii = SSDFS_I(inode);
+	fsi = SSDFS_FS_I(inode->i_sb);
+	ino = inode->i_ino;
+	logical_offset = batch->requested_extent.logical_offset;
+	data_bytes = batch->requested_extent.data_bytes;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, logical_offset %llu, "
+		  "data_bytes %u, sync_mode %#x\n",
+		  ino, logical_offset, data_bytes, wbc->sync_mode);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (need_add_block(folio)) {
+		err = ssdfs_segment_add_data_block_sync(fsi, pool, batch);
+		if (err == -EAGAIN) {
+			SSDFS_DBG("wait finishing requests in pool\n");
+			return err;
+		}
+	} else {
+		err = ssdfs_update_block(fsi, pool, batch, wbc);
+		if (err == -EAGAIN) {
+			SSDFS_DBG("wait finishing requests in pool\n");
+			return err;
+		}
+	}
+
+	if (err) {
+		SSDFS_ERR("fail to write folio sync: "
+			  "ino %lu, folio_index %llu, err %d\n",
+			  ino, (u64)folio->index, err);
+		return err;
+	}
+
+	return 0;
+}
+
+static
+int ssdfs_issue_async_extent_write_request(struct writeback_control *wbc,
+					   struct ssdfs_segment_request_pool *pool,
+					   struct ssdfs_dirty_folios_batch *batch)
+{
+	struct folio *folio;
+	struct inode *inode;
+	struct ssdfs_inode_info *ii;
+	struct ssdfs_fs_info *fsi;
+	ino_t ino;
+	u64 logical_offset;
+	u32 data_bytes;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!wbc || !pool || !batch);
+
+	if (batch->content.count == 0) {
+		SSDFS_ERR("batch is empty\n");
+		return -ERANGE;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (batch->processed_blks >= batch->content.count) {
+		SSDFS_ERR("processed_blks %u >= batch_size %u\n",
+			  batch->processed_blks,
+			  batch->content.count);
+		return -ERANGE;
+	}
+
+	folio = batch->content.blocks[batch->processed_blks].batch.folios[0];
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	inode = folio->mapping->host;
+	ii = SSDFS_I(inode);
+	fsi = SSDFS_FS_I(inode->i_sb);
+	ino = inode->i_ino;
+	logical_offset = batch->requested_extent.logical_offset;
+	data_bytes = batch->requested_extent.data_bytes;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, logical_offset %llu, "
+		  "data_bytes %u, sync_mode %#x\n",
+		  ino, logical_offset, data_bytes, wbc->sync_mode);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (need_add_block(folio)) {
+		err = ssdfs_segment_add_data_extent_async(fsi, pool, batch);
+		if (err == -EAGAIN) {
+			SSDFS_DBG("wait finishing requests in pool\n");
+			return err;
+		}
+	} else {
+		err = ssdfs_update_extent(fsi, pool, batch, wbc);
+		if (err == -EAGAIN) {
+			SSDFS_DBG("wait finishing requests in pool\n");
+			return err;
+		}
+	}
+
+	if (err) {
+		SSDFS_ERR("fail to write extent async: "
+			  "ino %lu, folio_index %llu, err %d\n",
+			  ino, (u64)folio->index, err);
+		return err;
+	}
+
+	return 0;
+}
+
+static
+int ssdfs_issue_sync_extent_write_request(struct writeback_control *wbc,
+					struct ssdfs_segment_request_pool *pool,
+					struct ssdfs_dirty_folios_batch *batch)
+{
+	struct folio *folio;
+	struct inode *inode;
+	struct ssdfs_inode_info *ii;
+	struct ssdfs_fs_info *fsi;
+	ino_t ino;
+	u64 logical_offset;
+	u32 data_bytes;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!wbc || !pool || !batch);
+
+	if (batch->content.count == 0) {
+		SSDFS_ERR("batch is empty\n");
+		return -ERANGE;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (batch->processed_blks >= batch->content.count) {
+		SSDFS_ERR("processed_blks %u >= batch_size %u\n",
+			  batch->processed_blks,
+			  batch->content.count);
+		return -ERANGE;
+	}
+
+	folio = batch->content.blocks[batch->processed_blks].batch.folios[0];
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	inode = folio->mapping->host;
+	ii = SSDFS_I(inode);
+	fsi = SSDFS_FS_I(inode->i_sb);
+	ino = inode->i_ino;
+	logical_offset = batch->requested_extent.logical_offset;
+	data_bytes = batch->requested_extent.data_bytes;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, logical_offset %llu, "
+		  "data_bytes %u, sync_mode %#x\n",
+		  ino, logical_offset, data_bytes, wbc->sync_mode);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (need_add_block(folio)) {
+		err = ssdfs_segment_add_data_extent_sync(fsi, pool, batch);
+		if (err == -EAGAIN) {
+			SSDFS_DBG("wait finishing requests in pool\n");
+			return err;
+		}
+	} else {
+		err = ssdfs_update_extent(fsi, pool, batch, wbc);
+		if (err == -EAGAIN) {
+			SSDFS_DBG("wait finishing requests in pool\n");
+			return err;
+		}
+	}
+
+	if (err) {
+		SSDFS_ERR("fail to write folio sync: "
+			  "ino %lu, folio_index %llu, err %d\n",
+			  ino, (u64)folio->index, err);
+		return err;
+	}
+
+	return 0;
+}
+
+static
+int ssdfs_issue_async_write_request(struct ssdfs_fs_info *fsi,
+			      struct writeback_control *wbc,
+			      struct ssdfs_segment_request_pool *pool,
+			      struct ssdfs_dirty_folios_batch *batch,
+			      int req_type)
+{
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!wbc || !pool || !batch);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (req_type == SSDFS_BLOCK_BASED_REQUEST) {
+		err = ssdfs_issue_async_block_write_request(wbc, pool, batch);
+		if (err == -EAGAIN) {
+			wake_up_all(&fsi->pending_wq);
+
+			err = ssdfs_wait_write_pool_requests_end(fsi, pool);
+			if (unlikely(err)) {
+				SSDFS_ERR("write request failed: err %d\n",
+					  err);
+				return err;
+			}
+
+			err = ssdfs_issue_async_block_write_request(wbc,
+								pool, batch);
+		}
+	} else if (req_type == SSDFS_EXTENT_BASED_REQUEST) {
+		err = ssdfs_issue_async_extent_write_request(wbc, pool, batch);
+		if (err == -EAGAIN) {
+			wake_up_all(&fsi->pending_wq);
+
+			err = ssdfs_wait_write_pool_requests_end(fsi, pool);
+			if (unlikely(err)) {
+				SSDFS_ERR("write request failed: err %d\n",
+					  err);
+				return err;
+			}
+
+			err = ssdfs_issue_async_extent_write_request(wbc,
+								pool, batch);
+		}
+	} else
+		BUG();
+
+	if (err) {
+		SSDFS_ERR("fail to write async: err %d\n",
+			  err);
+	}
+
+	wake_up_all(&fsi->pending_wq);
+
+	return err;
+}
+
+static
+int ssdfs_issue_sync_write_request(struct ssdfs_fs_info *fsi,
+				   struct writeback_control *wbc,
+				   struct ssdfs_segment_request_pool *pool,
+				   struct ssdfs_dirty_folios_batch *batch,
+				   int req_type)
+{
+	int i, j;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!wbc || !pool || !batch);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (req_type == SSDFS_BLOCK_BASED_REQUEST) {
+		err = ssdfs_issue_sync_block_write_request(wbc, pool, batch);
+		if (err == -EAGAIN) {
+			wake_up_all(&fsi->pending_wq);
+
+			err = ssdfs_wait_write_pool_requests_end(fsi, pool);
+			if (unlikely(err)) {
+				SSDFS_ERR("write request failed: err %d\n",
+					  err);
+				return err;
+			}
+
+			err = ssdfs_issue_sync_block_write_request(wbc,
+								pool, batch);
+		}
+	} else if (req_type == SSDFS_EXTENT_BASED_REQUEST) {
+		err = ssdfs_issue_sync_extent_write_request(wbc, pool, batch);
+		if (err == -EAGAIN) {
+			wake_up_all(&fsi->pending_wq);
+
+			err = ssdfs_wait_write_pool_requests_end(fsi, pool);
+			if (unlikely(err)) {
+				SSDFS_ERR("write request failed: err %d\n",
+					  err);
+				return err;
+			}
+
+			err = ssdfs_issue_sync_extent_write_request(wbc,
+								pool, batch);
+		}
+	} else
+		BUG();
+
+	if (err) {
+		SSDFS_ERR("fail to write sync: err %d\n",
+			  err);
+
+		for (i = 0; i < batch->content.count; i++) {
+			struct ssdfs_content_block *blk_state;
+			u32 batch_size;
+
+			blk_state = &batch->content.blocks[i];
+			batch_size = folio_batch_count(&blk_state->batch);
+
+#ifdef CONFIG_SSDFS_DEBUG
+			BUG_ON(batch_size == 0);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			for (j = 0; j < batch_size; j++) {
+				struct folio *folio;
+
+				folio = blk_state->batch.folios[j];
+
+#ifdef CONFIG_SSDFS_DEBUG
+				BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+				if (!folio_test_locked(folio)) {
+					SSDFS_WARN("folio %p, folio_test_locked %#x\n",
+						   folio, folio_test_locked(folio));
+					ssdfs_folio_lock(folio);
+				}
+
+				clear_folio_new(folio);
+				folio_mark_uptodate(folio);
+				folio_clear_dirty(folio);
+
+				ssdfs_folio_unlock(folio);
+				ssdfs_folio_end_writeback(fsi, U64_MAX, 0, folio);
+			}
+		}
+	}
+
+	wake_up_all(&fsi->pending_wq);
+
+	return err;
+}
+
+static
+int ssdfs_issue_write_request(struct writeback_control *wbc,
+			      struct ssdfs_segment_request_pool *pool,
+			      struct ssdfs_dirty_folios_batch *batch,
+			      int req_type)
+{
+	struct ssdfs_fs_info *fsi;
+	struct inode *inode;
+	struct address_space *mapping;
+	struct folio *folio;
+	struct ssdfs_content_block *blk_state;
+	ino_t ino;
+	u64 logical_offset;
+	u32 data_bytes;
+	u64 start_index;
+	int i, j;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!wbc || !pool || !batch);
+
+	if (batch->content.count == 0) {
+		SSDFS_WARN("batch is empty\n");
+		return -ERANGE;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	folio = batch->content.blocks[0].batch.folios[0];
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	inode = folio->mapping->host;
+	mapping = folio->mapping;
+	fsi = SSDFS_FS_I(inode->i_sb);
+	ino = inode->i_ino;
+	logical_offset = batch->requested_extent.logical_offset;
+	data_bytes = batch->requested_extent.data_bytes;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, logical_offset %llu, "
+		  "data_bytes %u, sync_mode %#x\n",
+		  ino, logical_offset, data_bytes, wbc->sync_mode);
+
+	if (logical_offset % fsi->pagesize) {
+		SSDFS_ERR("logical_offset %llu, pagesize %u\n",
+			  logical_offset, fsi->pagesize);
+		BUG();
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	for (i = 0; i < batch->content.count; i++) {
+		size_t block_bytes = 0;
+		pgoff_t index;
+		pgoff_t last_folio_index;
+
+		blk_state = &batch->content.blocks[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(folio_batch_count(&blk_state->batch) == 0);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		for (j = 0; j < folio_batch_count(&blk_state->batch); j++) {
+			folio = blk_state->batch.folios[j];
+
+#ifdef CONFIG_SSDFS_DEBUG
+			BUG_ON(!folio);
+
+			SSDFS_DBG("folio_index %llu\n",
+				  (u64)folio->index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			ssdfs_folio_start_writeback(fsi, U64_MAX,
+						    logical_offset, folio);
+			ssdfs_clear_dirty_folio(folio);
+
+			block_bytes += folio_size(folio);
+		}
+
+		if (block_bytes != fsi->pagesize) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("ino %lu, logical_offset %llu, "
+				  "data_bytes %u, blk_index %d, "
+				  "block_bytes %zu, pagesize %u\n",
+				  ino, logical_offset, data_bytes, i,
+				  block_bytes, fsi->pagesize);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			last_folio_index = folio_batch_count(&blk_state->batch);
+
+			if (last_folio_index == 0) {
+				err = -ERANGE;
+				SSDFS_ERR("empty block: blk_index %d\n", i);
+				goto finish_issue_write_request;
+			}
+
+			folio = blk_state->batch.folios[0];
+
+#ifdef CONFIG_SSDFS_DEBUG
+			BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			start_index = logical_offset + (i * fsi->pagesize);
+			start_index >>= PAGE_SHIFT;
+			index = folio->index;
+
+			if (start_index != index) {
+				err = -ERANGE;
+				SSDFS_WARN("block batch hasn't first folio: "
+					   "ino %lu, logical_offset %llu, "
+					   "data_bytes %u, blk_index %d, "
+					   "block_bytes %zu, pagesize %u, "
+					   "start_index %llu, index %lu\n",
+					   ino, logical_offset, data_bytes, i,
+					   block_bytes, fsi->pagesize,
+					   start_index, index);
+				goto finish_issue_write_request;
+			}
+
+			last_folio_index--;
+
+			folio = blk_state->batch.folios[last_folio_index];
+
+#ifdef CONFIG_SSDFS_DEBUG
+			BUG_ON(!folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			while (block_bytes < fsi->pagesize) {
+				pgoff_t mem_pages_per_folio =
+						folio_size(folio) / PAGE_SIZE;
+
+				last_folio_index = folio->index;
+				last_folio_index += mem_pages_per_folio;
+
+				folio = filemap_get_folio(mapping,
+							   last_folio_index);
+				if (IS_ERR(folio) && PTR_ERR(folio) == -ENOENT) {
+					folio = filemap_grab_folio(mapping,
+							    last_folio_index);
+					if (!folio) {
+						err = -ERANGE;
+						SSDFS_ERR("empty folio: "
+							  "folio_index %lu\n",
+							  last_folio_index);
+						goto finish_issue_write_request;
+					} else if (IS_ERR(folio)) {
+						err = PTR_ERR(folio);
+						SSDFS_ERR("fail to grab folio: "
+							  "folio_index %lu, err %d\n",
+							  last_folio_index,
+							  err);
+						goto finish_issue_write_request;
+					}
+
+					__ssdfs_memzero_folio(folio, 0,
+							      folio_size(folio),
+							      folio_size(folio));
+
+					ssdfs_account_locked_folio(folio);
+					folio_mark_uptodate(folio);
+					folio_mark_dirty(folio);
+					ssdfs_folio_start_writeback(fsi, U64_MAX,
+								    logical_offset, folio);
+					ssdfs_clear_dirty_folio(folio);
+					folio_batch_add(&blk_state->batch,
+							folio);
+				} else if (IS_ERR(folio)) {
+					err = PTR_ERR(folio);
+					SSDFS_ERR("fail to get folio: "
+						  "folio_index %lu, err %d\n",
+						  last_folio_index, err);
+					goto finish_issue_write_request;
+				} else {
+					if (!folio_test_locked(folio))
+						ssdfs_folio_lock(folio);
+					else
+						ssdfs_account_locked_folio(folio);
+
+					folio_mark_uptodate(folio);
+					folio_mark_dirty(folio);
+					ssdfs_folio_start_writeback(fsi, U64_MAX,
+								    logical_offset, folio);
+					ssdfs_clear_dirty_folio(folio);
+					folio_batch_add(&blk_state->batch,
+							folio);
+				}
+
+				block_bytes += folio_size(folio);
+			}
+		}
+	}
+
+	if (wbc->sync_mode == WB_SYNC_NONE) {
+		err = ssdfs_issue_async_write_request(fsi, wbc, pool,
+							batch, req_type);
+		if (err) {
+			SSDFS_ERR("fail to write async: "
+				  "ino %lu, err %d\n",
+				  ino, err);
+			goto finish_issue_write_request;
+		}
+	} else if (wbc->sync_mode == WB_SYNC_ALL) {
+		err = ssdfs_issue_sync_write_request(fsi, wbc, pool,
+						     batch, req_type);
+		if (err) {
+			SSDFS_ERR("fail to write sync: "
+				  "ino %lu, err %d\n",
+				  ino, err);
+			goto finish_issue_write_request;
+		}
+	} else
+		BUG();
+
+finish_issue_write_request:
+	if (unlikely(err)) {
+		for (i = 0; i < batch->content.count; i++) {
+			blk_state = &batch->content.blocks[i];
+
+			for (j = 0; j < folio_batch_count(&blk_state->batch); j++) {
+				folio = blk_state->batch.folios[j];
+
+				if (!folio)
+					continue;
+
+				SSDFS_ERR("BLK[%d][%d] folio_index %llu, folio_size %zu\n",
+					  i, j,
+					  (u64)folio->index,
+					  folio_size(folio));
+			}
+
+		}
+	}
+
+	ssdfs_dirty_folios_batch_init(batch);
+
+	return err;
+}
+
+static
+int __ssdfs_writepages(struct folio *folio, u32 len,
+			struct writeback_control *wbc,
+			struct ssdfs_segment_request_pool *pool,
+			struct ssdfs_dirty_folios_batch *batch)
+{
+	struct inode *inode = folio->mapping->host;
+	struct address_space *mapping = folio->mapping;
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	ino_t ino = inode->i_ino;
+	pgoff_t start_index;
+	pgoff_t index = folio->index;
+	loff_t logical_offset;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, folio_index %llu, len %u, sync_mode %#x\n",
+		  ino, (u64)index, len, wbc->sync_mode);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	logical_offset = (loff_t)index << PAGE_SHIFT;
+
+try_add_folio_into_request:
+	if (is_ssdfs_logical_extent_invalid(&batch->requested_extent)) {
+		if (logical_offset % fsi->pagesize) {
+			struct folio *cur_folio;
+			pgoff_t cur_blk;
+			pgoff_t cur_index;
+			pgoff_t mem_pages_per_folio;
+			u32 processed_bytes = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("logical_offset %llu, pagesize %u\n",
+				  logical_offset, fsi->pagesize);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			cur_blk = batch->content.count;
+
+			start_index = logical_offset >> fsi->log_pagesize;
+			start_index <<= fsi->log_pagesize;
+			start_index >>= PAGE_SHIFT;
+
+			cur_index = start_index;
+			while (cur_index < index) {
+				cur_folio = filemap_get_folio(mapping,
+							      cur_index);
+				if (IS_ERR_OR_NULL(cur_folio)) {
+					err = IS_ERR(cur_folio) ?
+						PTR_ERR(cur_folio) : -ERANGE;
+					SSDFS_ERR("fail to get folio: "
+						  "folio_index %lu, err %d\n",
+						  cur_index, err);
+					goto fail_write_folios;
+				}
+
+				if (!folio_test_locked(cur_folio))
+					ssdfs_folio_lock(cur_folio);
+				else
+					ssdfs_account_locked_folio(cur_folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+				BUG_ON(!folio_test_uptodate(cur_folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+				folio_mark_dirty(cur_folio);
+				ssdfs_folio_start_writeback(fsi, U64_MAX,
+						    logical_offset, cur_folio);
+				ssdfs_clear_dirty_folio(cur_folio);
+
+				err =
+				    ssdfs_dirty_folios_batch_add_folio(cur_folio,
+									cur_blk,
+									batch);
+				if (err) {
+					SSDFS_ERR("fail to add folio into batch: "
+						  "ino %lu, folio_index %lu, err %d\n",
+						  ino, index, err);
+					goto fail_write_folios;
+				}
+
+				mem_pages_per_folio =
+					folio_size(cur_folio) >> PAGE_SHIFT;
+				cur_index = cur_folio->index;
+				cur_index += mem_pages_per_folio;
+
+				processed_bytes += folio_size(cur_folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+				BUG_ON(cur_index > index);
+#endif /* CONFIG_SSDFS_DEBUG */
+			}
+
+			logical_offset = (loff_t)start_index << PAGE_SHIFT;
+			len += processed_bytes;
+
+			err = ssdfs_dirty_folios_batch_add_folio(folio,
+								 cur_blk,
+								 batch);
+			if (err) {
+				SSDFS_ERR("fail to add folio into batch: "
+					  "ino %lu, folio_index %lu, err %d\n",
+					  ino, index, err);
+				goto fail_write_folios;
+			}
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("logical_offset %llu, len %u\n",
+				  (u64)logical_offset, len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			ssdfs_dirty_folios_batch_prepare_logical_extent(ino,
+							(u64)logical_offset,
+							len, 0, 0,
+							batch);
+
+			err = ssdfs_issue_write_request(wbc, pool, batch,
+						    SSDFS_EXTENT_BASED_REQUEST);
+			if (err)
+				goto fail_write_folios;
+		} else {
+			err = ssdfs_dirty_folios_batch_add_folio(folio,
+							 batch->content.count,
+							 batch);
+			if (err) {
+				SSDFS_ERR("fail to add folio into batch: "
+					  "ino %lu, folio_index %lu, err %d\n",
+					  ino, index, err);
+				goto fail_write_folios;
+			}
+
+			ssdfs_dirty_folios_batch_prepare_logical_extent(ino,
+							(u64)logical_offset,
+							len, 0, 0,
+							batch);
+		}
+	} else {
+		struct ssdfs_content_block *blk_state;
+		struct folio *last_folio;
+		u64 upper_bound = batch->requested_extent.logical_offset +
+					batch->requested_extent.data_bytes;
+		u32 last_blk;
+		u32 last_index;
+
+		if (batch->content.count == 0) {
+			err = -ERANGE;
+			SSDFS_WARN("batch is empty\n");
+			goto fail_write_folios;
+		}
+
+		last_blk = batch->content.count - 1;
+		blk_state = &batch->content.blocks[last_blk];
+
+		last_index = folio_batch_count(&blk_state->batch);
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(last_index == 0);
+#endif /* CONFIG_SSDFS_DEBUG */
+		last_index -= 1;
+
+		last_folio = blk_state->batch.folios[last_index];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("logical_offset %llu, upper_bound %llu, "
+			  "last_blk %u, content.count %u, "
+			  "last_index %u, folio_batch_count %u\n",
+			  (u64)logical_offset, upper_bound,
+			  last_blk, batch->content.count,
+			  last_index,
+			  folio_batch_count(&blk_state->batch));
+
+		BUG_ON(!last_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		last_index = last_folio->index;
+
+		if (last_index == index) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("last_index %u == index %lu\n",
+				  last_index, index);
+#endif /* CONFIG_SSDFS_DEBUG */
+			return 0;
+		}
+
+		if (logical_offset == upper_bound &&
+		    can_be_merged_into_extent(last_folio, folio)) {
+			pgoff_t logical_blk1, logical_blk2;
+			pgoff_t cur_blk;
+
+			logical_blk1 = last_folio->index;
+			logical_blk1 <<= PAGE_SHIFT;
+			logical_blk1 >>= fsi->log_pagesize;
+
+			logical_blk2 = folio->index;
+			logical_blk2 <<= PAGE_SHIFT;
+			logical_blk2 >>= fsi->log_pagesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("folio can be merged into extent: "
+				  "LAST FOLIO: (folio_index %lu, "
+				  "logical_blk %lu), "
+				  "CURRENT FOLIO: (folio_index %lu, "
+				  "logical_blk %lu)\n",
+				  last_folio->index,
+				  logical_blk1,
+				  folio->index,
+				  logical_blk2);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			if (logical_blk1 == logical_blk2)
+				cur_blk = last_blk;
+			else
+				cur_blk = batch->content.count;
+
+			err = ssdfs_dirty_folios_batch_add_folio(folio,
+								 cur_blk,
+								 batch);
+			if (err) {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("unable to add folio: "
+					  "cur_blk %lu, folio_index %lu, "
+					  "err %d\n",
+					  cur_blk,
+					  folio->index,
+					  err);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+				err = ssdfs_issue_write_request(wbc,
+						    pool, batch,
+						    SSDFS_EXTENT_BASED_REQUEST);
+				if (err)
+					goto fail_write_folios;
+				else
+					goto try_add_folio_into_request;
+			}
+
+			batch->requested_extent.data_bytes += len;
+		} else {
+			err = ssdfs_issue_write_request(wbc, pool, batch,
+						    SSDFS_EXTENT_BASED_REQUEST);
+			if (err)
+				goto fail_write_folios;
+			else
+				goto try_add_folio_into_request;
+		}
+	}
+
+	return 0;
+
+fail_write_folios:
+	return err;
+}
+
+/* writepage function prototype */
+typedef int (*ssdfs_writepagefn)(struct folio *folio, u32 len,
+				 struct writeback_control *wbc,
+				 struct ssdfs_segment_request_pool *pool,
+				 struct ssdfs_dirty_folios_batch *batch);
+
+static
+int ssdfs_writepage_wrapper(struct folio *folio,
+			    struct writeback_control *wbc,
+			    struct ssdfs_segment_request_pool *pool,
+			    struct ssdfs_dirty_folios_batch *batch,
+			    ssdfs_writepagefn writepage)
+{
+	struct inode *inode = folio->mapping->host;
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	ino_t ino = inode->i_ino;
+	pgoff_t index = folio->index;
+	loff_t i_size = i_size_read(inode);
+	pgoff_t end_index = i_size >> PAGE_SHIFT;
+	int len = i_size & (folio_size(folio) - 1);
+	loff_t cur_blk;
+	u32 offset_inside_block;
+	bool is_new_blk = false;
+#ifdef CONFIG_SSDFS_DEBUG
+	u32 folio_processed_bytes = 0;
+	void *kaddr;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, page_index %llu, "
+		  "i_size %llu, len %d\n",
+		  ino, (u64)index,
+		  (u64)i_size, len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (inode->i_sb->s_flags & SB_RDONLY) {
+		/*
+		 * It means that filesystem was remounted in read-only
+		 * mode because of error or metadata corruption. But we
+		 * have dirty pages that try to be flushed in background.
+		 * So, here we simply discard this dirty page.
+		 */
+		err = -EROFS;
+		goto discard_folio;
+	}
+
+	/* Is the page fully outside @i_size? (truncate in progress) */
+	if (index > end_index) {
+		err = 0;
+		goto finish_write_folio;
+	}
+
+	if (is_ssdfs_file_inline(ii)) {
+		size_t inline_capacity =
+				ssdfs_inode_inline_file_capacity(inode);
+
+		if (len > inline_capacity) {
+			err = -ENOSPC;
+			SSDFS_ERR("len %d is greater capacity %zu\n",
+				  len, inline_capacity);
+			goto discard_folio;
+		}
+
+		ssdfs_folio_start_writeback(fsi, U64_MAX, 0, folio);
+
+		err = __ssdfs_memcpy_from_folio(ii->inline_file,
+						0, inline_capacity,
+						folio,
+						0, folio_size(folio),
+						len);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to copy file's content: "
+				  "err %d\n", err);
+			goto discard_folio;
+		}
+
+		inode_add_bytes(inode, len);
+
+		clear_folio_new(folio);
+		folio_mark_uptodate(folio);
+		folio_clear_dirty(folio);
+
+		ssdfs_folio_unlock(folio);
+		ssdfs_folio_end_writeback(fsi, U64_MAX, 0, folio);
+
+		return 0;
+	}
+
+	cur_blk = ((u64)index << PAGE_SHIFT) >> fsi->log_pagesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_blk %llu\n", (u64)cur_blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!need_add_block(folio)) {
+		is_new_blk = !ssdfs_extents_tree_has_logical_block(cur_blk,
+								   inode);
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("cur_blk %llu, is_new_blk %#x\n",
+			  (u64)cur_blk, is_new_blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (is_new_blk)
+			set_folio_new(folio);
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	do {
+		kaddr = kmap_local_folio(folio, folio_processed_bytes);
+		SSDFS_DBG("PAGE DUMP: "
+			  "folio_processed_bytes %u\n",
+			  folio_processed_bytes);
+		print_hex_dump_bytes("", DUMP_PREFIX_OFFSET,
+				     kaddr,
+				     PAGE_SIZE);
+		SSDFS_DBG("\n");
+		kunmap_local(kaddr);
+
+		folio_processed_bytes += PAGE_SIZE;
+	} while (folio_processed_bytes < folio_size(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	/* Is the page fully inside @i_size? */
+	if (index < end_index) {
+		err = (*writepage)(folio, folio_size(folio), wbc, pool, batch);
+		if (unlikely(err)) {
+			ssdfs_fs_error(inode->i_sb, __FILE__,
+					__func__, __LINE__,
+					"fail to write block: "
+					"ino %lu, page_index %llu, err %d\n",
+					ino, (u64)index, err);
+			goto discard_folio;
+		}
+	} else if (len > 0) {
+		/*
+		 * The page straddles @i_size. It must be zeroed out on each and every
+		 * writepage invocation because it may be mmapped. "A file is mapped
+		 * in multiples of the page size. For a file that is not a multiple of
+		 * the page size, the remaining memory is zeroed when mapped, and
+		 * writes to that region are not written out to the file."
+		 */
+		folio_zero_segment(folio, len, folio_size(folio));
+
+		err = (*writepage)(folio, len, wbc, pool, batch);
+		if (unlikely(err)) {
+			ssdfs_fs_error(inode->i_sb, __FILE__,
+					__func__, __LINE__,
+					"fail to write block: "
+					"ino %lu, page_index %llu, err %d\n",
+					ino, (u64)index, err);
+			goto discard_folio;
+		}
+	} else {
+		/* Write out the whole last folio (len == 0) */
+		err = (*writepage)(folio, folio_size(folio), wbc, pool, batch);
+		if (unlikely(err)) {
+			ssdfs_fs_error(inode->i_sb, __FILE__,
+					__func__, __LINE__,
+					"fail to write block: "
+					"ino %lu, page_index %llu, err %d\n",
+					ino, (u64)index, err);
+			goto discard_folio;
+		}
+	}
+
+	offset_inside_block = index << PAGE_SHIFT;
+	offset_inside_block %= fsi->pagesize;
+
+	if ((offset_inside_block + folio_size(folio)) < fsi->pagesize) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("NOT WHOLE BLOCK IS PROCESSED: "
+			  "ino %lu, cur_blk %llu, "
+			  "page_index %llu, "
+			  "offset_inside_block %u, "
+			  "folio_size %zu, block_size %u\n",
+			  ino, (u64)cur_blk, (u64)index,
+			  offset_inside_block,
+			  folio_size(folio),
+			  fsi->pagesize);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return -EAGAIN;
+	}
+
+	return 0;
+
+finish_write_folio:
+	ssdfs_folio_unlock(folio);
+
+discard_folio:
+	return err;
+}
+
+/*
+ * The ssdfs_writepages() is called by the VM to write out pages associated
+ * with the address_space object. If wbc->sync_mode is WBC_SYNC_ALL, then
+ * the writeback_control will specify a range of pages that must be
+ * written out.  If it is WBC_SYNC_NONE, then a nr_to_write is given
+ * and that many pages should be written if possible.
+ * If no ->writepages is given, then mpage_writepages is used
+ * instead.  This will choose pages from the address space that are
+ * tagged as DIRTY and will pass them to ->writepage.
+ */
+static
+int ssdfs_writepages(struct address_space *mapping,
+		     struct writeback_control *wbc)
+{
+	struct inode *inode = mapping->host;
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	ino_t ino = inode->i_ino;
+	struct ssdfs_segment_request_pool pool;
+	struct ssdfs_dirty_folios_batch *batch;
+	struct folio_batch fvec;
+	struct folio_batch block_vec;
+	int folios_count;
+	pgoff_t index = 0;
+	pgoff_t end;		/* Inclusive */
+	pgoff_t done_index = 0;
+	int range_whole = 0;
+	int tag;
+	int i;
+	int done = 0;
+	int ret = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, nr_to_write %lu, "
+		  "range_start %llu, range_end %llu, "
+		  "writeback_index %llu, "
+		  "wbc->range_cyclic %#x\n",
+		  ino, wbc->nr_to_write,
+		  (u64)wbc->range_start,
+		  (u64)wbc->range_end,
+		  (u64)mapping->writeback_index,
+		  wbc->range_cyclic);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	batch = ssdfs_dirty_folios_batch_alloc();
+	if (IS_ERR_OR_NULL(batch)) {
+		ret = (batch == NULL ? -ENOMEM : PTR_ERR(batch));
+		SSDFS_ERR("unable to allocate dirty folios batch\n");
+		return ret;
+	}
+
+	ssdfs_segment_request_pool_init(&pool);
+	ssdfs_dirty_folios_batch_init(batch);
+
+	/*
+	 * No folios to write?
+	 */
+	if (!mapping->nrpages || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
+		goto out_writepages;
+
+	folio_batch_init(&fvec);
+	folio_batch_init(&block_vec);
+
+	if (wbc->range_cyclic) {
+		index = mapping->writeback_index; /* prev offset */
+		end = -1;
+	} else {
+		index = wbc->range_start >> PAGE_SHIFT;
+		end = wbc->range_end >> PAGE_SHIFT;
+
+		if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
+			range_whole = 1;
+	}
+
+	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) {
+		tag = PAGECACHE_TAG_TOWRITE;
+		tag_pages_for_writeback(mapping, index, end);
+	} else
+		tag = PAGECACHE_TAG_DIRTY;
+
+	done_index = index;
+
+	while (!done && (index <= end)) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("index %llu, end %llu, done_index %llu, "
+			  "done %#x, tag %#x\n",
+			  (u64)index, (u64)end, (u64)done_index, done, tag);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		folios_count = filemap_get_folios_tag(mapping, &index, end,
+							tag, &fvec);
+		if (folios_count == 0) {
+			if (!is_ssdfs_file_inline(ii) &&
+			    is_ssdfs_dirty_batch_not_processed(batch)) {
+				ret = ssdfs_issue_write_request(wbc,
+						&pool, batch,
+						SSDFS_EXTENT_BASED_REQUEST);
+				if (ret < 0) {
+					SSDFS_ERR("ino %lu, nr_to_write %lu, "
+						  "range_start %llu, "
+						  "range_end %llu, "
+						  "writeback_index %llu, "
+						  "wbc->range_cyclic %#x, "
+						  "index %llu, end %llu, "
+						  "done_index %llu\n",
+						  ino, wbc->nr_to_write,
+						  (u64)wbc->range_start,
+						  (u64)wbc->range_end,
+						  (u64)mapping->writeback_index,
+						  wbc->range_cyclic,
+						  (u64)index, (u64)end,
+						  (u64)done_index);
+					goto out_writepages;
+				}
+			}
+
+			break;
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("FOUND: folios_count %d\n", folios_count);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		for (i = 0; i < folios_count; i++) {
+			struct folio *folio = fvec.folios[i];
+			unsigned long nr;
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("folio %p, index %d, "
+				  "folio->index %ld, end %llu\n",
+				  folio, i, folio->index, (u64)end);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			ret = 0;
+
+			/*
+			 * At this point, the page may be truncated or
+			 * invalidated (changing page->mapping to NULL), or
+			 * even swizzled back from swapper_space to tmpfs file
+			 * mapping. However, page->index will not change
+			 * because we have a reference on the page.
+			 */
+			if (folio->index > end) {
+				/*
+				 * can't be range_cyclic (1st pass) because
+				 * end == -1 in that case.
+				 */
+				done = 1;
+				break;
+			}
+
+			done_index = folio->index + 1;
+
+			ssdfs_folio_lock(folio);
+
+			/*
+			 * Page truncated or invalidated. We can freely skip it
+			 * then, even for data integrity operations: the page
+			 * has disappeared concurrently, so there could be no
+			 * real expectation of this data interity operation
+			 * even if there is now a new, dirty page at the same
+			 * pagecache address.
+			 */
+			if (unlikely(folio->mapping != mapping)) {
+continue_unlock:
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("UNLOCK FOLIO: index %ld\n",
+					  folio->index);
+#endif /* CONFIG_SSDFS_DEBUG */
+				ssdfs_folio_unlock(folio);
+				continue;
+			}
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("folio %p, index %d, folio->index %ld, "
+				  "folio_test_locked %#x, "
+				  "folio_test_dirty %#x, "
+				  "folio_test_writeback %#x\n",
+				  folio, i, folio->index,
+				  folio_test_locked(folio),
+				  folio_test_dirty(folio),
+				  folio_test_writeback(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			if (!folio_test_dirty(folio)) {
+				/* someone wrote it for us */
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("FOLIO IS NOT DIRTY: index %ld\n",
+					  folio->index);
+#endif /* CONFIG_SSDFS_DEBUG */
+				goto continue_unlock;
+			}
+
+			if (folio_test_writeback(folio)) {
+#ifdef CONFIG_SSDFS_DEBUG
+				SSDFS_DBG("FOLIO IS UNDER WRITEBACK: "
+					  "index %ld\n",
+					  folio->index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+				if (wbc->sync_mode != WB_SYNC_NONE) {
+#ifdef CONFIG_SSDFS_DEBUG
+					SSDFS_DBG("WAIT WRITEBACK: "
+						  "folio_index %ld\n",
+						  folio->index);
+#endif /* CONFIG_SSDFS_DEBUG */
+					folio_wait_writeback(folio);
+				} else
+					goto continue_unlock;
+			}
+
+			BUG_ON(folio_test_writeback(folio));
+			if (!folio_clear_dirty_for_io(folio))
+				goto continue_unlock;
+
+			ret = ssdfs_writepage_wrapper(folio, wbc,
+						      &pool, batch,
+						      __ssdfs_writepages);
+			nr = folio_nr_pages(folio);
+
+			if (ret) {
+				if (ret == -EAGAIN) {
+					/*
+					 * Not all folios of the logical block
+					 * is processed: continue processing folios
+					 */
+					done_index = folio->index + nr;
+				} else if (ret == -EROFS) {
+					/*
+					 * continue to discard folios
+					 */
+					done_index = folio->index + nr;
+				} else {
+					/*
+					 * done_index is set past this page,
+					 * so media errors will not choke
+					 * background writeout for the entire
+					 * file. This has consequences for
+					 * range_cyclic semantics (ie. it may
+					 * not be suitable for data integrity
+					 * writeout).
+					 */
+					done_index = folio->index + nr;
+					done = 1;
+					break;
+				}
+			}
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("folio %p, index %d, folio->index %ld, "
+				  "folio_test_locked %#x, "
+				  "folio_test_dirty %#x, "
+				  "folio_test_writeback %#x\n",
+				  folio, i, folio->index,
+				  folio_test_locked(folio),
+				  folio_test_dirty(folio),
+				  folio_test_writeback(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			/*
+			 * We stop writing back only if we are not doing
+			 * integrity sync. In case of integrity sync we have to
+			 * keep going until we have written all the pages
+			 * we tagged for writeback prior to entering this loop.
+			 */
+			wbc->nr_to_write -= nr;
+			if (wbc->nr_to_write <= 0 &&
+			    wbc->sync_mode == WB_SYNC_NONE) {
+				done = 1;
+				break;
+			}
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("wbc->nr_to_write %lu, "
+				  "wbc->sync_mode %#x, "
+				  "done_index %llu, "
+				  "done %#x\n",
+				  wbc->nr_to_write,
+				  wbc->sync_mode,
+				  (u64)done_index,
+				  done);
+#endif /* CONFIG_SSDFS_DEBUG */
+		}
+
+		if (ret != -EAGAIN &&
+		    !is_ssdfs_file_inline(ii) &&
+		    is_ssdfs_dirty_batch_not_processed(batch)) {
+			ret = ssdfs_issue_write_request(wbc, &pool, batch,
+						SSDFS_EXTENT_BASED_REQUEST);
+			if (ret < 0) {
+				SSDFS_ERR("ino %lu, nr_to_write %lu, "
+					  "range_start %llu, range_end %llu, "
+					  "writeback_index %llu, "
+					  "wbc->range_cyclic %#x, "
+					  "index %llu, end %llu, "
+					  "done_index %llu\n",
+					  ino, wbc->nr_to_write,
+					  (u64)wbc->range_start,
+					  (u64)wbc->range_end,
+					  (u64)mapping->writeback_index,
+					  wbc->range_cyclic,
+					  (u64)index, (u64)end,
+					  (u64)done_index);
+				goto out_writepages;
+			}
+		}
+
+		index = done_index;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("index %llu, end %llu, nr_to_write %lu\n",
+			  (u64)index, (u64)end, wbc->nr_to_write);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		folio_batch_reinit(&fvec);
+		cond_resched();
+	};
+
+	if (!ret) {
+		ret = ssdfs_wait_write_pool_requests_end(fsi, &pool);
+		if (unlikely(ret)) {
+			SSDFS_ERR("finish write request failed: "
+				  "err %d\n", ret);
+		}
+	} else
+		ssdfs_clean_failed_request_pool(&pool);
+
+	/*
+	 * If we hit the last page and there is more work to be done: wrap
+	 * back the index back to the start of the file for the next
+	 * time we are called.
+	 */
+	if (wbc->range_cyclic && !done)
+		done_index = 0;
+
+out_writepages:
+	if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))
+		mapping->writeback_index = done_index;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, nr_to_write %lu, "
+		  "range_whole %d, done_index %llu, done %#x, "
+		  "range_start %llu, range_end %llu, "
+		  "writeback_index %llu\n",
+		  ino, wbc->nr_to_write,
+		  range_whole, (u64)done_index, done,
+		  (u64)wbc->range_start,
+		  (u64)wbc->range_end,
+		  (u64)mapping->writeback_index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_dirty_folios_batch_free(batch);
+
+	return ret;
+}
+
+static void ssdfs_write_failed(struct address_space *mapping, loff_t to)
+{
+	struct inode *inode = mapping->host;
+
+	if (to > inode->i_size)
+		truncate_pagecache(inode, inode->i_size);
+}
+
+static inline
+struct folio *ssdfs_get_block_folio(struct file *file,
+				    struct address_space *mapping,
+				    pgoff_t index)
+{
+	struct inode *inode = mapping->host;
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct folio *folio = NULL;
+	fgf_t fgp_flags = FGP_WRITEBEGIN;
+	unsigned int nofs_flags;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, index %lu\n",
+		  inode->i_ino, index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	fgp_flags |= fgf_set_order(fsi->pagesize);
+
+	nofs_flags = memalloc_nofs_save();
+	folio = __filemap_get_folio(mapping, index, fgp_flags,
+				    mapping_gfp_mask(mapping));
+	memalloc_nofs_restore(nofs_flags);
+
+	if (!folio) {
+		SSDFS_ERR("fail to grab folio: index %lu\n",
+			  index);
+		return ERR_PTR(-ENOMEM);
+	} else if (IS_ERR(folio)) {
+		SSDFS_ERR("fail to grab folio: index %lu, err %ld\n",
+			  index, PTR_ERR(folio));
+		return folio;
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio %p, count %d, folio_index %lu, "
+		  "folio_size %zu, page_size %u, "
+		  "fgp_flags %#x, order %u\n",
+		  folio, folio_ref_count(folio),
+		  folio->index, folio_size(folio),
+		  fsi->pagesize, fgp_flags,
+		  FGF_GET_ORDER(fgp_flags));
+
+	SSDFS_DBG("folio->index %ld, "
+		  "folio_test_locked %#x, "
+		  "folio_test_uptodate %#x, "
+		  "folio_test_dirty %#x, "
+		  "folio_test_writeback %#x\n",
+		  folio->index,
+		  folio_test_locked(folio),
+		  folio_test_uptodate(folio),
+		  folio_test_dirty(folio),
+		  folio_test_writeback(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	ssdfs_account_locked_folio(folio);
+
+	return folio;
+}
+
+static inline
+void ssdfs_folio_test_and_set_if_new(struct inode *inode,
+				     struct folio *folio)
+{
+	pgoff_t last_folio;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!inode || !folio);
+
+	SSDFS_DBG("ino %lu, folio_index %lu\n",
+		  inode->i_ino, folio->index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	last_folio = (i_size_read(inode) - 1) >> PAGE_SHIFT;
+
+	if (i_size_read(inode) == 0) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("SET NEW FOLIO: "
+			  "file_size %llu, last_folio %lu, "
+			  "folio_index %lu\n",
+			  (u64)i_size_read(inode),
+			  last_folio,
+			  folio->index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		set_folio_new(folio);
+	} else {
+		if (folio->index > last_folio ||
+		    !folio_test_uptodate(folio)) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("SET NEW FOLIO: "
+				  "file_size %llu, last_folio %lu, "
+				  "folio_index %lu\n",
+				  (u64)i_size_read(inode),
+				  last_folio,
+				  folio->index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			set_folio_new(folio);
+		}
+	}
+}
+
+static
+int ssdfs_process_whole_block(struct file *file,
+			      struct address_space *mapping,
+			      struct folio *requested_folio,
+			      loff_t pos, u32 len,
+			      bool is_new_blk)
+{
+	struct inode *inode = mapping->host;
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct folio *cur_folio;
+	struct folio_batch fbatch;
+	pgoff_t index;
+	unsigned start;
+	unsigned end;
+	loff_t logical_offset;
+	u32 processed_bytes = 0;
+	bool need_read_block = false;
+	int i;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	BUG_ON(!requested_folio);
+
+	SSDFS_DBG("ino %lu, pos %llu, len %u\n",
+		  inode->i_ino, pos, len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	folio_batch_init(&fbatch);
+
+	logical_offset = (loff_t)requested_folio->index << PAGE_SHIFT;
+	logical_offset >>= fsi->log_pagesize;
+	logical_offset <<= fsi->log_pagesize;
+
+	index = logical_offset >> PAGE_SHIFT;
+
+	while (processed_bytes < fsi->pagesize) {
+		if (index == requested_folio->index) {
+			cur_folio = requested_folio;
+		} else {
+			cur_folio = ssdfs_get_block_folio(file, mapping, index);
+			if (unlikely(IS_ERR_OR_NULL(cur_folio))) {
+				err = IS_ERR(cur_folio) ?
+						PTR_ERR(cur_folio) : -ERANGE;
+				SSDFS_ERR("fail to get block's folio: "
+					  "index %lu, err %d\n",
+					  index, err);
+				return err;
+			}
+
+			if (is_new_blk) {
+				ssdfs_folio_test_and_set_if_new(inode,
+								cur_folio);
+			}
+		}
+
+		if (folio_test_uptodate(cur_folio)) {
+			folio_batch_add(&fbatch, cur_folio);
+			goto check_next_folio;
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("logical_offset %llu, inode_size %llu\n",
+			  logical_offset, (u64)i_size_read(inode));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		start = offset_in_folio(cur_folio, logical_offset);
+		end = folio_size(cur_folio) - start;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("start %u, end %u, "
+			  "len %u, processed_bytes %u, "
+			  "folio_size %zu\n",
+			  start, end, len, processed_bytes,
+			  folio_size(cur_folio));
+
+		BUG_ON(processed_bytes > fsi->pagesize);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if ((logical_offset & PAGE_MASK) >= i_size_read(inode)) {
+			/* Reading beyond i_size is simple: memset to zero */
+			folio_zero_segments(cur_folio, 0, start, end,
+					    folio_size(cur_folio));
+			folio_mark_uptodate(cur_folio);
+			flush_dcache_folio(cur_folio);
+		} else if (len >= (folio_size(cur_folio) + processed_bytes)) {
+			folio_zero_segments(cur_folio, 0, start, end,
+					    folio_size(cur_folio));
+			folio_mark_uptodate(cur_folio);
+			flush_dcache_folio(cur_folio);
+		} else {
+			need_read_block = true;
+		}
+
+		folio_batch_add(&fbatch, cur_folio);
+
+check_next_folio:
+		processed_bytes += folio_size(cur_folio);
+		index += folio_size(cur_folio) >> PAGE_SHIFT;
+	}
+
+	if (need_read_block) {
+		err = ssdfs_read_block_nolock(file, &fbatch,
+						SSDFS_CURRENT_THREAD_READ);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to read folio: "
+				  "index %lu, err %d\n",
+				  index, err);
+			return err;
+		}
+	}
+
+	for (i = 0; i < folio_batch_count(&fbatch); i++) {
+		cur_folio = fbatch.folios[i];
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(!cur_folio);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (cur_folio != requested_folio)
+			ssdfs_folio_unlock(cur_folio);
+	}
+
+	return 0;
+}
+
+static
+int ssdfs_write_begin_inline_file(struct file *file,
+				  struct address_space *mapping,
+				  loff_t pos, unsigned len,
+				  struct folio **foliop, void **fsdata)
+{
+	struct inode *inode = mapping->host;
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct ssdfs_inode_info *ii = SSDFS_I(inode);
+	struct folio *first_folio;
+	pgoff_t index = pos >> PAGE_SHIFT;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, pos %llu, len %u\n",
+		  inode->i_ino, pos, len);
+
+	if (!can_file_be_inline(inode, i_size_read(inode))) {
+		SSDFS_ERR("not inline file: "
+			  "ino %lu, pos %llu, "
+			  "len %u, file size %llu\n",
+			  inode->i_ino, pos, len,
+			  (u64)i_size_read(inode));
+		return -EINVAL;
+	}
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!ii->inline_file) {
+		err = ssdfs_allocate_inline_file_buffer(inode);
+		if (unlikely(err)) {
+			SSDFS_ERR("fail to allocate inline buffer\n");
+			return -EAGAIN;
+		}
+
+		/*
+		 * TODO: pre-fetch file's content in buffer
+		 *       (if inode size > 256 bytes)
+		 */
+	}
+
+	atomic_or(SSDFS_INODE_HAS_INLINE_FILE,
+		  &SSDFS_I(inode)->private_flags);
+
+	first_folio = ssdfs_get_block_folio(file, mapping, index);
+	if (!first_folio) {
+		SSDFS_ERR("fail to grab folio: index %lu\n",
+			  index);
+		return -ENOMEM;
+	} else if (IS_ERR(first_folio)) {
+		SSDFS_ERR("fail to grab folio: index %lu, err %ld\n",
+			  index, PTR_ERR(first_folio));
+		return PTR_ERR(first_folio);
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio %p, count %d\n",
+		  first_folio, folio_ref_count(first_folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	*foliop = first_folio;
+
+	if ((len == fsi->pagesize) || folio_test_uptodate(first_folio))
+		return 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("pos %llu, inode_size %llu\n",
+		  pos, (u64)i_size_read(inode));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	err = ssdfs_process_whole_block(file, mapping, first_folio,
+					pos, len, false);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to process thw whole block: "
+			  "ino %lu, pos %llu, len %u, err %d\n",
+			  inode->i_ino, pos, len, err);
+		return err;
+	}
+
+	return 0;
+}
+
+static
+struct folio *ssdfs_write_begin_logical_block(struct file *file,
+					      struct address_space *mapping,
+					      loff_t pos, u32 len)
+{
+	struct inode *inode = mapping->host;
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct folio *first_folio;
+	pgoff_t index = pos >> PAGE_SHIFT;
+	loff_t cur_blk;
+	u64 last_blk = U64_MAX;
+	bool is_new_blk = false;
+#ifdef CONFIG_SSDFS_DEBUG
+	u64 free_blocks = 0;
+#endif /* CONFIG_SSDFS_DEBUG */
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, pos %llu, len %u\n",
+		  inode->i_ino, pos, len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	atomic_and(~SSDFS_INODE_HAS_INLINE_FILE,
+		   &SSDFS_I(inode)->private_flags);
+
+	if (can_file_be_inline(inode, i_size_read(inode))) {
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("change from inline to regular file\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		last_blk = U64_MAX;
+	} else if (i_size_read(inode) > 0) {
+		last_blk = (i_size_read(inode) - 1) >>
+					fsi->log_pagesize;
+	}
+
+	cur_blk = pos >> fsi->log_pagesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_blk %llu, last_blk %llu\n",
+		  (u64)cur_blk, (u64)last_blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	first_folio = ssdfs_get_block_folio(file, mapping, index);
+	if (!first_folio) {
+		SSDFS_ERR("fail to grab folio: index %lu\n",
+			  index);
+		return ERR_PTR(-ENOMEM);
+	} else if (IS_ERR(first_folio)) {
+		SSDFS_ERR("fail to grab folio: index %lu, err %ld\n",
+			  index, PTR_ERR(first_folio));
+		return first_folio;
+	}
+
+	if (pos % folio_size(first_folio)) {
+		if (folio_test_uptodate(first_folio))
+			return first_folio;
+		else {
+			SSDFS_ERR("invalid request: "
+				  "pos %llu, pagesize %u, "
+				  "folio_size %zu\n",
+				  pos, fsi->pagesize,
+				  folio_size(first_folio));
+			ssdfs_folio_unlock(first_folio);
+			ssdfs_folio_put(first_folio);
+			return ERR_PTR(-ERANGE);
+		}
+	}
+
+	if (last_blk >= U64_MAX) {
+		is_new_blk = true;
+	} else if (cur_blk > last_blk) {
+		is_new_blk = true;
+	} else if (!folio_test_uptodate(first_folio)) {
+		is_new_blk = !ssdfs_extents_tree_has_logical_block(cur_blk,
+								   inode);
+	}
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("cur_blk %llu, is_new_blk %#x\n",
+		  (u64)cur_blk, is_new_blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (is_new_blk) {
+		if (!need_add_block(first_folio)) {
+			err = ssdfs_reserve_free_pages(fsi, 1,
+						SSDFS_USER_DATA_PAGES);
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		spin_lock(&fsi->volume_state_lock);
+		free_blocks = fsi->free_pages;
+		spin_unlock(&fsi->volume_state_lock);
+
+		SSDFS_DBG("free_blocks %llu, err %d\n",
+			  free_blocks, err);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		if (err) {
+			ssdfs_increase_volume_free_pages(fsi, 1);
+
+			ssdfs_folio_unlock(first_folio);
+			ssdfs_folio_put(first_folio);
+
+			ssdfs_write_failed(mapping, pos);
+
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("folio %p, count %d\n",
+				  first_folio, folio_ref_count(first_folio));
+			SSDFS_DBG("volume hasn't free space\n");
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			return ERR_PTR(err);
+		}
+
+		ssdfs_folio_test_and_set_if_new(inode, first_folio);
+	}
+
+	err = ssdfs_process_whole_block(file, mapping, first_folio,
+					pos, len, is_new_blk);
+	if (unlikely(err)) {
+		SSDFS_ERR("fail to process the whole block: "
+			  "ino %lu, pos %llu, len %u, err %d\n",
+			  inode->i_ino, pos, len, err);
+		return ERR_PTR(err);
+	}
+
+	return first_folio;
+}
+
+/*
+ * The ssdfs_write_begin() is called by the generic
+ * buffered write code to ask the filesystem to prepare
+ * to write len bytes at the given offset in the file.
+ */
+static
+int ssdfs_write_begin(const struct kiocb *iocb,
+		      struct address_space *mapping,
+		      loff_t pos, unsigned len,
+		      struct folio **foliop, void **fsdata)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = mapping->host;
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	struct folio *first_folio = NULL;
+	loff_t cur_pos, next_pos;
+	loff_t start_blk, end_blk, cur_blk;
+	u32 processed_bytes = 0;
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, pos %llu, len %u\n",
+		  inode->i_ino, pos, len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (inode->i_sb->s_flags & SB_RDONLY)
+		return -EROFS;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, large_folios_support %#x\n",
+		  inode->i_ino,
+		  mapping_large_folio_support(mapping));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (!can_file_be_inline(inode, i_size_read(inode))) {
+		/*
+		 * Process as regular file
+		 */
+		goto try_regular_write;
+	} else if (can_file_be_inline(inode, pos + len)) {
+		err = ssdfs_write_begin_inline_file(file, mapping,
+						    pos, len,
+						    foliop, fsdata);
+		if (err == -EAGAIN) {
+			/*
+			 * Process as regular file
+			 */
+			err = 0;
+			goto try_regular_write;
+		} else if (unlikely(err)) {
+			SSDFS_ERR("fail to process inline file: "
+				  "ino %lu, pos %llu, len %u, err %d\n",
+				  inode->i_ino, pos, len, err);
+			goto finish_write_begin;
+		} else
+			goto finish_write_begin;
+	} else {
+try_regular_write:
+		cur_pos = pos;
+		start_blk = pos >> fsi->log_pagesize;
+		end_blk = (pos + len + fsi->pagesize - 1) >> fsi->log_pagesize;
+
+#ifdef CONFIG_SSDFS_DEBUG
+		BUG_ON(start_blk == end_blk);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		for (cur_blk = start_blk; cur_blk < end_blk; cur_blk++) {
+			struct folio *folio;
+			u32 cur_len;
+
+#ifdef CONFIG_SSDFS_DEBUG
+			BUG_ON(processed_bytes > len);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			next_pos = (cur_blk + 1) << fsi->log_pagesize;
+			cur_len = min_t(u32, len - processed_bytes,
+						next_pos - cur_pos);
+
+			folio = ssdfs_write_begin_logical_block(file,
+								mapping,
+								cur_pos,
+								cur_len);
+			if (IS_ERR_OR_NULL(folio)) {
+				err = IS_ERR(folio) ? PTR_ERR(folio) : -ERANGE;
+				SSDFS_ERR("fail to process folio: "
+					  "ino %lu, pos %llu, err %d\n",
+					  inode->i_ino, cur_pos, err);
+				goto finish_write_begin;
+			}
+
+			if (!first_folio)
+				first_folio = folio;
+
+			processed_bytes += next_pos - cur_pos;
+			cur_pos = next_pos;
+		}
+
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("folio %p, count %d\n",
+			  first_folio,
+			  folio_ref_count(first_folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+		*foliop = first_folio;
+	}
+
+finish_write_begin:
+	return err;
+}
+
+/*
+ * After a successful ssdfs_write_begin(), and data copy,
+ * ssdfs_write_end() must be called.
+ */
+static
+int ssdfs_write_end(const struct kiocb *iocb,
+		    struct address_space *mapping,
+		    loff_t pos, unsigned len, unsigned copied,
+		    struct folio *folio, void *fsdata)
+{
+	struct inode *inode = mapping->host;
+	struct ssdfs_fs_info *fsi = SSDFS_FS_I(inode->i_sb);
+	pgoff_t index = folio->index;
+	unsigned start = offset_in_folio(folio, pos);
+	unsigned end = start + copied;
+	loff_t old_size = i_size_read(inode);
+	int err = 0;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, pos %llu, len %u, copied %u, "
+		  "index %lu, start %u, end %u, old_size %llu, "
+		  "folio_size %zu, need_add_block %#x, "
+		  "folio_test_dirty %#x\n",
+		  inode->i_ino, pos, len, copied,
+		  index, start, end, old_size,
+		  folio_size(folio),
+		  need_add_block(folio),
+		  folio_test_dirty(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	if (copied < len) {
+		/*
+		 * VFS copied less data to the folio that it intended and
+		 * declared in its '->write_begin()' call via the @len
+		 * argument. Just tell userspace to retry the entire block.
+		 */
+		if (!folio_test_uptodate(folio)) {
+			copied = 0;
+			goto out;
+		}
+	}
+
+	if (!need_add_block(folio) && !folio_test_dirty(folio)) {
+		u64 folio_offset;
+		u32 offset_inside_folio;
+
+		folio_offset = (u64)folio->index << PAGE_SHIFT;
+		div_u64_rem(folio_offset, fsi->pagesize, &offset_inside_folio);
+
+		if (offset_inside_folio == 0) {
+#ifdef CONFIG_SSDFS_DEBUG
+			SSDFS_DBG("ACCOUNT UPDATED USER DATA PAGES: "
+				  "ino %lu, pos %llu, len %u, "
+				  "folio_index %lu\n",
+				  inode->i_ino, pos, len,
+				  folio->index);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+			ssdfs_account_updated_user_data_pages(fsi,
+					SSDFS_MEM_PAGES_PER_LOGICAL_BLOCK(fsi));
+		}
+	}
+
+	if (old_size < (index << PAGE_SHIFT) + end) {
+		i_size_write(inode, (index << PAGE_SHIFT) + end);
+		mark_inode_dirty_sync(inode);
+	}
+
+	flush_dcache_folio(folio);
+
+	folio_mark_uptodate(folio);
+	if (!folio_test_dirty(folio))
+		filemap_dirty_folio(mapping, folio);
+
+out:
+	ssdfs_folio_unlock(folio);
+	ssdfs_folio_put(folio);
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("folio %p, count %d, "
+		  "folio_test_dirty %#x\n",
+		  folio, folio_ref_count(folio),
+		  folio_test_dirty(folio));
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	return err ? err : copied;
+}
+
+/*
+ * The ssdfs_direct_IO() is called by the generic read/write
+ * routines to perform direct_IO - that is IO requests which
+ * bypass the page cache and transfer data directly between
+ * the storage and the application's address space.
+ */
+static ssize_t ssdfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
+{
+	/* TODO: implement */
+	return -ERANGE;
+}
+
+/*
+ * The ssdfs_fsync() is called by the fsync(2) system call.
+ */
+int ssdfs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
+{
+	struct inode *inode = file->f_mapping->host;
+	int err;
+
+#ifdef CONFIG_SSDFS_DEBUG
+	SSDFS_DBG("ino %lu, start %llu, end %llu, datasync %#x\n",
+		  (unsigned long)inode->i_ino, (unsigned long long)start,
+		  (unsigned long long)end, datasync);
+#endif /* CONFIG_SSDFS_DEBUG */
+
+	trace_ssdfs_sync_file_enter(inode);
+
+	err = filemap_write_and_wait_range(inode->i_mapping, start, end);
+	if (err) {
+		trace_ssdfs_sync_file_exit(file, datasync, err);
+#ifdef CONFIG_SSDFS_DEBUG
+		SSDFS_DBG("fsync failed: ino %lu, start %llu, "
+			  "end %llu, err %d\n",
+			  (unsigned long)inode->i_ino,
+			  (unsigned long long)start,
+			  (unsigned long long)end,
+			  err);
+#endif /* CONFIG_SSDFS_DEBUG */
+		return err;
+	}
+
+	inode_lock(inode);
+	sync_inode_metadata(inode, 1);
+	blkdev_issue_flush(inode->i_sb->s_bdev);
+	inode_unlock(inode);
+
+	trace_ssdfs_sync_file_exit(file, datasync, err);
+
+	return err;
+}
+
+const struct file_operations ssdfs_file_operations = {
+	.llseek		= generic_file_llseek,
+	.read_iter	= ssdfs_file_read_iter,
+	.write_iter	= generic_file_write_iter,
+	.unlocked_ioctl	= ssdfs_ioctl,
+	.mmap		= generic_file_mmap,
+	.open		= generic_file_open,
+	.fsync		= ssdfs_fsync,
+	.splice_read	= filemap_splice_read,
+	.splice_write	= iter_file_splice_write,
+};
+
+const struct inode_operations ssdfs_file_inode_operations = {
+	.getattr	= ssdfs_getattr,
+	.setattr	= ssdfs_setattr,
+	.listxattr	= ssdfs_listxattr,
+	.get_inode_acl	= ssdfs_get_acl,
+	.set_acl	= ssdfs_set_acl,
+};
+
+const struct inode_operations ssdfs_special_inode_operations = {
+	.setattr	= ssdfs_setattr,
+	.listxattr	= ssdfs_listxattr,
+	.get_inode_acl	= ssdfs_get_acl,
+	.set_acl	= ssdfs_set_acl,
+};
+
+const struct inode_operations ssdfs_symlink_inode_operations = {
+	.get_link	= page_get_link,
+	.getattr	= ssdfs_getattr,
+	.setattr	= ssdfs_setattr,
+	.listxattr	= ssdfs_listxattr,
+};
+
+const struct address_space_operations ssdfs_aops = {
+	.read_folio		= ssdfs_read_block,
+	.readahead		= ssdfs_readahead,
+	.writepages		= ssdfs_writepages,
+	.write_begin		= ssdfs_write_begin,
+	.write_end		= ssdfs_write_end,
+	.migrate_folio		= filemap_migrate_folio,
+	.dirty_folio		= filemap_dirty_folio,
+	.direct_IO		= ssdfs_direct_IO,
+};
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2026-03-16  2:20 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-16  2:17 [PATCH v2 00/79] SSDFS: noGC ZNS/FDP-friendly LFS file system Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 01/79] ssdfs: introduce SSDFS on-disk layout Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 02/79] ssdfs: add key file system declarations Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 03/79] ssdfs: add key file system's function declarations Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 04/79] ssdfs: implement raw device operations Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 06/79] ssdfs: implement super operations Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 07/79] ssdfs: implement commit superblock logic Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 08/79] ssdfs: segment header + log footer operations Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 15/79] ssdfs: introduce PEB's block bitmap Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 17/79] ssdfs: implement support of migration scheme in PEB bitmap Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 19/79] ssdfs: introduce segment block bitmap Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 22/79] ssdfs: introduce offset translation table Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 24/79] ssdfs: introduce PEB object Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 27/79] ssdfs: introduce PEB container Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 33/79] ssdfs: introduce segment object Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 38/79] ssdfs: introduce PEB mapping table Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 40/79] ssdfs: introduce PEB mapping table cache Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 42/79] ssdfs: introduce segment bitmap Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 44/79] ssdfs: introduce b-tree object Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 46/79] ssdfs: introduce b-tree node object Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 48/79] ssdfs: introduce b-tree hierarchy object Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 50/79] ssdfs: introduce inodes b-tree Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 52/79] ssdfs: introduce dentries b-tree Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 55/79] ssdfs: introduce extents b-tree Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 57/79] ssdfs: introduce invalidated " Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 59/79] ssdfs: introduce shared " Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 61/79] ssdfs: introduce PEB-based deduplication technique Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 62/79] ssdfs: introduce shared dictionary b-tree Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 65/79] ssdfs: introduce snapshots b-tree Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 67/79] ssdfs: implement extended attributes support Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 71/79] ssdfs: implement IOCTL operations Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 75/79] ssdfs: implement inode operations support Viacheslav Dubeyko
2026-03-16  2:17 ` [PATCH v2 76/79] ssdfs: implement directory " Viacheslav Dubeyko
2026-03-16  2:18 ` [PATCH v2 77/79] ssdfs: implement file " Viacheslav Dubeyko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox