* [PATCH v5 00/11] dm-pcache – persistent-memory cache for block devices
@ 2025-08-12 8:24 Dongsheng Yang
2025-08-12 8:24 ` [PATCH v5 1/1] dm-pcache: add persistent cache target in device-mapper Dongsheng Yang
2025-08-14 20:58 ` [PATCH v5 00/11] dm-pcache – persistent-memory cache for block devices Mikulas Patocka
0 siblings, 2 replies; 4+ messages in thread
From: Dongsheng Yang @ 2025-08-12 8:24 UTC (permalink / raw)
To: mpatocka, agk, snitzer; +Cc: dm-devel, cengku, Dongsheng Yang
Hi Mikulas,
This is V5 for dm-pcache, please consider merging.
As you requested, I've squashed these patches into a single patch, based
on v6.16. If you find anything that needs tweaking while merging, feel
free to modify it directly. I can also send a rebased version on top of the linux-dm branch if you
prefer.
This patch has gone through quite a bit of testing on my side. I can
share the highlights. Once you merge this patch into the linux-dm branch,
I’ll keep running ongoing tests against dm-6.18.
1. dtg-tests for pcache[1]:
- Includes `dmsetup` management operations, including failure/exception scenarios.
- Failslab tests (covering the NOWAIT → NOIO fallback path you mentioned).
- `fail_make_request` fault-injection tests.
- Backing device delay tests.
- xfstests (group `generic/rw`) under different parameters.
2. Striped pmem as the cache_dev[2]: ran the same dtg-tests suite with a
striped pmem configuration.
3. KASAN and memleak modes[3]: all of the above tests were also executed
with KASAN and memory leak detection enabled.
4. Coverage with gcov: I used gcov to gather code coverage during the
test runs[4].
- The results show 100% function coverage and 90%+ line coverage. The
remaining uncovered code is mostly `BUG()` lines and a few hard-to-simulate error paths.
- In the coverage report you can see the code path for `mempool_alloc`
with `GFP_NOIO` is exercised and covered.
[1]: https://datatravelguide.github.io/dtg-blog/pcache/pcache_v5_result_virtio/results.html
[2]: https://datatravelguide.github.io/dtg-blog/pcache/pcache_v5_result_ram/results.html
[3]: https://datatravelguide.github.io/dtg-blog/pcache/pcache_v5_result_kasan/results.html
[4]: https://datatravelguide.github.io/dtg-blog/pcache/pcache_cov_v5/dm-pcache/index.html
Changelogs
V5 from V4:
- Fix `get_n_vecs()` bug when the buffer isn’t page-aligned; use
- `bio_add_max_vecs()` instead.
- Use `kvcalloc()` for allocations larger than 32,768 bytes.
- For mempool allocations, try `GFP_NOWAIT` first; if that fails, fall back to `GFP_NOIO`.
V4 from V3:
- go back to use mempool for allocation of cache_key and
backing_dev_req.
- introduce backing_bvec_cache and backing_dev->bvec_pool to
allocate bvecs for backing_dev_req. Currently, the obj_size of
bvec_cache is max size of bvecs. The bvecs are allocated only for
write‑back backing_dev_req requests, and only a limited number of writeback requests
are allocated at the same time, so the resulting memory overhead should
not be too severe.
If in the futer we found that this implementation results in unacceptable memory waste,
we can optimize it by switching to a multi‑level slab design.
- remove ret-value checking for bio_init_clone() as there should
not be REQ_INTEGRITY in bio bi_opf, so bio_init_clone() never
fail.
- remove ret-value checking for backing_dev_req_alloc() and
cache_key_alloc().
V3 from V2:
- rebased against linux-dm dm-6.17
- add missing include file bitfiled.h (Mikulas)
- move kmem_cache from per-device to per-module (Mikulas)
- fix a memleak in error path found by failslab testing
- retry pcache_request in defer_req() when memory allocation
fail.
V2 from V1:
- introduce req_alloc() and req_init() in backing_dev.c, then we
can do req_alloc() before holding spinlock and do req_init()
in subtree_walk().
- introduce pre_alloc_key and pre_alloc_req in walk_ctx, that
means we can pre-allocate cache_key or backing_dev_request
before subtree walking.
- use mempool_alloc() with NOIO for the allocation of cache_key
and backing_dev_req.
- some coding style changes from comments of Jonathan.
V1 from RFC-V2:
- use crc32c to replace crc32
- only retry pcache_req when cache full, add pcache_req into defer_list,
and wait cache invalidation happen.
- new format for pcache table, it is more easily extended with
new parameters later.
- remove __packed.
- use spin_lock_irq in req_complete_fn to replace
spin_lock_irqsave.
- fix bug in backing_dev_bio_end with spin_lock_irqsave.
- queue_work() inside spinlock.
- introduce inline_bvecs in backing_dev_req.
- use kmalloc_array for bvecs allocation.
- calculate ->off with dm_target_offset() before use it.
Dongsheng Yang (1):
dm-pcache: add persistent cache target in device-mapper
.../admin-guide/device-mapper/dm-pcache.rst | 201 ++++
MAINTAINERS | 8 +
drivers/md/Kconfig | 2 +
drivers/md/Makefile | 1 +
drivers/md/dm-pcache/Kconfig | 17 +
drivers/md/dm-pcache/Makefile | 3 +
drivers/md/dm-pcache/backing_dev.c | 374 ++++++++
drivers/md/dm-pcache/backing_dev.h | 100 ++
drivers/md/dm-pcache/cache.c | 445 +++++++++
drivers/md/dm-pcache/cache.h | 632 +++++++++++++
drivers/md/dm-pcache/cache_dev.c | 299 ++++++
drivers/md/dm-pcache/cache_dev.h | 70 ++
drivers/md/dm-pcache/cache_gc.c | 170 ++++
drivers/md/dm-pcache/cache_key.c | 888 ++++++++++++++++++
drivers/md/dm-pcache/cache_req.c | 835 ++++++++++++++++
drivers/md/dm-pcache/cache_segment.c | 293 ++++++
drivers/md/dm-pcache/cache_writeback.c | 258 +++++
drivers/md/dm-pcache/dm_pcache.c | 497 ++++++++++
drivers/md/dm-pcache/dm_pcache.h | 67 ++
drivers/md/dm-pcache/pcache_internal.h | 117 +++
drivers/md/dm-pcache/segment.c | 61 ++
drivers/md/dm-pcache/segment.h | 74 ++
22 files changed, 5412 insertions(+)
create mode 100644 Documentation/admin-guide/device-mapper/dm-pcache.rst
create mode 100644 drivers/md/dm-pcache/Kconfig
create mode 100644 drivers/md/dm-pcache/Makefile
create mode 100644 drivers/md/dm-pcache/backing_dev.c
create mode 100644 drivers/md/dm-pcache/backing_dev.h
create mode 100644 drivers/md/dm-pcache/cache.c
create mode 100644 drivers/md/dm-pcache/cache.h
create mode 100644 drivers/md/dm-pcache/cache_dev.c
create mode 100644 drivers/md/dm-pcache/cache_dev.h
create mode 100644 drivers/md/dm-pcache/cache_gc.c
create mode 100644 drivers/md/dm-pcache/cache_key.c
create mode 100644 drivers/md/dm-pcache/cache_req.c
create mode 100644 drivers/md/dm-pcache/cache_segment.c
create mode 100644 drivers/md/dm-pcache/cache_writeback.c
create mode 100644 drivers/md/dm-pcache/dm_pcache.c
create mode 100644 drivers/md/dm-pcache/dm_pcache.h
create mode 100644 drivers/md/dm-pcache/pcache_internal.h
create mode 100644 drivers/md/dm-pcache/segment.c
create mode 100644 drivers/md/dm-pcache/segment.h
--
2.43.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v5 1/1] dm-pcache: add persistent cache target in device-mapper
2025-08-12 8:24 [PATCH v5 00/11] dm-pcache – persistent-memory cache for block devices Dongsheng Yang
@ 2025-08-12 8:24 ` Dongsheng Yang
2025-08-14 20:58 ` [PATCH v5 00/11] dm-pcache – persistent-memory cache for block devices Mikulas Patocka
1 sibling, 0 replies; 4+ messages in thread
From: Dongsheng Yang @ 2025-08-12 8:24 UTC (permalink / raw)
To: mpatocka, agk, snitzer; +Cc: dm-devel, cengku, Dongsheng Yang
This patch introduces dm-pcache, a new DM target that places a DAX-
capable persistent-memory device in front of any slower block device and
uses it as a high-throughput, low-latency cache.
Design highlights
-----------------
- DAX data path – data is copied directly between DRAM and the pmem
mapping, bypassing the block layer’s overhead.
- Segmented, crash-consistent layout
- all layout metadata are dual-replicated CRC-protected.
- atomic kset flushes; key replay on mount guarantees cache integrity
even after power loss.
- Striped multi-tree index
- Multi‑tree indexing for high parallelism.
- overlap-resolution logic ensures non-intersecting cached extents.
- Background services
- write-back worker flushes dirty keys in order, preserving backing-device
crash consistency. This is important for checkpoint in cloud storage.
- garbage collector reclaims clean segments when utilisation exceeds a
tunable threshold.
- Data integrity – optional CRC32 on cached payload; metadata always protected.
Comparison with existing block-level caches
---------------------------------------------------------------------------------------------------------------------------------
| Feature | pcache (this patch) | bcache | dm-writecache |
|----------------------------------|---------------------------------|------------------------------|---------------------------|
| pmem access method | DAX | bio (block I/O) | DAX |
| Write latency (4 K rand-write) | ~5 µs | ~20 µs | ~5 µs |
| Concurrency | multi subtree index | global index tree | single tree + wc_lock |
| IOPS (4K randwrite, 32 numjobs) | 2.1 M | 352 K | 283 K |
| Read-cache support | YES | YES | NO |
| Deployment | no re-format of backend | backend devices must be | no re-format of backend |
| | | reformatted | |
| Write-back ordering | log-structured; | no ordering guarantee | no ordering guarantee |
| | preserves app-IO-order | | |
| Data integrity checks | metadata + data CRC(optional) | metadata CRC only | none |
---------------------------------------------------------------------------------------------------------------------------------
Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
---
.../admin-guide/device-mapper/dm-pcache.rst | 201 ++++
MAINTAINERS | 8 +
drivers/md/Kconfig | 2 +
drivers/md/Makefile | 1 +
drivers/md/dm-pcache/Kconfig | 17 +
drivers/md/dm-pcache/Makefile | 3 +
drivers/md/dm-pcache/backing_dev.c | 374 ++++++++
drivers/md/dm-pcache/backing_dev.h | 100 ++
drivers/md/dm-pcache/cache.c | 445 +++++++++
drivers/md/dm-pcache/cache.h | 632 +++++++++++++
drivers/md/dm-pcache/cache_dev.c | 299 ++++++
drivers/md/dm-pcache/cache_dev.h | 70 ++
drivers/md/dm-pcache/cache_gc.c | 170 ++++
drivers/md/dm-pcache/cache_key.c | 888 ++++++++++++++++++
drivers/md/dm-pcache/cache_req.c | 835 ++++++++++++++++
drivers/md/dm-pcache/cache_segment.c | 293 ++++++
drivers/md/dm-pcache/cache_writeback.c | 258 +++++
drivers/md/dm-pcache/dm_pcache.c | 497 ++++++++++
drivers/md/dm-pcache/dm_pcache.h | 67 ++
drivers/md/dm-pcache/pcache_internal.h | 117 +++
drivers/md/dm-pcache/segment.c | 61 ++
drivers/md/dm-pcache/segment.h | 74 ++
22 files changed, 5412 insertions(+)
create mode 100644 Documentation/admin-guide/device-mapper/dm-pcache.rst
create mode 100644 drivers/md/dm-pcache/Kconfig
create mode 100644 drivers/md/dm-pcache/Makefile
create mode 100644 drivers/md/dm-pcache/backing_dev.c
create mode 100644 drivers/md/dm-pcache/backing_dev.h
create mode 100644 drivers/md/dm-pcache/cache.c
create mode 100644 drivers/md/dm-pcache/cache.h
create mode 100644 drivers/md/dm-pcache/cache_dev.c
create mode 100644 drivers/md/dm-pcache/cache_dev.h
create mode 100644 drivers/md/dm-pcache/cache_gc.c
create mode 100644 drivers/md/dm-pcache/cache_key.c
create mode 100644 drivers/md/dm-pcache/cache_req.c
create mode 100644 drivers/md/dm-pcache/cache_segment.c
create mode 100644 drivers/md/dm-pcache/cache_writeback.c
create mode 100644 drivers/md/dm-pcache/dm_pcache.c
create mode 100644 drivers/md/dm-pcache/dm_pcache.h
create mode 100644 drivers/md/dm-pcache/pcache_internal.h
create mode 100644 drivers/md/dm-pcache/segment.c
create mode 100644 drivers/md/dm-pcache/segment.h
diff --git a/Documentation/admin-guide/device-mapper/dm-pcache.rst b/Documentation/admin-guide/device-mapper/dm-pcache.rst
new file mode 100644
index 000000000000..e6433fab7bd6
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/dm-pcache.rst
@@ -0,0 +1,201 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================================
+dm-pcache — Persistent Cache
+=================================
+
+*Author: Dongsheng Yang <dongsheng.yang@linux.dev>*
+
+This document describes *dm-pcache*, a Device-Mapper target that lets a
+byte-addressable *DAX* (persistent-memory, “pmem”) region act as a
+high-performance, crash-persistent cache in front of a slower block
+device. The code lives in `drivers/md/dm-pcache/`.
+
+Quick feature summary
+=====================
+
+* *Write-back* caching (only mode currently supported).
+* *16 MiB segments* allocated on the pmem device.
+* *Data CRC32* verification (optional, per cache).
+* Crash-safe: every metadata structure is duplicated (`PCACHE_META_INDEX_MAX
+ == 2`) and protected with CRC+sequence numbers.
+* *Multi-tree indexing* (one radix tree per CPU backend) for high PMem
+ parallelism
+* Pure *DAX path* I/O – no extra BIO round-trips
+* *Log-structured write-back* that preserves backend crash-consistency
+
+-------------------------------------------------------------------------------
+Constructor
+===========
+
+::
+
+ pcache <cache_dev> <backing_dev> [<number_of_optional_arguments> <cache_mode writeback> <data_crc true|false>]
+
+========================= ====================================================
+``cache_dev`` Any DAX-capable block device (``/dev/pmem0``…).
+ All metadata *and* cached blocks are stored here.
+
+``backing_dev`` The slow block device to be cached.
+
+``cache_mode`` Optional, Only ``writeback`` is accepted at the moment.
+
+``data_crc`` Optional, default to ``false``
+ ``true`` – store CRC32 for every cached entry and
+ verify on reads
+ ``false`` – skip CRC (faster)
+========================= ====================================================
+
+Example
+-------
+
+.. code-block:: shell
+
+ dmsetup create pcache_sdb --table \
+ "0 $(blockdev --getsz /dev/sdb) pcache /dev/pmem0 /dev/sdb 4 cache_mode writeback data_crc true"
+
+The first time a pmem device is used, dm-pcache formats it automatically
+(super-block, cache_info, etc.).
+
+-------------------------------------------------------------------------------
+Status line
+===========
+
+``dmsetup status <device>`` (``STATUSTYPE_INFO``) prints:
+
+::
+
+ <sb_flags> <seg_total> <cache_segs> <segs_used> \
+ <gc_percent> <cache_flags> \
+ <key_head_seg>:<key_head_off> \
+ <dirty_tail_seg>:<dirty_tail_off> \
+ <key_tail_seg>:<key_tail_off>
+
+Field meanings
+--------------
+
+=============================== =============================================
+``sb_flags`` Super-block flags (e.g. endian marker).
+
+``seg_total`` Number of physical *pmem* segments.
+
+``cache_segs`` Number of segments used for cache.
+
+``segs_used`` Segments currently allocated (bitmap weight).
+
+``gc_percent`` Current GC high-water mark (0-90).
+
+``cache_flags`` Bit 0 – DATA_CRC enabled
+ Bit 1 – INIT_DONE (cache initialised)
+ Bits 2-5 – cache mode (0 == WB).
+
+``key_head`` Where new key-sets are being written.
+
+``dirty_tail`` First dirty key-set that still needs
+ write-back to the backing device.
+
+``key_tail`` First key-set that may be reclaimed by GC.
+=============================== =============================================
+
+-------------------------------------------------------------------------------
+Messages
+========
+
+*Change GC trigger*
+
+::
+
+ dmsetup message <dev> 0 gc_percent <0-90>
+
+-------------------------------------------------------------------------------
+Theory of operation
+===================
+
+Sub-devices
+-----------
+
+==================== =========================================================
+backing_dev Any block device (SSD/HDD/loop/LVM, etc.).
+cache_dev DAX device; must expose direct-access memory.
+==================== =========================================================
+
+Segments and key-sets
+---------------------
+
+* The pmem space is divided into *16 MiB segments*.
+* Each write allocates space from a per-CPU *data_head* inside a segment.
+* A *cache-key* records a logical range on the origin and where it lives
+ inside pmem (segment + offset + generation).
+* 128 keys form a *key-set* (kset); ksets are written sequentially in pmem
+ and are themselves crash-safe (CRC).
+* The pair *(key_tail, dirty_tail)* delimit clean/dirty and live/dead ksets.
+
+Write-back
+----------
+
+Dirty keys are queued into a tree; a background worker copies data
+back to the backing_dev and advances *dirty_tail*. A FLUSH/FUA bio from the
+upper layers forces an immediate metadata commit.
+
+Garbage collection
+------------------
+
+GC starts when ``segs_used >= seg_total * gc_percent / 100``. It walks
+from *key_tail*, frees segments whose every key has been invalidated, and
+advances *key_tail*.
+
+CRC verification
+----------------
+
+If ``data_crc is enabled`` dm-pcache computes a CRC32 over every cached data
+range when it is inserted and stores it in the on-media key. Reads
+validate the CRC before copying to the caller.
+
+-------------------------------------------------------------------------------
+Failure handling
+================
+
+* *pmem media errors* – all metadata copies are read with
+ ``copy_mc_to_kernel``; an uncorrectable error logs and aborts initialisation.
+* *Cache full* – if no free segment can be found, writes return ``-EBUSY``;
+ dm-pcache retries internally (request deferral).
+* *System crash* – on attach, the driver replays ksets from *key_tail* to
+ rebuild the in-core trees; every segment’s generation guards against
+ use-after-free keys.
+
+-------------------------------------------------------------------------------
+Limitations & TODO
+==================
+
+* Only *write-back* mode; other modes planned.
+* Only FIFO cache invalidate; other (LRU, ARC...) planned.
+* Table reload is not supported currently.
+* Discard planned.
+
+-------------------------------------------------------------------------------
+Example workflow
+================
+
+.. code-block:: shell
+
+ # 1. Create devices
+ dmsetup create pcache_sdb --table \
+ "0 $(blockdev --getsz /dev/sdb) pcache /dev/pmem0 /dev/sdb 4 cache_mode writeback data_crc true"
+
+ # 2. Put a filesystem on top
+ mkfs.ext4 /dev/mapper/pcache_sdb
+ mount /dev/mapper/pcache_sdb /mnt
+
+ # 3. Tune GC threshold to 80 %
+ dmsetup message pcache_sdb 0 gc_percent 80
+
+ # 4. Observe status
+ watch -n1 'dmsetup status pcache_sdb'
+
+ # 5. Shutdown
+ umount /mnt
+ dmsetup remove pcache_sdb
+
+-------------------------------------------------------------------------------
+``dm-pcache`` is under active development; feedback, bug reports and patches
+are very welcome!
diff --git a/MAINTAINERS b/MAINTAINERS
index c0b444e5fd5a..b273e79befae 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6950,6 +6950,14 @@ S: Maintained
F: Documentation/admin-guide/device-mapper/vdo*.rst
F: drivers/md/dm-vdo/
+DEVICE-MAPPER PCACHE TARGET
+M: Dongsheng Yang <dongsheng.yang@linux.dev>
+M: Zheng Gu <cengku@gmail.com>
+L: dm-devel@lists.linux.dev
+S: Maintained
+F: Documentation/admin-guide/device-mapper/dm-pcache.rst
+F: drivers/md/dm-pcache/
+
DEVLINK
M: Jiri Pirko <jiri@resnulli.us>
L: netdev@vger.kernel.org
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index ddb37f6670de..cd4d8d1705bb 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -659,4 +659,6 @@ config DM_AUDIT
source "drivers/md/dm-vdo/Kconfig"
+source "drivers/md/dm-pcache/Kconfig"
+
endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 87bdfc9fe14c..f91a3133677f 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -71,6 +71,7 @@ obj-$(CONFIG_DM_RAID) += dm-raid.o
obj-$(CONFIG_DM_THIN_PROVISIONING) += dm-thin-pool.o
obj-$(CONFIG_DM_VERITY) += dm-verity.o
obj-$(CONFIG_DM_VDO) += dm-vdo/
+obj-$(CONFIG_DM_PCACHE) += dm-pcache/
obj-$(CONFIG_DM_CACHE) += dm-cache.o
obj-$(CONFIG_DM_CACHE_SMQ) += dm-cache-smq.o
obj-$(CONFIG_DM_EBS) += dm-ebs.o
diff --git a/drivers/md/dm-pcache/Kconfig b/drivers/md/dm-pcache/Kconfig
new file mode 100644
index 000000000000..0e251eca892e
--- /dev/null
+++ b/drivers/md/dm-pcache/Kconfig
@@ -0,0 +1,17 @@
+config DM_PCACHE
+ tristate "Persistent cache for Block Device (Experimental)"
+ depends on BLK_DEV_DM
+ depends on DEV_DAX
+ help
+ PCACHE provides a mechanism to use persistent memory (e.g., CXL persistent memory,
+ DAX-enabled devices) as a high-performance cache layer in front of
+ traditional block devices such as SSDs or HDDs.
+
+ PCACHE is implemented as a kernel module that integrates with the block
+ layer and supports direct access (DAX) to persistent memory for low-latency,
+ byte-addressable caching.
+
+ Note: This feature is experimental and should be tested thoroughly
+ before use in production environments.
+
+ If unsure, say 'N'.
diff --git a/drivers/md/dm-pcache/Makefile b/drivers/md/dm-pcache/Makefile
new file mode 100644
index 000000000000..86776e4acad2
--- /dev/null
+++ b/drivers/md/dm-pcache/Makefile
@@ -0,0 +1,3 @@
+dm-pcache-y := dm_pcache.o cache_dev.o segment.o backing_dev.o cache.o cache_gc.o cache_writeback.o cache_segment.o cache_key.o cache_req.o
+
+obj-m += dm-pcache.o
diff --git a/drivers/md/dm-pcache/backing_dev.c b/drivers/md/dm-pcache/backing_dev.c
new file mode 100644
index 000000000000..dcbebc0d7062
--- /dev/null
+++ b/drivers/md/dm-pcache/backing_dev.c
@@ -0,0 +1,374 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#include <linux/blkdev.h>
+
+#include "../dm-core.h"
+#include "pcache_internal.h"
+#include "cache_dev.h"
+#include "backing_dev.h"
+#include "cache.h"
+#include "dm_pcache.h"
+
+static struct kmem_cache *backing_req_cache;
+static struct kmem_cache *backing_bvec_cache;
+
+static void backing_dev_exit(struct pcache_backing_dev *backing_dev)
+{
+ mempool_exit(&backing_dev->req_pool);
+ mempool_exit(&backing_dev->bvec_pool);
+}
+
+static void req_submit_fn(struct work_struct *work);
+static void req_complete_fn(struct work_struct *work);
+static int backing_dev_init(struct dm_pcache *pcache)
+{
+ struct pcache_backing_dev *backing_dev = &pcache->backing_dev;
+ int ret;
+
+ ret = mempool_init_slab_pool(&backing_dev->req_pool, 128, backing_req_cache);
+ if (ret)
+ goto err;
+
+ ret = mempool_init_slab_pool(&backing_dev->bvec_pool, 128, backing_bvec_cache);
+ if (ret)
+ goto req_pool_exit;
+
+ INIT_LIST_HEAD(&backing_dev->submit_list);
+ INIT_LIST_HEAD(&backing_dev->complete_list);
+ spin_lock_init(&backing_dev->submit_lock);
+ spin_lock_init(&backing_dev->complete_lock);
+ INIT_WORK(&backing_dev->req_submit_work, req_submit_fn);
+ INIT_WORK(&backing_dev->req_complete_work, req_complete_fn);
+ atomic_set(&backing_dev->inflight_reqs, 0);
+ init_waitqueue_head(&backing_dev->inflight_wq);
+
+ return 0;
+
+req_pool_exit:
+ mempool_exit(&backing_dev->req_pool);
+err:
+ return ret;
+}
+
+int backing_dev_start(struct dm_pcache *pcache)
+{
+ struct pcache_backing_dev *backing_dev = &pcache->backing_dev;
+ int ret;
+
+ ret = backing_dev_init(pcache);
+ if (ret)
+ return ret;
+
+ backing_dev->dev_size = bdev_nr_sectors(backing_dev->dm_dev->bdev);
+
+ return 0;
+}
+
+void backing_dev_stop(struct dm_pcache *pcache)
+{
+ struct pcache_backing_dev *backing_dev = &pcache->backing_dev;
+
+ /*
+ * There should not be any new request comming, just wait
+ * inflight requests done.
+ */
+ wait_event(backing_dev->inflight_wq,
+ atomic_read(&backing_dev->inflight_reqs) == 0);
+
+ flush_work(&backing_dev->req_submit_work);
+ flush_work(&backing_dev->req_complete_work);
+
+ backing_dev_exit(backing_dev);
+}
+
+/* pcache_backing_dev_req functions */
+void backing_dev_req_end(struct pcache_backing_dev_req *backing_req)
+{
+ struct pcache_backing_dev *backing_dev = backing_req->backing_dev;
+
+ if (backing_req->end_req)
+ backing_req->end_req(backing_req, backing_req->ret);
+
+ switch (backing_req->type) {
+ case BACKING_DEV_REQ_TYPE_REQ:
+ if (backing_req->req.upper_req)
+ pcache_req_put(backing_req->req.upper_req, backing_req->ret);
+ break;
+ case BACKING_DEV_REQ_TYPE_KMEM:
+ if (backing_req->kmem.bvecs != backing_req->kmem.inline_bvecs)
+ mempool_free(backing_req->kmem.bvecs, &backing_dev->bvec_pool);
+ break;
+ default:
+ BUG();
+ }
+
+ mempool_free(backing_req, &backing_dev->req_pool);
+
+ if (atomic_dec_and_test(&backing_dev->inflight_reqs))
+ wake_up(&backing_dev->inflight_wq);
+}
+
+static void req_complete_fn(struct work_struct *work)
+{
+ struct pcache_backing_dev *backing_dev = container_of(work, struct pcache_backing_dev, req_complete_work);
+ struct pcache_backing_dev_req *backing_req;
+ LIST_HEAD(tmp_list);
+
+ spin_lock_irq(&backing_dev->complete_lock);
+ list_splice_init(&backing_dev->complete_list, &tmp_list);
+ spin_unlock_irq(&backing_dev->complete_lock);
+
+ while (!list_empty(&tmp_list)) {
+ backing_req = list_first_entry(&tmp_list,
+ struct pcache_backing_dev_req, node);
+ list_del_init(&backing_req->node);
+ backing_dev_req_end(backing_req);
+ }
+}
+
+static void backing_dev_bio_end(struct bio *bio)
+{
+ struct pcache_backing_dev_req *backing_req = bio->bi_private;
+ struct pcache_backing_dev *backing_dev = backing_req->backing_dev;
+ unsigned long flags;
+
+ backing_req->ret = bio->bi_status;
+
+ spin_lock_irqsave(&backing_dev->complete_lock, flags);
+ list_move_tail(&backing_req->node, &backing_dev->complete_list);
+ queue_work(BACKING_DEV_TO_PCACHE(backing_dev)->task_wq, &backing_dev->req_complete_work);
+ spin_unlock_irqrestore(&backing_dev->complete_lock, flags);
+}
+
+static void req_submit_fn(struct work_struct *work)
+{
+ struct pcache_backing_dev *backing_dev = container_of(work, struct pcache_backing_dev, req_submit_work);
+ struct pcache_backing_dev_req *backing_req;
+ LIST_HEAD(tmp_list);
+
+ spin_lock(&backing_dev->submit_lock);
+ list_splice_init(&backing_dev->submit_list, &tmp_list);
+ spin_unlock(&backing_dev->submit_lock);
+
+ while (!list_empty(&tmp_list)) {
+ backing_req = list_first_entry(&tmp_list,
+ struct pcache_backing_dev_req, node);
+ list_del_init(&backing_req->node);
+ submit_bio_noacct(&backing_req->bio);
+ }
+}
+
+void backing_dev_req_submit(struct pcache_backing_dev_req *backing_req, bool direct)
+{
+ struct pcache_backing_dev *backing_dev = backing_req->backing_dev;
+
+ if (direct) {
+ submit_bio_noacct(&backing_req->bio);
+ return;
+ }
+
+ spin_lock(&backing_dev->submit_lock);
+ list_add_tail(&backing_req->node, &backing_dev->submit_list);
+ queue_work(BACKING_DEV_TO_PCACHE(backing_dev)->task_wq, &backing_dev->req_submit_work);
+ spin_unlock(&backing_dev->submit_lock);
+}
+
+static void bio_map(struct bio *bio, void *base, size_t size)
+{
+ struct page *page;
+ unsigned int offset;
+ unsigned int len;
+
+ if (!is_vmalloc_addr(base)) {
+ page = virt_to_page(base);
+ offset = offset_in_page(base);
+
+ BUG_ON(!bio_add_page(bio, page, size, offset));
+ return;
+ }
+
+ flush_kernel_vmap_range(base, size);
+ while (size) {
+ page = vmalloc_to_page(base);
+ offset = offset_in_page(base);
+ len = min_t(size_t, PAGE_SIZE - offset, size);
+
+ BUG_ON(!bio_add_page(bio, page, len, offset));
+ size -= len;
+ base += len;
+ }
+}
+
+static struct pcache_backing_dev_req *req_type_req_alloc(struct pcache_backing_dev *backing_dev,
+ struct pcache_backing_dev_req_opts *opts)
+{
+ struct pcache_request *pcache_req = opts->req.upper_req;
+ struct pcache_backing_dev_req *backing_req;
+ struct bio *orig = pcache_req->bio;
+
+ backing_req = mempool_alloc(&backing_dev->req_pool, opts->gfp_mask);
+ if (!backing_req)
+ return NULL;
+
+ memset(backing_req, 0, sizeof(struct pcache_backing_dev_req));
+
+ bio_init_clone(backing_dev->dm_dev->bdev, &backing_req->bio, orig, opts->gfp_mask);
+
+ backing_req->type = BACKING_DEV_REQ_TYPE_REQ;
+ backing_req->backing_dev = backing_dev;
+ atomic_inc(&backing_dev->inflight_reqs);
+
+ return backing_req;
+}
+
+static struct pcache_backing_dev_req *kmem_type_req_alloc(struct pcache_backing_dev *backing_dev,
+ struct pcache_backing_dev_req_opts *opts)
+{
+ struct pcache_backing_dev_req *backing_req;
+ u32 n_vecs = bio_add_max_vecs(opts->kmem.data, opts->kmem.len);
+
+ backing_req = mempool_alloc(&backing_dev->req_pool, opts->gfp_mask);
+ if (!backing_req)
+ return NULL;
+
+ memset(backing_req, 0, sizeof(struct pcache_backing_dev_req));
+
+ if (n_vecs > BACKING_DEV_REQ_INLINE_BVECS) {
+ backing_req->kmem.bvecs = mempool_alloc(&backing_dev->bvec_pool, opts->gfp_mask);
+ if (!backing_req->kmem.bvecs)
+ goto free_backing_req;
+ } else {
+ backing_req->kmem.bvecs = backing_req->kmem.inline_bvecs;
+ }
+
+ backing_req->kmem.n_vecs = n_vecs;
+ backing_req->type = BACKING_DEV_REQ_TYPE_KMEM;
+ backing_req->backing_dev = backing_dev;
+ atomic_inc(&backing_dev->inflight_reqs);
+
+ return backing_req;
+
+free_backing_req:
+ mempool_free(backing_req, &backing_dev->req_pool);
+ return NULL;
+}
+
+struct pcache_backing_dev_req *backing_dev_req_alloc(struct pcache_backing_dev *backing_dev,
+ struct pcache_backing_dev_req_opts *opts)
+{
+ if (opts->type == BACKING_DEV_REQ_TYPE_REQ)
+ return req_type_req_alloc(backing_dev, opts);
+
+ if (opts->type == BACKING_DEV_REQ_TYPE_KMEM)
+ return kmem_type_req_alloc(backing_dev, opts);
+
+ BUG();
+}
+
+static void req_type_req_init(struct pcache_backing_dev_req *backing_req,
+ struct pcache_backing_dev_req_opts *opts)
+{
+ struct pcache_request *pcache_req = opts->req.upper_req;
+ struct bio *clone;
+ u32 off = opts->req.req_off;
+ u32 len = opts->req.len;
+
+ clone = &backing_req->bio;
+ BUG_ON(off & SECTOR_MASK);
+ BUG_ON(len & SECTOR_MASK);
+ bio_trim(clone, off >> SECTOR_SHIFT, len >> SECTOR_SHIFT);
+
+ clone->bi_iter.bi_sector = (pcache_req->off + off) >> SECTOR_SHIFT;
+ clone->bi_private = backing_req;
+ clone->bi_end_io = backing_dev_bio_end;
+
+ INIT_LIST_HEAD(&backing_req->node);
+ backing_req->end_req = opts->end_fn;
+
+ pcache_req_get(pcache_req);
+ backing_req->req.upper_req = pcache_req;
+ backing_req->req.bio_off = off;
+}
+
+static void kmem_type_req_init(struct pcache_backing_dev_req *backing_req,
+ struct pcache_backing_dev_req_opts *opts)
+{
+ struct pcache_backing_dev *backing_dev = backing_req->backing_dev;
+ struct bio *backing_bio;
+
+ bio_init(&backing_req->bio, backing_dev->dm_dev->bdev, backing_req->kmem.bvecs,
+ backing_req->kmem.n_vecs, opts->kmem.opf);
+
+ backing_bio = &backing_req->bio;
+ bio_map(backing_bio, opts->kmem.data, opts->kmem.len);
+
+ backing_bio->bi_iter.bi_sector = (opts->kmem.backing_off) >> SECTOR_SHIFT;
+ backing_bio->bi_private = backing_req;
+ backing_bio->bi_end_io = backing_dev_bio_end;
+
+ INIT_LIST_HEAD(&backing_req->node);
+ backing_req->end_req = opts->end_fn;
+ backing_req->priv_data = opts->priv_data;
+}
+
+void backing_dev_req_init(struct pcache_backing_dev_req *backing_req,
+ struct pcache_backing_dev_req_opts *opts)
+{
+ if (opts->type == BACKING_DEV_REQ_TYPE_REQ)
+ return req_type_req_init(backing_req, opts);
+
+ if (opts->type == BACKING_DEV_REQ_TYPE_KMEM)
+ return kmem_type_req_init(backing_req, opts);
+
+ BUG();
+}
+
+struct pcache_backing_dev_req *backing_dev_req_create(struct pcache_backing_dev *backing_dev,
+ struct pcache_backing_dev_req_opts *opts)
+{
+ struct pcache_backing_dev_req *backing_req;
+
+ backing_req = backing_dev_req_alloc(backing_dev, opts);
+ if (!backing_req)
+ return NULL;
+
+ backing_dev_req_init(backing_req, opts);
+
+ return backing_req;
+}
+
+void backing_dev_flush(struct pcache_backing_dev *backing_dev)
+{
+ blkdev_issue_flush(backing_dev->dm_dev->bdev);
+}
+
+int pcache_backing_init(void)
+{
+ u32 max_bvecs = (PCACHE_CACHE_SUBTREE_SIZE >> PAGE_SHIFT) + 1;
+ int ret;
+
+ backing_req_cache = KMEM_CACHE(pcache_backing_dev_req, 0);
+ if (!backing_req_cache) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ backing_bvec_cache = kmem_cache_create("pcache-bvec-slab",
+ max_bvecs * sizeof(struct bio_vec),
+ 0, 0, NULL);
+ if (!backing_bvec_cache) {
+ ret = -ENOMEM;
+ goto destroy_req_cache;
+ }
+
+ return 0;
+destroy_req_cache:
+ kmem_cache_destroy(backing_req_cache);
+err:
+ return ret;
+}
+
+void pcache_backing_exit(void)
+{
+ kmem_cache_destroy(backing_bvec_cache);
+ kmem_cache_destroy(backing_req_cache);
+}
diff --git a/drivers/md/dm-pcache/backing_dev.h b/drivers/md/dm-pcache/backing_dev.h
new file mode 100644
index 000000000000..fbcce983f0eb
--- /dev/null
+++ b/drivers/md/dm-pcache/backing_dev.h
@@ -0,0 +1,100 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _BACKING_DEV_H
+#define _BACKING_DEV_H
+
+#include <linux/device-mapper.h>
+
+#include "pcache_internal.h"
+
+struct pcache_backing_dev_req;
+typedef void (*backing_req_end_fn_t)(struct pcache_backing_dev_req *backing_req, int ret);
+
+#define BACKING_DEV_REQ_TYPE_REQ 1
+#define BACKING_DEV_REQ_TYPE_KMEM 2
+
+#define BACKING_DEV_REQ_INLINE_BVECS 4
+
+struct pcache_request;
+struct pcache_backing_dev_req {
+ u8 type;
+ struct bio bio;
+ struct pcache_backing_dev *backing_dev;
+
+ void *priv_data;
+ backing_req_end_fn_t end_req;
+
+ struct list_head node;
+ int ret;
+
+ union {
+ struct {
+ struct pcache_request *upper_req;
+ u32 bio_off;
+ } req;
+ struct {
+ struct bio_vec inline_bvecs[BACKING_DEV_REQ_INLINE_BVECS];
+ struct bio_vec *bvecs;
+ u32 n_vecs;
+ } kmem;
+ };
+};
+
+struct pcache_backing_dev {
+ struct pcache_cache *cache;
+
+ struct dm_dev *dm_dev;
+ mempool_t req_pool;
+ mempool_t bvec_pool;
+
+ struct list_head submit_list;
+ spinlock_t submit_lock;
+ struct work_struct req_submit_work;
+
+ struct list_head complete_list;
+ spinlock_t complete_lock;
+ struct work_struct req_complete_work;
+
+ atomic_t inflight_reqs;
+ wait_queue_head_t inflight_wq;
+
+ u64 dev_size;
+};
+
+struct dm_pcache;
+int backing_dev_start(struct dm_pcache *pcache);
+void backing_dev_stop(struct dm_pcache *pcache);
+
+struct pcache_backing_dev_req_opts {
+ u32 type;
+ union {
+ struct {
+ struct pcache_request *upper_req;
+ u32 req_off;
+ u32 len;
+ } req;
+ struct {
+ void *data;
+ blk_opf_t opf;
+ u32 len;
+ u64 backing_off;
+ } kmem;
+ };
+
+ gfp_t gfp_mask;
+ backing_req_end_fn_t end_fn;
+ void *priv_data;
+};
+
+void backing_dev_req_submit(struct pcache_backing_dev_req *backing_req, bool direct);
+void backing_dev_req_end(struct pcache_backing_dev_req *backing_req);
+struct pcache_backing_dev_req *backing_dev_req_create(struct pcache_backing_dev *backing_dev,
+ struct pcache_backing_dev_req_opts *opts);
+struct pcache_backing_dev_req *backing_dev_req_alloc(struct pcache_backing_dev *backing_dev,
+ struct pcache_backing_dev_req_opts *opts);
+void backing_dev_req_init(struct pcache_backing_dev_req *backing_req,
+ struct pcache_backing_dev_req_opts *opts);
+void backing_dev_flush(struct pcache_backing_dev *backing_dev);
+
+int pcache_backing_init(void);
+void pcache_backing_exit(void);
+#endif /* _BACKING_DEV_H */
diff --git a/drivers/md/dm-pcache/cache.c b/drivers/md/dm-pcache/cache.c
new file mode 100644
index 000000000000..d8e92367d947
--- /dev/null
+++ b/drivers/md/dm-pcache/cache.c
@@ -0,0 +1,445 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#include <linux/blk_types.h>
+
+#include "cache.h"
+#include "cache_dev.h"
+#include "backing_dev.h"
+#include "dm_pcache.h"
+
+struct kmem_cache *key_cache;
+
+static inline struct pcache_cache_info *get_cache_info_addr(struct pcache_cache *cache)
+{
+ return cache->cache_info_addr + cache->info_index;
+}
+
+static void cache_info_write(struct pcache_cache *cache)
+{
+ struct pcache_cache_info *cache_info = &cache->cache_info;
+
+ cache_info->header.seq++;
+ cache_info->header.crc = pcache_meta_crc(&cache_info->header,
+ sizeof(struct pcache_cache_info));
+
+ memcpy_flushcache(get_cache_info_addr(cache), cache_info,
+ sizeof(struct pcache_cache_info));
+
+ cache->info_index = (cache->info_index + 1) % PCACHE_META_INDEX_MAX;
+}
+
+static void cache_info_init_default(struct pcache_cache *cache);
+static int cache_info_init(struct pcache_cache *cache, struct pcache_cache_options *opts)
+{
+ struct dm_pcache *pcache = CACHE_TO_PCACHE(cache);
+ struct pcache_cache_info *cache_info_addr;
+
+ cache_info_addr = pcache_meta_find_latest(&cache->cache_info_addr->header,
+ sizeof(struct pcache_cache_info),
+ PCACHE_CACHE_INFO_SIZE,
+ &cache->cache_info);
+ if (IS_ERR(cache_info_addr))
+ return PTR_ERR(cache_info_addr);
+
+ if (cache_info_addr) {
+ if (opts->data_crc !=
+ (cache->cache_info.flags & PCACHE_CACHE_FLAGS_DATA_CRC)) {
+ pcache_dev_err(pcache, "invalid option for data_crc: %s, expected: %s",
+ opts->data_crc ? "true" : "false",
+ cache->cache_info.flags & PCACHE_CACHE_FLAGS_DATA_CRC ? "true" : "false");
+ return -EINVAL;
+ }
+
+ return 0;
+ }
+
+ /* init cache_info for new cache */
+ cache_info_init_default(cache);
+ cache_mode_set(cache, opts->cache_mode);
+ if (opts->data_crc)
+ cache->cache_info.flags |= PCACHE_CACHE_FLAGS_DATA_CRC;
+
+ return 0;
+}
+
+static void cache_info_set_gc_percent(struct pcache_cache_info *cache_info, u8 percent)
+{
+ cache_info->flags &= ~PCACHE_CACHE_FLAGS_GC_PERCENT_MASK;
+ cache_info->flags |= FIELD_PREP(PCACHE_CACHE_FLAGS_GC_PERCENT_MASK, percent);
+}
+
+int pcache_cache_set_gc_percent(struct pcache_cache *cache, u8 percent)
+{
+ if (percent > PCACHE_CACHE_GC_PERCENT_MAX || percent < PCACHE_CACHE_GC_PERCENT_MIN)
+ return -EINVAL;
+
+ mutex_lock(&cache->cache_info_lock);
+ cache_info_set_gc_percent(&cache->cache_info, percent);
+
+ cache_info_write(cache);
+ mutex_unlock(&cache->cache_info_lock);
+
+ return 0;
+}
+
+void cache_pos_encode(struct pcache_cache *cache,
+ struct pcache_cache_pos_onmedia *pos_onmedia_base,
+ struct pcache_cache_pos *pos, u64 seq, u32 *index)
+{
+ struct pcache_cache_pos_onmedia pos_onmedia;
+ struct pcache_cache_pos_onmedia *pos_onmedia_addr = pos_onmedia_base + *index;
+
+ pos_onmedia.cache_seg_id = pos->cache_seg->cache_seg_id;
+ pos_onmedia.seg_off = pos->seg_off;
+ pos_onmedia.header.seq = seq;
+ pos_onmedia.header.crc = cache_pos_onmedia_crc(&pos_onmedia);
+
+ memcpy_flushcache(pos_onmedia_addr, &pos_onmedia, sizeof(struct pcache_cache_pos_onmedia));
+ pmem_wmb();
+
+ *index = (*index + 1) % PCACHE_META_INDEX_MAX;
+}
+
+int cache_pos_decode(struct pcache_cache *cache,
+ struct pcache_cache_pos_onmedia *pos_onmedia,
+ struct pcache_cache_pos *pos, u64 *seq, u32 *index)
+{
+ struct pcache_cache_pos_onmedia latest, *latest_addr;
+
+ latest_addr = pcache_meta_find_latest(&pos_onmedia->header,
+ sizeof(struct pcache_cache_pos_onmedia),
+ sizeof(struct pcache_cache_pos_onmedia),
+ &latest);
+ if (IS_ERR(latest_addr))
+ return PTR_ERR(latest_addr);
+
+ if (!latest_addr)
+ return -EIO;
+
+ pos->cache_seg = &cache->segments[latest.cache_seg_id];
+ pos->seg_off = latest.seg_off;
+ *seq = latest.header.seq;
+ *index = (latest_addr - pos_onmedia);
+
+ return 0;
+}
+
+static inline void cache_info_set_seg_id(struct pcache_cache *cache, u32 seg_id)
+{
+ cache->cache_info.seg_id = seg_id;
+}
+
+static int cache_init(struct dm_pcache *pcache)
+{
+ struct pcache_cache *cache = &pcache->cache;
+ struct pcache_backing_dev *backing_dev = &pcache->backing_dev;
+ struct pcache_cache_dev *cache_dev = &pcache->cache_dev;
+ int ret;
+
+ cache->segments = kvcalloc(cache_dev->seg_num, sizeof(struct pcache_cache_segment), GFP_KERNEL);
+ if (!cache->segments) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ cache->seg_map = kvcalloc(BITS_TO_LONGS(cache_dev->seg_num), sizeof(unsigned long), GFP_KERNEL);
+ if (!cache->seg_map) {
+ ret = -ENOMEM;
+ goto free_segments;
+ }
+
+ cache->backing_dev = backing_dev;
+ cache->cache_dev = &pcache->cache_dev;
+ cache->n_segs = cache_dev->seg_num;
+ atomic_set(&cache->gc_errors, 0);
+ spin_lock_init(&cache->seg_map_lock);
+ spin_lock_init(&cache->key_head_lock);
+
+ mutex_init(&cache->cache_info_lock);
+ mutex_init(&cache->key_tail_lock);
+ mutex_init(&cache->dirty_tail_lock);
+ mutex_init(&cache->writeback_lock);
+
+ INIT_DELAYED_WORK(&cache->writeback_work, cache_writeback_fn);
+ INIT_DELAYED_WORK(&cache->gc_work, pcache_cache_gc_fn);
+ INIT_WORK(&cache->clean_work, clean_fn);
+
+ return 0;
+
+free_segments:
+ kvfree(cache->segments);
+err:
+ return ret;
+}
+
+static void cache_exit(struct pcache_cache *cache)
+{
+ kvfree(cache->seg_map);
+ kvfree(cache->segments);
+}
+
+static void cache_info_init_default(struct pcache_cache *cache)
+{
+ struct pcache_cache_info *cache_info = &cache->cache_info;
+
+ cache_info->header.seq = 0;
+ cache_info->n_segs = cache->cache_dev->seg_num;
+ cache_info_set_gc_percent(cache_info, PCACHE_CACHE_GC_PERCENT_DEFAULT);
+}
+
+static int cache_tail_init(struct pcache_cache *cache)
+{
+ struct dm_pcache *pcache = CACHE_TO_PCACHE(cache);
+ bool new_cache = !(cache->cache_info.flags & PCACHE_CACHE_FLAGS_INIT_DONE);
+
+ if (new_cache) {
+ __set_bit(0, cache->seg_map);
+
+ cache->key_head.cache_seg = &cache->segments[0];
+ cache->key_head.seg_off = 0;
+ cache_pos_copy(&cache->key_tail, &cache->key_head);
+ cache_pos_copy(&cache->dirty_tail, &cache->key_head);
+
+ cache_encode_dirty_tail(cache);
+ cache_encode_key_tail(cache);
+ } else {
+ if (cache_decode_key_tail(cache) || cache_decode_dirty_tail(cache)) {
+ pcache_dev_err(pcache, "Corrupted key tail or dirty tail.\n");
+ return -EIO;
+ }
+ }
+
+ return 0;
+}
+
+static int get_seg_id(struct pcache_cache *cache,
+ struct pcache_cache_segment *prev_cache_seg,
+ bool new_cache, u32 *seg_id)
+{
+ struct dm_pcache *pcache = CACHE_TO_PCACHE(cache);
+ struct pcache_cache_dev *cache_dev = cache->cache_dev;
+ int ret;
+
+ if (new_cache) {
+ ret = cache_dev_get_empty_segment_id(cache_dev, seg_id);
+ if (ret) {
+ pcache_dev_err(pcache, "no available segment\n");
+ goto err;
+ }
+
+ if (prev_cache_seg)
+ cache_seg_set_next_seg(prev_cache_seg, *seg_id);
+ else
+ cache_info_set_seg_id(cache, *seg_id);
+ } else {
+ if (prev_cache_seg) {
+ struct pcache_segment_info *prev_seg_info;
+
+ prev_seg_info = &prev_cache_seg->cache_seg_info;
+ if (!segment_info_has_next(prev_seg_info)) {
+ ret = -EFAULT;
+ goto err;
+ }
+ *seg_id = prev_cache_seg->cache_seg_info.next_seg;
+ } else {
+ *seg_id = cache->cache_info.seg_id;
+ }
+ }
+ return 0;
+err:
+ return ret;
+}
+
+static int cache_segs_init(struct pcache_cache *cache)
+{
+ struct pcache_cache_segment *prev_cache_seg = NULL;
+ struct pcache_cache_info *cache_info = &cache->cache_info;
+ bool new_cache = !(cache->cache_info.flags & PCACHE_CACHE_FLAGS_INIT_DONE);
+ u32 seg_id;
+ int ret;
+ u32 i;
+
+ for (i = 0; i < cache_info->n_segs; i++) {
+ ret = get_seg_id(cache, prev_cache_seg, new_cache, &seg_id);
+ if (ret)
+ goto err;
+
+ ret = cache_seg_init(cache, seg_id, i, new_cache);
+ if (ret)
+ goto err;
+
+ prev_cache_seg = &cache->segments[i];
+ }
+ return 0;
+err:
+ return ret;
+}
+
+static int cache_init_req_keys(struct pcache_cache *cache, u32 n_paral)
+{
+ struct dm_pcache *pcache = CACHE_TO_PCACHE(cache);
+ u32 n_subtrees;
+ int ret;
+ u32 i, cpu;
+
+ /* Calculate number of cache trees based on the device size */
+ n_subtrees = DIV_ROUND_UP(cache->dev_size << SECTOR_SHIFT, PCACHE_CACHE_SUBTREE_SIZE);
+ ret = cache_tree_init(cache, &cache->req_key_tree, n_subtrees);
+ if (ret)
+ goto err;
+
+ cache->n_ksets = n_paral;
+ cache->ksets = kvcalloc(cache->n_ksets, PCACHE_KSET_SIZE, GFP_KERNEL);
+ if (!cache->ksets) {
+ ret = -ENOMEM;
+ goto req_tree_exit;
+ }
+
+ /*
+ * Initialize each kset with a spinlock and delayed work for flushing.
+ * Each kset is associated with one queue to ensure independent handling
+ * of cache keys across multiple queues, maximizing multiqueue concurrency.
+ */
+ for (i = 0; i < cache->n_ksets; i++) {
+ struct pcache_cache_kset *kset = get_kset(cache, i);
+
+ kset->cache = cache;
+ spin_lock_init(&kset->kset_lock);
+ INIT_DELAYED_WORK(&kset->flush_work, kset_flush_fn);
+ }
+
+ cache->data_heads = alloc_percpu(struct pcache_cache_data_head);
+ if (!cache->data_heads) {
+ ret = -ENOMEM;
+ goto free_kset;
+ }
+
+ for_each_possible_cpu(cpu) {
+ struct pcache_cache_data_head *h =
+ per_cpu_ptr(cache->data_heads, cpu);
+ h->head_pos.cache_seg = NULL;
+ }
+
+ /*
+ * Replay persisted cache keys using cache_replay.
+ * This function loads and replays cache keys from previously stored
+ * ksets, allowing the cache to restore its state after a restart.
+ */
+ ret = cache_replay(cache);
+ if (ret) {
+ pcache_dev_err(pcache, "failed to replay keys\n");
+ goto free_heads;
+ }
+
+ return 0;
+
+free_heads:
+ free_percpu(cache->data_heads);
+free_kset:
+ kvfree(cache->ksets);
+req_tree_exit:
+ cache_tree_exit(&cache->req_key_tree);
+err:
+ return ret;
+}
+
+static void cache_destroy_req_keys(struct pcache_cache *cache)
+{
+ u32 i;
+
+ for (i = 0; i < cache->n_ksets; i++) {
+ struct pcache_cache_kset *kset = get_kset(cache, i);
+
+ cancel_delayed_work_sync(&kset->flush_work);
+ }
+
+ free_percpu(cache->data_heads);
+ kvfree(cache->ksets);
+ cache_tree_exit(&cache->req_key_tree);
+}
+
+int pcache_cache_start(struct dm_pcache *pcache)
+{
+ struct pcache_backing_dev *backing_dev = &pcache->backing_dev;
+ struct pcache_cache *cache = &pcache->cache;
+ struct pcache_cache_options *opts = &pcache->opts;
+ int ret;
+
+ ret = cache_init(pcache);
+ if (ret)
+ return ret;
+
+ cache->cache_info_addr = CACHE_DEV_CACHE_INFO(cache->cache_dev);
+ cache->cache_ctrl = CACHE_DEV_CACHE_CTRL(cache->cache_dev);
+ backing_dev->cache = cache;
+ cache->dev_size = backing_dev->dev_size;
+
+ ret = cache_info_init(cache, opts);
+ if (ret)
+ goto cache_exit;
+
+ ret = cache_segs_init(cache);
+ if (ret)
+ goto cache_exit;
+
+ ret = cache_tail_init(cache);
+ if (ret)
+ goto cache_exit;
+
+ ret = cache_init_req_keys(cache, num_online_cpus());
+ if (ret)
+ goto cache_exit;
+
+ ret = cache_writeback_init(cache);
+ if (ret)
+ goto destroy_keys;
+
+ cache->cache_info.flags |= PCACHE_CACHE_FLAGS_INIT_DONE;
+ cache_info_write(cache);
+ queue_delayed_work(cache_get_wq(cache), &cache->gc_work, 0);
+
+ return 0;
+
+destroy_keys:
+ cache_destroy_req_keys(cache);
+cache_exit:
+ cache_exit(cache);
+
+ return ret;
+}
+
+void pcache_cache_stop(struct dm_pcache *pcache)
+{
+ struct pcache_cache *cache = &pcache->cache;
+
+ cache_flush(cache);
+
+ cancel_delayed_work_sync(&cache->gc_work);
+ flush_work(&cache->clean_work);
+ cache_writeback_exit(cache);
+
+ if (cache->req_key_tree.n_subtrees)
+ cache_destroy_req_keys(cache);
+
+ cache_exit(cache);
+}
+
+struct workqueue_struct *cache_get_wq(struct pcache_cache *cache)
+{
+ struct dm_pcache *pcache = CACHE_TO_PCACHE(cache);
+
+ return pcache->task_wq;
+}
+
+int pcache_cache_init(void)
+{
+ key_cache = KMEM_CACHE(pcache_cache_key, 0);
+ if (!key_cache)
+ return -ENOMEM;
+
+ return 0;
+}
+
+void pcache_cache_exit(void)
+{
+ kmem_cache_destroy(key_cache);
+}
diff --git a/drivers/md/dm-pcache/cache.h b/drivers/md/dm-pcache/cache.h
new file mode 100644
index 000000000000..b3b361cc406e
--- /dev/null
+++ b/drivers/md/dm-pcache/cache.h
@@ -0,0 +1,632 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _PCACHE_CACHE_H
+#define _PCACHE_CACHE_H
+
+#include "segment.h"
+
+/* Garbage collection thresholds */
+#define PCACHE_CACHE_GC_PERCENT_MIN 0 /* Minimum GC percentage */
+#define PCACHE_CACHE_GC_PERCENT_MAX 90 /* Maximum GC percentage */
+#define PCACHE_CACHE_GC_PERCENT_DEFAULT 70 /* Default GC percentage */
+
+#define PCACHE_CACHE_SUBTREE_SIZE (4 * PCACHE_MB) /* 4MB total tree size */
+#define PCACHE_CACHE_SUBTREE_SIZE_MASK 0x3FFFFF /* Mask for tree size */
+#define PCACHE_CACHE_SUBTREE_SIZE_SHIFT 22 /* Bit shift for tree size */
+
+/* Maximum number of keys per key set */
+#define PCACHE_KSET_KEYS_MAX 128
+#define PCACHE_CACHE_SEGS_MAX (1024 * 1024) /* maximum cache size for each device is 16T */
+#define PCACHE_KSET_ONMEDIA_SIZE_MAX struct_size_t(struct pcache_cache_kset_onmedia, data, PCACHE_KSET_KEYS_MAX)
+#define PCACHE_KSET_SIZE (sizeof(struct pcache_cache_kset) + sizeof(struct pcache_cache_key_onmedia) * PCACHE_KSET_KEYS_MAX)
+
+/* Maximum number of keys to clean in one round of clean_work */
+#define PCACHE_CLEAN_KEYS_MAX 10
+
+/* Writeback and garbage collection intervals in jiffies */
+#define PCACHE_CACHE_WRITEBACK_INTERVAL (5 * HZ)
+#define PCACHE_CACHE_GC_INTERVAL (5 * HZ)
+
+/* Macro to get the cache key structure from an rb_node pointer */
+#define CACHE_KEY(node) (container_of(node, struct pcache_cache_key, rb_node))
+
+struct pcache_cache_pos_onmedia {
+ struct pcache_meta_header header;
+ __u32 cache_seg_id;
+ __u32 seg_off;
+};
+
+/* Offset and size definitions for cache segment control */
+#define PCACHE_CACHE_SEG_CTRL_OFF (PCACHE_SEG_INFO_SIZE * PCACHE_META_INDEX_MAX)
+#define PCACHE_CACHE_SEG_CTRL_SIZE (4 * PCACHE_KB)
+
+struct pcache_cache_seg_gen {
+ struct pcache_meta_header header;
+ __u64 gen;
+};
+
+/* Control structure for cache segments */
+struct pcache_cache_seg_ctrl {
+ struct pcache_cache_seg_gen gen[PCACHE_META_INDEX_MAX];
+ __u64 res[64];
+};
+
+#define PCACHE_CACHE_FLAGS_DATA_CRC BIT(0)
+#define PCACHE_CACHE_FLAGS_INIT_DONE BIT(1)
+
+#define PCACHE_CACHE_FLAGS_CACHE_MODE_MASK GENMASK(5, 2)
+#define PCACHE_CACHE_MODE_WRITEBACK 0
+#define PCACHE_CACHE_MODE_WRITETHROUGH 1
+#define PCACHE_CACHE_MODE_WRITEAROUND 2
+#define PCACHE_CACHE_MODE_WRITEONLY 3
+
+#define PCACHE_CACHE_FLAGS_GC_PERCENT_MASK GENMASK(12, 6)
+
+struct pcache_cache_info {
+ struct pcache_meta_header header;
+ __u32 seg_id;
+ __u32 n_segs;
+ __u32 flags;
+ __u32 reserved;
+};
+
+struct pcache_cache_pos {
+ struct pcache_cache_segment *cache_seg;
+ u32 seg_off;
+};
+
+struct pcache_cache_segment {
+ struct pcache_cache *cache;
+ u32 cache_seg_id; /* Index in cache->segments */
+ struct pcache_segment segment;
+ atomic_t refs;
+
+ struct pcache_segment_info cache_seg_info;
+ struct mutex info_lock;
+ u32 info_index;
+
+ spinlock_t gen_lock;
+ u64 gen;
+ u64 gen_seq;
+ u32 gen_index;
+
+ struct pcache_cache_seg_ctrl *cache_seg_ctrl;
+ struct mutex ctrl_lock;
+};
+
+/* rbtree for cache entries */
+struct pcache_cache_subtree {
+ struct rb_root root;
+ spinlock_t tree_lock;
+};
+
+struct pcache_cache_tree {
+ struct pcache_cache *cache;
+ u32 n_subtrees;
+ mempool_t key_pool;
+ struct pcache_cache_subtree *subtrees;
+};
+
+extern struct kmem_cache *key_cache;
+
+struct pcache_cache_key {
+ struct pcache_cache_tree *cache_tree;
+ struct pcache_cache_subtree *cache_subtree;
+ struct kref ref;
+ struct rb_node rb_node;
+ struct list_head list_node;
+ u64 off;
+ u32 len;
+ u32 flags;
+ struct pcache_cache_pos cache_pos;
+ u64 seg_gen;
+};
+
+#define PCACHE_CACHE_KEY_FLAGS_EMPTY BIT(0)
+#define PCACHE_CACHE_KEY_FLAGS_CLEAN BIT(1)
+
+struct pcache_cache_key_onmedia {
+ __u64 off;
+ __u32 len;
+ __u32 flags;
+ __u32 cache_seg_id;
+ __u32 cache_seg_off;
+ __u64 seg_gen;
+ __u32 data_crc;
+ __u32 reserved;
+};
+
+struct pcache_cache_kset_onmedia {
+ __u32 crc;
+ union {
+ __u32 key_num;
+ __u32 next_cache_seg_id;
+ };
+ __u64 magic;
+ __u64 flags;
+ struct pcache_cache_key_onmedia data[];
+};
+
+struct pcache_cache {
+ struct pcache_backing_dev *backing_dev;
+ struct pcache_cache_dev *cache_dev;
+ struct pcache_cache_ctrl *cache_ctrl;
+ u64 dev_size;
+
+ struct pcache_cache_data_head __percpu *data_heads;
+
+ spinlock_t key_head_lock;
+ struct pcache_cache_pos key_head;
+ u32 n_ksets;
+ struct pcache_cache_kset *ksets;
+
+ struct mutex key_tail_lock;
+ struct pcache_cache_pos key_tail;
+ u64 key_tail_seq;
+ u32 key_tail_index;
+
+ struct mutex dirty_tail_lock;
+ struct pcache_cache_pos dirty_tail;
+ u64 dirty_tail_seq;
+ u32 dirty_tail_index;
+
+ struct pcache_cache_tree req_key_tree;
+ struct work_struct clean_work;
+
+ struct mutex writeback_lock;
+ char wb_kset_onmedia_buf[PCACHE_KSET_ONMEDIA_SIZE_MAX];
+ struct pcache_cache_tree writeback_key_tree;
+ struct delayed_work writeback_work;
+ struct {
+ atomic_t pending;
+ u32 advance;
+ int ret;
+ } writeback_ctx;
+
+ char gc_kset_onmedia_buf[PCACHE_KSET_ONMEDIA_SIZE_MAX];
+ struct delayed_work gc_work;
+ atomic_t gc_errors;
+
+ struct mutex cache_info_lock;
+ struct pcache_cache_info cache_info;
+ struct pcache_cache_info *cache_info_addr;
+ u32 info_index;
+
+ u32 n_segs;
+ unsigned long *seg_map;
+ u32 last_cache_seg;
+ bool cache_full;
+ spinlock_t seg_map_lock;
+ struct pcache_cache_segment *segments;
+};
+
+struct workqueue_struct *cache_get_wq(struct pcache_cache *cache);
+
+struct dm_pcache;
+struct pcache_cache_options {
+ u32 cache_mode:4;
+ u32 data_crc:1;
+};
+int pcache_cache_start(struct dm_pcache *pcache);
+void pcache_cache_stop(struct dm_pcache *pcache);
+
+struct pcache_cache_ctrl {
+ /* Updated by gc_thread */
+ struct pcache_cache_pos_onmedia key_tail_pos[PCACHE_META_INDEX_MAX];
+
+ /* Updated by writeback_thread */
+ struct pcache_cache_pos_onmedia dirty_tail_pos[PCACHE_META_INDEX_MAX];
+};
+
+struct pcache_cache_data_head {
+ struct pcache_cache_pos head_pos;
+};
+
+static inline u16 pcache_cache_get_gc_percent(struct pcache_cache *cache)
+{
+ return FIELD_GET(PCACHE_CACHE_FLAGS_GC_PERCENT_MASK, cache->cache_info.flags);
+}
+
+int pcache_cache_set_gc_percent(struct pcache_cache *cache, u8 percent);
+
+/* cache key */
+struct pcache_cache_key *cache_key_alloc(struct pcache_cache_tree *cache_tree, gfp_t gfp_mask);
+void cache_key_init(struct pcache_cache_tree *cache_tree, struct pcache_cache_key *key);
+void cache_key_get(struct pcache_cache_key *key);
+void cache_key_put(struct pcache_cache_key *key);
+int cache_key_append(struct pcache_cache *cache, struct pcache_cache_key *key, bool force_close);
+void cache_key_insert(struct pcache_cache_tree *cache_tree, struct pcache_cache_key *key, bool fixup);
+int cache_key_decode(struct pcache_cache *cache,
+ struct pcache_cache_key_onmedia *key_onmedia,
+ struct pcache_cache_key *key);
+void cache_pos_advance(struct pcache_cache_pos *pos, u32 len);
+
+#define PCACHE_KSET_FLAGS_LAST BIT(0)
+#define PCACHE_KSET_MAGIC 0x676894a64e164f1aULL
+
+struct pcache_cache_kset {
+ struct pcache_cache *cache;
+ spinlock_t kset_lock;
+ struct delayed_work flush_work;
+ struct pcache_cache_kset_onmedia kset_onmedia;
+};
+
+extern struct pcache_cache_kset_onmedia pcache_empty_kset;
+
+#define SUBTREE_WALK_RET_OK 0
+#define SUBTREE_WALK_RET_ERR 1
+#define SUBTREE_WALK_RET_NEED_KEY 2
+#define SUBTREE_WALK_RET_NEED_REQ 3
+#define SUBTREE_WALK_RET_RESEARCH 4
+
+struct pcache_cache_subtree_walk_ctx {
+ struct pcache_cache_tree *cache_tree;
+ struct rb_node *start_node;
+ struct pcache_request *pcache_req;
+ struct pcache_cache_key *key;
+ u32 req_done;
+ int ret;
+
+ /* pre-allocated key and backing_dev_req */
+ struct pcache_cache_key *pre_alloc_key;
+ struct pcache_backing_dev_req *pre_alloc_req;
+
+ struct list_head *delete_key_list;
+ struct list_head *submit_req_list;
+
+ /*
+ * |--------| key_tmp
+ * |====| key
+ */
+ int (*before)(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp,
+ struct pcache_cache_subtree_walk_ctx *ctx);
+
+ /*
+ * |----------| key_tmp
+ * |=====| key
+ */
+ int (*after)(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp,
+ struct pcache_cache_subtree_walk_ctx *ctx);
+
+ /*
+ * |----------------| key_tmp
+ * |===========| key
+ */
+ int (*overlap_tail)(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp,
+ struct pcache_cache_subtree_walk_ctx *ctx);
+
+ /*
+ * |--------| key_tmp
+ * |==========| key
+ */
+ int (*overlap_head)(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp,
+ struct pcache_cache_subtree_walk_ctx *ctx);
+
+ /*
+ * |----| key_tmp
+ * |==========| key
+ */
+ int (*overlap_contain)(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp,
+ struct pcache_cache_subtree_walk_ctx *ctx);
+
+ /*
+ * |-----------| key_tmp
+ * |====| key
+ */
+ int (*overlap_contained)(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp,
+ struct pcache_cache_subtree_walk_ctx *ctx);
+
+ int (*walk_finally)(struct pcache_cache_subtree_walk_ctx *ctx, int ret);
+ bool (*walk_done)(struct pcache_cache_subtree_walk_ctx *ctx);
+};
+
+int cache_subtree_walk(struct pcache_cache_subtree_walk_ctx *ctx);
+struct rb_node *cache_subtree_search(struct pcache_cache_subtree *cache_subtree, struct pcache_cache_key *key,
+ struct rb_node **parentp, struct rb_node ***newp,
+ struct list_head *delete_key_list);
+int cache_kset_close(struct pcache_cache *cache, struct pcache_cache_kset *kset);
+void clean_fn(struct work_struct *work);
+void kset_flush_fn(struct work_struct *work);
+int cache_replay(struct pcache_cache *cache);
+int cache_tree_init(struct pcache_cache *cache, struct pcache_cache_tree *cache_tree, u32 n_subtrees);
+void cache_tree_clear(struct pcache_cache_tree *cache_tree);
+void cache_tree_exit(struct pcache_cache_tree *cache_tree);
+
+/* cache segments */
+struct pcache_cache_segment *get_cache_segment(struct pcache_cache *cache);
+int cache_seg_init(struct pcache_cache *cache, u32 seg_id, u32 cache_seg_id,
+ bool new_cache);
+void cache_seg_get(struct pcache_cache_segment *cache_seg);
+void cache_seg_put(struct pcache_cache_segment *cache_seg);
+void cache_seg_set_next_seg(struct pcache_cache_segment *cache_seg, u32 seg_id);
+
+/* cache request*/
+int cache_flush(struct pcache_cache *cache);
+void miss_read_end_work_fn(struct work_struct *work);
+int pcache_cache_handle_req(struct pcache_cache *cache, struct pcache_request *pcache_req);
+
+/* gc */
+void pcache_cache_gc_fn(struct work_struct *work);
+
+/* writeback */
+void cache_writeback_exit(struct pcache_cache *cache);
+int cache_writeback_init(struct pcache_cache *cache);
+void cache_writeback_fn(struct work_struct *work);
+
+/* inline functions */
+static inline struct pcache_cache_subtree *get_subtree(struct pcache_cache_tree *cache_tree, u64 off)
+{
+ if (cache_tree->n_subtrees == 1)
+ return &cache_tree->subtrees[0];
+
+ return &cache_tree->subtrees[off >> PCACHE_CACHE_SUBTREE_SIZE_SHIFT];
+}
+
+static inline void *cache_pos_addr(struct pcache_cache_pos *pos)
+{
+ return (pos->cache_seg->segment.data + pos->seg_off);
+}
+
+static inline void *get_key_head_addr(struct pcache_cache *cache)
+{
+ return cache_pos_addr(&cache->key_head);
+}
+
+static inline u32 get_kset_id(struct pcache_cache *cache, u64 off)
+{
+ return (off >> PCACHE_CACHE_SUBTREE_SIZE_SHIFT) % cache->n_ksets;
+}
+
+static inline struct pcache_cache_kset *get_kset(struct pcache_cache *cache, u32 kset_id)
+{
+ return (void *)cache->ksets + PCACHE_KSET_SIZE * kset_id;
+}
+
+static inline struct pcache_cache_data_head *get_data_head(struct pcache_cache *cache)
+{
+ return this_cpu_ptr(cache->data_heads);
+}
+
+static inline bool cache_key_empty(struct pcache_cache_key *key)
+{
+ return key->flags & PCACHE_CACHE_KEY_FLAGS_EMPTY;
+}
+
+static inline bool cache_key_clean(struct pcache_cache_key *key)
+{
+ return key->flags & PCACHE_CACHE_KEY_FLAGS_CLEAN;
+}
+
+static inline void cache_pos_copy(struct pcache_cache_pos *dst, struct pcache_cache_pos *src)
+{
+ memcpy(dst, src, sizeof(struct pcache_cache_pos));
+}
+
+/**
+ * cache_seg_is_ctrl_seg - Checks if a cache segment is a cache ctrl segment.
+ * @cache_seg_id: ID of the cache segment.
+ *
+ * Returns true if the cache segment ID corresponds to a cache ctrl segment.
+ *
+ * Note: We extend the segment control of the first cache segment
+ * (cache segment ID 0) to serve as the cache control (pcache_cache_ctrl)
+ * for the entire PCACHE cache. This function determines whether the given
+ * cache segment is the one storing the pcache_cache_ctrl information.
+ */
+static inline bool cache_seg_is_ctrl_seg(u32 cache_seg_id)
+{
+ return (cache_seg_id == 0);
+}
+
+/**
+ * cache_key_cutfront - Cuts a specified length from the front of a cache key.
+ * @key: Pointer to pcache_cache_key structure.
+ * @cut_len: Length to cut from the front.
+ *
+ * Advances the cache key position by cut_len and adjusts offset and length accordingly.
+ */
+static inline void cache_key_cutfront(struct pcache_cache_key *key, u32 cut_len)
+{
+ if (key->cache_pos.cache_seg)
+ cache_pos_advance(&key->cache_pos, cut_len);
+
+ key->off += cut_len;
+ key->len -= cut_len;
+}
+
+/**
+ * cache_key_cutback - Cuts a specified length from the back of a cache key.
+ * @key: Pointer to pcache_cache_key structure.
+ * @cut_len: Length to cut from the back.
+ *
+ * Reduces the length of the cache key by cut_len.
+ */
+static inline void cache_key_cutback(struct pcache_cache_key *key, u32 cut_len)
+{
+ key->len -= cut_len;
+}
+
+static inline void cache_key_delete(struct pcache_cache_key *key)
+{
+ struct pcache_cache_subtree *cache_subtree;
+
+ cache_subtree = key->cache_subtree;
+ BUG_ON(!cache_subtree);
+
+ rb_erase(&key->rb_node, &cache_subtree->root);
+ key->flags = 0;
+ cache_key_put(key);
+}
+
+static inline bool cache_data_crc_on(struct pcache_cache *cache)
+{
+ return (cache->cache_info.flags & PCACHE_CACHE_FLAGS_DATA_CRC);
+}
+
+static inline u32 cache_mode_get(struct pcache_cache *cache)
+{
+ return FIELD_GET(PCACHE_CACHE_FLAGS_CACHE_MODE_MASK, cache->cache_info.flags);
+}
+
+static inline void cache_mode_set(struct pcache_cache *cache, u32 cache_mode)
+{
+ cache->cache_info.flags &= ~PCACHE_CACHE_FLAGS_CACHE_MODE_MASK;
+ cache->cache_info.flags |= FIELD_PREP(PCACHE_CACHE_FLAGS_CACHE_MODE_MASK, cache_mode);
+}
+
+/**
+ * cache_key_data_crc - Calculates CRC for data in a cache key.
+ * @key: Pointer to the pcache_cache_key structure.
+ *
+ * Returns the CRC-32 checksum of the data within the cache key's position.
+ */
+static inline u32 cache_key_data_crc(struct pcache_cache_key *key)
+{
+ void *data;
+
+ data = cache_pos_addr(&key->cache_pos);
+
+ return crc32c(PCACHE_CRC_SEED, data, key->len);
+}
+
+static inline u32 cache_kset_crc(struct pcache_cache_kset_onmedia *kset_onmedia)
+{
+ u32 crc_size;
+
+ if (kset_onmedia->flags & PCACHE_KSET_FLAGS_LAST)
+ crc_size = sizeof(struct pcache_cache_kset_onmedia) - 4;
+ else
+ crc_size = struct_size(kset_onmedia, data, kset_onmedia->key_num) - 4;
+
+ return crc32c(PCACHE_CRC_SEED, (void *)kset_onmedia + 4, crc_size);
+}
+
+static inline u32 get_kset_onmedia_size(struct pcache_cache_kset_onmedia *kset_onmedia)
+{
+ return struct_size_t(struct pcache_cache_kset_onmedia, data, kset_onmedia->key_num);
+}
+
+/**
+ * cache_seg_remain - Computes remaining space in a cache segment.
+ * @pos: Pointer to pcache_cache_pos structure.
+ *
+ * Returns the amount of remaining space in the segment data starting from
+ * the current position offset.
+ */
+static inline u32 cache_seg_remain(struct pcache_cache_pos *pos)
+{
+ struct pcache_cache_segment *cache_seg;
+ struct pcache_segment *segment;
+ u32 seg_remain;
+
+ cache_seg = pos->cache_seg;
+ segment = &cache_seg->segment;
+ seg_remain = segment->data_size - pos->seg_off;
+
+ return seg_remain;
+}
+
+/**
+ * cache_key_invalid - Checks if a cache key is invalid.
+ * @key: Pointer to pcache_cache_key structure.
+ *
+ * Returns true if the cache key is invalid due to its generation being
+ * less than the generation of its segment; otherwise returns false.
+ *
+ * When the GC (garbage collection) thread identifies a segment
+ * as reclaimable, it increments the segment's generation (gen). However,
+ * it does not immediately remove all related cache keys. When accessing
+ * such a cache key, this function can be used to determine if the cache
+ * key has already become invalid.
+ */
+static inline bool cache_key_invalid(struct pcache_cache_key *key)
+{
+ if (cache_key_empty(key))
+ return false;
+
+ return (key->seg_gen < key->cache_pos.cache_seg->gen);
+}
+
+/**
+ * cache_key_lstart - Retrieves the logical start offset of a cache key.
+ * @key: Pointer to pcache_cache_key structure.
+ *
+ * Returns the logical start offset for the cache key.
+ */
+static inline u64 cache_key_lstart(struct pcache_cache_key *key)
+{
+ return key->off;
+}
+
+/**
+ * cache_key_lend - Retrieves the logical end offset of a cache key.
+ * @key: Pointer to pcache_cache_key structure.
+ *
+ * Returns the logical end offset for the cache key.
+ */
+static inline u64 cache_key_lend(struct pcache_cache_key *key)
+{
+ return key->off + key->len;
+}
+
+static inline void cache_key_copy(struct pcache_cache_key *key_dst, struct pcache_cache_key *key_src)
+{
+ key_dst->off = key_src->off;
+ key_dst->len = key_src->len;
+ key_dst->seg_gen = key_src->seg_gen;
+ key_dst->cache_tree = key_src->cache_tree;
+ key_dst->cache_subtree = key_src->cache_subtree;
+ key_dst->flags = key_src->flags;
+
+ cache_pos_copy(&key_dst->cache_pos, &key_src->cache_pos);
+}
+
+/**
+ * cache_pos_onmedia_crc - Calculates the CRC for an on-media cache position.
+ * @pos_om: Pointer to pcache_cache_pos_onmedia structure.
+ *
+ * Calculates the CRC-32 checksum of the position, excluding the first 4 bytes.
+ * Returns the computed CRC value.
+ */
+static inline u32 cache_pos_onmedia_crc(struct pcache_cache_pos_onmedia *pos_om)
+{
+ return pcache_meta_crc(&pos_om->header, sizeof(struct pcache_cache_pos_onmedia));
+}
+
+void cache_pos_encode(struct pcache_cache *cache,
+ struct pcache_cache_pos_onmedia *pos_onmedia,
+ struct pcache_cache_pos *pos, u64 seq, u32 *index);
+int cache_pos_decode(struct pcache_cache *cache,
+ struct pcache_cache_pos_onmedia *pos_onmedia,
+ struct pcache_cache_pos *pos, u64 *seq, u32 *index);
+
+static inline void cache_encode_key_tail(struct pcache_cache *cache)
+{
+ cache_pos_encode(cache, cache->cache_ctrl->key_tail_pos,
+ &cache->key_tail, ++cache->key_tail_seq,
+ &cache->key_tail_index);
+}
+
+static inline int cache_decode_key_tail(struct pcache_cache *cache)
+{
+ return cache_pos_decode(cache, cache->cache_ctrl->key_tail_pos,
+ &cache->key_tail, &cache->key_tail_seq,
+ &cache->key_tail_index);
+}
+
+static inline void cache_encode_dirty_tail(struct pcache_cache *cache)
+{
+ cache_pos_encode(cache, cache->cache_ctrl->dirty_tail_pos,
+ &cache->dirty_tail, ++cache->dirty_tail_seq,
+ &cache->dirty_tail_index);
+}
+
+static inline int cache_decode_dirty_tail(struct pcache_cache *cache)
+{
+ return cache_pos_decode(cache, cache->cache_ctrl->dirty_tail_pos,
+ &cache->dirty_tail, &cache->dirty_tail_seq,
+ &cache->dirty_tail_index);
+}
+
+int pcache_cache_init(void);
+void pcache_cache_exit(void);
+#endif /* _PCACHE_CACHE_H */
diff --git a/drivers/md/dm-pcache/cache_dev.c b/drivers/md/dm-pcache/cache_dev.c
new file mode 100644
index 000000000000..cc138b99137d
--- /dev/null
+++ b/drivers/md/dm-pcache/cache_dev.c
@@ -0,0 +1,299 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/blkdev.h>
+#include <linux/dax.h>
+#include <linux/vmalloc.h>
+#include <linux/pfn_t.h>
+#include <linux/parser.h>
+
+#include "cache_dev.h"
+#include "backing_dev.h"
+#include "cache.h"
+#include "dm_pcache.h"
+
+static void cache_dev_dax_exit(struct pcache_cache_dev *cache_dev)
+{
+ if (cache_dev->use_vmap)
+ vunmap(cache_dev->mapping);
+}
+
+static int build_vmap(struct dax_device *dax_dev, long total_pages, void **vaddr)
+{
+ struct page **pages;
+ long i = 0, chunk;
+ pfn_t pfn;
+ int ret;
+
+ pages = vmalloc_array(total_pages, sizeof(struct page *));
+ if (!pages)
+ return -ENOMEM;
+
+ do {
+ chunk = dax_direct_access(dax_dev, i, total_pages - i,
+ DAX_ACCESS, NULL, &pfn);
+ if (chunk <= 0) {
+ ret = chunk ? chunk : -EINVAL;
+ goto out_free;
+ }
+
+ if (!pfn_t_has_page(pfn)) {
+ ret = -EOPNOTSUPP;
+ goto out_free;
+ }
+
+ while (chunk-- && i < total_pages) {
+ pages[i++] = pfn_t_to_page(pfn);
+ pfn.val++;
+ if (!(i & 15))
+ cond_resched();
+ }
+ } while (i < total_pages);
+
+ *vaddr = vmap(pages, total_pages, VM_MAP, PAGE_KERNEL);
+ if (!*vaddr)
+ ret = -ENOMEM;
+out_free:
+ vfree(pages);
+ return ret;
+}
+
+static int cache_dev_dax_init(struct pcache_cache_dev *cache_dev)
+{
+ struct dm_pcache *pcache = CACHE_DEV_TO_PCACHE(cache_dev);
+ struct dax_device *dax_dev;
+ long total_pages, mapped_pages;
+ u64 bdev_size;
+ void *vaddr;
+ int ret;
+ int id;
+ pfn_t pfn;
+
+ dax_dev = cache_dev->dm_dev->dax_dev;
+ /* total size check */
+ bdev_size = bdev_nr_bytes(cache_dev->dm_dev->bdev);
+ if (bdev_size < PCACHE_CACHE_DEV_SIZE_MIN) {
+ pcache_dev_err(pcache, "dax device is too small, required at least %llu",
+ PCACHE_CACHE_DEV_SIZE_MIN);
+ ret = -ENOSPC;
+ goto out;
+ }
+
+ total_pages = bdev_size >> PAGE_SHIFT;
+ /* attempt: direct-map the whole range */
+ id = dax_read_lock();
+ mapped_pages = dax_direct_access(dax_dev, 0, total_pages,
+ DAX_ACCESS, &vaddr, &pfn);
+ if (mapped_pages < 0) {
+ pcache_dev_err(pcache, "dax_direct_access failed: %ld\n", mapped_pages);
+ ret = mapped_pages;
+ goto unlock;
+ }
+
+ if (!pfn_t_has_page(pfn)) {
+ ret = -EOPNOTSUPP;
+ goto unlock;
+ }
+
+ if (mapped_pages == total_pages) {
+ /* success: contiguous direct mapping */
+ cache_dev->mapping = vaddr;
+ } else {
+ /* need vmap fallback */
+ ret = build_vmap(dax_dev, total_pages, &vaddr);
+ if (ret) {
+ pcache_dev_err(pcache, "vmap fallback failed: %d\n", ret);
+ goto unlock;
+ }
+
+ cache_dev->mapping = vaddr;
+ cache_dev->use_vmap = true;
+ }
+ dax_read_unlock(id);
+
+ return 0;
+unlock:
+ dax_read_unlock(id);
+out:
+ return ret;
+}
+
+void cache_dev_zero_range(struct pcache_cache_dev *cache_dev, void *pos, u32 size)
+{
+ memset(pos, 0, size);
+ dax_flush(cache_dev->dm_dev->dax_dev, pos, size);
+}
+
+static int sb_read(struct pcache_cache_dev *cache_dev, struct pcache_sb *sb)
+{
+ struct pcache_sb *sb_addr = CACHE_DEV_SB(cache_dev);
+
+ if (copy_mc_to_kernel(sb, sb_addr, sizeof(struct pcache_sb)))
+ return -EIO;
+
+ return 0;
+}
+
+static void sb_write(struct pcache_cache_dev *cache_dev, struct pcache_sb *sb)
+{
+ struct pcache_sb *sb_addr = CACHE_DEV_SB(cache_dev);
+
+ memcpy_flushcache(sb_addr, sb, sizeof(struct pcache_sb));
+ pmem_wmb();
+}
+
+static int sb_init(struct pcache_cache_dev *cache_dev, struct pcache_sb *sb)
+{
+ struct dm_pcache *pcache = CACHE_DEV_TO_PCACHE(cache_dev);
+ u64 nr_segs;
+ u64 cache_dev_size;
+ u64 magic;
+ u32 flags = 0;
+
+ magic = le64_to_cpu(sb->magic);
+ if (magic)
+ return -EEXIST;
+
+ cache_dev_size = bdev_nr_bytes(file_bdev(cache_dev->dm_dev->bdev_file));
+ if (cache_dev_size < PCACHE_CACHE_DEV_SIZE_MIN) {
+ pcache_dev_err(pcache, "dax device is too small, required at least %llu",
+ PCACHE_CACHE_DEV_SIZE_MIN);
+ return -ENOSPC;
+ }
+
+ nr_segs = (cache_dev_size - PCACHE_SEGMENTS_OFF) / ((PCACHE_SEG_SIZE));
+
+#if defined(__BYTE_ORDER) ? (__BIG_ENDIAN == __BYTE_ORDER) : defined(__BIG_ENDIAN)
+ flags |= PCACHE_SB_F_BIGENDIAN;
+#endif
+ sb->flags = cpu_to_le32(flags);
+ sb->magic = cpu_to_le64(PCACHE_MAGIC);
+ sb->seg_num = cpu_to_le32(nr_segs);
+ sb->crc = cpu_to_le32(crc32c(PCACHE_CRC_SEED, (void *)(sb) + 4, sizeof(struct pcache_sb) - 4));
+
+ cache_dev_zero_range(cache_dev, CACHE_DEV_CACHE_INFO(cache_dev),
+ PCACHE_CACHE_INFO_SIZE * PCACHE_META_INDEX_MAX +
+ PCACHE_CACHE_CTRL_SIZE);
+
+ return 0;
+}
+
+static int sb_validate(struct pcache_cache_dev *cache_dev, struct pcache_sb *sb)
+{
+ struct dm_pcache *pcache = CACHE_DEV_TO_PCACHE(cache_dev);
+ u32 flags;
+ u32 crc;
+
+ if (le64_to_cpu(sb->magic) != PCACHE_MAGIC) {
+ pcache_dev_err(pcache, "unexpected magic: %llx\n",
+ le64_to_cpu(sb->magic));
+ return -EINVAL;
+ }
+
+ crc = crc32c(PCACHE_CRC_SEED, (void *)(sb) + 4, sizeof(struct pcache_sb) - 4);
+ if (crc != le32_to_cpu(sb->crc)) {
+ pcache_dev_err(pcache, "corrupted sb: %u, expected: %u\n", crc, le32_to_cpu(sb->crc));
+ return -EINVAL;
+ }
+
+ flags = le32_to_cpu(sb->flags);
+#if defined(__BYTE_ORDER) ? (__BIG_ENDIAN == __BYTE_ORDER) : defined(__BIG_ENDIAN)
+ if (!(flags & PCACHE_SB_F_BIGENDIAN)) {
+ pcache_dev_err(pcache, "cache_dev is not big endian\n");
+ return -EINVAL;
+ }
+#else
+ if (flags & PCACHE_SB_F_BIGENDIAN) {
+ pcache_dev_err(pcache, "cache_dev is big endian\n");
+ return -EINVAL;
+ }
+#endif
+ return 0;
+}
+
+static int cache_dev_init(struct pcache_cache_dev *cache_dev, u32 seg_num)
+{
+ cache_dev->seg_num = seg_num;
+ cache_dev->seg_bitmap = kvcalloc(BITS_TO_LONGS(cache_dev->seg_num), sizeof(unsigned long), GFP_KERNEL);
+ if (!cache_dev->seg_bitmap)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static void cache_dev_exit(struct pcache_cache_dev *cache_dev)
+{
+ kvfree(cache_dev->seg_bitmap);
+}
+
+void cache_dev_stop(struct dm_pcache *pcache)
+{
+ struct pcache_cache_dev *cache_dev = &pcache->cache_dev;
+
+ cache_dev_exit(cache_dev);
+ cache_dev_dax_exit(cache_dev);
+}
+
+int cache_dev_start(struct dm_pcache *pcache)
+{
+ struct pcache_cache_dev *cache_dev = &pcache->cache_dev;
+ struct pcache_sb sb;
+ bool format = false;
+ int ret;
+
+ mutex_init(&cache_dev->seg_lock);
+
+ ret = cache_dev_dax_init(cache_dev);
+ if (ret) {
+ pcache_dev_err(pcache, "failed to init cache_dev %s via dax way: %d.",
+ cache_dev->dm_dev->name, ret);
+ goto err;
+ }
+
+ ret = sb_read(cache_dev, &sb);
+ if (ret)
+ goto dax_release;
+
+ if (le64_to_cpu(sb.magic) == 0) {
+ format = true;
+ ret = sb_init(cache_dev, &sb);
+ if (ret < 0)
+ goto dax_release;
+ }
+
+ ret = sb_validate(cache_dev, &sb);
+ if (ret)
+ goto dax_release;
+
+ cache_dev->sb_flags = le32_to_cpu(sb.flags);
+ ret = cache_dev_init(cache_dev, le32_to_cpu(sb.seg_num));
+ if (ret)
+ goto dax_release;
+
+ if (format)
+ sb_write(cache_dev, &sb);
+
+ return 0;
+
+dax_release:
+ cache_dev_dax_exit(cache_dev);
+err:
+ return ret;
+}
+
+int cache_dev_get_empty_segment_id(struct pcache_cache_dev *cache_dev, u32 *seg_id)
+{
+ int ret;
+
+ mutex_lock(&cache_dev->seg_lock);
+ *seg_id = find_next_zero_bit(cache_dev->seg_bitmap, cache_dev->seg_num, 0);
+ if (*seg_id == cache_dev->seg_num) {
+ ret = -ENOSPC;
+ goto unlock;
+ }
+
+ __set_bit(*seg_id, cache_dev->seg_bitmap);
+ ret = 0;
+unlock:
+ mutex_unlock(&cache_dev->seg_lock);
+ return ret;
+}
diff --git a/drivers/md/dm-pcache/cache_dev.h b/drivers/md/dm-pcache/cache_dev.h
new file mode 100644
index 000000000000..6251eb4ebe96
--- /dev/null
+++ b/drivers/md/dm-pcache/cache_dev.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _PCACHE_CACHE_DEV_H
+#define _PCACHE_CACHE_DEV_H
+
+#include <linux/device.h>
+#include <linux/device-mapper.h>
+
+#include "pcache_internal.h"
+
+#define PCACHE_MAGIC 0x65B05EFA96C596EFULL
+
+#define PCACHE_SB_OFF (4 * PCACHE_KB)
+#define PCACHE_SB_SIZE (4 * PCACHE_KB)
+
+#define PCACHE_CACHE_INFO_OFF (PCACHE_SB_OFF + PCACHE_SB_SIZE)
+#define PCACHE_CACHE_INFO_SIZE (4 * PCACHE_KB)
+
+#define PCACHE_CACHE_CTRL_OFF (PCACHE_CACHE_INFO_OFF + (PCACHE_CACHE_INFO_SIZE * PCACHE_META_INDEX_MAX))
+#define PCACHE_CACHE_CTRL_SIZE (4 * PCACHE_KB)
+
+#define PCACHE_SEGMENTS_OFF (PCACHE_CACHE_CTRL_OFF + PCACHE_CACHE_CTRL_SIZE)
+#define PCACHE_SEG_INFO_SIZE (4 * PCACHE_KB)
+
+#define PCACHE_CACHE_DEV_SIZE_MIN (512 * PCACHE_MB) /* 512 MB */
+#define PCACHE_SEG_SIZE (16 * PCACHE_MB) /* Size of each PCACHE segment (16 MB) */
+
+#define CACHE_DEV_SB(cache_dev) ((struct pcache_sb *)(cache_dev->mapping + PCACHE_SB_OFF))
+#define CACHE_DEV_CACHE_INFO(cache_dev) ((void *)cache_dev->mapping + PCACHE_CACHE_INFO_OFF)
+#define CACHE_DEV_CACHE_CTRL(cache_dev) ((void *)cache_dev->mapping + PCACHE_CACHE_CTRL_OFF)
+#define CACHE_DEV_SEGMENTS(cache_dev) ((void *)cache_dev->mapping + PCACHE_SEGMENTS_OFF)
+#define CACHE_DEV_SEGMENT(cache_dev, id) ((void *)CACHE_DEV_SEGMENTS(cache_dev) + (u64)id * PCACHE_SEG_SIZE)
+
+/*
+ * PCACHE SB flags configured during formatting
+ *
+ * The PCACHE_SB_F_xxx flags define registration requirements based on cache_dev
+ * formatting. For a machine to register a cache_dev:
+ * - PCACHE_SB_F_BIGENDIAN: Requires a big-endian machine.
+ */
+#define PCACHE_SB_F_BIGENDIAN BIT(0)
+
+struct pcache_sb {
+ __le32 crc;
+ __le32 flags;
+ __le64 magic;
+
+ __le32 seg_num;
+};
+
+struct pcache_cache_dev {
+ u32 sb_flags;
+ u32 seg_num;
+ void *mapping;
+ bool use_vmap;
+
+ struct dm_dev *dm_dev;
+
+ struct mutex seg_lock;
+ unsigned long *seg_bitmap;
+};
+
+struct dm_pcache;
+int cache_dev_start(struct dm_pcache *pcache);
+void cache_dev_stop(struct dm_pcache *pcache);
+
+void cache_dev_zero_range(struct pcache_cache_dev *cache_dev, void *pos, u32 size);
+
+int cache_dev_get_empty_segment_id(struct pcache_cache_dev *cache_dev, u32 *seg_id);
+
+#endif /* _PCACHE_CACHE_DEV_H */
diff --git a/drivers/md/dm-pcache/cache_gc.c b/drivers/md/dm-pcache/cache_gc.c
new file mode 100644
index 000000000000..94f8b276a021
--- /dev/null
+++ b/drivers/md/dm-pcache/cache_gc.c
@@ -0,0 +1,170 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#include "cache.h"
+#include "backing_dev.h"
+#include "cache_dev.h"
+#include "dm_pcache.h"
+
+/**
+ * cache_key_gc - Releases the reference of a cache key segment.
+ * @cache: Pointer to the pcache_cache structure.
+ * @key: Pointer to the cache key to be garbage collected.
+ *
+ * This function decrements the reference count of the cache segment
+ * associated with the given key. If the reference count drops to zero,
+ * the segment may be invalidated and reused.
+ */
+static void cache_key_gc(struct pcache_cache *cache, struct pcache_cache_key *key)
+{
+ cache_seg_put(key->cache_pos.cache_seg);
+}
+
+static bool need_gc(struct pcache_cache *cache, struct pcache_cache_pos *dirty_tail, struct pcache_cache_pos *key_tail)
+{
+ struct dm_pcache *pcache = CACHE_TO_PCACHE(cache);
+ struct pcache_cache_kset_onmedia *kset_onmedia;
+ void *dirty_addr, *key_addr;
+ u32 segs_used, segs_gc_threshold, to_copy;
+ int ret;
+
+ dirty_addr = cache_pos_addr(dirty_tail);
+ key_addr = cache_pos_addr(key_tail);
+ if (dirty_addr == key_addr) {
+ pcache_dev_debug(pcache, "key tail is equal to dirty tail: %u:%u\n",
+ dirty_tail->cache_seg->cache_seg_id,
+ dirty_tail->seg_off);
+ return false;
+ }
+
+ kset_onmedia = (struct pcache_cache_kset_onmedia *)cache->gc_kset_onmedia_buf;
+
+ to_copy = min(PCACHE_KSET_ONMEDIA_SIZE_MAX, PCACHE_SEG_SIZE - key_tail->seg_off);
+ ret = copy_mc_to_kernel(kset_onmedia, key_addr, to_copy);
+ if (ret) {
+ pcache_dev_err(pcache, "error to read kset: %d", ret);
+ return false;
+ }
+
+ /* Check if kset_onmedia is corrupted */
+ if (kset_onmedia->magic != PCACHE_KSET_MAGIC) {
+ pcache_dev_debug(pcache, "gc error: magic is not as expected. key_tail: %u:%u magic: %llx, expected: %llx\n",
+ key_tail->cache_seg->cache_seg_id, key_tail->seg_off,
+ kset_onmedia->magic, PCACHE_KSET_MAGIC);
+ return false;
+ }
+
+ /* Verify the CRC of the kset_onmedia */
+ if (kset_onmedia->crc != cache_kset_crc(kset_onmedia)) {
+ pcache_dev_debug(pcache, "gc error: crc is not as expected. crc: %x, expected: %x\n",
+ cache_kset_crc(kset_onmedia), kset_onmedia->crc);
+ return false;
+ }
+
+ segs_used = bitmap_weight(cache->seg_map, cache->n_segs);
+ segs_gc_threshold = cache->n_segs * pcache_cache_get_gc_percent(cache) / 100;
+ if (segs_used < segs_gc_threshold) {
+ pcache_dev_debug(pcache, "segs_used: %u, segs_gc_threshold: %u\n", segs_used, segs_gc_threshold);
+ return false;
+ }
+
+ return true;
+}
+
+/**
+ * last_kset_gc - Advances the garbage collection for the last kset.
+ * @cache: Pointer to the pcache_cache structure.
+ * @kset_onmedia: Pointer to the kset_onmedia structure for the last kset.
+ */
+static void last_kset_gc(struct pcache_cache *cache, struct pcache_cache_kset_onmedia *kset_onmedia)
+{
+ struct dm_pcache *pcache = CACHE_TO_PCACHE(cache);
+ struct pcache_cache_segment *cur_seg, *next_seg;
+
+ cur_seg = cache->key_tail.cache_seg;
+
+ next_seg = &cache->segments[kset_onmedia->next_cache_seg_id];
+
+ mutex_lock(&cache->key_tail_lock);
+ cache->key_tail.cache_seg = next_seg;
+ cache->key_tail.seg_off = 0;
+ cache_encode_key_tail(cache);
+ mutex_unlock(&cache->key_tail_lock);
+
+ pcache_dev_debug(pcache, "gc advance kset seg: %u\n", cur_seg->cache_seg_id);
+
+ spin_lock(&cache->seg_map_lock);
+ __clear_bit(cur_seg->cache_seg_id, cache->seg_map);
+ spin_unlock(&cache->seg_map_lock);
+}
+
+void pcache_cache_gc_fn(struct work_struct *work)
+{
+ struct pcache_cache *cache = container_of(work, struct pcache_cache, gc_work.work);
+ struct dm_pcache *pcache = CACHE_TO_PCACHE(cache);
+ struct pcache_cache_pos dirty_tail, key_tail;
+ struct pcache_cache_kset_onmedia *kset_onmedia;
+ struct pcache_cache_key_onmedia *key_onmedia;
+ struct pcache_cache_key *key;
+ int ret;
+ int i;
+
+ kset_onmedia = (struct pcache_cache_kset_onmedia *)cache->gc_kset_onmedia_buf;
+
+ while (true) {
+ if (pcache_is_stopping(pcache) || atomic_read(&cache->gc_errors))
+ return;
+
+ /* Get new tail positions */
+ mutex_lock(&cache->dirty_tail_lock);
+ cache_pos_copy(&dirty_tail, &cache->dirty_tail);
+ mutex_unlock(&cache->dirty_tail_lock);
+
+ mutex_lock(&cache->key_tail_lock);
+ cache_pos_copy(&key_tail, &cache->key_tail);
+ mutex_unlock(&cache->key_tail_lock);
+
+ if (!need_gc(cache, &dirty_tail, &key_tail))
+ break;
+
+ if (kset_onmedia->flags & PCACHE_KSET_FLAGS_LAST) {
+ /* Don't move to the next segment if dirty_tail has not moved */
+ if (dirty_tail.cache_seg == key_tail.cache_seg)
+ break;
+
+ last_kset_gc(cache, kset_onmedia);
+ continue;
+ }
+
+ for (i = 0; i < kset_onmedia->key_num; i++) {
+ struct pcache_cache_key key_tmp = { 0 };
+
+ key_onmedia = &kset_onmedia->data[i];
+
+ key = &key_tmp;
+ cache_key_init(&cache->req_key_tree, key);
+
+ ret = cache_key_decode(cache, key_onmedia, key);
+ if (ret) {
+ /* return without re-arm gc work, and prevent future
+ * gc, because we can't retry the partial-gc-ed kset
+ */
+ atomic_inc(&cache->gc_errors);
+ pcache_dev_err(pcache, "failed to decode cache key in gc\n");
+ return;
+ }
+
+ cache_key_gc(cache, key);
+ }
+
+ pcache_dev_debug(pcache, "gc advance: %u:%u %u\n",
+ key_tail.cache_seg->cache_seg_id,
+ key_tail.seg_off,
+ get_kset_onmedia_size(kset_onmedia));
+
+ mutex_lock(&cache->key_tail_lock);
+ cache_pos_advance(&cache->key_tail, get_kset_onmedia_size(kset_onmedia));
+ cache_encode_key_tail(cache);
+ mutex_unlock(&cache->key_tail_lock);
+ }
+
+ queue_delayed_work(cache_get_wq(cache), &cache->gc_work, PCACHE_CACHE_GC_INTERVAL);
+}
diff --git a/drivers/md/dm-pcache/cache_key.c b/drivers/md/dm-pcache/cache_key.c
new file mode 100644
index 000000000000..2b77e121f89b
--- /dev/null
+++ b/drivers/md/dm-pcache/cache_key.c
@@ -0,0 +1,888 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#include "cache.h"
+#include "backing_dev.h"
+#include "cache_dev.h"
+#include "dm_pcache.h"
+
+struct pcache_cache_kset_onmedia pcache_empty_kset = { 0 };
+
+void cache_key_init(struct pcache_cache_tree *cache_tree, struct pcache_cache_key *key)
+{
+ kref_init(&key->ref);
+ key->cache_tree = cache_tree;
+ INIT_LIST_HEAD(&key->list_node);
+ RB_CLEAR_NODE(&key->rb_node);
+}
+
+struct pcache_cache_key *cache_key_alloc(struct pcache_cache_tree *cache_tree, gfp_t gfp_mask)
+{
+ struct pcache_cache_key *key;
+
+ key = mempool_alloc(&cache_tree->key_pool, gfp_mask);
+ if (!key)
+ return NULL;
+
+ memset(key, 0, sizeof(struct pcache_cache_key));
+ cache_key_init(cache_tree, key);
+
+ return key;
+}
+
+/**
+ * cache_key_get - Increment the reference count of a cache key.
+ * @key: Pointer to the pcache_cache_key structure.
+ *
+ * This function increments the reference count of the specified cache key,
+ * ensuring that it is not freed while still in use.
+ */
+void cache_key_get(struct pcache_cache_key *key)
+{
+ kref_get(&key->ref);
+}
+
+/**
+ * cache_key_destroy - Free a cache key structure when its reference count drops to zero.
+ * @ref: Pointer to the kref structure.
+ *
+ * This function is called when the reference count of the cache key reaches zero.
+ * It frees the allocated cache key back to the slab cache.
+ */
+static void cache_key_destroy(struct kref *ref)
+{
+ struct pcache_cache_key *key = container_of(ref, struct pcache_cache_key, ref);
+ struct pcache_cache_tree *cache_tree = key->cache_tree;
+
+ mempool_free(key, &cache_tree->key_pool);
+}
+
+void cache_key_put(struct pcache_cache_key *key)
+{
+ kref_put(&key->ref, cache_key_destroy);
+}
+
+void cache_pos_advance(struct pcache_cache_pos *pos, u32 len)
+{
+ /* Ensure enough space remains in the current segment */
+ BUG_ON(cache_seg_remain(pos) < len);
+
+ pos->seg_off += len;
+}
+
+static void cache_key_encode(struct pcache_cache *cache,
+ struct pcache_cache_key_onmedia *key_onmedia,
+ struct pcache_cache_key *key)
+{
+ key_onmedia->off = key->off;
+ key_onmedia->len = key->len;
+
+ key_onmedia->cache_seg_id = key->cache_pos.cache_seg->cache_seg_id;
+ key_onmedia->cache_seg_off = key->cache_pos.seg_off;
+
+ key_onmedia->seg_gen = key->seg_gen;
+ key_onmedia->flags = key->flags;
+
+ if (cache_data_crc_on(cache))
+ key_onmedia->data_crc = cache_key_data_crc(key);
+}
+
+int cache_key_decode(struct pcache_cache *cache,
+ struct pcache_cache_key_onmedia *key_onmedia,
+ struct pcache_cache_key *key)
+{
+ struct dm_pcache *pcache = CACHE_TO_PCACHE(cache);
+
+ key->off = key_onmedia->off;
+ key->len = key_onmedia->len;
+
+ key->cache_pos.cache_seg = &cache->segments[key_onmedia->cache_seg_id];
+ key->cache_pos.seg_off = key_onmedia->cache_seg_off;
+
+ key->seg_gen = key_onmedia->seg_gen;
+ key->flags = key_onmedia->flags;
+
+ if (cache_data_crc_on(cache) &&
+ key_onmedia->data_crc != cache_key_data_crc(key)) {
+ pcache_dev_err(pcache, "key: %llu:%u seg %u:%u data_crc error: %x, expected: %x\n",
+ key->off, key->len, key->cache_pos.cache_seg->cache_seg_id,
+ key->cache_pos.seg_off, cache_key_data_crc(key), key_onmedia->data_crc);
+ return -EIO;
+ }
+
+ return 0;
+}
+
+static void append_last_kset(struct pcache_cache *cache, u32 next_seg)
+{
+ struct pcache_cache_kset_onmedia kset_onmedia = { 0 };
+
+ kset_onmedia.flags |= PCACHE_KSET_FLAGS_LAST;
+ kset_onmedia.next_cache_seg_id = next_seg;
+ kset_onmedia.magic = PCACHE_KSET_MAGIC;
+ kset_onmedia.crc = cache_kset_crc(&kset_onmedia);
+
+ memcpy_flushcache(get_key_head_addr(cache), &kset_onmedia, sizeof(struct pcache_cache_kset_onmedia));
+ pmem_wmb();
+ cache_pos_advance(&cache->key_head, sizeof(struct pcache_cache_kset_onmedia));
+}
+
+int cache_kset_close(struct pcache_cache *cache, struct pcache_cache_kset *kset)
+{
+ struct pcache_cache_kset_onmedia *kset_onmedia;
+ u32 kset_onmedia_size;
+ int ret;
+
+ kset_onmedia = &kset->kset_onmedia;
+
+ if (!kset_onmedia->key_num)
+ return 0;
+
+ kset_onmedia_size = struct_size(kset_onmedia, data, kset_onmedia->key_num);
+
+ spin_lock(&cache->key_head_lock);
+again:
+ /* Reserve space for the last kset */
+ if (cache_seg_remain(&cache->key_head) < kset_onmedia_size + sizeof(struct pcache_cache_kset_onmedia)) {
+ struct pcache_cache_segment *next_seg;
+
+ next_seg = get_cache_segment(cache);
+ if (!next_seg) {
+ ret = -EBUSY;
+ goto out;
+ }
+
+ /* clear outdated kset in next seg */
+ memcpy_flushcache(next_seg->segment.data, &pcache_empty_kset,
+ sizeof(struct pcache_cache_kset_onmedia));
+ append_last_kset(cache, next_seg->cache_seg_id);
+ cache->key_head.cache_seg = next_seg;
+ cache->key_head.seg_off = 0;
+ goto again;
+ }
+
+ kset_onmedia->magic = PCACHE_KSET_MAGIC;
+ kset_onmedia->crc = cache_kset_crc(kset_onmedia);
+
+ /* clear outdated kset after current kset */
+ memcpy_flushcache(get_key_head_addr(cache) + kset_onmedia_size, &pcache_empty_kset,
+ sizeof(struct pcache_cache_kset_onmedia));
+ /* write current kset into segment */
+ memcpy_flushcache(get_key_head_addr(cache), kset_onmedia, kset_onmedia_size);
+ pmem_wmb();
+
+ /* reset kset_onmedia */
+ memset(kset_onmedia, 0, sizeof(struct pcache_cache_kset_onmedia));
+ cache_pos_advance(&cache->key_head, kset_onmedia_size);
+
+ ret = 0;
+out:
+ spin_unlock(&cache->key_head_lock);
+
+ return ret;
+}
+
+/**
+ * cache_key_append - Append a cache key to the related kset.
+ * @cache: Pointer to the pcache_cache structure.
+ * @key: Pointer to the cache key structure to append.
+ * @force_close: Need to close current kset if true.
+ *
+ * This function appends a cache key to the appropriate kset. If the kset
+ * is full, it closes the kset. If not, it queues a flush work to write
+ * the kset to media.
+ *
+ * Returns 0 on success, or a negative error code on failure.
+ */
+int cache_key_append(struct pcache_cache *cache, struct pcache_cache_key *key, bool force_close)
+{
+ struct pcache_cache_kset *kset;
+ struct pcache_cache_kset_onmedia *kset_onmedia;
+ struct pcache_cache_key_onmedia *key_onmedia;
+ u32 kset_id = get_kset_id(cache, key->off);
+ int ret = 0;
+
+ kset = get_kset(cache, kset_id);
+ kset_onmedia = &kset->kset_onmedia;
+
+ spin_lock(&kset->kset_lock);
+ key_onmedia = &kset_onmedia->data[kset_onmedia->key_num];
+ cache_key_encode(cache, key_onmedia, key);
+
+ /* Check if the current kset has reached the maximum number of keys */
+ if (++kset_onmedia->key_num == PCACHE_KSET_KEYS_MAX || force_close) {
+ /* If full, close the kset */
+ ret = cache_kset_close(cache, kset);
+ if (ret) {
+ kset_onmedia->key_num--;
+ goto out;
+ }
+ } else {
+ /* If not full, queue a delayed work to flush the kset */
+ queue_delayed_work(cache_get_wq(cache), &kset->flush_work, 1 * HZ);
+ }
+out:
+ spin_unlock(&kset->kset_lock);
+
+ return ret;
+}
+
+/**
+ * cache_subtree_walk - Traverse the cache tree.
+ * @ctx: Pointer to the context structure for traversal.
+ *
+ * This function traverses the cache tree starting from the specified node.
+ * It calls the appropriate callback functions based on the relationships
+ * between the keys in the cache tree.
+ *
+ * Returns 0 on success, or a negative error code on failure.
+ */
+int cache_subtree_walk(struct pcache_cache_subtree_walk_ctx *ctx)
+{
+ struct pcache_cache_key *key_tmp, *key;
+ struct rb_node *node_tmp;
+ int ret = SUBTREE_WALK_RET_OK;
+
+ key = ctx->key;
+ node_tmp = ctx->start_node;
+
+ while (node_tmp) {
+ if (ctx->walk_done && ctx->walk_done(ctx))
+ break;
+
+ key_tmp = CACHE_KEY(node_tmp);
+ /*
+ * If key_tmp ends before the start of key, continue to the next node.
+ * |----------|
+ * |=====|
+ */
+ if (cache_key_lend(key_tmp) <= cache_key_lstart(key)) {
+ if (ctx->after) {
+ ret = ctx->after(key, key_tmp, ctx);
+ if (ret)
+ goto out;
+ }
+ goto next;
+ }
+
+ /*
+ * If key_tmp starts after the end of key, stop traversing.
+ * |--------|
+ * |====|
+ */
+ if (cache_key_lstart(key_tmp) >= cache_key_lend(key)) {
+ if (ctx->before) {
+ ret = ctx->before(key, key_tmp, ctx);
+ if (ret)
+ goto out;
+ }
+ break;
+ }
+
+ /* Handle overlapping keys */
+ if (cache_key_lstart(key_tmp) >= cache_key_lstart(key)) {
+ /*
+ * If key_tmp encompasses key.
+ * |----------------| key_tmp
+ * |===========| key
+ */
+ if (cache_key_lend(key_tmp) >= cache_key_lend(key)) {
+ if (ctx->overlap_tail) {
+ ret = ctx->overlap_tail(key, key_tmp, ctx);
+ if (ret)
+ goto out;
+ }
+ break;
+ }
+
+ /*
+ * If key_tmp is contained within key.
+ * |----| key_tmp
+ * |==========| key
+ */
+ if (ctx->overlap_contain) {
+ ret = ctx->overlap_contain(key, key_tmp, ctx);
+ if (ret)
+ goto out;
+ }
+
+ goto next;
+ }
+
+ /*
+ * If key_tmp starts before key ends but ends after key.
+ * |-----------| key_tmp
+ * |====| key
+ */
+ if (cache_key_lend(key_tmp) > cache_key_lend(key)) {
+ if (ctx->overlap_contained) {
+ ret = ctx->overlap_contained(key, key_tmp, ctx);
+ if (ret)
+ goto out;
+ }
+ break;
+ }
+
+ /*
+ * If key_tmp starts before key and ends within key.
+ * |--------| key_tmp
+ * |==========| key
+ */
+ if (ctx->overlap_head) {
+ ret = ctx->overlap_head(key, key_tmp, ctx);
+ if (ret)
+ goto out;
+ }
+next:
+ node_tmp = rb_next(node_tmp);
+ }
+
+out:
+ if (ctx->walk_finally)
+ ret = ctx->walk_finally(ctx, ret);
+
+ return ret;
+}
+
+/**
+ * cache_subtree_search - Search for a key in the cache tree.
+ * @cache_subtree: Pointer to the cache tree structure.
+ * @key: Pointer to the cache key to search for.
+ * @parentp: Pointer to store the parent node of the found node.
+ * @newp: Pointer to store the location where the new node should be inserted.
+ * @delete_key_list: List to collect invalid keys for deletion.
+ *
+ * This function searches the cache tree for a specific key and returns
+ * the node that is the predecessor of the key, or first node if the key is
+ * less than all keys in the tree. If any invalid keys are found during
+ * the search, they are added to the delete_key_list for later cleanup.
+ *
+ * Returns a pointer to the previous node.
+ */
+struct rb_node *cache_subtree_search(struct pcache_cache_subtree *cache_subtree, struct pcache_cache_key *key,
+ struct rb_node **parentp, struct rb_node ***newp,
+ struct list_head *delete_key_list)
+{
+ struct rb_node **new, *parent = NULL;
+ struct pcache_cache_key *key_tmp;
+ struct rb_node *prev_node = NULL;
+
+ new = &(cache_subtree->root.rb_node);
+ while (*new) {
+ key_tmp = container_of(*new, struct pcache_cache_key, rb_node);
+ if (cache_key_invalid(key_tmp))
+ list_add(&key_tmp->list_node, delete_key_list);
+
+ parent = *new;
+ if (key_tmp->off >= key->off) {
+ new = &((*new)->rb_left);
+ } else {
+ prev_node = *new;
+ new = &((*new)->rb_right);
+ }
+ }
+
+ if (!prev_node)
+ prev_node = rb_first(&cache_subtree->root);
+
+ if (parentp)
+ *parentp = parent;
+
+ if (newp)
+ *newp = new;
+
+ return prev_node;
+}
+
+static struct pcache_cache_key *get_pre_alloc_key(struct pcache_cache_subtree_walk_ctx *ctx)
+{
+ struct pcache_cache_key *key;
+
+ if (ctx->pre_alloc_key) {
+ key = ctx->pre_alloc_key;
+ ctx->pre_alloc_key = NULL;
+
+ return key;
+ }
+
+ return cache_key_alloc(ctx->cache_tree, GFP_NOWAIT);
+}
+
+/**
+ * fixup_overlap_tail - Adjust the key when it overlaps at the tail.
+ * @key: Pointer to the new cache key being inserted.
+ * @key_tmp: Pointer to the existing key that overlaps.
+ * @ctx: Pointer to the context for walking the cache tree.
+ *
+ * This function modifies the existing key (key_tmp) when there is an
+ * overlap at the tail with the new key. If the modified key becomes
+ * empty, it is deleted.
+ */
+static int fixup_overlap_tail(struct pcache_cache_key *key,
+ struct pcache_cache_key *key_tmp,
+ struct pcache_cache_subtree_walk_ctx *ctx)
+{
+ /*
+ * |----------------| key_tmp
+ * |===========| key
+ */
+ BUG_ON(cache_key_empty(key));
+ if (cache_key_empty(key_tmp)) {
+ cache_key_delete(key_tmp);
+ return SUBTREE_WALK_RET_RESEARCH;
+ }
+
+ cache_key_cutfront(key_tmp, cache_key_lend(key) - cache_key_lstart(key_tmp));
+ if (key_tmp->len == 0) {
+ cache_key_delete(key_tmp);
+ return SUBTREE_WALK_RET_RESEARCH;
+ }
+
+ return SUBTREE_WALK_RET_OK;
+}
+
+/**
+ * fixup_overlap_contain - Handle case where new key completely contains an existing key.
+ * @key: Pointer to the new cache key being inserted.
+ * @key_tmp: Pointer to the existing key that is being contained.
+ * @ctx: Pointer to the context for walking the cache tree.
+ *
+ * This function deletes the existing key (key_tmp) when the new key
+ * completely contains it. It returns SUBTREE_WALK_RET_RESEARCH to indicate that the
+ * tree structure may have changed, necessitating a re-insertion of
+ * the new key.
+ */
+static int fixup_overlap_contain(struct pcache_cache_key *key,
+ struct pcache_cache_key *key_tmp,
+ struct pcache_cache_subtree_walk_ctx *ctx)
+{
+ /*
+ * |----| key_tmp
+ * |==========| key
+ */
+ BUG_ON(cache_key_empty(key));
+ cache_key_delete(key_tmp);
+
+ return SUBTREE_WALK_RET_RESEARCH;
+}
+
+/**
+ * fixup_overlap_contained - Handle overlap when a new key is contained in an existing key.
+ * @key: The new cache key being inserted.
+ * @key_tmp: The existing cache key that overlaps with the new key.
+ * @ctx: Context for the cache tree walk.
+ *
+ * This function adjusts the existing key if the new key is contained
+ * within it. If the existing key is empty, it indicates a placeholder key
+ * that was inserted during a miss read. This placeholder will later be
+ * updated with real data from the backing_dev, making it no longer an empty key.
+ *
+ * If we delete key or insert a key, the structure of the entire cache tree may change,
+ * requiring a full research of the tree to find a new insertion point.
+ */
+static int fixup_overlap_contained(struct pcache_cache_key *key,
+ struct pcache_cache_key *key_tmp, struct pcache_cache_subtree_walk_ctx *ctx)
+{
+ struct pcache_cache_tree *cache_tree = ctx->cache_tree;
+
+ /*
+ * |-----------| key_tmp
+ * |====| key
+ */
+ BUG_ON(cache_key_empty(key));
+ if (cache_key_empty(key_tmp)) {
+ /* If key_tmp is empty, don't split it;
+ * it's a placeholder key for miss reads that will be updated later.
+ */
+ cache_key_cutback(key_tmp, cache_key_lend(key_tmp) - cache_key_lstart(key));
+ if (key_tmp->len == 0) {
+ cache_key_delete(key_tmp);
+ return SUBTREE_WALK_RET_RESEARCH;
+ }
+ } else {
+ struct pcache_cache_key *key_fixup;
+ bool need_research = false;
+
+ key_fixup = get_pre_alloc_key(ctx);
+ if (!key_fixup)
+ return SUBTREE_WALK_RET_NEED_KEY;
+
+ cache_key_copy(key_fixup, key_tmp);
+
+ /* Split key_tmp based on the new key's range */
+ cache_key_cutback(key_tmp, cache_key_lend(key_tmp) - cache_key_lstart(key));
+ if (key_tmp->len == 0) {
+ cache_key_delete(key_tmp);
+ need_research = true;
+ }
+
+ /* Create a new portion for key_fixup */
+ cache_key_cutfront(key_fixup, cache_key_lend(key) - cache_key_lstart(key_tmp));
+ if (key_fixup->len == 0) {
+ cache_key_put(key_fixup);
+ } else {
+ /* Insert the new key into the cache */
+ cache_key_insert(cache_tree, key_fixup, false);
+ need_research = true;
+ }
+
+ if (need_research)
+ return SUBTREE_WALK_RET_RESEARCH;
+ }
+
+ return SUBTREE_WALK_RET_OK;
+}
+
+/**
+ * fixup_overlap_head - Handle overlap when a new key overlaps with the head of an existing key.
+ * @key: The new cache key being inserted.
+ * @key_tmp: The existing cache key that overlaps with the new key.
+ * @ctx: Context for the cache tree walk.
+ *
+ * This function adjusts the existing key if the new key overlaps
+ * with the beginning of it. If the resulting key length is zero
+ * after the adjustment, the key is deleted. This indicates that
+ * the key no longer holds valid data and requires the tree to be
+ * re-researched for a new insertion point.
+ */
+static int fixup_overlap_head(struct pcache_cache_key *key,
+ struct pcache_cache_key *key_tmp, struct pcache_cache_subtree_walk_ctx *ctx)
+{
+ /*
+ * |--------| key_tmp
+ * |==========| key
+ */
+ BUG_ON(cache_key_empty(key));
+ /* Adjust key_tmp by cutting back based on the new key's start */
+ cache_key_cutback(key_tmp, cache_key_lend(key_tmp) - cache_key_lstart(key));
+ if (key_tmp->len == 0) {
+ /* If the adjusted key_tmp length is zero, delete it */
+ cache_key_delete(key_tmp);
+ return SUBTREE_WALK_RET_RESEARCH;
+ }
+
+ return SUBTREE_WALK_RET_OK;
+}
+
+/**
+ * cache_key_insert - Insert a new cache key into the cache tree.
+ * @cache_tree: Pointer to the cache_tree structure.
+ * @key: The cache key to insert.
+ * @fixup: Indicates if this is a new key being inserted.
+ *
+ * This function searches for the appropriate location to insert
+ * a new cache key into the cache tree. It handles key overlaps
+ * and ensures any invalid keys are removed before insertion.
+ */
+void cache_key_insert(struct pcache_cache_tree *cache_tree, struct pcache_cache_key *key, bool fixup)
+{
+ struct pcache_cache *cache = cache_tree->cache;
+ struct pcache_cache_subtree_walk_ctx walk_ctx = { 0 };
+ struct rb_node **new, *parent = NULL;
+ struct pcache_cache_subtree *cache_subtree;
+ struct pcache_cache_key *key_tmp = NULL, *key_next;
+ struct rb_node *prev_node = NULL;
+ LIST_HEAD(delete_key_list);
+ int ret;
+
+ cache_subtree = get_subtree(cache_tree, key->off);
+ key->cache_subtree = cache_subtree;
+search:
+ prev_node = cache_subtree_search(cache_subtree, key, &parent, &new, &delete_key_list);
+ if (!list_empty(&delete_key_list)) {
+ /* Remove invalid keys from the delete list */
+ list_for_each_entry_safe(key_tmp, key_next, &delete_key_list, list_node) {
+ list_del_init(&key_tmp->list_node);
+ cache_key_delete(key_tmp);
+ }
+ goto search;
+ }
+
+ if (fixup) {
+ /* Set up the context with the cache, start node, and new key */
+ walk_ctx.cache_tree = cache_tree;
+ walk_ctx.start_node = prev_node;
+ walk_ctx.key = key;
+
+ /* Assign overlap handling functions for different scenarios */
+ walk_ctx.overlap_tail = fixup_overlap_tail;
+ walk_ctx.overlap_head = fixup_overlap_head;
+ walk_ctx.overlap_contain = fixup_overlap_contain;
+ walk_ctx.overlap_contained = fixup_overlap_contained;
+
+ ret = cache_subtree_walk(&walk_ctx);
+ switch (ret) {
+ case SUBTREE_WALK_RET_OK:
+ break;
+ case SUBTREE_WALK_RET_RESEARCH:
+ goto search;
+ case SUBTREE_WALK_RET_NEED_KEY:
+ spin_unlock(&cache_subtree->tree_lock);
+ pcache_dev_debug(CACHE_TO_PCACHE(cache), "allocate pre_alloc_key with GFP_NOIO");
+ walk_ctx.pre_alloc_key = cache_key_alloc(cache_tree, GFP_NOIO);
+ spin_lock(&cache_subtree->tree_lock);
+ goto search;
+ default:
+ BUG();
+ }
+ }
+
+ if (walk_ctx.pre_alloc_key)
+ cache_key_put(walk_ctx.pre_alloc_key);
+
+ /* Link and insert the new key into the red-black tree */
+ rb_link_node(&key->rb_node, parent, new);
+ rb_insert_color(&key->rb_node, &cache_subtree->root);
+}
+
+/**
+ * clean_fn - Cleanup function to remove invalid keys from the cache tree.
+ * @work: Pointer to the work_struct associated with the cleanup.
+ *
+ * This function cleans up invalid keys from the cache tree in the background
+ * after a cache segment has been invalidated during cache garbage collection.
+ * It processes a maximum of PCACHE_CLEAN_KEYS_MAX keys per iteration and holds
+ * the tree lock to ensure thread safety.
+ */
+void clean_fn(struct work_struct *work)
+{
+ struct pcache_cache *cache = container_of(work, struct pcache_cache, clean_work);
+ struct pcache_cache_subtree *cache_subtree;
+ struct rb_node *node;
+ struct pcache_cache_key *key;
+ int i, count;
+
+ for (i = 0; i < cache->req_key_tree.n_subtrees; i++) {
+ cache_subtree = &cache->req_key_tree.subtrees[i];
+
+again:
+ if (pcache_is_stopping(CACHE_TO_PCACHE(cache)))
+ return;
+
+ /* Delete up to PCACHE_CLEAN_KEYS_MAX keys in one iteration */
+ count = 0;
+ spin_lock(&cache_subtree->tree_lock);
+ node = rb_first(&cache_subtree->root);
+ while (node) {
+ key = CACHE_KEY(node);
+ node = rb_next(node);
+ if (cache_key_invalid(key)) {
+ count++;
+ cache_key_delete(key);
+ }
+
+ if (count >= PCACHE_CLEAN_KEYS_MAX) {
+ /* Unlock and pause before continuing cleanup */
+ spin_unlock(&cache_subtree->tree_lock);
+ usleep_range(1000, 2000);
+ goto again;
+ }
+ }
+ spin_unlock(&cache_subtree->tree_lock);
+ }
+}
+
+/*
+ * kset_flush_fn - Flush work for a cache kset.
+ *
+ * This function is called when a kset flush work is queued from
+ * cache_key_append(). If the kset is full, it will be closed
+ * immediately. If not, the flush work will be queued for later closure.
+ *
+ * If cache_kset_close detects that a new segment is required to store
+ * the kset and there are no available segments, it will return an error.
+ * In this scenario, a retry will be attempted.
+ */
+void kset_flush_fn(struct work_struct *work)
+{
+ struct pcache_cache_kset *kset = container_of(work, struct pcache_cache_kset, flush_work.work);
+ struct pcache_cache *cache = kset->cache;
+ int ret;
+
+ if (pcache_is_stopping(CACHE_TO_PCACHE(cache)))
+ return;
+
+ spin_lock(&kset->kset_lock);
+ ret = cache_kset_close(cache, kset);
+ spin_unlock(&kset->kset_lock);
+
+ if (ret) {
+ /* Failed to flush kset, schedule a retry. */
+ queue_delayed_work(cache_get_wq(cache), &kset->flush_work, msecs_to_jiffies(100));
+ }
+}
+
+static int kset_replay(struct pcache_cache *cache, struct pcache_cache_kset_onmedia *kset_onmedia)
+{
+ struct pcache_cache_key_onmedia *key_onmedia;
+ struct pcache_cache_subtree *cache_subtree;
+ struct pcache_cache_key *key;
+ int ret;
+ int i;
+
+ for (i = 0; i < kset_onmedia->key_num; i++) {
+ key_onmedia = &kset_onmedia->data[i];
+
+ key = cache_key_alloc(&cache->req_key_tree, GFP_NOIO);
+ ret = cache_key_decode(cache, key_onmedia, key);
+ if (ret) {
+ cache_key_put(key);
+ goto err;
+ }
+
+ __set_bit(key->cache_pos.cache_seg->cache_seg_id, cache->seg_map);
+
+ /* Check if the segment generation is valid for insertion. */
+ if (key->seg_gen < key->cache_pos.cache_seg->gen) {
+ cache_key_put(key);
+ } else {
+ cache_subtree = get_subtree(&cache->req_key_tree, key->off);
+ spin_lock(&cache_subtree->tree_lock);
+ cache_key_insert(&cache->req_key_tree, key, true);
+ spin_unlock(&cache_subtree->tree_lock);
+ }
+
+ cache_seg_get(key->cache_pos.cache_seg);
+ }
+
+ return 0;
+err:
+ return ret;
+}
+
+int cache_replay(struct pcache_cache *cache)
+{
+ struct dm_pcache *pcache = CACHE_TO_PCACHE(cache);
+ struct pcache_cache_pos pos_tail;
+ struct pcache_cache_pos *pos;
+ struct pcache_cache_kset_onmedia *kset_onmedia;
+ u32 to_copy, count = 0;
+ int ret = 0;
+
+ kset_onmedia = kzalloc(PCACHE_KSET_ONMEDIA_SIZE_MAX, GFP_KERNEL);
+ if (!kset_onmedia)
+ return -ENOMEM;
+
+ cache_pos_copy(&pos_tail, &cache->key_tail);
+ pos = &pos_tail;
+
+ /*
+ * In cache replaying stage, there is no other one will access
+ * cache->seg_map, so we can set bit here without cache->seg_map_lock.
+ */
+ __set_bit(pos->cache_seg->cache_seg_id, cache->seg_map);
+
+ while (true) {
+ to_copy = min(PCACHE_KSET_ONMEDIA_SIZE_MAX, PCACHE_SEG_SIZE - pos->seg_off);
+ ret = copy_mc_to_kernel(kset_onmedia, cache_pos_addr(pos), to_copy);
+ if (ret) {
+ ret = -EIO;
+ goto out;
+ }
+
+ if (kset_onmedia->magic != PCACHE_KSET_MAGIC ||
+ kset_onmedia->crc != cache_kset_crc(kset_onmedia)) {
+ break;
+ }
+
+ /* Process the last kset and prepare for the next segment. */
+ if (kset_onmedia->flags & PCACHE_KSET_FLAGS_LAST) {
+ struct pcache_cache_segment *next_seg;
+
+ pcache_dev_debug(pcache, "last kset replay, next: %u\n", kset_onmedia->next_cache_seg_id);
+
+ next_seg = &cache->segments[kset_onmedia->next_cache_seg_id];
+
+ pos->cache_seg = next_seg;
+ pos->seg_off = 0;
+
+ __set_bit(pos->cache_seg->cache_seg_id, cache->seg_map);
+ continue;
+ }
+
+ /* Replay the kset and check for errors. */
+ ret = kset_replay(cache, kset_onmedia);
+ if (ret)
+ goto out;
+
+ /* Advance the position after processing the kset. */
+ cache_pos_advance(pos, get_kset_onmedia_size(kset_onmedia));
+ if (++count > 512) {
+ cond_resched();
+ count = 0;
+ }
+ }
+
+ /* Update the key_head position after replaying. */
+ spin_lock(&cache->key_head_lock);
+ cache_pos_copy(&cache->key_head, pos);
+ spin_unlock(&cache->key_head_lock);
+out:
+ kfree(kset_onmedia);
+ return ret;
+}
+
+int cache_tree_init(struct pcache_cache *cache, struct pcache_cache_tree *cache_tree, u32 n_subtrees)
+{
+ int ret;
+ u32 i;
+
+ cache_tree->cache = cache;
+ cache_tree->n_subtrees = n_subtrees;
+
+ ret = mempool_init_slab_pool(&cache_tree->key_pool, 1024, key_cache);
+ if (ret)
+ goto err;
+
+ /*
+ * Allocate and initialize the subtrees array.
+ * Each element is a cache tree structure that contains
+ * an RB tree root and a spinlock for protecting its contents.
+ */
+ cache_tree->subtrees = kvcalloc(cache_tree->n_subtrees, sizeof(struct pcache_cache_subtree), GFP_KERNEL);
+ if (!cache_tree->subtrees) {
+ ret = -ENOMEM;
+ goto key_pool_exit;
+ }
+
+ for (i = 0; i < cache_tree->n_subtrees; i++) {
+ struct pcache_cache_subtree *cache_subtree = &cache_tree->subtrees[i];
+
+ cache_subtree->root = RB_ROOT;
+ spin_lock_init(&cache_subtree->tree_lock);
+ }
+
+ return 0;
+
+key_pool_exit:
+ mempool_exit(&cache_tree->key_pool);
+err:
+ return ret;
+}
+
+void cache_tree_clear(struct pcache_cache_tree *cache_tree)
+{
+ struct pcache_cache_subtree *cache_subtree;
+ struct rb_node *node;
+ struct pcache_cache_key *key;
+ u32 i;
+
+ for (i = 0; i < cache_tree->n_subtrees; i++) {
+ cache_subtree = &cache_tree->subtrees[i];
+
+ spin_lock(&cache_subtree->tree_lock);
+ node = rb_first(&cache_subtree->root);
+ while (node) {
+ key = CACHE_KEY(node);
+ node = rb_next(node);
+
+ cache_key_delete(key);
+ }
+ spin_unlock(&cache_subtree->tree_lock);
+ }
+}
+
+void cache_tree_exit(struct pcache_cache_tree *cache_tree)
+{
+ cache_tree_clear(cache_tree);
+ kvfree(cache_tree->subtrees);
+ mempool_exit(&cache_tree->key_pool);
+}
diff --git a/drivers/md/dm-pcache/cache_req.c b/drivers/md/dm-pcache/cache_req.c
new file mode 100644
index 000000000000..992a1ad98909
--- /dev/null
+++ b/drivers/md/dm-pcache/cache_req.c
@@ -0,0 +1,835 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include "cache.h"
+#include "backing_dev.h"
+#include "cache_dev.h"
+#include "dm_pcache.h"
+
+static int cache_data_head_init(struct pcache_cache *cache)
+{
+ struct pcache_cache_segment *next_seg;
+ struct pcache_cache_data_head *data_head;
+
+ data_head = get_data_head(cache);
+ next_seg = get_cache_segment(cache);
+ if (!next_seg)
+ return -EBUSY;
+
+ cache_seg_get(next_seg);
+ data_head->head_pos.cache_seg = next_seg;
+ data_head->head_pos.seg_off = 0;
+
+ return 0;
+}
+
+/**
+ * cache_data_alloc - Allocate data for a cache key.
+ * @cache: Pointer to the cache structure.
+ * @key: Pointer to the cache key to allocate data for.
+ *
+ * This function tries to allocate space from the cache segment specified by the
+ * data head. If the remaining space in the segment is insufficient to allocate
+ * the requested length for the cache key, it will allocate whatever is available
+ * and adjust the key's length accordingly. This function does not allocate
+ * space that crosses segment boundaries.
+ */
+static int cache_data_alloc(struct pcache_cache *cache, struct pcache_cache_key *key)
+{
+ struct pcache_cache_data_head *data_head;
+ struct pcache_cache_pos *head_pos;
+ struct pcache_cache_segment *cache_seg;
+ u32 seg_remain;
+ u32 allocated = 0, to_alloc;
+ int ret = 0;
+
+ preempt_disable();
+ data_head = get_data_head(cache);
+again:
+ if (!data_head->head_pos.cache_seg) {
+ seg_remain = 0;
+ } else {
+ cache_pos_copy(&key->cache_pos, &data_head->head_pos);
+ key->seg_gen = key->cache_pos.cache_seg->gen;
+
+ head_pos = &data_head->head_pos;
+ cache_seg = head_pos->cache_seg;
+ seg_remain = cache_seg_remain(head_pos);
+ to_alloc = key->len - allocated;
+ }
+
+ if (seg_remain > to_alloc) {
+ /* If remaining space in segment is sufficient for the cache key, allocate it. */
+ cache_pos_advance(head_pos, to_alloc);
+ allocated += to_alloc;
+ cache_seg_get(cache_seg);
+ } else if (seg_remain) {
+ /* If remaining space is not enough, allocate the remaining space and adjust the cache key length. */
+ cache_pos_advance(head_pos, seg_remain);
+ key->len = seg_remain;
+
+ /* Get for key: obtain a reference to the cache segment for the key. */
+ cache_seg_get(cache_seg);
+ /* Put for head_pos->cache_seg: release the reference for the current head's segment. */
+ cache_seg_put(head_pos->cache_seg);
+ head_pos->cache_seg = NULL;
+ } else {
+ /* Initialize a new data head if no segment is available. */
+ ret = cache_data_head_init(cache);
+ if (ret)
+ goto out;
+
+ goto again;
+ }
+
+out:
+ preempt_enable();
+
+ return ret;
+}
+
+static int cache_copy_from_req_bio(struct pcache_cache *cache, struct pcache_cache_key *key,
+ struct pcache_request *pcache_req, u32 bio_off)
+{
+ struct pcache_cache_pos *pos = &key->cache_pos;
+ struct pcache_segment *segment;
+
+ segment = &pos->cache_seg->segment;
+
+ return segment_copy_from_bio(segment, pos->seg_off, key->len, pcache_req->bio, bio_off);
+}
+
+static int cache_copy_to_req_bio(struct pcache_cache *cache, struct pcache_request *pcache_req,
+ u32 bio_off, u32 len, struct pcache_cache_pos *pos, u64 key_gen)
+{
+ struct pcache_cache_segment *cache_seg = pos->cache_seg;
+ struct pcache_segment *segment = &cache_seg->segment;
+ int ret;
+
+ spin_lock(&cache_seg->gen_lock);
+ if (key_gen < cache_seg->gen) {
+ spin_unlock(&cache_seg->gen_lock);
+ return -EINVAL;
+ }
+
+ ret = segment_copy_to_bio(segment, pos->seg_off, len, pcache_req->bio, bio_off);
+ spin_unlock(&cache_seg->gen_lock);
+
+ return ret;
+}
+
+/**
+ * miss_read_end_req - Handle the end of a miss read request.
+ * @backing_req: Pointer to the request structure.
+ * @read_ret: Return value of read.
+ *
+ * This function is called when a backing request to read data from
+ * the backing_dev is completed. If the key associated with the request
+ * is empty (a placeholder), it allocates cache space for the key,
+ * copies the data read from the bio into the cache, and updates
+ * the key's status. If the key has been overwritten by a write
+ * request during this process, it will be deleted from the cache
+ * tree and no further action will be taken.
+ */
+static void miss_read_end_req(struct pcache_backing_dev_req *backing_req, int read_ret)
+{
+ void *priv_data = backing_req->priv_data;
+ struct pcache_request *pcache_req = backing_req->req.upper_req;
+ struct pcache_cache *cache = backing_req->backing_dev->cache;
+ int ret;
+
+ if (priv_data) {
+ struct pcache_cache_key *key;
+ struct pcache_cache_subtree *cache_subtree;
+
+ key = (struct pcache_cache_key *)priv_data;
+ cache_subtree = key->cache_subtree;
+
+ /* if this key was deleted from cache_subtree by a write, key->flags should be cleared,
+ * so if cache_key_empty() return true, this key is still in cache_subtree
+ */
+ spin_lock(&cache_subtree->tree_lock);
+ if (cache_key_empty(key)) {
+ /* Check if the backing request was successful. */
+ if (read_ret) {
+ cache_key_delete(key);
+ goto unlock;
+ }
+
+ /* Allocate cache space for the key and copy data from the backing_dev. */
+ ret = cache_data_alloc(cache, key);
+ if (ret) {
+ cache_key_delete(key);
+ goto unlock;
+ }
+
+ ret = cache_copy_from_req_bio(cache, key, pcache_req, backing_req->req.bio_off);
+ if (ret) {
+ cache_seg_put(key->cache_pos.cache_seg);
+ cache_key_delete(key);
+ goto unlock;
+ }
+ key->flags &= ~PCACHE_CACHE_KEY_FLAGS_EMPTY;
+ key->flags |= PCACHE_CACHE_KEY_FLAGS_CLEAN;
+
+ /* Append the key to the cache. */
+ ret = cache_key_append(cache, key, false);
+ if (ret) {
+ cache_seg_put(key->cache_pos.cache_seg);
+ cache_key_delete(key);
+ goto unlock;
+ }
+ }
+unlock:
+ spin_unlock(&cache_subtree->tree_lock);
+ cache_key_put(key);
+ }
+}
+
+/**
+ * submit_cache_miss_req - Submit a backing request when cache data is missing
+ * @cache: The cache context that manages cache operations
+ * @backing_req: The cache request containing information about the read request
+ *
+ * This function is used to handle cases where a cache read request cannot locate
+ * the required data in the cache. When such a miss occurs during `cache_subtree_walk`,
+ * it triggers a backing read request to fetch data from the backing storage.
+ *
+ * If `pcache_req->priv_data` is set, it points to a `pcache_cache_key`, representing
+ * a new cache key to be inserted into the cache. The function calls `cache_key_insert`
+ * to attempt adding the key. On insertion failure, it releases the key reference and
+ * clears `priv_data` to avoid further processing.
+ */
+static void submit_cache_miss_req(struct pcache_cache *cache, struct pcache_backing_dev_req *backing_req)
+{
+ if (backing_req->priv_data) {
+ struct pcache_cache_key *key;
+
+ /* Attempt to insert the key into the cache if priv_data is set */
+ key = (struct pcache_cache_key *)backing_req->priv_data;
+ cache_key_insert(&cache->req_key_tree, key, true);
+ }
+ backing_dev_req_submit(backing_req, false);
+}
+
+static void cache_miss_req_free(struct pcache_backing_dev_req *backing_req)
+{
+ struct pcache_cache_key *key;
+
+ if (backing_req->priv_data) {
+ key = backing_req->priv_data;
+ backing_req->priv_data = NULL;
+ cache_key_put(key); /* for ->priv_data */
+ cache_key_put(key); /* for init ref in alloc */
+ }
+
+ backing_dev_req_end(backing_req);
+}
+
+static struct pcache_backing_dev_req *cache_miss_req_alloc(struct pcache_cache *cache,
+ struct pcache_request *parent,
+ gfp_t gfp_mask)
+{
+ struct pcache_backing_dev *backing_dev = cache->backing_dev;
+ struct pcache_backing_dev_req *backing_req;
+ struct pcache_cache_key *key = NULL;
+ struct pcache_backing_dev_req_opts req_opts = { 0 };
+
+ req_opts.type = BACKING_DEV_REQ_TYPE_REQ;
+ req_opts.gfp_mask = gfp_mask;
+ req_opts.req.upper_req = parent;
+
+ backing_req = backing_dev_req_alloc(backing_dev, &req_opts);
+ if (!backing_req)
+ return NULL;
+
+ key = cache_key_alloc(&cache->req_key_tree, gfp_mask);
+ if (!key)
+ goto free_backing_req;
+
+ cache_key_get(key);
+ backing_req->priv_data = key;
+
+ return backing_req;
+
+free_backing_req:
+ cache_miss_req_free(backing_req);
+ return NULL;
+}
+
+static void cache_miss_req_init(struct pcache_cache *cache,
+ struct pcache_backing_dev_req *backing_req,
+ struct pcache_request *parent,
+ u32 off, u32 len, bool insert_key)
+{
+ struct pcache_cache_key *key;
+ struct pcache_backing_dev_req_opts req_opts = { 0 };
+
+ req_opts.type = BACKING_DEV_REQ_TYPE_REQ;
+ req_opts.req.upper_req = parent;
+ req_opts.req.req_off = off;
+ req_opts.req.len = len;
+ req_opts.end_fn = miss_read_end_req;
+
+ backing_dev_req_init(backing_req, &req_opts);
+
+ if (insert_key) {
+ key = backing_req->priv_data;
+ key->off = parent->off + off;
+ key->len = len;
+ key->flags |= PCACHE_CACHE_KEY_FLAGS_EMPTY;
+ } else {
+ key = backing_req->priv_data;
+ backing_req->priv_data = NULL;
+ cache_key_put(key);
+ cache_key_put(key);
+ }
+}
+
+static struct pcache_backing_dev_req *get_pre_alloc_req(struct pcache_cache_subtree_walk_ctx *ctx)
+{
+ struct pcache_cache *cache = ctx->cache_tree->cache;
+ struct pcache_request *pcache_req = ctx->pcache_req;
+ struct pcache_backing_dev_req *backing_req;
+
+ if (ctx->pre_alloc_req) {
+ backing_req = ctx->pre_alloc_req;
+ ctx->pre_alloc_req = NULL;
+
+ return backing_req;
+ }
+
+ return cache_miss_req_alloc(cache, pcache_req, GFP_NOWAIT);
+}
+
+/*
+ * In the process of walking the cache tree to locate cached data, this
+ * function handles the situation where the requested data range lies
+ * entirely before an existing cache node (`key_tmp`). This outcome
+ * signifies that the target data is absent from the cache (cache miss).
+ *
+ * To fulfill this portion of the read request, the function creates a
+ * backing request (`backing_req`) for the missing data range represented
+ * by `key`. It then appends this request to the submission list in the
+ * `ctx`, which will later be processed to retrieve the data from backing
+ * storage. After setting up the backing request, `req_done` in `ctx` is
+ * updated to reflect the length of the handled range, and the range
+ * in `key` is adjusted by trimming off the portion that is now handled.
+ *
+ * The scenario handled here:
+ *
+ * |--------| key_tmp (existing cached range)
+ * |====| key (requested range, preceding key_tmp)
+ *
+ * Since `key` is before `key_tmp`, it signifies that the requested data
+ * range is missing in the cache (cache miss) and needs retrieval from
+ * backing storage.
+ */
+static int read_before(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp,
+ struct pcache_cache_subtree_walk_ctx *ctx)
+{
+ struct pcache_backing_dev_req *backing_req;
+ struct pcache_cache *cache = ctx->cache_tree->cache;
+
+ /*
+ * In this scenario, `key` represents a range that precedes `key_tmp`,
+ * meaning the requested data range is missing from the cache tree
+ * and must be retrieved from the backing_dev.
+ */
+ backing_req = get_pre_alloc_req(ctx);
+ if (!backing_req)
+ return SUBTREE_WALK_RET_NEED_REQ;
+
+ cache_miss_req_init(cache, backing_req, ctx->pcache_req, ctx->req_done, key->len, true);
+
+ list_add(&backing_req->node, ctx->submit_req_list);
+ ctx->req_done += key->len;
+ cache_key_cutfront(key, key->len);
+
+ return SUBTREE_WALK_RET_OK;
+}
+
+/*
+ * During cache_subtree_walk, this function manages a scenario where part of the
+ * requested data range overlaps with an existing cache node (`key_tmp`).
+ *
+ * |----------------| key_tmp (existing cached range)
+ * |===========| key (requested range, overlapping the tail of key_tmp)
+ */
+static int read_overlap_tail(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp,
+ struct pcache_cache_subtree_walk_ctx *ctx)
+{
+ struct pcache_cache *cache = ctx->cache_tree->cache;
+ struct pcache_backing_dev_req *backing_req;
+ u32 io_len;
+ int ret;
+
+ /*
+ * Calculate the length of the non-overlapping portion of `key`
+ * before `key_tmp`, representing the data missing in the cache.
+ */
+ io_len = cache_key_lstart(key_tmp) - cache_key_lstart(key);
+ if (io_len) {
+ backing_req = get_pre_alloc_req(ctx);
+ if (!backing_req)
+ return SUBTREE_WALK_RET_NEED_REQ;
+
+ cache_miss_req_init(cache, backing_req, ctx->pcache_req, ctx->req_done, io_len, true);
+
+ list_add(&backing_req->node, ctx->submit_req_list);
+ ctx->req_done += io_len;
+ cache_key_cutfront(key, io_len);
+ }
+
+ /*
+ * Handle the overlapping portion by calculating the length of
+ * the remaining data in `key` that coincides with `key_tmp`.
+ */
+ io_len = cache_key_lend(key) - cache_key_lstart(key_tmp);
+ if (cache_key_empty(key_tmp)) {
+ backing_req = get_pre_alloc_req(ctx);
+ if (!backing_req)
+ return SUBTREE_WALK_RET_NEED_REQ;
+
+ cache_miss_req_init(cache, backing_req, ctx->pcache_req, ctx->req_done, io_len, false);
+ submit_cache_miss_req(cache, backing_req);
+ } else {
+ ret = cache_copy_to_req_bio(ctx->cache_tree->cache, ctx->pcache_req, ctx->req_done,
+ io_len, &key_tmp->cache_pos, key_tmp->seg_gen);
+ if (ret) {
+ if (ret == -EINVAL) {
+ cache_key_delete(key_tmp);
+ return SUBTREE_WALK_RET_RESEARCH;
+ }
+
+ ctx->ret = ret;
+ return SUBTREE_WALK_RET_ERR;
+ }
+ }
+
+ ctx->req_done += io_len;
+ cache_key_cutfront(key, io_len);
+
+ return SUBTREE_WALK_RET_OK;
+}
+
+/*
+ * |----| key_tmp (existing cached range)
+ * |==========| key (requested range)
+ */
+static int read_overlap_contain(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp,
+ struct pcache_cache_subtree_walk_ctx *ctx)
+{
+ struct pcache_cache *cache = ctx->cache_tree->cache;
+ struct pcache_backing_dev_req *backing_req;
+ u32 io_len;
+ int ret;
+
+ /*
+ * Calculate the non-overlapping part of `key` before `key_tmp`
+ * to identify the missing data length.
+ */
+ io_len = cache_key_lstart(key_tmp) - cache_key_lstart(key);
+ if (io_len) {
+ backing_req = get_pre_alloc_req(ctx);
+ if (!backing_req)
+ return SUBTREE_WALK_RET_NEED_REQ;
+
+ cache_miss_req_init(cache, backing_req, ctx->pcache_req, ctx->req_done, io_len, true);
+
+ list_add(&backing_req->node, ctx->submit_req_list);
+
+ ctx->req_done += io_len;
+ cache_key_cutfront(key, io_len);
+ }
+
+ /*
+ * Handle the overlapping portion between `key` and `key_tmp`.
+ */
+ io_len = key_tmp->len;
+ if (cache_key_empty(key_tmp)) {
+ backing_req = get_pre_alloc_req(ctx);
+ if (!backing_req)
+ return SUBTREE_WALK_RET_NEED_REQ;
+
+ cache_miss_req_init(cache, backing_req, ctx->pcache_req, ctx->req_done, io_len, false);
+ submit_cache_miss_req(cache, backing_req);
+ } else {
+ ret = cache_copy_to_req_bio(ctx->cache_tree->cache, ctx->pcache_req, ctx->req_done,
+ io_len, &key_tmp->cache_pos, key_tmp->seg_gen);
+ if (ret) {
+ if (ret == -EINVAL) {
+ cache_key_delete(key_tmp);
+ return SUBTREE_WALK_RET_RESEARCH;
+ }
+
+ ctx->ret = ret;
+ return SUBTREE_WALK_RET_ERR;
+ }
+ }
+
+ ctx->req_done += io_len;
+ cache_key_cutfront(key, io_len);
+
+ return SUBTREE_WALK_RET_OK;
+}
+
+/*
+ * |-----------| key_tmp (existing cached range)
+ * |====| key (requested range, fully within key_tmp)
+ *
+ * If `key_tmp` contains valid cached data, this function copies the relevant
+ * portion to the request's bio. Otherwise, it sends a backing request to
+ * fetch the required data range.
+ */
+static int read_overlap_contained(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp,
+ struct pcache_cache_subtree_walk_ctx *ctx)
+{
+ struct pcache_cache *cache = ctx->cache_tree->cache;
+ struct pcache_backing_dev_req *backing_req;
+ struct pcache_cache_pos pos;
+ int ret;
+
+ /*
+ * Check if `key_tmp` is empty, indicating a miss. If so, initiate
+ * a backing request to fetch the required data for `key`.
+ */
+ if (cache_key_empty(key_tmp)) {
+ backing_req = get_pre_alloc_req(ctx);
+ if (!backing_req)
+ return SUBTREE_WALK_RET_NEED_REQ;
+
+ cache_miss_req_init(cache, backing_req, ctx->pcache_req, ctx->req_done, key->len, false);
+ submit_cache_miss_req(cache, backing_req);
+ } else {
+ cache_pos_copy(&pos, &key_tmp->cache_pos);
+ cache_pos_advance(&pos, cache_key_lstart(key) - cache_key_lstart(key_tmp));
+
+ ret = cache_copy_to_req_bio(ctx->cache_tree->cache, ctx->pcache_req, ctx->req_done,
+ key->len, &pos, key_tmp->seg_gen);
+ if (ret) {
+ if (ret == -EINVAL) {
+ cache_key_delete(key_tmp);
+ return SUBTREE_WALK_RET_RESEARCH;
+ }
+
+ ctx->ret = ret;
+ return SUBTREE_WALK_RET_ERR;
+ }
+ }
+
+ ctx->req_done += key->len;
+ cache_key_cutfront(key, key->len);
+
+ return SUBTREE_WALK_RET_OK;
+}
+
+/*
+ * |--------| key_tmp (existing cached range)
+ * |==========| key (requested range, overlapping the head of key_tmp)
+ */
+static int read_overlap_head(struct pcache_cache_key *key, struct pcache_cache_key *key_tmp,
+ struct pcache_cache_subtree_walk_ctx *ctx)
+{
+ struct pcache_cache *cache = ctx->cache_tree->cache;
+ struct pcache_backing_dev_req *backing_req;
+ struct pcache_cache_pos pos;
+ u32 io_len;
+ int ret;
+
+ io_len = cache_key_lend(key_tmp) - cache_key_lstart(key);
+
+ if (cache_key_empty(key_tmp)) {
+ backing_req = get_pre_alloc_req(ctx);
+ if (!backing_req)
+ return SUBTREE_WALK_RET_NEED_REQ;
+
+ cache_miss_req_init(cache, backing_req, ctx->pcache_req, ctx->req_done, io_len, false);
+ submit_cache_miss_req(cache, backing_req);
+ } else {
+ cache_pos_copy(&pos, &key_tmp->cache_pos);
+ cache_pos_advance(&pos, cache_key_lstart(key) - cache_key_lstart(key_tmp));
+
+ ret = cache_copy_to_req_bio(ctx->cache_tree->cache, ctx->pcache_req, ctx->req_done,
+ io_len, &pos, key_tmp->seg_gen);
+ if (ret) {
+ if (ret == -EINVAL) {
+ cache_key_delete(key_tmp);
+ return SUBTREE_WALK_RET_RESEARCH;
+ }
+
+ ctx->ret = ret;
+ return SUBTREE_WALK_RET_ERR;
+ }
+ }
+
+ ctx->req_done += io_len;
+ cache_key_cutfront(key, io_len);
+
+ return SUBTREE_WALK_RET_OK;
+}
+
+/**
+ * read_walk_finally - Finalizes the cache read tree walk by submitting any
+ * remaining backing requests
+ * @ctx: Context structure holding information about the cache,
+ * read request, and submission list
+ * @ret: the return value after this walk.
+ *
+ * This function is called at the end of the `cache_subtree_walk` during a
+ * cache read operation. It completes the walk by checking if any data
+ * requested by `key` was not found in the cache tree, and if so, it sends
+ * a backing request to retrieve that data. Then, it iterates through the
+ * submission list of backing requests created during the walk, removing
+ * each request from the list and submitting it.
+ *
+ * The scenario managed here includes:
+ * - Sending a backing request for the remaining length of `key` if it was
+ * not fulfilled by existing cache entries.
+ * - Iterating through `ctx->submit_req_list` to submit each backing request
+ * enqueued during the walk.
+ *
+ * This ensures all necessary backing requests for cache misses are submitted
+ * to the backing storage to retrieve any data that could not be found in
+ * the cache.
+ */
+static int read_walk_finally(struct pcache_cache_subtree_walk_ctx *ctx, int ret)
+{
+ struct pcache_cache *cache = ctx->cache_tree->cache;
+ struct pcache_backing_dev_req *backing_req, *next_req;
+ struct pcache_cache_key *key = ctx->key;
+
+ list_for_each_entry_safe(backing_req, next_req, ctx->submit_req_list, node) {
+ list_del_init(&backing_req->node);
+ submit_cache_miss_req(ctx->cache_tree->cache, backing_req);
+ }
+
+ if (ret != SUBTREE_WALK_RET_OK)
+ return ret;
+
+ if (key->len) {
+ backing_req = get_pre_alloc_req(ctx);
+ if (!backing_req)
+ return SUBTREE_WALK_RET_NEED_REQ;
+
+ cache_miss_req_init(cache, backing_req, ctx->pcache_req, ctx->req_done, key->len, true);
+ submit_cache_miss_req(cache, backing_req);
+ ctx->req_done += key->len;
+ }
+
+ return SUBTREE_WALK_RET_OK;
+}
+
+/*
+ * This function is used within `cache_subtree_walk` to determine whether the
+ * read operation has covered the requested data length. It compares the
+ * amount of data processed (`ctx->req_done`) with the total data length
+ * specified in the original request (`ctx->pcache_req->data_len`).
+ *
+ * If `req_done` meets or exceeds the required data length, the function
+ * returns `true`, indicating the walk is complete. Otherwise, it returns `false`,
+ * signaling that additional data processing is needed to fulfill the request.
+ */
+static bool read_walk_done(struct pcache_cache_subtree_walk_ctx *ctx)
+{
+ return (ctx->req_done >= ctx->pcache_req->data_len);
+}
+
+/**
+ * cache_read - Process a read request by traversing the cache tree
+ * @cache: Cache structure holding cache trees and related configurations
+ * @pcache_req: Request structure with information about the data to read
+ *
+ * This function attempts to fulfill a read request by traversing the cache tree(s)
+ * to locate cached data for the requested range. If parts of the data are missing
+ * in the cache, backing requests are generated to retrieve the required segments.
+ *
+ * The function operates by initializing a key for the requested data range and
+ * preparing a context (`walk_ctx`) to manage the cache tree traversal. The context
+ * includes pointers to functions (e.g., `read_before`, `read_overlap_tail`) that handle
+ * specific conditions encountered during the traversal. The `walk_finally` and `walk_done`
+ * functions manage the end stages of the traversal, while the `delete_key_list` and
+ * `submit_req_list` lists track any keys to be deleted or requests to be submitted.
+ *
+ * The function first calculates the requested range and checks if it fits within the
+ * current cache tree (based on the tree's size limits). It then locks the cache tree
+ * and performs a search to locate any matching keys. If there are outdated keys,
+ * these are deleted, and the search is restarted to ensure accurate data retrieval.
+ *
+ * If the requested range spans multiple cache trees, the function moves on to the
+ * next tree once the current range has been processed. This continues until the
+ * entire requested data length has been handled.
+ */
+static int cache_read(struct pcache_cache *cache, struct pcache_request *pcache_req)
+{
+ struct pcache_cache_key key_data = { .off = pcache_req->off, .len = pcache_req->data_len };
+ struct pcache_cache_subtree *cache_subtree;
+ struct pcache_cache_key *key_tmp = NULL, *key_next;
+ struct rb_node *prev_node = NULL;
+ struct pcache_cache_key *key = &key_data;
+ struct pcache_cache_subtree_walk_ctx walk_ctx = { 0 };
+ struct pcache_backing_dev_req *backing_req, *next_req;
+ LIST_HEAD(delete_key_list);
+ LIST_HEAD(submit_req_list);
+ int ret;
+
+ walk_ctx.cache_tree = &cache->req_key_tree;
+ walk_ctx.req_done = 0;
+ walk_ctx.pcache_req = pcache_req;
+ walk_ctx.before = read_before;
+ walk_ctx.overlap_tail = read_overlap_tail;
+ walk_ctx.overlap_head = read_overlap_head;
+ walk_ctx.overlap_contain = read_overlap_contain;
+ walk_ctx.overlap_contained = read_overlap_contained;
+ walk_ctx.walk_finally = read_walk_finally;
+ walk_ctx.walk_done = read_walk_done;
+ walk_ctx.delete_key_list = &delete_key_list;
+ walk_ctx.submit_req_list = &submit_req_list;
+
+next:
+ key->off = pcache_req->off + walk_ctx.req_done;
+ key->len = pcache_req->data_len - walk_ctx.req_done;
+ if (key->len > PCACHE_CACHE_SUBTREE_SIZE - (key->off & PCACHE_CACHE_SUBTREE_SIZE_MASK))
+ key->len = PCACHE_CACHE_SUBTREE_SIZE - (key->off & PCACHE_CACHE_SUBTREE_SIZE_MASK);
+
+ cache_subtree = get_subtree(&cache->req_key_tree, key->off);
+ spin_lock(&cache_subtree->tree_lock);
+search:
+ prev_node = cache_subtree_search(cache_subtree, key, NULL, NULL, &delete_key_list);
+ if (!list_empty(&delete_key_list)) {
+ list_for_each_entry_safe(key_tmp, key_next, &delete_key_list, list_node) {
+ list_del_init(&key_tmp->list_node);
+ cache_key_delete(key_tmp);
+ }
+ goto search;
+ }
+
+ walk_ctx.start_node = prev_node;
+ walk_ctx.key = key;
+
+ ret = cache_subtree_walk(&walk_ctx);
+ if (ret == SUBTREE_WALK_RET_RESEARCH)
+ goto search;
+ spin_unlock(&cache_subtree->tree_lock);
+
+ if (ret == SUBTREE_WALK_RET_ERR) {
+ ret = walk_ctx.ret;
+ goto out;
+ }
+
+ if (ret == SUBTREE_WALK_RET_NEED_REQ) {
+ walk_ctx.pre_alloc_req = cache_miss_req_alloc(cache, pcache_req, GFP_NOIO);
+ pcache_dev_debug(CACHE_TO_PCACHE(cache), "allocate pre_alloc_req with GFP_NOIO");
+ }
+
+ if (walk_ctx.req_done < pcache_req->data_len)
+ goto next;
+ ret = 0;
+out:
+ if (walk_ctx.pre_alloc_req)
+ cache_miss_req_free(walk_ctx.pre_alloc_req);
+
+ list_for_each_entry_safe(backing_req, next_req, &submit_req_list, node) {
+ list_del_init(&backing_req->node);
+ backing_dev_req_end(backing_req);
+ }
+
+ return ret;
+}
+
+static int cache_write(struct pcache_cache *cache, struct pcache_request *pcache_req)
+{
+ struct pcache_cache_subtree *cache_subtree;
+ struct pcache_cache_key *key;
+ u64 offset = pcache_req->off;
+ u32 length = pcache_req->data_len;
+ u32 io_done = 0;
+ int ret;
+
+ while (true) {
+ if (io_done >= length)
+ break;
+
+ key = cache_key_alloc(&cache->req_key_tree, GFP_NOIO);
+ key->off = offset + io_done;
+ key->len = length - io_done;
+ if (key->len > PCACHE_CACHE_SUBTREE_SIZE - (key->off & PCACHE_CACHE_SUBTREE_SIZE_MASK))
+ key->len = PCACHE_CACHE_SUBTREE_SIZE - (key->off & PCACHE_CACHE_SUBTREE_SIZE_MASK);
+
+ ret = cache_data_alloc(cache, key);
+ if (ret) {
+ cache_key_put(key);
+ goto err;
+ }
+
+ ret = cache_copy_from_req_bio(cache, key, pcache_req, io_done);
+ if (ret) {
+ cache_seg_put(key->cache_pos.cache_seg);
+ cache_key_put(key);
+ goto err;
+ }
+
+ cache_subtree = get_subtree(&cache->req_key_tree, key->off);
+ spin_lock(&cache_subtree->tree_lock);
+ cache_key_insert(&cache->req_key_tree, key, true);
+ ret = cache_key_append(cache, key, pcache_req->bio->bi_opf & REQ_FUA);
+ if (ret) {
+ cache_seg_put(key->cache_pos.cache_seg);
+ cache_key_delete(key);
+ goto unlock;
+ }
+
+ io_done += key->len;
+ spin_unlock(&cache_subtree->tree_lock);
+ }
+
+ return 0;
+unlock:
+ spin_unlock(&cache_subtree->tree_lock);
+err:
+ return ret;
+}
+
+/**
+ * cache_flush - Flush all ksets to persist any pending cache data
+ * @cache: Pointer to the cache structure
+ *
+ * This function iterates through all ksets associated with the provided `cache`
+ * and ensures that any data marked for persistence is written to media. For each
+ * kset, it acquires the kset lock, then invokes `cache_kset_close`, which handles
+ * the persistence logic for that kset.
+ *
+ * If `cache_kset_close` encounters an error, the function exits immediately with
+ * the respective error code, preventing the flush operation from proceeding to
+ * subsequent ksets.
+ */
+int cache_flush(struct pcache_cache *cache)
+{
+ struct pcache_cache_kset *kset;
+ u32 i, ret;
+
+ for (i = 0; i < cache->n_ksets; i++) {
+ kset = get_kset(cache, i);
+
+ spin_lock(&kset->kset_lock);
+ ret = cache_kset_close(cache, kset);
+ spin_unlock(&kset->kset_lock);
+
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+int pcache_cache_handle_req(struct pcache_cache *cache, struct pcache_request *pcache_req)
+{
+ struct bio *bio = pcache_req->bio;
+
+ if (unlikely(bio->bi_opf & REQ_PREFLUSH))
+ return cache_flush(cache);
+
+ if (bio_data_dir(bio) == READ)
+ return cache_read(cache, pcache_req);
+
+ return cache_write(cache, pcache_req);
+}
diff --git a/drivers/md/dm-pcache/cache_segment.c b/drivers/md/dm-pcache/cache_segment.c
new file mode 100644
index 000000000000..8f9534ae04c3
--- /dev/null
+++ b/drivers/md/dm-pcache/cache_segment.c
@@ -0,0 +1,293 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include "cache_dev.h"
+#include "cache.h"
+#include "backing_dev.h"
+#include "dm_pcache.h"
+
+static inline struct pcache_segment_info *get_seg_info_addr(struct pcache_cache_segment *cache_seg)
+{
+ struct pcache_segment_info *seg_info_addr;
+ u32 seg_id = cache_seg->segment.seg_id;
+ void *seg_addr;
+
+ seg_addr = CACHE_DEV_SEGMENT(cache_seg->cache->cache_dev, seg_id);
+ seg_info_addr = seg_addr + PCACHE_SEG_INFO_SIZE * cache_seg->info_index;
+
+ return seg_info_addr;
+}
+
+static void cache_seg_info_write(struct pcache_cache_segment *cache_seg)
+{
+ struct pcache_segment_info *seg_info_addr;
+ struct pcache_segment_info *seg_info = &cache_seg->cache_seg_info;
+
+ mutex_lock(&cache_seg->info_lock);
+ seg_info->header.seq++;
+ seg_info->header.crc = pcache_meta_crc(&seg_info->header, sizeof(struct pcache_segment_info));
+
+ seg_info_addr = get_seg_info_addr(cache_seg);
+ memcpy_flushcache(seg_info_addr, seg_info, sizeof(struct pcache_segment_info));
+ pmem_wmb();
+
+ cache_seg->info_index = (cache_seg->info_index + 1) % PCACHE_META_INDEX_MAX;
+ mutex_unlock(&cache_seg->info_lock);
+}
+
+static int cache_seg_info_load(struct pcache_cache_segment *cache_seg)
+{
+ struct pcache_segment_info *cache_seg_info_addr_base, *cache_seg_info_addr;
+ struct pcache_cache_dev *cache_dev = cache_seg->cache->cache_dev;
+ struct dm_pcache *pcache = CACHE_DEV_TO_PCACHE(cache_dev);
+ u32 seg_id = cache_seg->segment.seg_id;
+ int ret = 0;
+
+ cache_seg_info_addr_base = CACHE_DEV_SEGMENT(cache_dev, seg_id);
+
+ mutex_lock(&cache_seg->info_lock);
+ cache_seg_info_addr = pcache_meta_find_latest(&cache_seg_info_addr_base->header,
+ sizeof(struct pcache_segment_info),
+ PCACHE_SEG_INFO_SIZE,
+ &cache_seg->cache_seg_info);
+ if (IS_ERR(cache_seg_info_addr)) {
+ ret = PTR_ERR(cache_seg_info_addr);
+ goto out;
+ } else if (!cache_seg_info_addr) {
+ ret = -EIO;
+ goto out;
+ }
+ cache_seg->info_index = cache_seg_info_addr - cache_seg_info_addr_base;
+out:
+ mutex_unlock(&cache_seg->info_lock);
+
+ if (ret)
+ pcache_dev_err(pcache, "can't read segment info of segment: %u, ret: %d\n",
+ cache_seg->segment.seg_id, ret);
+ return ret;
+}
+
+static int cache_seg_ctrl_load(struct pcache_cache_segment *cache_seg)
+{
+ struct pcache_cache_seg_ctrl *cache_seg_ctrl = cache_seg->cache_seg_ctrl;
+ struct pcache_cache_seg_gen cache_seg_gen, *cache_seg_gen_addr;
+ int ret = 0;
+
+ mutex_lock(&cache_seg->ctrl_lock);
+ cache_seg_gen_addr = pcache_meta_find_latest(&cache_seg_ctrl->gen->header,
+ sizeof(struct pcache_cache_seg_gen),
+ sizeof(struct pcache_cache_seg_gen),
+ &cache_seg_gen);
+ if (IS_ERR(cache_seg_gen_addr)) {
+ ret = PTR_ERR(cache_seg_gen_addr);
+ goto out;
+ }
+
+ if (!cache_seg_gen_addr) {
+ cache_seg->gen = 0;
+ cache_seg->gen_seq = 0;
+ cache_seg->gen_index = 0;
+ goto out;
+ }
+
+ cache_seg->gen = cache_seg_gen.gen;
+ cache_seg->gen_seq = cache_seg_gen.header.seq;
+ cache_seg->gen_index = (cache_seg_gen_addr - cache_seg_ctrl->gen);
+out:
+ mutex_unlock(&cache_seg->ctrl_lock);
+
+ return ret;
+}
+
+static inline struct pcache_cache_seg_gen *get_cache_seg_gen_addr(struct pcache_cache_segment *cache_seg)
+{
+ struct pcache_cache_seg_ctrl *cache_seg_ctrl = cache_seg->cache_seg_ctrl;
+
+ return (cache_seg_ctrl->gen + cache_seg->gen_index);
+}
+
+static void cache_seg_ctrl_write(struct pcache_cache_segment *cache_seg)
+{
+ struct pcache_cache_seg_gen cache_seg_gen;
+
+ mutex_lock(&cache_seg->ctrl_lock);
+ cache_seg_gen.gen = cache_seg->gen;
+ cache_seg_gen.header.seq = ++cache_seg->gen_seq;
+ cache_seg_gen.header.crc = pcache_meta_crc(&cache_seg_gen.header,
+ sizeof(struct pcache_cache_seg_gen));
+
+ memcpy_flushcache(get_cache_seg_gen_addr(cache_seg), &cache_seg_gen, sizeof(struct pcache_cache_seg_gen));
+ pmem_wmb();
+
+ cache_seg->gen_index = (cache_seg->gen_index + 1) % PCACHE_META_INDEX_MAX;
+ mutex_unlock(&cache_seg->ctrl_lock);
+}
+
+static void cache_seg_ctrl_init(struct pcache_cache_segment *cache_seg)
+{
+ cache_seg->gen = 0;
+ cache_seg->gen_seq = 0;
+ cache_seg->gen_index = 0;
+ cache_seg_ctrl_write(cache_seg);
+}
+
+static int cache_seg_meta_load(struct pcache_cache_segment *cache_seg)
+{
+ int ret;
+
+ ret = cache_seg_info_load(cache_seg);
+ if (ret)
+ goto err;
+
+ ret = cache_seg_ctrl_load(cache_seg);
+ if (ret)
+ goto err;
+
+ return 0;
+err:
+ return ret;
+}
+
+/**
+ * cache_seg_set_next_seg - Sets the ID of the next segment
+ * @cache_seg: Pointer to the cache segment structure.
+ * @seg_id: The segment ID to set as the next segment.
+ *
+ * A pcache_cache allocates multiple cache segments, which are linked together
+ * through next_seg. When loading a pcache_cache, the first cache segment can
+ * be found using cache->seg_id, which allows access to all the cache segments.
+ */
+void cache_seg_set_next_seg(struct pcache_cache_segment *cache_seg, u32 seg_id)
+{
+ cache_seg->cache_seg_info.flags |= PCACHE_SEG_INFO_FLAGS_HAS_NEXT;
+ cache_seg->cache_seg_info.next_seg = seg_id;
+ cache_seg_info_write(cache_seg);
+}
+
+int cache_seg_init(struct pcache_cache *cache, u32 seg_id, u32 cache_seg_id,
+ bool new_cache)
+{
+ struct pcache_cache_dev *cache_dev = cache->cache_dev;
+ struct pcache_cache_segment *cache_seg = &cache->segments[cache_seg_id];
+ struct pcache_segment_init_options seg_options = { 0 };
+ struct pcache_segment *segment = &cache_seg->segment;
+ int ret;
+
+ cache_seg->cache = cache;
+ cache_seg->cache_seg_id = cache_seg_id;
+ spin_lock_init(&cache_seg->gen_lock);
+ atomic_set(&cache_seg->refs, 0);
+ mutex_init(&cache_seg->info_lock);
+ mutex_init(&cache_seg->ctrl_lock);
+
+ /* init pcache_segment */
+ seg_options.type = PCACHE_SEGMENT_TYPE_CACHE_DATA;
+ seg_options.data_off = PCACHE_CACHE_SEG_CTRL_OFF + PCACHE_CACHE_SEG_CTRL_SIZE;
+ seg_options.seg_id = seg_id;
+ seg_options.seg_info = &cache_seg->cache_seg_info;
+ pcache_segment_init(cache_dev, segment, &seg_options);
+
+ cache_seg->cache_seg_ctrl = CACHE_DEV_SEGMENT(cache_dev, seg_id) + PCACHE_CACHE_SEG_CTRL_OFF;
+
+ if (new_cache) {
+ cache_dev_zero_range(cache_dev, CACHE_DEV_SEGMENT(cache_dev, seg_id),
+ PCACHE_SEG_INFO_SIZE * PCACHE_META_INDEX_MAX +
+ PCACHE_CACHE_SEG_CTRL_SIZE);
+
+ cache_seg_ctrl_init(cache_seg);
+
+ cache_seg->info_index = 0;
+ cache_seg_info_write(cache_seg);
+
+ /* clear outdated kset in segment */
+ memcpy_flushcache(segment->data, &pcache_empty_kset, sizeof(struct pcache_cache_kset_onmedia));
+ pmem_wmb();
+ } else {
+ ret = cache_seg_meta_load(cache_seg);
+ if (ret)
+ goto err;
+ }
+
+ return 0;
+err:
+ return ret;
+}
+
+/**
+ * get_cache_segment - Retrieves a free cache segment from the cache.
+ * @cache: Pointer to the cache structure.
+ *
+ * This function attempts to find a free cache segment that can be used.
+ * It locks the segment map and checks for the next available segment ID.
+ * If a free segment is found, it initializes it and returns a pointer to the
+ * cache segment structure. Returns NULL if no segments are available.
+ */
+struct pcache_cache_segment *get_cache_segment(struct pcache_cache *cache)
+{
+ struct pcache_cache_segment *cache_seg;
+ u32 seg_id;
+
+ spin_lock(&cache->seg_map_lock);
+again:
+ seg_id = find_next_zero_bit(cache->seg_map, cache->n_segs, cache->last_cache_seg);
+ if (seg_id == cache->n_segs) {
+ /* reset the hint of ->last_cache_seg and retry */
+ if (cache->last_cache_seg) {
+ cache->last_cache_seg = 0;
+ goto again;
+ }
+ cache->cache_full = true;
+ spin_unlock(&cache->seg_map_lock);
+ return NULL;
+ }
+
+ /*
+ * found an available cache_seg, mark it used in seg_map
+ * and update the search hint ->last_cache_seg
+ */
+ __set_bit(seg_id, cache->seg_map);
+ cache->last_cache_seg = seg_id;
+ spin_unlock(&cache->seg_map_lock);
+
+ cache_seg = &cache->segments[seg_id];
+ cache_seg->cache_seg_id = seg_id;
+
+ return cache_seg;
+}
+
+static void cache_seg_gen_increase(struct pcache_cache_segment *cache_seg)
+{
+ spin_lock(&cache_seg->gen_lock);
+ cache_seg->gen++;
+ spin_unlock(&cache_seg->gen_lock);
+
+ cache_seg_ctrl_write(cache_seg);
+}
+
+void cache_seg_get(struct pcache_cache_segment *cache_seg)
+{
+ atomic_inc(&cache_seg->refs);
+}
+
+static void cache_seg_invalidate(struct pcache_cache_segment *cache_seg)
+{
+ struct pcache_cache *cache;
+
+ cache = cache_seg->cache;
+ cache_seg_gen_increase(cache_seg);
+
+ spin_lock(&cache->seg_map_lock);
+ if (cache->cache_full)
+ cache->cache_full = false;
+ __clear_bit(cache_seg->cache_seg_id, cache->seg_map);
+ spin_unlock(&cache->seg_map_lock);
+
+ pcache_defer_reqs_kick(CACHE_TO_PCACHE(cache));
+ /* clean_work will clean the bad key in key_tree*/
+ queue_work(cache_get_wq(cache), &cache->clean_work);
+}
+
+void cache_seg_put(struct pcache_cache_segment *cache_seg)
+{
+ if (atomic_dec_and_test(&cache_seg->refs))
+ cache_seg_invalidate(cache_seg);
+}
diff --git a/drivers/md/dm-pcache/cache_writeback.c b/drivers/md/dm-pcache/cache_writeback.c
new file mode 100644
index 000000000000..e92fc4568074
--- /dev/null
+++ b/drivers/md/dm-pcache/cache_writeback.c
@@ -0,0 +1,258 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/bio.h>
+
+#include "cache.h"
+#include "backing_dev.h"
+#include "cache_dev.h"
+#include "dm_pcache.h"
+
+static void writeback_ctx_end(struct pcache_cache *cache, int ret)
+{
+ if (ret && !cache->writeback_ctx.ret) {
+ pcache_dev_err(CACHE_TO_PCACHE(cache), "writeback error: %d", ret);
+ cache->writeback_ctx.ret = ret;
+ }
+
+ if (!atomic_dec_and_test(&cache->writeback_ctx.pending))
+ return;
+
+ if (!cache->writeback_ctx.ret) {
+ backing_dev_flush(cache->backing_dev);
+
+ mutex_lock(&cache->dirty_tail_lock);
+ cache_pos_advance(&cache->dirty_tail, cache->writeback_ctx.advance);
+ cache_encode_dirty_tail(cache);
+ mutex_unlock(&cache->dirty_tail_lock);
+ }
+ queue_delayed_work(cache_get_wq(cache), &cache->writeback_work, 0);
+}
+
+static void writeback_end_req(struct pcache_backing_dev_req *backing_req, int ret)
+{
+ struct pcache_cache *cache = backing_req->priv_data;
+
+ mutex_lock(&cache->writeback_lock);
+ writeback_ctx_end(cache, ret);
+ mutex_unlock(&cache->writeback_lock);
+}
+
+static inline bool is_cache_clean(struct pcache_cache *cache, struct pcache_cache_pos *dirty_tail)
+{
+ struct dm_pcache *pcache = CACHE_TO_PCACHE(cache);
+ struct pcache_cache_kset_onmedia *kset_onmedia;
+ u32 to_copy;
+ void *addr;
+ int ret;
+
+ addr = cache_pos_addr(dirty_tail);
+ kset_onmedia = (struct pcache_cache_kset_onmedia *)cache->wb_kset_onmedia_buf;
+
+ to_copy = min(PCACHE_KSET_ONMEDIA_SIZE_MAX, PCACHE_SEG_SIZE - dirty_tail->seg_off);
+ ret = copy_mc_to_kernel(kset_onmedia, addr, to_copy);
+ if (ret) {
+ pcache_dev_err(pcache, "error to read kset: %d", ret);
+ return true;
+ }
+
+ /* Check if the magic number matches the expected value */
+ if (kset_onmedia->magic != PCACHE_KSET_MAGIC) {
+ pcache_dev_debug(pcache, "dirty_tail: %u:%u magic: %llx, not expected: %llx\n",
+ dirty_tail->cache_seg->cache_seg_id, dirty_tail->seg_off,
+ kset_onmedia->magic, PCACHE_KSET_MAGIC);
+ return true;
+ }
+
+ /* Verify the CRC checksum for data integrity */
+ if (kset_onmedia->crc != cache_kset_crc(kset_onmedia)) {
+ pcache_dev_debug(pcache, "dirty_tail: %u:%u crc: %x, not expected: %x\n",
+ dirty_tail->cache_seg->cache_seg_id, dirty_tail->seg_off,
+ cache_kset_crc(kset_onmedia), kset_onmedia->crc);
+ return true;
+ }
+
+ return false;
+}
+
+void cache_writeback_exit(struct pcache_cache *cache)
+{
+ cancel_delayed_work_sync(&cache->writeback_work);
+ backing_dev_flush(cache->backing_dev);
+ cache_tree_exit(&cache->writeback_key_tree);
+}
+
+int cache_writeback_init(struct pcache_cache *cache)
+{
+ int ret;
+
+ ret = cache_tree_init(cache, &cache->writeback_key_tree, 1);
+ if (ret)
+ goto err;
+
+ atomic_set(&cache->writeback_ctx.pending, 0);
+
+ /* Queue delayed work to start writeback handling */
+ queue_delayed_work(cache_get_wq(cache), &cache->writeback_work, 0);
+
+ return 0;
+err:
+ return ret;
+}
+
+static void cache_key_writeback(struct pcache_cache *cache, struct pcache_cache_key *key)
+{
+ struct pcache_backing_dev_req *writeback_req;
+ struct pcache_backing_dev_req_opts writeback_req_opts = { 0 };
+ struct pcache_cache_pos *pos;
+ void *addr;
+ u32 seg_remain;
+ u64 off;
+
+ if (cache_key_clean(key))
+ return;
+
+ pos = &key->cache_pos;
+
+ seg_remain = cache_seg_remain(pos);
+ BUG_ON(seg_remain < key->len);
+
+ addr = cache_pos_addr(pos);
+ off = key->off;
+
+ writeback_req_opts.type = BACKING_DEV_REQ_TYPE_KMEM;
+ writeback_req_opts.gfp_mask = GFP_NOIO;
+ writeback_req_opts.end_fn = writeback_end_req;
+ writeback_req_opts.priv_data = cache;
+
+ writeback_req_opts.kmem.data = addr;
+ writeback_req_opts.kmem.opf = REQ_OP_WRITE;
+ writeback_req_opts.kmem.len = key->len;
+ writeback_req_opts.kmem.backing_off = off;
+
+ writeback_req = backing_dev_req_create(cache->backing_dev, &writeback_req_opts);
+
+ atomic_inc(&cache->writeback_ctx.pending);
+ backing_dev_req_submit(writeback_req, true);
+}
+
+static void cache_wb_tree_writeback(struct pcache_cache *cache, u32 advance)
+{
+ struct pcache_cache_tree *cache_tree = &cache->writeback_key_tree;
+ struct pcache_cache_subtree *cache_subtree;
+ struct rb_node *node;
+ struct pcache_cache_key *key;
+ u32 i;
+
+ cache->writeback_ctx.ret = 0;
+ cache->writeback_ctx.advance = advance;
+ atomic_set(&cache->writeback_ctx.pending, 1);
+
+ for (i = 0; i < cache_tree->n_subtrees; i++) {
+ cache_subtree = &cache_tree->subtrees[i];
+
+ node = rb_first(&cache_subtree->root);
+ while (node) {
+ key = CACHE_KEY(node);
+ node = rb_next(node);
+
+ cache_key_writeback(cache, key);
+ cache_key_delete(key);
+ }
+ }
+ writeback_ctx_end(cache, 0);
+}
+
+static int cache_kset_insert_tree(struct pcache_cache *cache, struct pcache_cache_kset_onmedia *kset_onmedia)
+{
+ struct pcache_cache_key_onmedia *key_onmedia;
+ struct pcache_cache_subtree *cache_subtree;
+ struct pcache_cache_key *key;
+ int ret;
+ u32 i;
+
+ /* Iterate through all keys in the kset and write each back to storage */
+ for (i = 0; i < kset_onmedia->key_num; i++) {
+ key_onmedia = &kset_onmedia->data[i];
+
+ key = cache_key_alloc(&cache->writeback_key_tree, GFP_NOIO);
+ ret = cache_key_decode(cache, key_onmedia, key);
+ if (ret) {
+ cache_key_put(key);
+ goto clear_tree;
+ }
+
+ cache_subtree = get_subtree(&cache->writeback_key_tree, key->off);
+ spin_lock(&cache_subtree->tree_lock);
+ cache_key_insert(&cache->writeback_key_tree, key, true);
+ spin_unlock(&cache_subtree->tree_lock);
+ }
+
+ return 0;
+clear_tree:
+ cache_tree_clear(&cache->writeback_key_tree);
+ return ret;
+}
+
+static void last_kset_writeback(struct pcache_cache *cache,
+ struct pcache_cache_kset_onmedia *last_kset_onmedia)
+{
+ struct dm_pcache *pcache = CACHE_TO_PCACHE(cache);
+ struct pcache_cache_segment *next_seg;
+
+ pcache_dev_debug(pcache, "last kset, next: %u\n", last_kset_onmedia->next_cache_seg_id);
+
+ next_seg = &cache->segments[last_kset_onmedia->next_cache_seg_id];
+
+ mutex_lock(&cache->dirty_tail_lock);
+ cache->dirty_tail.cache_seg = next_seg;
+ cache->dirty_tail.seg_off = 0;
+ cache_encode_dirty_tail(cache);
+ mutex_unlock(&cache->dirty_tail_lock);
+}
+
+void cache_writeback_fn(struct work_struct *work)
+{
+ struct pcache_cache *cache = container_of(work, struct pcache_cache, writeback_work.work);
+ struct dm_pcache *pcache = CACHE_TO_PCACHE(cache);
+ struct pcache_cache_pos dirty_tail;
+ struct pcache_cache_kset_onmedia *kset_onmedia;
+ u32 delay;
+ int ret;
+
+ mutex_lock(&cache->writeback_lock);
+ if (atomic_read(&cache->writeback_ctx.pending))
+ goto unlock;
+
+ if (pcache_is_stopping(pcache))
+ goto unlock;
+
+ kset_onmedia = (struct pcache_cache_kset_onmedia *)cache->wb_kset_onmedia_buf;
+
+ mutex_lock(&cache->dirty_tail_lock);
+ cache_pos_copy(&dirty_tail, &cache->dirty_tail);
+ mutex_unlock(&cache->dirty_tail_lock);
+
+ if (is_cache_clean(cache, &dirty_tail)) {
+ delay = PCACHE_CACHE_WRITEBACK_INTERVAL;
+ goto queue_work;
+ }
+
+ if (kset_onmedia->flags & PCACHE_KSET_FLAGS_LAST) {
+ last_kset_writeback(cache, kset_onmedia);
+ delay = 0;
+ goto queue_work;
+ }
+
+ ret = cache_kset_insert_tree(cache, kset_onmedia);
+ if (ret) {
+ delay = PCACHE_CACHE_WRITEBACK_INTERVAL;
+ goto queue_work;
+ }
+
+ cache_wb_tree_writeback(cache, get_kset_onmedia_size(kset_onmedia));
+ delay = 0;
+queue_work:
+ queue_delayed_work(cache_get_wq(cache), &cache->writeback_work, delay);
+unlock:
+ mutex_unlock(&cache->writeback_lock);
+}
diff --git a/drivers/md/dm-pcache/dm_pcache.c b/drivers/md/dm-pcache/dm_pcache.c
new file mode 100644
index 000000000000..e5f5936fa6f0
--- /dev/null
+++ b/drivers/md/dm-pcache/dm_pcache.c
@@ -0,0 +1,497 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#include <linux/module.h>
+#include <linux/blkdev.h>
+#include <linux/bio.h>
+
+#include "../dm-core.h"
+#include "cache_dev.h"
+#include "backing_dev.h"
+#include "cache.h"
+#include "dm_pcache.h"
+
+void pcache_defer_reqs_kick(struct dm_pcache *pcache)
+{
+ struct pcache_cache *cache = &pcache->cache;
+
+ spin_lock(&cache->seg_map_lock);
+ if (!cache->cache_full)
+ queue_work(pcache->task_wq, &pcache->defered_req_work);
+ spin_unlock(&cache->seg_map_lock);
+}
+
+static void defer_req(struct pcache_request *pcache_req)
+{
+ struct dm_pcache *pcache = pcache_req->pcache;
+
+ BUG_ON(!list_empty(&pcache_req->list_node));
+
+ spin_lock(&pcache->defered_req_list_lock);
+ list_add(&pcache_req->list_node, &pcache->defered_req_list);
+ pcache_defer_reqs_kick(pcache);
+ spin_unlock(&pcache->defered_req_list_lock);
+}
+
+static void defered_req_fn(struct work_struct *work)
+{
+ struct dm_pcache *pcache = container_of(work, struct dm_pcache, defered_req_work);
+ struct pcache_request *pcache_req;
+ LIST_HEAD(tmp_list);
+ int ret;
+
+ if (pcache_is_stopping(pcache))
+ return;
+
+ spin_lock(&pcache->defered_req_list_lock);
+ list_splice_init(&pcache->defered_req_list, &tmp_list);
+ spin_unlock(&pcache->defered_req_list_lock);
+
+ while (!list_empty(&tmp_list)) {
+ pcache_req = list_first_entry(&tmp_list,
+ struct pcache_request, list_node);
+ list_del_init(&pcache_req->list_node);
+ pcache_req->ret = 0;
+ ret = pcache_cache_handle_req(&pcache->cache, pcache_req);
+ if (ret == -EBUSY)
+ defer_req(pcache_req);
+ else
+ pcache_req_put(pcache_req, ret);
+ }
+}
+
+void pcache_req_get(struct pcache_request *pcache_req)
+{
+ kref_get(&pcache_req->ref);
+}
+
+static void end_req(struct kref *ref)
+{
+ struct pcache_request *pcache_req = container_of(ref, struct pcache_request, ref);
+ struct dm_pcache *pcache = pcache_req->pcache;
+ struct bio *bio = pcache_req->bio;
+ int ret = pcache_req->ret;
+
+ if (ret == -EBUSY) {
+ pcache_req_get(pcache_req);
+ defer_req(pcache_req);
+ } else {
+ bio->bi_status = errno_to_blk_status(ret);
+ bio_endio(bio);
+
+ if (atomic_dec_and_test(&pcache->inflight_reqs))
+ wake_up(&pcache->inflight_wq);
+ }
+}
+
+void pcache_req_put(struct pcache_request *pcache_req, int ret)
+{
+ /* Set the return status if it is not already set */
+ if (ret && !pcache_req->ret)
+ pcache_req->ret = ret;
+
+ kref_put(&pcache_req->ref, end_req);
+}
+
+static bool at_least_one_arg(struct dm_arg_set *as, char **error)
+{
+ if (!as->argc) {
+ *error = "Insufficient args";
+ return false;
+ }
+
+ return true;
+}
+
+static int parse_cache_dev(struct dm_pcache *pcache, struct dm_arg_set *as,
+ char **error)
+{
+ int ret;
+
+ if (!at_least_one_arg(as, error))
+ return -EINVAL;
+ ret = dm_get_device(pcache->ti, dm_shift_arg(as),
+ BLK_OPEN_READ | BLK_OPEN_WRITE,
+ &pcache->cache_dev.dm_dev);
+ if (ret) {
+ *error = "Error opening cache device";
+ return ret;
+ }
+
+ return 0;
+}
+
+static int parse_backing_dev(struct dm_pcache *pcache, struct dm_arg_set *as,
+ char **error)
+{
+ int ret;
+
+ if (!at_least_one_arg(as, error))
+ return -EINVAL;
+
+ ret = dm_get_device(pcache->ti, dm_shift_arg(as),
+ BLK_OPEN_READ | BLK_OPEN_WRITE,
+ &pcache->backing_dev.dm_dev);
+ if (ret) {
+ *error = "Error opening backing device";
+ return ret;
+ }
+
+ return 0;
+}
+
+static void pcache_init_opts(struct pcache_cache_options *opts)
+{
+ opts->cache_mode = PCACHE_CACHE_MODE_WRITEBACK;
+ opts->data_crc = false;
+}
+
+static int parse_cache_opts(struct dm_pcache *pcache, struct dm_arg_set *as,
+ char **error)
+{
+ struct pcache_cache_options *opts = &pcache->opts;
+ static const struct dm_arg _args[] = {
+ {0, 4, "Invalid number of cache option arguments"},
+ };
+ unsigned int argc;
+ const char *arg;
+ int ret;
+
+ pcache_init_opts(opts);
+ if (!as->argc)
+ return 0;
+
+ ret = dm_read_arg_group(_args, as, &argc, error);
+ if (ret)
+ return -EINVAL;
+
+ while (argc) {
+ arg = dm_shift_arg(as);
+ argc--;
+
+ if (!strcmp(arg, "cache_mode")) {
+ arg = dm_shift_arg(as);
+ if (!strcmp(arg, "writeback")) {
+ opts->cache_mode = PCACHE_CACHE_MODE_WRITEBACK;
+ } else {
+ *error = "Invalid cache mode parameter";
+ return -EINVAL;
+ }
+ argc--;
+ } else if (!strcmp(arg, "data_crc")) {
+ arg = dm_shift_arg(as);
+ if (!strcmp(arg, "true")) {
+ opts->data_crc = true;
+ } else if (!strcmp(arg, "false")) {
+ opts->data_crc = false;
+ } else {
+ *error = "Invalid data crc parameter";
+ return -EINVAL;
+ }
+ argc--;
+ } else {
+ *error = "Unrecognised cache option requested";
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
+static int pcache_start(struct dm_pcache *pcache, char **error)
+{
+ int ret;
+
+ ret = cache_dev_start(pcache);
+ if (ret) {
+ *error = "Failed to start cache dev";
+ return ret;
+ }
+
+ ret = backing_dev_start(pcache);
+ if (ret) {
+ *error = "Failed to start backing dev";
+ goto stop_cache;
+ }
+
+ ret = pcache_cache_start(pcache);
+ if (ret) {
+ *error = "Failed to start pcache";
+ goto stop_backing;
+ }
+
+ return 0;
+stop_backing:
+ backing_dev_stop(pcache);
+stop_cache:
+ cache_dev_stop(pcache);
+
+ return ret;
+}
+
+static void pcache_destroy_args(struct dm_pcache *pcache)
+{
+ if (pcache->cache_dev.dm_dev)
+ dm_put_device(pcache->ti, pcache->cache_dev.dm_dev);
+ if (pcache->backing_dev.dm_dev)
+ dm_put_device(pcache->ti, pcache->backing_dev.dm_dev);
+}
+
+static int pcache_parse_args(struct dm_pcache *pcache, unsigned int argc, char **argv,
+ char **error)
+{
+ struct dm_arg_set as;
+ int ret;
+
+ as.argc = argc;
+ as.argv = argv;
+
+ /*
+ * Parse cache device
+ */
+ ret = parse_cache_dev(pcache, &as, error);
+ if (ret)
+ return ret;
+ /*
+ * Parse backing device
+ */
+ ret = parse_backing_dev(pcache, &as, error);
+ if (ret)
+ goto out;
+ /*
+ * Parse optional arguments
+ */
+ ret = parse_cache_opts(pcache, &as, error);
+ if (ret)
+ goto out;
+
+ return 0;
+out:
+ pcache_destroy_args(pcache);
+ return ret;
+}
+
+static int dm_pcache_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+ struct mapped_device *md = ti->table->md;
+ struct dm_pcache *pcache;
+ int ret;
+
+ if (md->map) {
+ ti->error = "Don't support table loading for live md";
+ return -EOPNOTSUPP;
+ }
+
+ /* Allocate memory for the cache structure */
+ pcache = kzalloc(sizeof(struct dm_pcache), GFP_KERNEL);
+ if (!pcache)
+ return -ENOMEM;
+
+ pcache->task_wq = alloc_workqueue("pcache-%s-wq", WQ_UNBOUND | WQ_MEM_RECLAIM,
+ 0, md->name);
+ if (!pcache->task_wq) {
+ ret = -ENOMEM;
+ goto free_pcache;
+ }
+
+ spin_lock_init(&pcache->defered_req_list_lock);
+ INIT_LIST_HEAD(&pcache->defered_req_list);
+ INIT_WORK(&pcache->defered_req_work, defered_req_fn);
+ pcache->ti = ti;
+
+ ret = pcache_parse_args(pcache, argc, argv, &ti->error);
+ if (ret)
+ goto destroy_wq;
+
+ ret = pcache_start(pcache, &ti->error);
+ if (ret)
+ goto destroy_args;
+
+ ti->num_flush_bios = 1;
+ ti->flush_supported = true;
+ ti->per_io_data_size = sizeof(struct pcache_request);
+ ti->private = pcache;
+ atomic_set(&pcache->inflight_reqs, 0);
+ atomic_set(&pcache->state, PCACHE_STATE_RUNNING);
+ init_waitqueue_head(&pcache->inflight_wq);
+
+ return 0;
+destroy_args:
+ pcache_destroy_args(pcache);
+destroy_wq:
+ destroy_workqueue(pcache->task_wq);
+free_pcache:
+ kfree(pcache);
+
+ return ret;
+}
+
+static void defer_req_stop(struct dm_pcache *pcache)
+{
+ struct pcache_request *pcache_req;
+ LIST_HEAD(tmp_list);
+
+ flush_work(&pcache->defered_req_work);
+
+ spin_lock(&pcache->defered_req_list_lock);
+ list_splice_init(&pcache->defered_req_list, &tmp_list);
+ spin_unlock(&pcache->defered_req_list_lock);
+
+ while (!list_empty(&tmp_list)) {
+ pcache_req = list_first_entry(&tmp_list,
+ struct pcache_request, list_node);
+ list_del_init(&pcache_req->list_node);
+ pcache_req_put(pcache_req, -EIO);
+ }
+}
+
+static void dm_pcache_dtr(struct dm_target *ti)
+{
+ struct dm_pcache *pcache;
+
+ pcache = ti->private;
+ atomic_set(&pcache->state, PCACHE_STATE_STOPPING);
+ defer_req_stop(pcache);
+
+ wait_event(pcache->inflight_wq,
+ atomic_read(&pcache->inflight_reqs) == 0);
+
+ pcache_cache_stop(pcache);
+ backing_dev_stop(pcache);
+ cache_dev_stop(pcache);
+
+ pcache_destroy_args(pcache);
+ drain_workqueue(pcache->task_wq);
+ destroy_workqueue(pcache->task_wq);
+
+ kfree(pcache);
+}
+
+static int dm_pcache_map_bio(struct dm_target *ti, struct bio *bio)
+{
+ struct pcache_request *pcache_req = dm_per_bio_data(bio, sizeof(struct pcache_request));
+ struct dm_pcache *pcache = ti->private;
+ int ret;
+
+ pcache_req->pcache = pcache;
+ kref_init(&pcache_req->ref);
+ pcache_req->ret = 0;
+ pcache_req->bio = bio;
+ pcache_req->off = (u64)bio->bi_iter.bi_sector << SECTOR_SHIFT;
+ pcache_req->data_len = bio->bi_iter.bi_size;
+ INIT_LIST_HEAD(&pcache_req->list_node);
+ atomic_inc(&pcache->inflight_reqs);
+
+ ret = pcache_cache_handle_req(&pcache->cache, pcache_req);
+ if (ret == -EBUSY)
+ defer_req(pcache_req);
+ else
+ pcache_req_put(pcache_req, ret);
+
+ return DM_MAPIO_SUBMITTED;
+}
+
+static void dm_pcache_status(struct dm_target *ti, status_type_t type,
+ unsigned int status_flags, char *result,
+ unsigned int maxlen)
+{
+ struct dm_pcache *pcache = ti->private;
+ struct pcache_cache_dev *cache_dev = &pcache->cache_dev;
+ struct pcache_backing_dev *backing_dev = &pcache->backing_dev;
+ struct pcache_cache *cache = &pcache->cache;
+ unsigned int sz = 0;
+
+ switch (type) {
+ case STATUSTYPE_INFO:
+ DMEMIT("%x %u %u %u %u %x %u:%u %u:%u %u:%u",
+ cache_dev->sb_flags,
+ cache_dev->seg_num,
+ cache->n_segs,
+ bitmap_weight(cache->seg_map, cache->n_segs),
+ pcache_cache_get_gc_percent(cache),
+ cache->cache_info.flags,
+ cache->key_head.cache_seg->cache_seg_id,
+ cache->key_head.seg_off,
+ cache->dirty_tail.cache_seg->cache_seg_id,
+ cache->dirty_tail.seg_off,
+ cache->key_tail.cache_seg->cache_seg_id,
+ cache->key_tail.seg_off);
+ break;
+ case STATUSTYPE_TABLE:
+ DMEMIT("%s %s 4 cache_mode writeback crc %s",
+ cache_dev->dm_dev->name,
+ backing_dev->dm_dev->name,
+ cache_data_crc_on(cache) ? "true" : "false");
+ break;
+ case STATUSTYPE_IMA:
+ *result = '\0';
+ break;
+ }
+}
+
+static int dm_pcache_message(struct dm_target *ti, unsigned int argc,
+ char **argv, char *result, unsigned int maxlen)
+{
+ struct dm_pcache *pcache = ti->private;
+ unsigned long val;
+
+ if (argc != 2)
+ goto err;
+
+ if (!strcasecmp(argv[0], "gc_percent")) {
+ if (kstrtoul(argv[1], 10, &val))
+ goto err;
+
+ return pcache_cache_set_gc_percent(&pcache->cache, val);
+ }
+err:
+ return -EINVAL;
+}
+
+static struct target_type dm_pcache_target = {
+ .name = "pcache",
+ .version = {0, 1, 0},
+ .module = THIS_MODULE,
+ .features = DM_TARGET_SINGLETON,
+ .ctr = dm_pcache_ctr,
+ .dtr = dm_pcache_dtr,
+ .map = dm_pcache_map_bio,
+ .status = dm_pcache_status,
+ .message = dm_pcache_message,
+};
+
+static int __init dm_pcache_init(void)
+{
+ int ret;
+
+ ret = pcache_backing_init();
+ if (ret)
+ goto err;
+
+ ret = pcache_cache_init();
+ if (ret)
+ goto backing_exit;
+
+ ret = dm_register_target(&dm_pcache_target);
+ if (ret)
+ goto cache_exit;
+ return 0;
+
+cache_exit:
+ pcache_cache_exit();
+backing_exit:
+ pcache_backing_exit();
+err:
+ return ret;
+}
+module_init(dm_pcache_init);
+
+static void __exit dm_pcache_exit(void)
+{
+ dm_unregister_target(&dm_pcache_target);
+ pcache_cache_exit();
+ pcache_backing_exit();
+}
+module_exit(dm_pcache_exit);
+
+MODULE_DESCRIPTION("dm-pcache Persistent Cache for block device");
+MODULE_AUTHOR("Dongsheng Yang <dongsheng.yang@linux.dev>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/md/dm-pcache/dm_pcache.h b/drivers/md/dm-pcache/dm_pcache.h
new file mode 100644
index 000000000000..b4e06be0c0b9
--- /dev/null
+++ b/drivers/md/dm-pcache/dm_pcache.h
@@ -0,0 +1,67 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _DM_PCACHE_H
+#define _DM_PCACHE_H
+#include <linux/device-mapper.h>
+
+#include "../dm-core.h"
+
+#define CACHE_DEV_TO_PCACHE(cache_dev) (container_of(cache_dev, struct dm_pcache, cache_dev))
+#define BACKING_DEV_TO_PCACHE(backing_dev) (container_of(backing_dev, struct dm_pcache, backing_dev))
+#define CACHE_TO_PCACHE(cache) (container_of(cache, struct dm_pcache, cache))
+
+#define PCACHE_STATE_RUNNING 1
+#define PCACHE_STATE_STOPPING 2
+
+struct pcache_cache_dev;
+struct pcache_backing_dev;
+struct pcache_cache;
+struct pcache_cache_options;
+struct dm_pcache {
+ struct dm_target *ti;
+ struct pcache_cache_dev cache_dev;
+ struct pcache_backing_dev backing_dev;
+ struct pcache_cache cache;
+ struct pcache_cache_options opts;
+
+ spinlock_t defered_req_list_lock;
+ struct list_head defered_req_list;
+ struct workqueue_struct *task_wq;
+
+ struct work_struct defered_req_work;
+
+ atomic_t state;
+ atomic_t inflight_reqs;
+ wait_queue_head_t inflight_wq;
+};
+
+static inline bool pcache_is_stopping(struct dm_pcache *pcache)
+{
+ return (atomic_read(&pcache->state) == PCACHE_STATE_STOPPING);
+}
+
+#define pcache_dev_err(pcache, fmt, ...) \
+ pcache_err("%s " fmt, pcache->ti->table->md->name, ##__VA_ARGS__)
+#define pcache_dev_info(pcache, fmt, ...) \
+ pcache_info("%s " fmt, pcache->ti->table->md->name, ##__VA_ARGS__)
+#define pcache_dev_debug(pcache, fmt, ...) \
+ pcache_debug("%s " fmt, pcache->ti->table->md->name, ##__VA_ARGS__)
+
+struct pcache_request {
+ struct dm_pcache *pcache;
+ struct bio *bio;
+
+ u64 off;
+ u32 data_len;
+
+ struct kref ref;
+ int ret;
+
+ struct list_head list_node;
+};
+
+void pcache_req_get(struct pcache_request *pcache_req);
+void pcache_req_put(struct pcache_request *pcache_req, int ret);
+
+void pcache_defer_reqs_kick(struct dm_pcache *pcache);
+
+#endif /* _DM_PCACHE_H */
diff --git a/drivers/md/dm-pcache/pcache_internal.h b/drivers/md/dm-pcache/pcache_internal.h
new file mode 100644
index 000000000000..d427e534727c
--- /dev/null
+++ b/drivers/md/dm-pcache/pcache_internal.h
@@ -0,0 +1,117 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _PCACHE_INTERNAL_H
+#define _PCACHE_INTERNAL_H
+
+#include <linux/delay.h>
+#include <linux/crc32c.h>
+
+#define pcache_err(fmt, ...) \
+ pr_err("dm-pcache: %s:%u " fmt, __func__, __LINE__, ##__VA_ARGS__)
+#define pcache_info(fmt, ...) \
+ pr_info("dm-pcache: %s:%u " fmt, __func__, __LINE__, ##__VA_ARGS__)
+#define pcache_debug(fmt, ...) \
+ pr_debug("dm-pcache: %s:%u " fmt, __func__, __LINE__, ##__VA_ARGS__)
+
+#define PCACHE_KB (1024ULL)
+#define PCACHE_MB (1024 * PCACHE_KB)
+
+/* Maximum number of metadata indices */
+#define PCACHE_META_INDEX_MAX 2
+
+#define PCACHE_CRC_SEED 0x3B15A
+/*
+ * struct pcache_meta_header - PCACHE metadata header structure
+ * @crc: CRC checksum for validating metadata integrity.
+ * @seq: Sequence number to track metadata updates.
+ * @version: Metadata version.
+ * @res: Reserved space for future use.
+ */
+struct pcache_meta_header {
+ __u32 crc;
+ __u8 seq;
+ __u8 version;
+ __u16 res;
+};
+
+/*
+ * pcache_meta_crc - Calculate CRC for the given metadata header.
+ * @header: Pointer to the metadata header.
+ * @meta_size: Size of the metadata structure.
+ *
+ * Returns the CRC checksum calculated by excluding the CRC field itself.
+ */
+static inline u32 pcache_meta_crc(struct pcache_meta_header *header, u32 meta_size)
+{
+ return crc32c(PCACHE_CRC_SEED, (void *)header + 4, meta_size - 4);
+}
+
+/*
+ * pcache_meta_seq_after - Check if a sequence number is more recent, accounting for overflow.
+ * @seq1: First sequence number.
+ * @seq2: Second sequence number.
+ *
+ * Determines if @seq1 is more recent than @seq2 by calculating the signed
+ * difference between them. This approach allows handling sequence number
+ * overflow correctly because the difference wraps naturally, and any value
+ * greater than zero indicates that @seq1 is "after" @seq2. This method
+ * assumes 8-bit unsigned sequence numbers, where the difference wraps
+ * around if seq1 overflows past seq2.
+ *
+ * Returns:
+ * - true if @seq1 is more recent than @seq2, indicating it comes "after"
+ * - false otherwise.
+ */
+static inline bool pcache_meta_seq_after(u8 seq1, u8 seq2)
+{
+ return (s8)(seq1 - seq2) > 0;
+}
+
+/*
+ * pcache_meta_find_latest - Find the latest valid metadata.
+ * @header: Pointer to the metadata header.
+ * @meta_size: Size of each metadata block.
+ *
+ * Finds the latest valid metadata by checking sequence numbers. If a
+ * valid entry with the highest sequence number is found, its pointer
+ * is returned. Returns NULL if no valid metadata is found.
+ */
+static inline void __must_check *pcache_meta_find_latest(struct pcache_meta_header *header,
+ u32 meta_size, u32 meta_max_size,
+ void *meta_ret)
+{
+ struct pcache_meta_header *meta, *latest = NULL;
+ u32 i, seq_latest = 0;
+ void *meta_addr;
+
+ meta = meta_ret;
+
+ for (i = 0; i < PCACHE_META_INDEX_MAX; i++) {
+ meta_addr = (void *)header + (i * meta_max_size);
+ if (copy_mc_to_kernel(meta, meta_addr, meta_size)) {
+ pcache_err("hardware memory error when copy meta");
+ return ERR_PTR(-EIO);
+ }
+
+ /* Skip if CRC check fails, which means corrupted */
+ if (meta->crc != pcache_meta_crc(meta, meta_size))
+ continue;
+
+ /* Update latest if a more recent sequence is found */
+ if (!latest || pcache_meta_seq_after(meta->seq, seq_latest)) {
+ seq_latest = meta->seq;
+ latest = (void *)header + (i * meta_max_size);
+ }
+ }
+
+ if (!latest)
+ return NULL;
+
+ if (copy_mc_to_kernel(meta_ret, latest, meta_size)) {
+ pcache_err("hardware memory error");
+ return ERR_PTR(-EIO);
+ }
+
+ return latest;
+}
+
+#endif /* _PCACHE_INTERNAL_H */
diff --git a/drivers/md/dm-pcache/segment.c b/drivers/md/dm-pcache/segment.c
new file mode 100644
index 000000000000..7e9818701445
--- /dev/null
+++ b/drivers/md/dm-pcache/segment.c
@@ -0,0 +1,61 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#include <linux/dax.h>
+
+#include "pcache_internal.h"
+#include "cache_dev.h"
+#include "segment.h"
+
+int segment_copy_to_bio(struct pcache_segment *segment,
+ u32 data_off, u32 data_len, struct bio *bio, u32 bio_off)
+{
+ struct iov_iter iter;
+ size_t copied;
+ void *src;
+
+ iov_iter_bvec(&iter, ITER_DEST, &bio->bi_io_vec[bio->bi_iter.bi_idx],
+ bio_segments(bio), bio->bi_iter.bi_size);
+ iter.iov_offset = bio->bi_iter.bi_bvec_done;
+ if (bio_off)
+ iov_iter_advance(&iter, bio_off);
+
+ src = segment->data + data_off;
+ copied = _copy_mc_to_iter(src, data_len, &iter);
+ if (copied != data_len)
+ return -EIO;
+
+ return 0;
+}
+
+int segment_copy_from_bio(struct pcache_segment *segment,
+ u32 data_off, u32 data_len, struct bio *bio, u32 bio_off)
+{
+ struct iov_iter iter;
+ size_t copied;
+ void *dst;
+
+ iov_iter_bvec(&iter, ITER_SOURCE, &bio->bi_io_vec[bio->bi_iter.bi_idx],
+ bio_segments(bio), bio->bi_iter.bi_size);
+ iter.iov_offset = bio->bi_iter.bi_bvec_done;
+ if (bio_off)
+ iov_iter_advance(&iter, bio_off);
+
+ dst = segment->data + data_off;
+ copied = _copy_from_iter_flushcache(dst, data_len, &iter);
+ if (copied != data_len)
+ return -EIO;
+ pmem_wmb();
+
+ return 0;
+}
+
+void pcache_segment_init(struct pcache_cache_dev *cache_dev, struct pcache_segment *segment,
+ struct pcache_segment_init_options *options)
+{
+ segment->seg_info = options->seg_info;
+ segment_info_set_type(segment->seg_info, options->type);
+
+ segment->cache_dev = cache_dev;
+ segment->seg_id = options->seg_id;
+ segment->data_size = PCACHE_SEG_SIZE - options->data_off;
+ segment->data = CACHE_DEV_SEGMENT(cache_dev, options->seg_id) + options->data_off;
+}
diff --git a/drivers/md/dm-pcache/segment.h b/drivers/md/dm-pcache/segment.h
new file mode 100644
index 000000000000..deca1ddcb02b
--- /dev/null
+++ b/drivers/md/dm-pcache/segment.h
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _PCACHE_SEGMENT_H
+#define _PCACHE_SEGMENT_H
+
+#include <linux/bio.h>
+#include <linux/bitfield.h>
+
+#include "pcache_internal.h"
+
+struct pcache_segment_info {
+ struct pcache_meta_header header;
+ __u32 flags;
+ __u32 next_seg;
+};
+
+#define PCACHE_SEG_INFO_FLAGS_HAS_NEXT BIT(0)
+
+#define PCACHE_SEG_INFO_FLAGS_TYPE_MASK GENMASK(4, 1)
+#define PCACHE_SEGMENT_TYPE_CACHE_DATA 1
+
+static inline bool segment_info_has_next(struct pcache_segment_info *seg_info)
+{
+ return (seg_info->flags & PCACHE_SEG_INFO_FLAGS_HAS_NEXT);
+}
+
+static inline void segment_info_set_type(struct pcache_segment_info *seg_info, u8 type)
+{
+ seg_info->flags &= ~PCACHE_SEG_INFO_FLAGS_TYPE_MASK;
+ seg_info->flags |= FIELD_PREP(PCACHE_SEG_INFO_FLAGS_TYPE_MASK, type);
+}
+
+static inline u8 segment_info_get_type(struct pcache_segment_info *seg_info)
+{
+ return FIELD_GET(PCACHE_SEG_INFO_FLAGS_TYPE_MASK, seg_info->flags);
+}
+
+struct pcache_segment_pos {
+ struct pcache_segment *segment; /* Segment associated with the position */
+ u32 off; /* Offset within the segment */
+};
+
+struct pcache_segment_init_options {
+ u8 type;
+ u32 seg_id;
+ u32 data_off;
+
+ struct pcache_segment_info *seg_info;
+};
+
+struct pcache_segment {
+ struct pcache_cache_dev *cache_dev;
+
+ void *data;
+ u32 data_size;
+ u32 seg_id;
+
+ struct pcache_segment_info *seg_info;
+};
+
+int segment_copy_to_bio(struct pcache_segment *segment,
+ u32 data_off, u32 data_len, struct bio *bio, u32 bio_off);
+int segment_copy_from_bio(struct pcache_segment *segment,
+ u32 data_off, u32 data_len, struct bio *bio, u32 bio_off);
+
+static inline void segment_pos_advance(struct pcache_segment_pos *seg_pos, u32 len)
+{
+ BUG_ON(seg_pos->off + len > seg_pos->segment->data_size);
+
+ seg_pos->off += len;
+}
+
+void pcache_segment_init(struct pcache_cache_dev *cache_dev, struct pcache_segment *segment,
+ struct pcache_segment_init_options *options);
+#endif /* _PCACHE_SEGMENT_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v5 00/11] dm-pcache – persistent-memory cache for block devices
2025-08-12 8:24 [PATCH v5 00/11] dm-pcache – persistent-memory cache for block devices Dongsheng Yang
2025-08-12 8:24 ` [PATCH v5 1/1] dm-pcache: add persistent cache target in device-mapper Dongsheng Yang
@ 2025-08-14 20:58 ` Mikulas Patocka
2025-08-15 11:39 ` Dongsheng Yang
1 sibling, 1 reply; 4+ messages in thread
From: Mikulas Patocka @ 2025-08-14 20:58 UTC (permalink / raw)
To: Dongsheng Yang; +Cc: agk, snitzer, dm-devel, cengku
On Tue, 12 Aug 2025, Dongsheng Yang wrote:
> Hi Mikulas,
> This is V5 for dm-pcache, please consider merging.
Hi
I merged it to the device mapper git.
(there was a problem with the include file linux/pfn_t.h and the type
pfn_t not being present on Linux 6.17-rc1, but I fixed it)
Mikulas
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v5 00/11] dm-pcache – persistent-memory cache for block devices
2025-08-14 20:58 ` [PATCH v5 00/11] dm-pcache – persistent-memory cache for block devices Mikulas Patocka
@ 2025-08-15 11:39 ` Dongsheng Yang
0 siblings, 0 replies; 4+ messages in thread
From: Dongsheng Yang @ 2025-08-15 11:39 UTC (permalink / raw)
To: Mikulas Patocka; +Cc: agk, snitzer, dm-devel, cengku
在 8/15/2025 4:58 AM, Mikulas Patocka 写道:
>
> On Tue, 12 Aug 2025, Dongsheng Yang wrote:
>
>> Hi Mikulas,
>> This is V5 for dm-pcache, please consider merging.
> Hi
>
> I merged it to the device mapper git.
>
> (there was a problem with the include file linux/pfn_t.h and the type
> pfn_t not being present on Linux 6.17-rc1, but I fixed it)
Thanx for the fix.
Dongsheng
>
> Mikulas
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-08-15 11:39 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-12 8:24 [PATCH v5 00/11] dm-pcache – persistent-memory cache for block devices Dongsheng Yang
2025-08-12 8:24 ` [PATCH v5 1/1] dm-pcache: add persistent cache target in device-mapper Dongsheng Yang
2025-08-14 20:58 ` [PATCH v5 00/11] dm-pcache – persistent-memory cache for block devices Mikulas Patocka
2025-08-15 11:39 ` Dongsheng Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).