[dm-devel] Potential enhancements to dm-thin v2

* [dm-devel] Potential enhancements to dm-thin v2
@ 2022-04-10 22:03 Demi Marie Obenour
  2022-04-11  8:16 ` Zdenek Kabelac
  0 siblings, 1 reply; 10+ messages in thread
From: Demi Marie Obenour @ 2022-04-10 22:03 UTC (permalink / raw)
  To: Joe Thornber; +Cc: dm-devel

[-- Attachment #1.1: Type: text/plain, Size: 2286 bytes --]

For quite a while, I have wanted to write a tool to manage thin volumes
that is not based on LVM.  The main thing holding me back is that the
current dm-thin interface is extremely error-prone.  The only per-thin
metadata stored by the kernel is a 24-bit thin ID, and userspace must
take great care to keep that ID in sync with its own metadata.  Failure
to do so results in data loss, data corruption, or even security
vulnerabilities.  Furthermore, having to suspend a thin volume before
one can take a snapshot of it creates a critical section during which
userspace must be very careful, as I/O or a crash can lead to deadlock.
I believe both of these problems can be solved without overly
complicating the kernel implementation.

The metadata problem can be solved by allowing userspace to (1)
associate a 256-byte binary blob with each thin volume and (2) easily
enumerate the thin volumes in a pool.  Even with 16777216 thins, this
would only use 4GiB of space, and dm-thin v2 will support far larger
metadata volumes.  While being able to look up thins by the blob would
be awesome, I would be okay with just enumerating thins at startup and
caching the ID ⇔ blob mapping in userspace, at least if thin IDs become
64-bit so I do not have to worry about reuse.  Being able to enumerate
the thin volumes would allow me to rely solely on the metadata in the
thin pool, without having to manage any metadata in userspace.  Looking
at the existing implementation, this seems to be fairly simple: the
current B-tree code supports arbitrary value sizes already, so the blob
could be appended to 'struct disk_device_details'.  (Requiring the size
of the blob to be set at pool creation, or when the pool is empty, is
fine.)

The suspend problem can be solved by having the kernel automatically
suspend a thin volume before taking a snapshot of it, and resuming
afterwards.  This removes a footgun from the userspace API, and should
improve reliability too, as it reduces the number of error conditions
that can hang the system.  Per discussion with Zdenek, having the kernel
do this automatically is infeasible for arbitrary device stacks, but
this is a common special case.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 98 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread