From: Ram Pai <linuxram@us.ibm.com>
To: LKML <linux-kernel@vger.kernel.org>,
linux-raid@vger.kernel.org, dm-devel@redhat.com,
linux-doc@vger.kernel.org
Cc: shli@kernel.org, agk@redhat.com, snitzer@redhat.com,
corbet@lwn.net, Ram Pai <linuxram@us.ibm.com>
Subject: [RFC PATCH 16/16] DM: add documentation for dm-inplace-compress.
Date: Mon, 15 Aug 2016 10:36:53 -0700 [thread overview]
Message-ID: <1471282613-31006-17-git-send-email-linuxram@us.ibm.com> (raw)
In-Reply-To: <1471282613-31006-1-git-send-email-linuxram@us.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
.../device-mapper/dm-inplace-compress.text | 138 ++++++++++++++++++++
1 files changed, 138 insertions(+), 0 deletions(-)
create mode 100644 Documentation/device-mapper/dm-inplace-compress.text
diff --git a/Documentation/device-mapper/dm-inplace-compress.text b/Documentation/device-mapper/dm-inplace-compress.text
new file mode 100644
index 0000000..c31e69e
--- /dev/null
+++ b/Documentation/device-mapper/dm-inplace-compress.text
@@ -0,0 +1,138 @@
+dm-inplace-compress
+====================
+
+Device-Mapper's "inplace-compress" target provides inplace compression of block
+devices using the kernel compression API.
+
+Parameters: <device path> \
+ [ <#opt_params writethough> ]
+ [ <#opt_params <writeback> <meta_commit_delay> ]
+ [ <#opt_params compressor> <type> ]
+
+
+<writethrough>
+ Write data and metadata together.
+
+<writeback> <meta_commit_delay>
+ Write metadata every 'meta_commit_delay' interval.
+
+<device path>
+ This is the device that is going to be used as backend and contains the
+ compressed data. You can specify it as a path like /dev/xxx or a device
+ number <major>:<minor>.
+
+<compressor> <type>
+ Choose the compressor algorithm. 'lzo' and '842'
+ compressors are supported.
+
+Example scripts
+===============
+
+create a inplace-compress block device using lzo compression. Write metadata
+and data together.
+[[
+#!/bin/sh
+# Create a inplace-compress device using dmsetup
+dmsetup create comp1 --table "0 `blockdev --getsize $1` inplacecompress $1
+ writethrough compressor lzo"
+]]
+
+
+create a inplace-compress block device using nx-842 hardware compression. Write
+metadata periodially every 5sec.
+
+[[
+#!/bin/sh
+# Create a inplace-compress device using dmsetup
+dmsetup create comp1 --table "0 `blockdev --getsize $1` inplacecompress $1
+ writeback 5 compressor 842"
+]]
+
+Description
+===========
+ This is a simple DM target supporting inplace compression. Its best suited for
+ SSD. The underlying disk must support 512B sector size, the target only
+ supports 4k sector size.
+
+ Disk layout:
+ |super|...meta...|..data...|
+
+ Store unit is 4k (a block). Super is 1 block, which stores meta and data
+ size and compression algorithm. Meta is a bitmap. For each data block,
+ there are 5 bits meta.
+
+ Data:
+
+ Data of a block is compressed. Compressed data is round up to 512B, which
+ is the payload. In disk, payload is stored at the beginning of logical
+ sector of the block. Let's look at an example. Say we store data to block
+ A, which is in sector B(A*8), its orginal size is 4k, compressed size is
+ 1500. Compressed data (CD) will use 3 sectors (512B). The 3 sectors are the
+ payload. Payload will be stored at sector B.
+
+ ---------------------------------------------------
+ ... | CD1 | CD2 | CD3 | | | | | | ...
+ ---------------------------------------------------
+ ^B ^B+1 ^B+2 ^B+7 ^B+8
+
+ For this block, we will not use sector B+3 to B+7 (a hole). We use 4 meta
+ bits to present payload size. The compressed size (1500) isn't stored in
+ meta directly. Instead, we store it at the last 32bits of payload. In this
+ example, we store it at the end of sector B+2. If compressed size +
+ sizeof(32bits) crosses a sector, payload size will increase one sector. If
+ payload uses 8 sectors, we store uncompressed data directly.
+
+ If IO size is bigger than one block, we can store the data as an extent.
+ Data of the whole extent will compressed and stored in the similar way like
+ above. The first block of the extent is the head, all others are the tail.
+ If extent is 1 block, the block is head. We have 1 bit of meta to present
+ if a block is head or tail. If 4 meta bits of head block can't store extent
+ payload size, we will borrow tail block meta bits to store payload size.
+ Max allowd extent size is 128k, so we don't compress/decompress too big
+ size data.
+
+ Meta:
+ Modifying data will modify meta too. Meta will be written(flush) to disk
+ depending on meta write policy. We support writeback and writethrough mode.
+ In writeback mode, meta will be written to disk in an interval or a FLUSH
+ request. In writethrough mode, data and meta data will be written to disk
+ together.
+
+ Advantages:
+
+ 1. Simple. Since we store compressed data in-place, we don't need complicated
+ disk data management.
+ 2. Efficient. For each 4k, we only need 5 bits meta. 1T data will use less than
+ 200M meta, so we can load all meta into memory. And actual compression size is
+ in payload. So if IO doesn't need RMW and we use writeback meta flush, we don't
+ need extra IO for meta.
+
+ Disadvantages:
+
+ 1. hole. Since we store compressed data in-place, there are a lot of holes
+ (in above example, B+3 - B+7) Hole can impact IO, because we can't do IO
+ merge.
+
+ 2. 1:1 size. Compression doesn't change disk size. If disk is 1T, we can
+ only store 1T data even we do compression.
+
+ But this is for SSD only. Generally SSD firmware has a FTL layer to map
+ disk sectors to flash nand. High end SSD firmware has filesystem-like FTL.
+
+ 1. hole. Disk has a lot of holes, but SSD FTL can still store data continuous
+ in nand. Even if we can't do IO merge in OS layer, SSD firmware can do it.
+
+ 2. 1:1 size. On one side, we write compressed data to SSD, which means less
+ data is written to SSD. This will be very helpful to improve SSD garbage
+ collection, and so write speed and life cycle. So even this is a problem, the
+ target is still helpful. On the other side, advanced SSD FTL can easily do thin
+ provision. For example, if nand is 1T and we let SSD report it as 2T, and use
+ the SSD as compressed target. In such SSD, we don't have the 1:1 size issue.
+
+ So even if SSD FTL cannot map non-continuous disk sectors to continuous nand,
+ the compression target can still function well.
+
+
+Author:
+ Shaohua Li <shli@fusionio.com>
+ Ram Pai <ram.n.pai@gmail.com>
--
1.7.1
prev parent reply other threads:[~2016-08-15 17:36 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-15 17:36 [RFC PATCH 00/16] dm-inplace-compression block device Ram Pai
2016-08-15 17:36 ` [RFC PATCH 01/16] DM: dm-inplace-compress: an inplace compressed DM target Ram Pai
2016-08-15 17:36 ` [RFC PATCH 02/16] DM: Ability to choose the compressor Ram Pai
2016-08-15 17:36 ` [RFC PATCH 03/16] DM: Error if enough space is not available Ram Pai
2016-08-15 17:36 ` [RFC PATCH 04/16] DM: Ensure that the read request is within the device range Ram Pai
2016-08-15 17:36 ` [RFC PATCH 05/16] DM: allocation/free helper routines Ram Pai
2016-08-15 17:36 ` [RFC PATCH 06/16] DM: separate out compression and decompression routines Ram Pai
2016-08-15 17:36 ` [RFC PATCH 07/16] DM: Optimize memory allocated to hold compressed buffer Ram Pai
2016-08-15 17:36 ` [RFC PATCH 08/16] DM: Tag a magicmarker at the end of each compressed segment Ram Pai
2016-08-15 17:36 ` [RFC PATCH 09/16] DM: Delay allocation of decompression buffer during read Ram Pai
2016-08-15 17:36 ` [RFC PATCH 10/16] DM: Try to use the bio buffer for decompression instead of allocating one Ram Pai
2016-08-15 17:36 ` [RFC PATCH 11/16] DM: Try to avoid temporary buffer allocation to hold compressed data Ram Pai
2016-08-15 17:36 ` [RFC PATCH 12/16] DM: release unneeded buffer as soon as possible Ram Pai
2016-08-15 17:36 ` [RFC PATCH 13/16] DM: macros to set and get the state of the request Ram Pai
2016-08-15 17:36 ` [RFC PATCH 14/16] DM: Wasted bio copy Ram Pai
2016-08-15 17:36 ` [RFC PATCH 15/16] DM: Add sysfs parameters to track total memory saved and allocated Ram Pai
2016-08-15 17:36 ` Ram Pai [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1471282613-31006-17-git-send-email-linuxram@us.ibm.com \
--to=linuxram@us.ibm.com \
--cc=agk@redhat.com \
--cc=corbet@lwn.net \
--cc=dm-devel@redhat.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=shli@kernel.org \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).