* [PATCH RFCv2 00/10] dm-dedup: device-mapper deduplication target @ 2014-08-28 22:48 Vasily Tarasov 2014-12-03 2:31 ` Darrick J. Wong 2015-01-14 19:43 ` Vivek Goyal 0 siblings, 2 replies; 9+ messages in thread From: Vasily Tarasov @ 2014-08-28 22:48 UTC (permalink / raw) To: dm-devel Cc: Joe Thornber, Mike Snitzer, Christoph Hellwig, Philip Shilane, Sonam Mandal, Erez Zadok This is a second request for comments for dm-dedup. Updates compared to the first submission: - code is updated to kernel 3.16 - construction parameters are now positional (as in other targets) - documentation is extended and brought to the same format as in other targets Dm-dedup is a device-mapper deduplication target. Every write coming to the dm-dedup instance is deduplicated against previously written data. For datasets that contain many duplicates scattered across the disk (e.g., collections of virtual machine disk images and backups) deduplication provides a significant amount of space savings. To quickly identify duplicates, dm-dedup maintains an index of hashes for all written blocks. A block is a user-configurable unit of deduplication with a recommended block size of 4KB. dm-dedup's index, along with other deduplication metadata, resides on a separate block device, which we refer to as a metadata device. Although the metadata device can be on any block device, e.g., an HDD or its own partition, for higher performance we recommend to use SSD devices to store metadata. Dm-dedup is designed to support pluggable metadata backends. A metadata backend is responsible for storing metadata: LBN-to-PBN and HASH-to-PBN mappings, allocation maps, and reference counters. (LBN: Logical Block Number, PBN: Physical Block Number). Currently we implemented "cowbtree" and "inram" backends. The cowbtree uses device-mapper persistent API to store metadata. The inram backend stores all metadata in RAM as a hash table. Detailed design is described here: http://www.fsl.cs.sunysb.edu/docs/ols-dmdedup/dmdedup-ols14.pdf Our preliminary experiments on real traces demonstrate that Dmdedup can even exceed the performance of a disk drive running ext4. The reasons are that (1) deduplication reduces I/O traffic to the data device, and (2) Dmdedup effectively sequentializes random writes to the data device. Dmdedup is developed by a joint group of researchers from Stony Brook University, Harvey Mudd College, and EMC. See the documentation patch for more details. Vasily Tarasov (10): dm-dedup: main data structures dm-dedup: core deduplication logic dm-dedup: hash computation dm-dedup: implementation of the read-on-write procedure dm-dedup: COW B-tree backend dm-dedup: inram backend dm-dedup: Makefile changes dm-dedup: Kconfig changes dm-dedup: status function dm-dedup: documentation Documentation/device-mapper/dedup.txt | 205 +++++++ drivers/md/Kconfig | 8 + drivers/md/Makefile | 2 + drivers/md/dm-dedup-backend.h | 114 ++++ drivers/md/dm-dedup-cbt.c | 755 ++++++++++++++++++++++++++ drivers/md/dm-dedup-cbt.h | 44 ++ drivers/md/dm-dedup-hash.c | 145 +++++ drivers/md/dm-dedup-hash.h | 30 + drivers/md/dm-dedup-kvstore.h | 51 ++ drivers/md/dm-dedup-ram.c | 580 ++++++++++++++++++++ drivers/md/dm-dedup-ram.h | 43 ++ drivers/md/dm-dedup-rw.c | 248 +++++++++ drivers/md/dm-dedup-rw.h | 19 + drivers/md/dm-dedup-target.c | 946 +++++++++++++++++++++++++++++++++ drivers/md/dm-dedup-target.h | 100 ++++ 15 files changed, 3290 insertions(+), 0 deletions(-) create mode 100644 Documentation/device-mapper/dedup.txt create mode 100644 drivers/md/dm-dedup-backend.h create mode 100644 drivers/md/dm-dedup-cbt.c create mode 100644 drivers/md/dm-dedup-cbt.h create mode 100644 drivers/md/dm-dedup-hash.c create mode 100644 drivers/md/dm-dedup-hash.h create mode 100644 drivers/md/dm-dedup-kvstore.h create mode 100644 drivers/md/dm-dedup-ram.c create mode 100644 drivers/md/dm-dedup-ram.h create mode 100644 drivers/md/dm-dedup-rw.c create mode 100644 drivers/md/dm-dedup-rw.h create mode 100644 drivers/md/dm-dedup-target.c create mode 100644 drivers/md/dm-dedup-target.h ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFCv2 00/10] dm-dedup: device-mapper deduplication target 2014-08-28 22:48 [PATCH RFCv2 00/10] dm-dedup: device-mapper deduplication target Vasily Tarasov @ 2014-12-03 2:31 ` Darrick J. Wong 2015-01-14 19:43 ` Vivek Goyal 1 sibling, 0 replies; 9+ messages in thread From: Darrick J. Wong @ 2014-12-03 2:31 UTC (permalink / raw) To: device-mapper development Cc: Joe Thornber, Mike Snitzer, Christoph Hellwig, Philip Shilane, Sonam Mandal, Erez Zadok On Thu, Aug 28, 2014 at 06:48:28PM -0400, Vasily Tarasov wrote: > This is a second request for comments for dm-dedup. > > Updates compared to the first submission: > > - code is updated to kernel 3.16 > - construction parameters are now positional (as in other targets) > - documentation is extended and brought to the same format as in other targets > > Dm-dedup is a device-mapper deduplication target. Every write coming to the > dm-dedup instance is deduplicated against previously written data. For > datasets that contain many duplicates scattered across the disk (e.g., > collections of virtual machine disk images and backups) deduplication provides > a significant amount of space savings. > > To quickly identify duplicates, dm-dedup maintains an index of hashes for all > written blocks. A block is a user-configurable unit of deduplication with a > recommended block size of 4KB. dm-dedup's index, along with other > deduplication metadata, resides on a separate block device, which we refer to > as a metadata device. Although the metadata device can be on any block > device, e.g., an HDD or its own partition, for higher performance we recommend > to use SSD devices to store metadata. > > Dm-dedup is designed to support pluggable metadata backends. A metadata > backend is responsible for storing metadata: LBN-to-PBN and HASH-to-PBN > mappings, allocation maps, and reference counters. (LBN: Logical Block > Number, PBN: Physical Block Number). Currently we implemented "cowbtree" and > "inram" backends. The cowbtree uses device-mapper persistent API to store > metadata. The inram backend stores all metadata in RAM as a hash table. > > Detailed design is described here: > > http://www.fsl.cs.sunysb.edu/docs/ols-dmdedup/dmdedup-ols14.pdf > > Our preliminary experiments on real traces demonstrate that Dmdedup can even > exceed the performance of a disk drive running ext4. The reasons are that (1) > deduplication reduces I/O traffic to the data device, and (2) Dmdedup > effectively sequentializes random writes to the data device. Hi! /me starts playing with the patches at: git://git.fsl.cs.stonybrook.edu/linux-dmdedup.git#dm-dedup-devel They seem to apply ok to 3.18-rc7, so I got to poke around long enough to have questions/comments: Is there a way for it to automatically garbage collect? I started rewriting the same block tons of times[1], but then the device filled up and all the writes stopped. If I sent the "garbage_collect" message every 15s it wouldn't wedge like that, but if I let it hang, garbage collecting didn't un-wedge the wac processes. Loading with the cowbtree backend caused a crash in target_message (dm core) with a RIP of zero when I tried to send the garbage_collect message. It would be nice if one could send discard and (optionally) do checksum verification on the read path. I'll look into adding those once I get a better grasp on what the code is doing. Fortunately dm-dedup is short. :) I suspect that this business in my_endio that uses bio_iovec to free the page isn't going to work with the iterator rework. When I tried bulk writing 128M of zeroes to the device, it blew up while trying to free_pages some nonexistent page. Fixing it to bio_for_each_segment_all() and free bvec->bv_page gets us to free the correct page, at least, but the next IO splats. Thanks for clearing out some of the BUG*()s. FYI, dm-dedupe might be an easier way to do data block checksumming for ext4, hence my interest. I ran the ext4 metadata checksum test and it managed to finish without any blowups, though xfstests was not so lucky. Amusingly the dedupe ratio was ~53 when it finished. --D [1] wac.c: http://djwong.org/docs/wac.c $ gcc -Wall -o wac wac.c $ ./wac -l 65536 -n32 -m32 -y32 -z32 -f -r $DEDUPE_DEVICE > > Dmdedup is developed by a joint group of researchers from Stony Brook > University, Harvey Mudd College, and EMC. See the documentation patch for > more details. > > Vasily Tarasov (10): > dm-dedup: main data structures > dm-dedup: core deduplication logic > dm-dedup: hash computation > dm-dedup: implementation of the read-on-write procedure > dm-dedup: COW B-tree backend > dm-dedup: inram backend > dm-dedup: Makefile changes > dm-dedup: Kconfig changes > dm-dedup: status function > dm-dedup: documentation > > Documentation/device-mapper/dedup.txt | 205 +++++++ > drivers/md/Kconfig | 8 + > drivers/md/Makefile | 2 + > drivers/md/dm-dedup-backend.h | 114 ++++ > drivers/md/dm-dedup-cbt.c | 755 ++++++++++++++++++++++++++ > drivers/md/dm-dedup-cbt.h | 44 ++ > drivers/md/dm-dedup-hash.c | 145 +++++ > drivers/md/dm-dedup-hash.h | 30 + > drivers/md/dm-dedup-kvstore.h | 51 ++ > drivers/md/dm-dedup-ram.c | 580 ++++++++++++++++++++ > drivers/md/dm-dedup-ram.h | 43 ++ > drivers/md/dm-dedup-rw.c | 248 +++++++++ > drivers/md/dm-dedup-rw.h | 19 + > drivers/md/dm-dedup-target.c | 946 +++++++++++++++++++++++++++++++++ > drivers/md/dm-dedup-target.h | 100 ++++ > 15 files changed, 3290 insertions(+), 0 deletions(-) > create mode 100644 Documentation/device-mapper/dedup.txt > create mode 100644 drivers/md/dm-dedup-backend.h > create mode 100644 drivers/md/dm-dedup-cbt.c > create mode 100644 drivers/md/dm-dedup-cbt.h > create mode 100644 drivers/md/dm-dedup-hash.c > create mode 100644 drivers/md/dm-dedup-hash.h > create mode 100644 drivers/md/dm-dedup-kvstore.h > create mode 100644 drivers/md/dm-dedup-ram.c > create mode 100644 drivers/md/dm-dedup-ram.h > create mode 100644 drivers/md/dm-dedup-rw.c > create mode 100644 drivers/md/dm-dedup-rw.h > create mode 100644 drivers/md/dm-dedup-target.c > create mode 100644 drivers/md/dm-dedup-target.h > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFCv2 00/10] dm-dedup: device-mapper deduplication target 2014-08-28 22:48 [PATCH RFCv2 00/10] dm-dedup: device-mapper deduplication target Vasily Tarasov 2014-12-03 2:31 ` Darrick J. Wong @ 2015-01-14 19:43 ` Vivek Goyal 2015-01-15 9:08 ` Akira Hayakawa 2015-01-23 16:27 ` Vasily Tarasov 1 sibling, 2 replies; 9+ messages in thread From: Vivek Goyal @ 2015-01-14 19:43 UTC (permalink / raw) To: Vasily Tarasov Cc: Joe Thornber, Mike Snitzer, Christoph Hellwig, device-mapper development, Philip Shilane, Sonam Mandal, Erez Zadok On Thu, Aug 28, 2014 at 06:48:28PM -0400, Vasily Tarasov wrote: > This is a second request for comments for dm-dedup. > > Updates compared to the first submission: > > - code is updated to kernel 3.16 > - construction parameters are now positional (as in other targets) > - documentation is extended and brought to the same format as in other targets > > Dm-dedup is a device-mapper deduplication target. Every write coming to the > dm-dedup instance is deduplicated against previously written data. For > datasets that contain many duplicates scattered across the disk (e.g., > collections of virtual machine disk images and backups) deduplication provides > a significant amount of space savings. > > To quickly identify duplicates, dm-dedup maintains an index of hashes for all > written blocks. A block is a user-configurable unit of deduplication with a > recommended block size of 4KB. dm-dedup's index, along with other > deduplication metadata, resides on a separate block device, which we refer to > as a metadata device. Although the metadata device can be on any block > device, e.g., an HDD or its own partition, for higher performance we recommend > to use SSD devices to store metadata. > > Dm-dedup is designed to support pluggable metadata backends. A metadata > backend is responsible for storing metadata: LBN-to-PBN and HASH-to-PBN > mappings, allocation maps, and reference counters. (LBN: Logical Block > Number, PBN: Physical Block Number). Currently we implemented "cowbtree" and > "inram" backends. The cowbtree uses device-mapper persistent API to store > metadata. The inram backend stores all metadata in RAM as a hash table. > > Detailed design is described here: > > http://www.fsl.cs.sunysb.edu/docs/ols-dmdedup/dmdedup-ols14.pdf > > Our preliminary experiments on real traces demonstrate that Dmdedup can even > exceed the performance of a disk drive running ext4. The reasons are that (1) > deduplication reduces I/O traffic to the data device, and (2) Dmdedup > effectively sequentializes random writes to the data device. > > Dmdedup is developed by a joint group of researchers from Stony Brook > University, Harvey Mudd College, and EMC. See the documentation patch for > more details. Hi, I have quickly browsed through the paper above and have some very basic questions. - What real life workload is really going to benefit from this? Do you have any numbers for that? I see one example of storing multiple linux trees in tar format and for the sequential write case with CBT backend performance has almost halfed with CBT backend. And we had a dedup ratio of 1.88 (for perfect case). INRAM numbers I think really don't count because it is not practical to keep all metadata in RAM. And the case of keeping all data in NVRAM is still little futuristic. So this sounds like a too huge a performance penalty to me to be really useful on real life workloads? - Why did you implement an inline deduplication as opposed to out-of-line deduplication? Section 2 (Timeliness) in paper just mentioned out-of-line dedup but does not go into more details that why did you choose an in-line one. I am wondering that will it not make sense to first implement an out-of-line dedup and punt lot of cost to worker thread (which kick in only when storage is idle). That way even if don't get a high dedup ratio for a workload, inserting a dedup target in the stack will be less painful from performance point of view. - You mentioned that random workload will become sequetion with dedup. That will be true only if there is a single writer, isn't it? Have you run your tests with multiple writers doing random writes and did you get same kind of imrovements? Also on the flip side a seqeuntial file will become random if multiple writers are overwriting their sequential file (as you always allocate a new block upon overwrite) and that will hit performance. - What is 4KB chunking? Is it same as saying that block size will be 4KB? If yes, I am concerned that this might turn out to be a performance bottleneck. Thanks Vivek ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFCv2 00/10] dm-dedup: device-mapper deduplication target 2015-01-14 19:43 ` Vivek Goyal @ 2015-01-15 9:08 ` Akira Hayakawa 2015-01-23 16:34 ` Vasily Tarasov 2015-01-23 16:27 ` Vasily Tarasov 1 sibling, 1 reply; 9+ messages in thread From: Akira Hayakawa @ 2015-01-15 9:08 UTC (permalink / raw) To: device-mapper development Cc: Vasily Tarasov, Joe Thornber, Mike Snitzer, Christoph Hellwig, Philip Shilane, Sonam Mandal, Erez Zadok, Vivek Goyal Hi, Just a comment. If I understand correctly, dm-dedup is a block-level fix-sized chunking online deduplication. That first splits the incoming request into fixed-sized chunk (the smaller the chunk is the more efficient the deduplication is) that's typically 4KB. My caching driver dm-writeboost also splits requests into 4KB chunks but the situations aren't the same. I think if the backend (not metadata) storage is HDD, the splitting won't be a bottleneck but if it's more fast storage like SSD, it probably will. In my case on the other hand, the backend storage is always HDD. That's the difference. In your paper, you mention that the typical combination of backend/metadata storage is HDD/SSD but I think the backend storage nowadays can be SSD. Do you think SSD deduplicates the data internally and so dm-dedup will not be used in that case? As you mention in the future work, variable-length chunking can save metadata but more complex data management is needed. However, I think avoiding splitting will make sense with SSD backend. And because you compute hashing for each chunk, CPU usage is relatively high, so you don't need to worry about the another CPU usage. - Akira On Wed, 14 Jan 2015 14:43:15 -0500 Vivek Goyal <vgoyal@redhat.com> wrote: > On Thu, Aug 28, 2014 at 06:48:28PM -0400, Vasily Tarasov wrote: > > This is a second request for comments for dm-dedup. > > > > Updates compared to the first submission: > > > > - code is updated to kernel 3.16 > > - construction parameters are now positional (as in other targets) > > - documentation is extended and brought to the same format as in other targets > > > > Dm-dedup is a device-mapper deduplication target. Every write coming to the > > dm-dedup instance is deduplicated against previously written data. For > > datasets that contain many duplicates scattered across the disk (e.g., > > collections of virtual machine disk images and backups) deduplication provides > > a significant amount of space savings. > > > > To quickly identify duplicates, dm-dedup maintains an index of hashes for all > > written blocks. A block is a user-configurable unit of deduplication with a > > recommended block size of 4KB. dm-dedup's index, along with other > > deduplication metadata, resides on a separate block device, which we refer to > > as a metadata device. Although the metadata device can be on any block > > device, e.g., an HDD or its own partition, for higher performance we recommend > > to use SSD devices to store metadata. > > > > Dm-dedup is designed to support pluggable metadata backends. A metadata > > backend is responsible for storing metadata: LBN-to-PBN and HASH-to-PBN > > mappings, allocation maps, and reference counters. (LBN: Logical Block > > Number, PBN: Physical Block Number). Currently we implemented "cowbtree" and > > "inram" backends. The cowbtree uses device-mapper persistent API to store > > metadata. The inram backend stores all metadata in RAM as a hash table. > > > > Detailed design is described here: > > > > http://www.fsl.cs.sunysb.edu/docs/ols-dmdedup/dmdedup-ols14.pdf > > > > Our preliminary experiments on real traces demonstrate that Dmdedup can even > > exceed the performance of a disk drive running ext4. The reasons are that (1) > > deduplication reduces I/O traffic to the data device, and (2) Dmdedup > > effectively sequentializes random writes to the data device. > > > > Dmdedup is developed by a joint group of researchers from Stony Brook > > University, Harvey Mudd College, and EMC. See the documentation patch for > > more details. > > Hi, > > I have quickly browsed through the paper above and have some very > basic questions. > > - What real life workload is really going to benefit from this? Do you > have any numbers for that? > > I see one example of storing multiple linux trees in tar format and for > the sequential write case with CBT backend performance has almost halfed > with CBT backend. And we had a dedup ratio of 1.88 (for perfect case). > > INRAM numbers I think really don't count because it is not practical to > keep all metadata in RAM. And the case of keeping all data in NVRAM is > still little futuristic. > > So this sounds like a too huge a performance penalty to me to be really > useful on real life workloads? > > - Why did you implement an inline deduplication as opposed to out-of-line > deduplication? Section 2 (Timeliness) in paper just mentioned > out-of-line dedup but does not go into more details that why did you > choose an in-line one. > > I am wondering that will it not make sense to first implement an > out-of-line dedup and punt lot of cost to worker thread (which kick > in only when storage is idle). That way even if don't get a high dedup > ratio for a workload, inserting a dedup target in the stack will be less > painful from performance point of view. > > - You mentioned that random workload will become sequetion with dedup. > That will be true only if there is a single writer, isn't it? Have > you run your tests with multiple writers doing random writes and did > you get same kind of imrovements? > > Also on the flip side a seqeuntial file will become random if multiple > writers are overwriting their sequential file (as you always allocate > a new block upon overwrite) and that will hit performance. > > - What is 4KB chunking? Is it same as saying that block size will be > 4KB? If yes, I am concerned that this might turn out to be a performance > bottleneck. > > Thanks > Vivek > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel -- Akira Hayakawa <ruby.wktk@gmail.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFCv2 00/10] dm-dedup: device-mapper deduplication target 2015-01-15 9:08 ` Akira Hayakawa @ 2015-01-23 16:34 ` Vasily Tarasov 0 siblings, 0 replies; 9+ messages in thread From: Vasily Tarasov @ 2015-01-23 16:34 UTC (permalink / raw) To: device-mapper development Cc: Joe Thornber, Mike Snitzer, Christoph Hellwig, Philip Shilane, Sonam Mandal, Erez Zadok, Vivek Goyal Akira, I don't think modern SSDs deduplicate data internally (at least most of them don't). So, in terms of space, dm-dedup will still be beneficial for SSDs. We consider the scenario when data is stored on HDD more common because HDDs are much larger and can store large datasets. Applying deduplication to large datasets is somewhat more justified. But, as I mentioned, some people might want to apply dedup to SSDs as well. Dm-dedup can be for this as well. Vasily On Thu, Jan 15, 2015 at 4:08 AM, Akira Hayakawa <ruby.wktk@gmail.com> wrote: > Hi, > > Just a comment. > > If I understand correctly, dm-dedup is a block-level fix-sized chunking > online deduplication. > That first splits the incoming request into fixed-sized chunk (the smaller > the chunk is the more efficient the deduplication is) that's typically 4KB. > > My caching driver dm-writeboost also splits requests into 4KB chunks but > the situations aren't the same. > I think if the backend (not metadata) storage is HDD, the splitting won't be > a bottleneck but if it's more fast storage like SSD, it probably will. In my case > on the other hand, the backend storage is always HDD. That's the difference. > > In your paper, you mention that the typical combination of backend/metadata storage > is HDD/SSD but I think the backend storage nowadays can be SSD. Do you think SSD > deduplicates the data internally and so dm-dedup will not be used in that case? > > As you mention in the future work, variable-length chunking can save metadata but > more complex data management is needed. However, I think avoiding splitting will > make sense with SSD backend. And because you compute hashing for each chunk, CPU > usage is relatively high, so you don't need to worry about the another CPU usage. > > - Akira > > On Wed, 14 Jan 2015 14:43:15 -0500 > Vivek Goyal <vgoyal@redhat.com> wrote: > >> On Thu, Aug 28, 2014 at 06:48:28PM -0400, Vasily Tarasov wrote: >> > This is a second request for comments for dm-dedup. >> > >> > Updates compared to the first submission: >> > >> > - code is updated to kernel 3.16 >> > - construction parameters are now positional (as in other targets) >> > - documentation is extended and brought to the same format as in other targets >> > >> > Dm-dedup is a device-mapper deduplication target. Every write coming to the >> > dm-dedup instance is deduplicated against previously written data. For >> > datasets that contain many duplicates scattered across the disk (e.g., >> > collections of virtual machine disk images and backups) deduplication provides >> > a significant amount of space savings. >> > >> > To quickly identify duplicates, dm-dedup maintains an index of hashes for all >> > written blocks. A block is a user-configurable unit of deduplication with a >> > recommended block size of 4KB. dm-dedup's index, along with other >> > deduplication metadata, resides on a separate block device, which we refer to >> > as a metadata device. Although the metadata device can be on any block >> > device, e.g., an HDD or its own partition, for higher performance we recommend >> > to use SSD devices to store metadata. >> > >> > Dm-dedup is designed to support pluggable metadata backends. A metadata >> > backend is responsible for storing metadata: LBN-to-PBN and HASH-to-PBN >> > mappings, allocation maps, and reference counters. (LBN: Logical Block >> > Number, PBN: Physical Block Number). Currently we implemented "cowbtree" and >> > "inram" backends. The cowbtree uses device-mapper persistent API to store >> > metadata. The inram backend stores all metadata in RAM as a hash table. >> > >> > Detailed design is described here: >> > >> > http://www.fsl.cs.sunysb.edu/docs/ols-dmdedup/dmdedup-ols14.pdf >> > >> > Our preliminary experiments on real traces demonstrate that Dmdedup can even >> > exceed the performance of a disk drive running ext4. The reasons are that (1) >> > deduplication reduces I/O traffic to the data device, and (2) Dmdedup >> > effectively sequentializes random writes to the data device. >> > >> > Dmdedup is developed by a joint group of researchers from Stony Brook >> > University, Harvey Mudd College, and EMC. See the documentation patch for >> > more details. >> >> Hi, >> >> I have quickly browsed through the paper above and have some very >> basic questions. >> >> - What real life workload is really going to benefit from this? Do you >> have any numbers for that? >> >> I see one example of storing multiple linux trees in tar format and for >> the sequential write case with CBT backend performance has almost halfed >> with CBT backend. And we had a dedup ratio of 1.88 (for perfect case). >> >> INRAM numbers I think really don't count because it is not practical to >> keep all metadata in RAM. And the case of keeping all data in NVRAM is >> still little futuristic. >> >> So this sounds like a too huge a performance penalty to me to be really >> useful on real life workloads? >> >> - Why did you implement an inline deduplication as opposed to out-of-line >> deduplication? Section 2 (Timeliness) in paper just mentioned >> out-of-line dedup but does not go into more details that why did you >> choose an in-line one. >> >> I am wondering that will it not make sense to first implement an >> out-of-line dedup and punt lot of cost to worker thread (which kick >> in only when storage is idle). That way even if don't get a high dedup >> ratio for a workload, inserting a dedup target in the stack will be less >> painful from performance point of view. >> >> - You mentioned that random workload will become sequetion with dedup. >> That will be true only if there is a single writer, isn't it? Have >> you run your tests with multiple writers doing random writes and did >> you get same kind of imrovements? >> >> Also on the flip side a seqeuntial file will become random if multiple >> writers are overwriting their sequential file (as you always allocate >> a new block upon overwrite) and that will hit performance. >> >> - What is 4KB chunking? Is it same as saying that block size will be >> 4KB? If yes, I am concerned that this might turn out to be a performance >> bottleneck. >> >> Thanks >> Vivek >> >> -- >> dm-devel mailing list >> dm-devel@redhat.com >> https://www.redhat.com/mailman/listinfo/dm-devel > > > -- > Akira Hayakawa <ruby.wktk@gmail.com> > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFCv2 00/10] dm-dedup: device-mapper deduplication target 2015-01-14 19:43 ` Vivek Goyal 2015-01-15 9:08 ` Akira Hayakawa @ 2015-01-23 16:27 ` Vasily Tarasov 2015-01-30 15:56 ` Vivek Goyal 1 sibling, 1 reply; 9+ messages in thread From: Vasily Tarasov @ 2015-01-23 16:27 UTC (permalink / raw) To: Vivek Goyal Cc: Joe Thornber, Mike Snitzer, Christoph Hellwig, device-mapper development, Philip Shilane, Sonam Mandal, Erez Zadok Hi Vivek, Thanks for reading our paper! Please, find the answers to the issues you raised inline. > Hi, > > I have quickly browsed through the paper above and have some very > basic questions. > > - What real life workload is really going to benefit from this? Do you > have any numbers for that? > > I see one example of storing multiple linux trees in tar format and for > the sequential write case with CBT backend performance has almost halfed > with CBT backend. And we had a dedup ratio of 1.88 (for perfect case). > > INRAM numbers I think really don't count because it is not practical to > keep all metadata in RAM. And the case of keeping all data in NVRAM is > still little futuristic. > > So this sounds like a too huge a performance penalty to me to be really > useful on real life workloads? Dm-dedup is designed so that different metadata backends can be implemented easily. We first implemented Copy-on-Write (COW) backend because device-mapper already provides a COW-based persistent metadata library. That library was specifically designed for various device-mapper targets to store metadata reliably in a common way. Using COW library allows us to use a well-tested code that is already in kernel instead of increasing the code size of our submission. You're right, however, that COW B-tree exhibits relatively high I/O overhead which might not be acceptable in some environments. For such environments, new backends with higher performance will be added in the future. As an example, we present DTB and INRAM backends in the paper. INRAM backend is that simple that we even include it in the submitted patches. We envision it to be used in cases similar to Intel's pmfs file system (persistent memory file system). Persistent memory is not that futuristic anymore, IMHO :) Talking about workloads. Many workloads have uneven performance profiles, so CBT's cache can adsorb peaks and then flush metadata during the lower load phases. In many cases, deduplication ratio is also higher, e.g., file systems that store hundreds of VM disk images, backups, etc. So, we believe that for many situations CBT backend is practical. > > - Why did you implement an inline deduplication as opposed to out-of-line > deduplication? Section 2 (Timeliness) in paper just mentioned > out-of-line dedup but does not go into more details that why did you > choose an in-line one. > > I am wondering that will it not make sense to first implement an > out-of-line dedup and punt lot of cost to worker thread (which kick > in only when storage is idle). That way even if don't get a high dedup > ratio for a workload, inserting a dedup target in the stack will be less > painful from performance point of view. Both in-line and off-line deduplication approaches have their own pluses and minuses. Among the minuses of the off-line approach is that it requires allocation of extra space to buffer non-deduplicated writes, re-reading the data from disk when deduplication happens (i.e. more I/O used). It also complicates space usage accounting and user might run out of space though deduplication process will discover many duplicated blocks later. Our final goal is to support both approaches but for this code submission we wanted to limit the amount of new code. In-line deduplication is a core part, around which we can implement off-line dedup by adding an extra thread that will reuse the same logic as in-line deduplication. > > - You mentioned that random workload will become sequetion with dedup. > That will be true only if there is a single writer, isn't it? Have > you run your tests with multiple writers doing random writes and did > you get same kind of imrovements? > > Also on the flip side a seqeuntial file will become random if multiple > writers are overwriting their sequential file (as you always allocate > a new block upon overwrite) and that will hit performance. Even for multiple random writers the workload at the data device level becomes sequential. The thing is that we allocate blocks on data device as requests are inserted in the I/O queue, no matter which process inserts the request. You're right, however, that as with any log-structured file system, sequential allocation of data blocks in Dm-dedup leads to fragmentation. Blocks that belong to the same file, for example, might not be close if multiple writers wrote these blocks at different times. Moreover, such fragmentaion is a general problem with any deduplication system. In fact, if you have an identical chunk that belongs to two (or more files) in the system, then the file layout is not sequential for all files but one (or none of the files). In future, mechanisms for defragmentation can be implemented to mitigate this effect. > > - What is 4KB chunking? Is it same as saying that block size will be > 4KB? If yes, I am concerned that this might turn out to be a performance > bottleneck. Yes, chunk is a conventional name for a unit of deduplication. Dm-dedup's user can configure chunk's size with respect to his or her workload and performance requirements. Larger chunks generally cause less metadata and more sequentiality on allocation but lower deduplication ratio. Vasily ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFCv2 00/10] dm-dedup: device-mapper deduplication target 2015-01-23 16:27 ` Vasily Tarasov @ 2015-01-30 15:56 ` Vivek Goyal 2015-02-03 16:11 ` Vasily Tarasov 0 siblings, 1 reply; 9+ messages in thread From: Vivek Goyal @ 2015-01-30 15:56 UTC (permalink / raw) To: Vasily Tarasov Cc: Joe Thornber, Mike Snitzer, Christoph Hellwig, device-mapper development, Philip Shilane, Sonam Mandal, Erez Zadok On Fri, Jan 23, 2015 at 11:27:39AM -0500, Vasily Tarasov wrote: [..] > > - Why did you implement an inline deduplication as opposed to out-of-line > > deduplication? Section 2 (Timeliness) in paper just mentioned > > out-of-line dedup but does not go into more details that why did you > > choose an in-line one. > > > > I am wondering that will it not make sense to first implement an > > out-of-line dedup and punt lot of cost to worker thread (which kick > > in only when storage is idle). That way even if don't get a high dedup > > ratio for a workload, inserting a dedup target in the stack will be less > > painful from performance point of view. > > Both in-line and off-line deduplication approaches have their own > pluses and minuses. Among the minuses of the off-line approach is > that it requires allocation of extra space to buffer non-deduplicated > writes, Well, that extra space requirement is temporary. So you got to pay the cost somewhere. Personally, I will be more than happy to consume more disk space when I am writing and not take a hit and let worker threads optimize space usage later. > re-reading the data from disk when deduplication happens (i.e. > more I/O used). Worker threads are supposed to kick in when disk is idle so it might not be as big a concern. > It also complicates space usage accounting and user > might run out of space though deduplication process will discover many > duplicated blocks later. Anyway, user needs to plan for extra space. De-dup is not exact science and one does not know how much will be the de-dup ratio in a data set. > > Our final goal is to support both approaches but for this code > submission we wanted to limit the amount of new code. In-line > deduplication is a core part, around which we can implement off-line > dedup by adding an extra thread that will reuse the same logic as > in-line deduplication. Ok. I am fine with building both if that makes sense. I also understand that there are pros/cons to both the approaches. Just that given the higt cost of inline dedupe, I am finding it little odd that it be implemented first as opposed to offline one. Anyway, I will spend some time on patches now. Thanks Vivek ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFCv2 00/10] dm-dedup: device-mapper deduplication target 2015-01-30 15:56 ` Vivek Goyal @ 2015-02-03 16:11 ` Vasily Tarasov 2015-02-03 16:17 ` Vivek Goyal 0 siblings, 1 reply; 9+ messages in thread From: Vasily Tarasov @ 2015-02-03 16:11 UTC (permalink / raw) To: Vivek Goyal Cc: Joe Thornber, Mike Snitzer, Christoph Hellwig, device-mapper development, Philip Shilane, Sonam Mandal, Erez Zadok Thanks, Vivek. We'll also start working on adding off-line dedup support to Dmdedup. Vasily On Fri, Jan 30, 2015 at 10:56 AM, Vivek Goyal <vgoyal@redhat.com> wrote: > On Fri, Jan 23, 2015 at 11:27:39AM -0500, Vasily Tarasov wrote: > > [..] >> > - Why did you implement an inline deduplication as opposed to out-of-line >> > deduplication? Section 2 (Timeliness) in paper just mentioned >> > out-of-line dedup but does not go into more details that why did you >> > choose an in-line one. >> > >> > I am wondering that will it not make sense to first implement an >> > out-of-line dedup and punt lot of cost to worker thread (which kick >> > in only when storage is idle). That way even if don't get a high dedup >> > ratio for a workload, inserting a dedup target in the stack will be less >> > painful from performance point of view. >> >> Both in-line and off-line deduplication approaches have their own >> pluses and minuses. Among the minuses of the off-line approach is >> that it requires allocation of extra space to buffer non-deduplicated >> writes, > > Well, that extra space requirement is temporary. So you got to pay the cost > somewhere. Personally, I will be more than happy to consume more disk > space when I am writing and not take a hit and let worker threads optimize > space usage later. > >> re-reading the data from disk when deduplication happens (i.e. >> more I/O used). > > Worker threads are supposed to kick in when disk is idle so it might not > be as big a concern. > >> It also complicates space usage accounting and user >> might run out of space though deduplication process will discover many >> duplicated blocks later. > > Anyway, user needs to plan for extra space. De-dup is not exact science > and one does not know how much will be the de-dup ratio in a data set. > >> >> Our final goal is to support both approaches but for this code >> submission we wanted to limit the amount of new code. In-line >> deduplication is a core part, around which we can implement off-line >> dedup by adding an extra thread that will reuse the same logic as >> in-line deduplication. > > Ok. I am fine with building both if that makes sense. > > I also understand that there are pros/cons to both the approaches. Just > that given the higt cost of inline dedupe, I am finding it little odd > that it be implemented first as opposed to offline one. > > Anyway, I will spend some time on patches now. > > Thanks > Vivek > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFCv2 00/10] dm-dedup: device-mapper deduplication target 2015-02-03 16:11 ` Vasily Tarasov @ 2015-02-03 16:17 ` Vivek Goyal 0 siblings, 0 replies; 9+ messages in thread From: Vivek Goyal @ 2015-02-03 16:17 UTC (permalink / raw) To: Vasily Tarasov Cc: Joe Thornber, Mike Snitzer, Christoph Hellwig, device-mapper development, Philip Shilane, Sonam Mandal, Erez Zadok On Tue, Feb 03, 2015 at 11:11:07AM -0500, Vasily Tarasov wrote: > Thanks, Vivek. We'll also start working on adding off-line dedup > support to Dmdedup. Ok, thanks vasily. Let us first review and improve the existing patches for in-line dedup. Once things are in good shape and ready to be merged, then you can look at off-line dedupe. Don't want to bloat the size of patches which contain both in-line and off-line dedupe implementation. Thanks Vivek > > Vasily > > On Fri, Jan 30, 2015 at 10:56 AM, Vivek Goyal <vgoyal@redhat.com> wrote: > > On Fri, Jan 23, 2015 at 11:27:39AM -0500, Vasily Tarasov wrote: > > > > [..] > >> > - Why did you implement an inline deduplication as opposed to out-of-line > >> > deduplication? Section 2 (Timeliness) in paper just mentioned > >> > out-of-line dedup but does not go into more details that why did you > >> > choose an in-line one. > >> > > >> > I am wondering that will it not make sense to first implement an > >> > out-of-line dedup and punt lot of cost to worker thread (which kick > >> > in only when storage is idle). That way even if don't get a high dedup > >> > ratio for a workload, inserting a dedup target in the stack will be less > >> > painful from performance point of view. > >> > >> Both in-line and off-line deduplication approaches have their own > >> pluses and minuses. Among the minuses of the off-line approach is > >> that it requires allocation of extra space to buffer non-deduplicated > >> writes, > > > > Well, that extra space requirement is temporary. So you got to pay the cost > > somewhere. Personally, I will be more than happy to consume more disk > > space when I am writing and not take a hit and let worker threads optimize > > space usage later. > > > >> re-reading the data from disk when deduplication happens (i.e. > >> more I/O used). > > > > Worker threads are supposed to kick in when disk is idle so it might not > > be as big a concern. > > > >> It also complicates space usage accounting and user > >> might run out of space though deduplication process will discover many > >> duplicated blocks later. > > > > Anyway, user needs to plan for extra space. De-dup is not exact science > > and one does not know how much will be the de-dup ratio in a data set. > > > >> > >> Our final goal is to support both approaches but for this code > >> submission we wanted to limit the amount of new code. In-line > >> deduplication is a core part, around which we can implement off-line > >> dedup by adding an extra thread that will reuse the same logic as > >> in-line deduplication. > > > > Ok. I am fine with building both if that makes sense. > > > > I also understand that there are pros/cons to both the approaches. Just > > that given the higt cost of inline dedupe, I am finding it little odd > > that it be implemented first as opposed to offline one. > > > > Anyway, I will spend some time on patches now. > > > > Thanks > > Vivek > > ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-02-03 16:17 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-08-28 22:48 [PATCH RFCv2 00/10] dm-dedup: device-mapper deduplication target Vasily Tarasov 2014-12-03 2:31 ` Darrick J. Wong 2015-01-14 19:43 ` Vivek Goyal 2015-01-15 9:08 ` Akira Hayakawa 2015-01-23 16:34 ` Vasily Tarasov 2015-01-23 16:27 ` Vasily Tarasov 2015-01-30 15:56 ` Vivek Goyal 2015-02-03 16:11 ` Vasily Tarasov 2015-02-03 16:17 ` Vivek Goyal
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.