linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v10 00/16] Online(inband) data deduplication
@ 2014-04-10  3:48 Liu Bo
  2014-04-10  3:48 ` [PATCH v10 01/16] Btrfs: disable qgroups accounting when quota_enable is 0 Liu Bo
                   ` (19 more replies)
  0 siblings, 20 replies; 26+ messages in thread
From: Liu Bo @ 2014-04-10  3:48 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Marcel Ritter, Christian Robert, alanqk, Konstantinos Skarlatos,
	David Sterba, Martin Steigerwald, Josef Bacik, Chris Mason

Hello,

This the 10th attempt for in-band data dedupe, based on Linux _3.14_ kernel.

Data deduplication is a specialized data compression technique for eliminating
duplicate copies of repeating data.[1]

This patch set is also related to "Content based storage" in project ideas[2],
it introduces inband data deduplication for btrfs and dedup/dedupe is for short.

* PATCH 1 is a speed-up improvement, which is about dedup and quota.

* PATCH 2-5 is the preparation work for dedup implementation.

* PATCH 6 shows how we implement dedup feature.

* PATCH 7 fixes a backref walking bug with dedup.

* PATCH 8 fixes a free space bug of dedup extents on error handling.

* PATCH 9 adds the ioctl to control dedup feature.

* PATCH 10 targets delayed refs' scalability problem of deleting refs, which is 
  uncovered by the dedup feature.

* PATCH 11-16 fixes bugs of dedupe including race bug, deadlock, abnormal
  transaction abortion and crash.

* btrfs-progs patch(PATCH 17) offers all details about how to control the
  dedup feature on progs side.

I've tested this with xfstests by adding a inline dedup 'enable & on' in xfstests'
mount and scratch_mount.


***NOTE***
Known bugs:
* Mounting with options "flushoncommit" and enabling dedupe feature will end up
  with _deadlock_.


TODO:
* a bit-to-bit comparison callback.

All comments are welcome!


[1]: http://en.wikipedia.org/wiki/Data_deduplication
[2]: https://btrfs.wiki.kernel.org/index.php/Project_ideas#Content_based_storage

v10:
- fix a typo in the subject line.
- update struct 'btrfs_ioctl_dedup_args' in the kernel side to fix
  'Inappropriate ioctl for device'.

v9:
- fix a deadlock and a crash reported by users.
- fix the metadata ENOSPC problem with dedup again.

v8:
- fix the race crash of dedup ref again.
- fix the metadata ENOSPC problem with dedup.

v7:
- rebase onto the lastest btrfs
- break a big patch into smaller ones to make reviewers happy.
- kill mount options of dedup and use ioctl method instead.
- fix two crash due to the special dedup ref

For former patch sets:
v6: http://thread.gmane.org/gmane.comp.file-systems.btrfs/27512
v5: http://thread.gmane.org/gmane.comp.file-systems.btrfs/27257
v4: http://thread.gmane.org/gmane.comp.file-systems.btrfs/25751
v3: http://comments.gmane.org/gmane.comp.file-systems.btrfs/25433
v2: http://comments.gmane.org/gmane.comp.file-systems.btrfs/24959

Liu Bo (16):
  Btrfs: disable qgroups accounting when quota_enable is 0
  Btrfs: introduce dedup tree and relatives
  Btrfs: introduce dedup tree operations
  Btrfs: introduce dedup state
  Btrfs: make ordered extent aware of dedup
  Btrfs: online(inband) data dedup
  Btrfs: skip dedup reference during backref walking
  Btrfs: don't return space for dedup extent
  Btrfs: add ioctl of dedup control
  Btrfs: improve the delayed refs process in rm case
  Btrfs: fix a crash of dedup ref
  Btrfs: fix deadlock of dedup work
  Btrfs: fix transactin abortion in __btrfs_free_extent
  Btrfs: fix wrong pinned bytes in __btrfs_free_extent
  Btrfs: use total_bytes instead of bytes_used for global_rsv
  Btrfs: fix dedup enospc problem

 fs/btrfs/backref.c           |   9 +
 fs/btrfs/ctree.c             |   2 +-
 fs/btrfs/ctree.h             |  86 ++++++
 fs/btrfs/delayed-ref.c       |  26 +-
 fs/btrfs/delayed-ref.h       |   3 +
 fs/btrfs/disk-io.c           |  37 +++
 fs/btrfs/extent-tree.c       | 235 +++++++++++++---
 fs/btrfs/extent_io.c         |  22 +-
 fs/btrfs/extent_io.h         |  16 ++
 fs/btrfs/file-item.c         | 244 +++++++++++++++++
 fs/btrfs/inode.c             | 635 ++++++++++++++++++++++++++++++++++++++-----
 fs/btrfs/ioctl.c             | 167 ++++++++++++
 fs/btrfs/ordered-data.c      |  44 ++-
 fs/btrfs/ordered-data.h      |  13 +-
 fs/btrfs/qgroup.c            |   3 +
 fs/btrfs/relocation.c        |   3 +
 fs/btrfs/transaction.c       |  41 +++
 fs/btrfs/transaction.h       |   1 +
 include/trace/events/btrfs.h |   3 +-
 include/uapi/linux/btrfs.h   |  12 +
 20 files changed, 1471 insertions(+), 131 deletions(-)

-- 
1.8.2.1

^ permalink raw reply	[flat|nested] 26+ messages in thread
* [PATCH v4] Btrfs-progs: add dedup subcommand
@ 2014-04-09  7:08 Liu Bo
  2014-04-09 10:10 ` [PATCH v5] " Liu Bo
  0 siblings, 1 reply; 26+ messages in thread
From: Liu Bo @ 2014-04-09  7:08 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Marcel Ritter, Christian Robert, alanqk, Konstantinos Skarlatos,
	David Sterba, Martin Steigerwald, Josef Bacik, Chris Mason

This adds deduplication subcommands, 'btrfs dedup command <path>',
including enable/disable/on/off.

- btrfs dedup enable
Create the dedup tree, and it's the very first step when you're going to use
the dedup feature.

- btrfs dedup disable
Delete the dedup tree, after this we're not able to use dedup any more unless
you enable it again.

- btrfs dedup on [-b]
Switch on the dedup feature temporarily, and it's the second step of applying
dedup with writes.  Option '-b' is used to set dedup blocksize.
The default blocksize is 8192(no special reason, you may argue), and the current
limit is [4096, 128 * 1024], because 4K is the generic page size and 128K is the
upper limit of btrfs's compression.

- btrfs dedup off
Switch off the dedup feature temporarily, but the dedup tree remains.

---------------------------------------------------------
Usage:
Step 1: btrfs dedup enable /btrfs
Step 2: btrfs dedup on /btrfs or btrfs dedup on -b 4K /btrfs
Step 3: now we have dedup, run your test.
Step 4: btrfs dedup off /btrfs
Step 5: btrfs dedup disable /btrfs
---------------------------------------------------------

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
v4: rebase and reserve spare space in btrfs_ioctl_dedup_args struct. 
v3: add commands 'btrfs dedup on/off'
v2: add manpage


 Makefile       |   3 +-
 btrfs.c        |   1 +
 cmds-dedup.c   | 178 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 commands.h     |   2 +
 ctree.h        |   2 +
 ioctl.h        |  13 +++++
 man/btrfs.8.in |  31 +++++++++-
 7 files changed, 226 insertions(+), 4 deletions(-)
 create mode 100644 cmds-dedup.c

diff --git a/Makefile b/Makefile
index 0874a41..092f2db 100644
--- a/Makefile
+++ b/Makefile
@@ -13,7 +13,8 @@ objects = ctree.o disk-io.o radix-tree.o extent-tree.o print-tree.o \
 cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
 	       cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
 	       cmds-quota.o cmds-qgroup.o cmds-replace.o cmds-check.o \
-	       cmds-restore.o cmds-rescue.o chunk-recover.o super-recover.o
+	       cmds-restore.o cmds-rescue.o chunk-recover.o super-recover.o \
+	       cmds-dedup.o
 libbtrfs_objects = send-stream.o send-utils.o rbtree.o btrfs-list.o crc32c.o \
 		   uuid-tree.o
 libbtrfs_headers = send-stream.h send-utils.h send.h rbtree.h btrfs-list.h \
diff --git a/btrfs.c b/btrfs.c
index d5fc738..dfae35f 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -255,6 +255,7 @@ static const struct cmd_group btrfs_cmd_group = {
 		{ "quota", cmd_quota, NULL, &quota_cmd_group, 0 },
 		{ "qgroup", cmd_qgroup, NULL, &qgroup_cmd_group, 0 },
 		{ "replace", cmd_replace, NULL, &replace_cmd_group, 0 },
+		{ "dedup", cmd_dedup, NULL, &dedup_cmd_group, 0 },
 		{ "help", cmd_help, cmd_help_usage, NULL, 0 },
 		{ "version", cmd_version, cmd_version_usage, NULL, 0 },
 		NULL_CMD_STRUCT
diff --git a/cmds-dedup.c b/cmds-dedup.c
new file mode 100644
index 0000000..b959349
--- /dev/null
+++ b/cmds-dedup.c
@@ -0,0 +1,178 @@
+/*
+ * Copyright (C) 2013 Oracle.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include <sys/ioctl.h>
+#include <unistd.h>
+#include <getopt.h>
+
+#include "ctree.h"
+#include "ioctl.h"
+
+#include "commands.h"
+#include "utils.h"
+
+static const char * const dedup_cmd_group_usage[] = {
+	"btrfs dedup <command> [options] <path>",
+	NULL
+};
+
+int dedup_ctl(char *path, struct btrfs_ioctl_dedup_args *args)
+{
+	int ret = 0;
+	int fd;
+	int e;
+	DIR *dirstream = NULL;
+
+	fd = open_file_or_dir(path, &dirstream);
+	if (fd < 0) {
+		fprintf(stderr, "ERROR: can't access '%s'\n", path);
+		return -EACCES;
+	}
+
+	ret = ioctl(fd, BTRFS_IOC_DEDUP_CTL, args);
+	e = errno;
+	close_file_or_dir(fd, dirstream);
+	if (ret < 0) {
+		fprintf(stderr, "ERROR: dedup command failed: %s\n",
+			strerror(e));
+		if (args->cmd == BTRFS_DEDUP_CTL_DISABLE ||
+		    args->cmd == BTRFS_DEDUP_CTL_SET_BS)
+			fprintf(stderr, "please refer to 'dmesg | tail' for more info\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static const char * const cmd_dedup_enable_usage[] = {
+	"btrfs dedup enable <path>",
+	"Enable data deduplication support for a filesystem.",
+	NULL
+};
+
+static int cmd_dedup_enable(int argc, char **argv)
+{
+	struct btrfs_ioctl_dedup_args dargs;
+
+	if (check_argc_exact(argc, 2))
+		usage(cmd_dedup_enable_usage);
+
+	dargs.cmd = BTRFS_DEDUP_CTL_ENABLE;
+
+	return dedup_ctl(argv[1], &dargs);
+}
+
+static const char * const cmd_dedup_disable_usage[] = {
+	"btrfs dedup disable <path>",
+	"Disable data deduplication support for a filesystem.",
+	NULL
+};
+
+static int cmd_dedup_disable(int argc, char **argv)
+{
+	struct btrfs_ioctl_dedup_args dargs;
+
+	if (check_argc_exact(argc, 2))
+		usage(cmd_dedup_disable_usage);
+
+	dargs.cmd = BTRFS_DEDUP_CTL_DISABLE;
+
+	return dedup_ctl(argv[1], &dargs);
+}
+
+static int dedup_set_bs(char *path, struct btrfs_ioctl_dedup_args *dargs)
+{
+	return dedup_ctl(path, dargs);
+}
+
+static const char * const cmd_dedup_on_usage[] = {
+	"btrfs dedup on [-b|--bs size] <path>",
+	"Switch on data deduplication or change the dedup blocksize.",
+	"",
+	"-b|--bs <size>  set dedup blocksize",
+	NULL
+};
+
+static struct option longopts[] = {
+	{"bs", required_argument, NULL, 'b'},
+	{0, 0, 0, 0}
+};
+
+static int cmd_dedup_on(int argc, char **argv)
+{
+	struct btrfs_ioctl_dedup_args dargs;
+	u64 bs = 8192;
+
+	optind = 1;
+	while (1) {
+		int longindex;
+
+		int c = getopt_long(argc, argv, "b:", longopts, &longindex);
+		if (c < 0)
+			break;
+
+		switch (c) {
+		case 'b':
+			bs = parse_size(optarg);
+			break;
+		default:
+			usage(cmd_dedup_on_usage);
+		}
+	}
+
+	if (check_argc_exact(argc - optind, 1))
+		usage(cmd_dedup_on_usage);
+
+	dargs.cmd = BTRFS_DEDUP_CTL_SET_BS;
+	dargs.bs = bs;
+
+	return dedup_set_bs(argv[optind], &dargs);
+}
+
+static const char * const cmd_dedup_off_usage[] = {
+	"btrfs dedup off <path>",
+	"Switch off data deduplication.",
+	NULL
+};
+
+static int cmd_dedup_off(int argc, char **argv)
+{
+	struct btrfs_ioctl_dedup_args dargs;
+
+	if (check_argc_exact(argc, 2))
+		usage(cmd_dedup_off_usage);
+
+	dargs.cmd = BTRFS_DEDUP_CTL_SET_BS;
+	dargs.bs = 0;
+
+	return dedup_set_bs(argv[1], &dargs);
+}
+
+const struct cmd_group dedup_cmd_group = {
+	dedup_cmd_group_usage, NULL, {
+		{ "enable", cmd_dedup_enable, cmd_dedup_enable_usage, NULL, 0 },
+		{ "disable", cmd_dedup_disable, cmd_dedup_disable_usage, 0, 0 },
+		{ "on", cmd_dedup_on, cmd_dedup_on_usage, NULL, 0},
+		{ "off", cmd_dedup_off, cmd_dedup_off_usage, NULL, 0},
+		{ 0, 0, 0, 0, 0 }
+	}
+};
+
+int cmd_dedup(int argc, char **argv)
+{
+	return handle_command_group(&dedup_cmd_group, argc, argv);
+}
diff --git a/commands.h b/commands.h
index b791d68..6fccc15 100644
--- a/commands.h
+++ b/commands.h
@@ -91,6 +91,7 @@ extern const struct cmd_group quota_cmd_group;
 extern const struct cmd_group qgroup_cmd_group;
 extern const struct cmd_group replace_cmd_group;
 extern const struct cmd_group rescue_cmd_group;
+extern const struct cmd_group dedup_cmd_group;
 
 extern const char * const cmd_send_usage[];
 extern const char * const cmd_receive_usage[];
@@ -119,6 +120,7 @@ int cmd_select_super(int argc, char **argv);
 int cmd_dump_super(int argc, char **argv);
 int cmd_debug_tree(int argc, char **argv);
 int cmd_rescue(int argc, char **argv);
+int cmd_dedup(int argc, char **argv);
 
 /* subvolume exported functions */
 int test_issubvolume(char *path);
diff --git a/ctree.h b/ctree.h
index 2117374..dbd86ec 100644
--- a/ctree.h
+++ b/ctree.h
@@ -470,6 +470,7 @@ struct btrfs_super_block {
 #define BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF	(1ULL << 6)
 #define BTRFS_FEATURE_INCOMPAT_RAID56		(1ULL << 7)
 #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA	(1ULL << 8)
+#define BTRFS_FEATURE_INCOMPAT_DEDUP		(1ULL << 10)
 
 
 #define BTRFS_FEATURE_COMPAT_SUPP		0ULL
@@ -482,6 +483,7 @@ struct btrfs_super_block {
 	 BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF |		\
 	 BTRFS_FEATURE_INCOMPAT_RAID56 |		\
 	 BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS |		\
+	 BTRFS_FEATURE_INCOMPAT_DEDUP |			\
 	 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA)
 
 /*
diff --git a/ioctl.h b/ioctl.h
index a589cd7..d812853 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -430,6 +430,16 @@ struct btrfs_ioctl_get_dev_stats {
 	__u64 unused[128 - 2 - BTRFS_DEV_STAT_VALUES_MAX]; /* pad to 1k */
 };
 
+/* deduplication control ioctl modes */
+#define BTRFS_DEDUP_CTL_ENABLE 1
+#define BTRFS_DEDUP_CTL_DISABLE 2
+#define BTRFS_DEDUP_CTL_SET_BS 3
+struct btrfs_ioctl_dedup_args {
+	__u64 cmd;
+	__u64 bs;
+	__u64 unused[14]; /* pad to 128 bytes */
+};
+
 /* BTRFS_IOC_SNAP_CREATE is no longer used by the btrfs command */
 #define BTRFS_QUOTA_CTL_ENABLE	1
 #define BTRFS_QUOTA_CTL_DISABLE	2
@@ -593,6 +603,9 @@ struct btrfs_ioctl_clone_range_args {
 				      struct btrfs_ioctl_get_dev_stats)
 #define BTRFS_IOC_DEV_REPLACE _IOWR(BTRFS_IOCTL_MAGIC, 53, \
 				    struct btrfs_ioctl_dev_replace_args)
+#define BTRFS_IOC_DEDUP_CTL _IOWR(BTRFS_IOCTL_MAGIC, 55, \
+				  struct btrfs_ioctl_dedup_args)
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/man/btrfs.8.in b/man/btrfs.8.in
index b620348..56cdf1b 100644
--- a/man/btrfs.8.in
+++ b/man/btrfs.8.in
@@ -109,13 +109,22 @@ btrfs \- control a btrfs filesystem
 .PP
 \fBbtrfs\fP \fBqgroup limit\fP [\fIoptions\fP] \fI<size>\fP|\fBnone\fP [\fI<qgroupid>\fP] \fI<path>\fP
 .PP
-.PP
 \fBbtrfs\fP \fBreplace start\fP [-Bfr] \fI<srcdev>\fP|\fI<devid> <targetdev> <mount_point>\fP
 .PP
 \fBbtrfs\fP \fBreplace status\fP [-1] \fI<mount_point>\fP
 .PP
 \fBbtrfs\fP \fBreplace cancel\fP \fI<mount_point>\fP
 .PP
+\fBbtrfs\fP \fBdedup enable\fP \fI<path>\fP
+.PP
+\fBbtrfs\fP \fBdedup disable\fP \fI<path>\fP
+.PP
+\fBbtrfs\fP \fBdedup on\fP [-b|--bs \fIsize\fP] \fI<path>\fP
+.PP
+\fBbtrfs\fP \fBdedup off\fP \fI<path>\fP
+.PP
+.PP
+
 \fBbtrfs\fP \fBhelp|\-\-help \fP
 .PP
 \fBbtrfs\fP \fB<command> \-\-help \fP
@@ -739,12 +748,28 @@ Print status and progress information of a running device replace operation.
 .IP "\fB-1\fP" 5
 print once instead of print continuously until the replace
 operation finishes (or is canceled)
-.RE
-.TP
 
 \fBreplace cancel\fR \fI<mount_point>\fR
 Cancel a running device replace operation.
 .RE
+.TP
+
+\fBdedup enable\fP \fI<path>\fP
+Enable data deduplication support for a filesystem.
+.TP
+
+\fBdedup disable\fP \fI<path>\fP
+Disable data deduplication support for a filesystem.
+.TP
+
+\fBdedup on\fP [-b|--bs \fIsize\fP] \fI<path>\fP
+Switch on data deduplication or change the dedup blocksize.
+.TP
+
+\fBdedup off\fP \fI<path>\fP
+Switch off data deduplication.
+.RE
+.TP
 
 .SH EXIT STATUS
 \fBbtrfs\fR returns a zero exist status if it succeeds. Non zero is returned in
-- 
1.8.2.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-04-14  8:41 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-10  3:48 [RFC PATCH v10 00/16] Online(inband) data deduplication Liu Bo
2014-04-10  3:48 ` [PATCH v10 01/16] Btrfs: disable qgroups accounting when quota_enable is 0 Liu Bo
2014-04-10  3:48 ` [PATCH v10 02/16] Btrfs: introduce dedup tree and relatives Liu Bo
2014-04-10  3:48 ` [PATCH v10 03/16] Btrfs: introduce dedup tree operations Liu Bo
2014-04-10  3:48 ` [PATCH v10 04/16] Btrfs: introduce dedup state Liu Bo
2014-04-10  3:48 ` [PATCH v10 05/16] Btrfs: make ordered extent aware of dedup Liu Bo
2014-04-10  3:48 ` [PATCH v10 06/16] Btrfs: online(inband) data dedup Liu Bo
2014-04-10  3:48 ` [PATCH v10 07/16] Btrfs: skip dedup reference during backref walking Liu Bo
2014-04-10  3:48 ` [PATCH v10 08/16] Btrfs: don't return space for dedup extent Liu Bo
2014-04-10  3:48 ` [PATCH v10 09/16] Btrfs: add ioctl of dedup control Liu Bo
2014-04-10  3:48 ` [PATCH v10 10/16] Btrfs: improve the delayed refs process in rm case Liu Bo
2014-04-10  3:48 ` [PATCH v10 11/16] Btrfs: fix a crash of dedup ref Liu Bo
2014-04-10  3:48 ` [PATCH v10 12/16] Btrfs: fix deadlock of dedup work Liu Bo
2014-04-10  3:48 ` [PATCH v10 13/16] Btrfs: fix transactin abortion in __btrfs_free_extent Liu Bo
2014-04-10  3:48 ` [PATCH v10 14/16] Btrfs: fix wrong pinned bytes " Liu Bo
2014-04-10  3:48 ` [PATCH v10 15/16] Btrfs: use total_bytes instead of bytes_used for global_rsv Liu Bo
2014-04-10  3:48 ` [PATCH v10 16/16] Btrfs: fix dedup enospc problem Liu Bo
2014-04-10  3:48 ` [PATCH v5] Btrfs-progs: add dedup subcommand Liu Bo
2014-04-10  9:08 ` [RFC PATCH v10 00/16] Online(inband) data deduplication Konstantinos Skarlatos
2014-04-10 15:44   ` Liu Bo
2014-04-10 15:55 ` Liu Bo
2014-04-11  9:28   ` Martin Steigerwald
2014-04-11  9:51     ` Liu Bo
2014-04-14  8:41 ` Test results for " Konstantinos Skarlatos
  -- strict thread matches above, loose matches on Subject: below --
2014-04-09  7:08 [PATCH v4] Btrfs-progs: add dedup subcommand Liu Bo
2014-04-09 10:10 ` [PATCH v5] " Liu Bo
2014-04-09 10:14   ` Liu Bo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).