public inbox for util-linux@vger.kernel.org
 help / color / mirror / Atom feed
* fallocate: Add "--dig-holes" options and minor fixes
@ 2014-01-25 19:17 Rodrigo Campos
  2014-01-25 19:17 ` [PATCH 1/3] fallocate: Clarify that space can also be deallocated Rodrigo Campos
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Rodrigo Campos @ 2014-01-25 19:17 UTC (permalink / raw)
  To: util-linux

Hi,

As discussed in this[1] gmane thread, I'm submitting the funtionality to dig
holes as part of the fallocate(1) command.

When using this option, the file is searched for chunks of '\0's and digs a hole
when possible. The end result should (if a hole was found) an sparse file
created in-place.

The patchset consist of 3 patches, the first one:

	[PATCH 1/3] fallocate: Clarify that space can also be deallocated

is just a fix to make clear that the fallocate tool as already is, is capable of
dellocations too (not only preallocations).

The next 2 patches are:

	[PATCH 2/3] fallocate: Hide #ifdef tricks to call fallocate in a

	Which creates a function that has the #ifdef, that will be useful when
we call fallocate from different places in the code (i.e. in the next patch)

	[PATCH 3/3] fallocate: Add "--dig-holes" option

	That just adds the option and updates the manpage.


Comments and suggestions totally welcome! :)




Thanks a lot,
Rodrigo

[1]: http://article.gmane.org/gmane.linux.utilities.util-linux-ng/8415


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/3] fallocate: Clarify that space can also be deallocated
  2014-01-25 19:17 fallocate: Add "--dig-holes" options and minor fixes Rodrigo Campos
@ 2014-01-25 19:17 ` Rodrigo Campos
  2014-01-25 19:17 ` [PATCH 2/3] fallocate: Hide #ifdef tricks to call fallocate in a function Rodrigo Campos
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 20+ messages in thread
From: Rodrigo Campos @ 2014-01-25 19:17 UTC (permalink / raw)
  To: util-linux; +Cc: Rodrigo Campos

The functionality is already there, with --punch-hole, but the text was for the
preallocation case only.

Signed-off-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
---
 sys-utils/fallocate.1 | 11 ++++++-----
 sys-utils/fallocate.c |  4 ++--
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/sys-utils/fallocate.1 b/sys-utils/fallocate.1
index d462cee..efa42c1 100644
--- a/sys-utils/fallocate.1
+++ b/sys-utils/fallocate.1
@@ -1,7 +1,7 @@
 .\" -*- nroff -*-
 .TH FALLOCATE 1 "September 2011" "util-linux" "User Commands"
 .SH NAME
-fallocate \- preallocate space to a file
+fallocate \- preallocate or deallocate space to a file
 .SH SYNOPSIS
 .B fallocate
 .RB [ \-n ]
@@ -13,10 +13,11 @@ fallocate \- preallocate space to a file
 .I filename
 .SH DESCRIPTION
 .B fallocate
-is used to preallocate blocks to a file.  For filesystems which support the
-fallocate system call, this is done quickly by allocating blocks and marking
-them as uninitialized, requiring no IO to the data blocks.  This is much faster
-than creating a file by filling it with zeros.
+is used to manipulate the allocated disk space for a file, either to deallocate
+or preallocate it. For filesystems which support the fallocate system call,
+preallocation is done quickly by allocating blocks and marking them as
+uninitialized, requiring no IO to the data blocks. This is much faster than
+creating a file by filling it with zeros.
 .PP
 As of the Linux Kernel v2.6.31, the fallocate system call is supported by the
 btrfs, ext4, ocfs2, and xfs filesystems.
diff --git a/sys-utils/fallocate.c b/sys-utils/fallocate.c
index cd35b2d..55e8411 100644
--- a/sys-utils/fallocate.c
+++ b/sys-utils/fallocate.c
@@ -62,8 +62,8 @@ static void __attribute__((__noreturn__)) usage(FILE *out)
 	fputs(USAGE_OPTIONS, out);
 	fputs(_(" -n, --keep-size     don't modify the length of the file\n"
 		" -p, --punch-hole    punch holes in the file\n"
-		" -o, --offset <num>  offset of the allocation, in bytes\n"
-		" -l, --length <num>  length of the allocation, in bytes\n"), out);
+		" -o, --offset <num>  offset of the (de)allocation, in bytes\n"
+		" -l, --length <num>  length of the (de)allocation, in bytes\n"), out);
 	fputs(USAGE_SEPARATOR, out);
 	fputs(USAGE_HELP, out);
 	fputs(USAGE_VERSION, out);
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/3] fallocate: Hide #ifdef tricks to call fallocate in a function
  2014-01-25 19:17 fallocate: Add "--dig-holes" options and minor fixes Rodrigo Campos
  2014-01-25 19:17 ` [PATCH 1/3] fallocate: Clarify that space can also be deallocated Rodrigo Campos
@ 2014-01-25 19:17 ` Rodrigo Campos
  2014-01-25 19:17 ` [PATCH 3/3] fallocate: Add "--dig-holes" option Rodrigo Campos
  2014-01-25 19:23 ` fallocate: Add "--dig-holes" options and minor fixes Rodrigo Campos
  3 siblings, 0 replies; 20+ messages in thread
From: Rodrigo Campos @ 2014-01-25 19:17 UTC (permalink / raw)
  To: util-linux; +Cc: Rodrigo Campos

Future patches will add more calls to fallocate(), so it will be useful to have
all these tricks inside a function.

The error message when fallocate is not supported is slightly changed: the file
name is not printed as a prefix because is not available in the context of the
function. Also, to only print one of the two possible errors (as happens when
using directly exit()), an else clause was added.

Signed-off-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
---
 sys-utils/fallocate.c | 43 ++++++++++++++++++++++++++++---------------
 1 file changed, 28 insertions(+), 15 deletions(-)

diff --git a/sys-utils/fallocate.c b/sys-utils/fallocate.c
index 55e8411..5c66553 100644
--- a/sys-utils/fallocate.c
+++ b/sys-utils/fallocate.c
@@ -82,6 +82,30 @@ static loff_t cvtnum(char *s)
 	return x;
 }
 
+static int xfallocate(int fd, int mode, off_t offset, off_t length)
+{
+	int error;
+
+#ifdef HAVE_FALLOCATE
+	error = fallocate(fd, mode, offset, length);
+#else
+	error = syscall(SYS_fallocate, fd, mode, offset, length);
+#endif
+	/*
+	 * EOPNOTSUPP: The FALLOC_FL_KEEP_SIZE is unsupported
+	 * ENOSYS: The filesystem does not support sys_fallocate
+	 */
+	if (error < 0) {
+		if ((mode & FALLOC_FL_KEEP_SIZE) && errno == EOPNOTSUPP) {
+			fputs(_("keep size mode (-n option) unsupported\n"),
+			      stderr);
+		} else {
+			fputs(_("fallocate failed\n"), stderr);
+		}
+	}
+	return error;
+}
+
 int main(int argc, char **argv)
 {
 	char	*fname;
@@ -153,21 +177,10 @@ int main(int argc, char **argv)
 	if (fd < 0)
 		err(EXIT_FAILURE, _("cannot open %s"), fname);
 
-#ifdef HAVE_FALLOCATE
-	error = fallocate(fd, mode, offset, length);
-#else
-	error = syscall(SYS_fallocate, fd, mode, offset, length);
-#endif
-	/*
-	 * EOPNOTSUPP: The FALLOC_FL_KEEP_SIZE is unsupported
-	 * ENOSYS: The filesystem does not support sys_fallocate
-	 */
-	if (error < 0) {
-		if ((mode & FALLOC_FL_KEEP_SIZE) && errno == EOPNOTSUPP)
-			errx(EXIT_FAILURE,
-				_("keep size mode (-n option) unsupported"));
-		err(EXIT_FAILURE, _("%s: fallocate failed"), fname);
-	}
+	error = xfallocate(fd, mode, offset, length);
+
+	if (error < 0)
+		exit(EXIT_FAILURE);
 
 	if (close_fd(fd) != 0)
 		err(EXIT_FAILURE, _("write failed: %s"), fname);
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/3] fallocate: Add "--dig-holes" option
  2014-01-25 19:17 fallocate: Add "--dig-holes" options and minor fixes Rodrigo Campos
  2014-01-25 19:17 ` [PATCH 1/3] fallocate: Clarify that space can also be deallocated Rodrigo Campos
  2014-01-25 19:17 ` [PATCH 2/3] fallocate: Hide #ifdef tricks to call fallocate in a function Rodrigo Campos
@ 2014-01-25 19:17 ` Rodrigo Campos
  2014-01-26 14:43   ` [PATCH v2] " Rodrigo Campos
  2014-01-25 19:23 ` fallocate: Add "--dig-holes" options and minor fixes Rodrigo Campos
  3 siblings, 1 reply; 20+ messages in thread
From: Rodrigo Campos @ 2014-01-25 19:17 UTC (permalink / raw)
  To: util-linux; +Cc: Rodrigo Campos

This option tries to detect chunk of '\0's and punch a hole, making the file
sparse in-place.

Signed-off-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
---
 bash-completion/fallocate |   2 +-
 sys-utils/fallocate.1     |  19 +++++++-
 sys-utils/fallocate.c     | 114 ++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 120 insertions(+), 15 deletions(-)

diff --git a/bash-completion/fallocate b/bash-completion/fallocate
index 2c6e4cb..5fc58c0 100644
--- a/bash-completion/fallocate
+++ b/bash-completion/fallocate
@@ -15,7 +15,7 @@ _fallocate_module()
 	esac
 	case $cur in
 		-*)
-			OPTS="--keep-size --punch-hole --offset --length --help --version"
+			OPTS="--keep-size --punch-hole --detect-holes --offset --length --help --version"
 			COMPREPLY=( $(compgen -W "${OPTS[*]}" -- $cur) )
 			return 0
 			;;
diff --git a/sys-utils/fallocate.1 b/sys-utils/fallocate.1
index efa42c1..ac8e61d 100644
--- a/sys-utils/fallocate.1
+++ b/sys-utils/fallocate.1
@@ -11,6 +11,12 @@ fallocate \- preallocate or deallocate space to a file
 .B \-l
 .IR length
 .I filename
+.PP
+.B fallocate
+.RB \-d
+.RB [ \-l
+.IR length ]
+.I filename
 .SH DESCRIPTION
 .B fallocate
 is used to manipulate the allocated disk space for a file, either to deallocate
@@ -20,7 +26,8 @@ uninitialized, requiring no IO to the data blocks. This is much faster than
 creating a file by filling it with zeros.
 .PP
 As of the Linux Kernel v2.6.31, the fallocate system call is supported by the
-btrfs, ext4, ocfs2, and xfs filesystems.
+btrfs, ext4, ocfs2, and xfs filesystems. Support for options needed to run with
+\fI\-\-punch-hole\fR or \fI\-\-detect-holes\fR was added in Linux 2.6.38.
 .PP
 The exit code returned by
 .B fallocate
@@ -36,6 +43,16 @@ Do not modify the apparent length of the file.  This may effectively allocate
 blocks past EOF, which can be removed with a truncate.
 .IP "\fB\-p, \-\-punch-hole\fP"
 Punch holes in the file, the range should not exceed the length of the file.
+.IP "\fB\-d, \-\-dig-holes\fP"
+Detect and dig holes of, at least, \fIlength\fR size. If \fIlength\fR is not
+specified, it defaults to 32k. Makes the file sparse in-place, without using
+extra disk space. You can think of this as doing a "\fBcp --sparse\fP" and
+renaming the dest file as the original, without the need for extra disk space.
+.PP
+.IP
+Note that too small values for \fIlength\fR might be ignored. And too big values
+might use lot of RAM and not detect many holes. Also, when using this option, 
+\fI\-\-keep-size\fP is implied.
 .IP "\fB\-o, \-\-offset\fP \fIoffset\fP
 Specifies the beginning offset of the allocation, in bytes.
 .IP "\fB\-l, \-\-length\fP \fIlength\fP
diff --git a/sys-utils/fallocate.c b/sys-utils/fallocate.c
index 5c66553..72aaef4 100644
--- a/sys-utils/fallocate.c
+++ b/sys-utils/fallocate.c
@@ -23,6 +23,7 @@
  */
 #include <sys/stat.h>
 #include <sys/types.h>
+#include <sys/mman.h>
 #include <ctype.h>
 #include <errno.h>
 #include <fcntl.h>
@@ -31,6 +32,7 @@
 #include <unistd.h>
 #include <getopt.h>
 #include <limits.h>
+#include <string.h>
 
 #ifndef HAVE_FALLOCATE
 # include <sys/syscall.h>
@@ -62,6 +64,7 @@ static void __attribute__((__noreturn__)) usage(FILE *out)
 	fputs(USAGE_OPTIONS, out);
 	fputs(_(" -n, --keep-size     don't modify the length of the file\n"
 		" -p, --punch-hole    punch holes in the file\n"
+		" -d, --dig-holes     detect and dig holes\n"
 		" -o, --offset <num>  offset of the (de)allocation, in bytes\n"
 		" -l, --length <num>  length of the (de)allocation, in bytes\n"), out);
 	fputs(USAGE_SEPARATOR, out);
@@ -106,6 +109,76 @@ static int xfallocate(int fd, int mode, off_t offset, off_t length)
 	return error;
 }
 
+/*
+ * Look for chunks of '\0's with size hole_size and when we find them, dig a
+ * hole on that offset with that size
+ */
+static int detect_holes(int fd, size_t hole_size)
+{
+	int ret = 0;
+	int err;
+
+	if (hole_size >= 100 * 1024 * 1024) {
+		size_t ram_mb = hole_size / 1024 / 1024;
+		printf("WARNING: %zu MB RAM will be used\n", ram_mb);
+		sleep(3);
+	}
+
+	/* Create a buffer of '\0's to compare against */
+	/* XXX: Use mmap() with MAP_PRIVATE so Linux can avoid this allocation */
+	void *zeros = mmap(NULL, hole_size, PROT_READ,
+	                   MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
+	if (zeros == MAP_FAILED) {
+		perror("mmap");
+		return -1;
+	}
+
+	/* buffer to read the file */
+	ssize_t buf_len = hole_size;
+	void *buf = malloc(buf_len);
+	if (buf == NULL) {
+		fputs(_("not enough memory\n"), stderr);
+		ret = -1;
+		goto out;
+	}
+
+	off_t end = lseek(fd, 0, SEEK_END);
+	if (end == -1) {
+		perror("lseek");
+		ret = -1;
+		goto out;
+	}
+
+	for (off_t offset = 0; offset + hole_size <= end; offset += buf_len) {
+
+		/* Try to read hole_size bytes */
+		buf_len = pread(fd, buf, hole_size, offset);
+		if (buf_len == -1) {
+			perror("pread");
+			ret = -1;
+			goto out;
+		}
+
+		/* Always use buf_len, as we may read less than hole_size bytes */
+		int not_zeros = memcmp(buf, zeros, buf_len);
+		if (not_zeros)
+			continue;
+
+		int ret = xfallocate(fd, FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE,
+		                     offset, buf_len);
+		if (ret)
+			goto out;
+	}
+out:
+	err = munmap(zeros, hole_size);
+	if (err) {
+		perror("munmap");
+		ret = 1;
+	}
+	free(buf);
+	return ret;
+}
+
 int main(int argc, char **argv)
 {
 	char	*fname;
@@ -113,17 +186,19 @@ int main(int argc, char **argv)
 	int	error;
 	int	fd;
 	int	mode = 0;
+	int	dig_holes = 0;
 	loff_t	length = -2LL;
 	loff_t	offset = 0;
 
 	static const struct option longopts[] = {
-	    { "help",      0, 0, 'h' },
-	    { "version",   0, 0, 'V' },
-	    { "keep-size", 0, 0, 'n' },
+	    { "help",       0, 0, 'h' },
+	    { "version",    0, 0, 'V' },
+	    { "keep-size",  0, 0, 'n' },
 	    { "punch-hole", 0, 0, 'p' },
-	    { "offset",    1, 0, 'o' },
-	    { "length",    1, 0, 'l' },
-	    { NULL,        0, 0, 0 }
+	    { "dig-holes",  0, 0, 'd' },
+	    { "offset",     1, 0, 'o' },
+	    { "length",     1, 0, 'l' },
+	    { NULL,         0, 0, 0 }
 	};
 
 	setlocale(LC_ALL, "");
@@ -131,7 +206,7 @@ int main(int argc, char **argv)
 	textdomain(PACKAGE);
 	atexit(close_stdout);
 
-	while ((c = getopt_long(argc, argv, "hVnpl:o:", longopts, NULL)) != -1) {
+	while ((c = getopt_long(argc, argv, "hVnpdl:o:", longopts, NULL)) != -1) {
 		switch(c) {
 		case 'h':
 			usage(stdout);
@@ -145,6 +220,9 @@ int main(int argc, char **argv)
 		case 'n':
 			mode |= FALLOC_FL_KEEP_SIZE;
 			break;
+		case 'd':
+			dig_holes = 1;
+			break;
 		case 'l':
 			length = cvtnum(optarg);
 			break;
@@ -156,8 +234,13 @@ int main(int argc, char **argv)
 			break;
 		}
 	}
-
-	if (length == -2LL)
+	if (dig_holes && mode != 0)
+		errx(EXIT_FAILURE, _("Can't use -p or -n with --dig-holes"));
+	if (dig_holes && offset != 0)
+		errx(EXIT_FAILURE, _("Can't use -o with --dig-holes"));
+	if (length == -2LL && dig_holes)
+		length = 32 * 1024;
+	if (length == -2LL && !dig_holes)
 		errx(EXIT_FAILURE, _("no length argument specified"));
 	if (length <= 0)
 		errx(EXIT_FAILURE, _("invalid length value specified"));
@@ -173,16 +256,21 @@ int main(int argc, char **argv)
 		usage(stderr);
 	}
 
-	fd = open(fname, O_WRONLY|O_CREAT, 0644);
+	fd = open(fname, O_RDWR|O_CREAT, 0644);
 	if (fd < 0)
 		err(EXIT_FAILURE, _("cannot open %s"), fname);
 
-	error = xfallocate(fd, mode, offset, length);
+	if (dig_holes)
+		error = detect_holes(fd, length);
+	else
+		error = xfallocate(fd, mode, offset, length);
+
+	/* Close before checking for errors, as we might have written it */
+	if (close_fd(fd) != 0)
+		err(EXIT_FAILURE, _("write failed: %s"), fname);
 
 	if (error < 0)
 		exit(EXIT_FAILURE);
 
-	if (close_fd(fd) != 0)
-		err(EXIT_FAILURE, _("write failed: %s"), fname);
 	return EXIT_SUCCESS;
 }
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: fallocate: Add "--dig-holes" options and minor fixes
  2014-01-25 19:17 fallocate: Add "--dig-holes" options and minor fixes Rodrigo Campos
                   ` (2 preceding siblings ...)
  2014-01-25 19:17 ` [PATCH 3/3] fallocate: Add "--dig-holes" option Rodrigo Campos
@ 2014-01-25 19:23 ` Rodrigo Campos
       [not found]   ` <20140201040414.GA23360@sdfg.com.ar>
  3 siblings, 1 reply; 20+ messages in thread
From: Rodrigo Campos @ 2014-01-25 19:23 UTC (permalink / raw)
  To: util-linux

On Sat, Jan 25, 2014 at 07:17:25PM +0000, Rodrigo Campos wrote:
> Hi,
> 
> As discussed in this[1] gmane thread, I'm submitting the funtionality to dig
> holes as part of the fallocate(1) command.

Ohh, forgot to tell, but I tried to re-use already used strings in _("")
contructors and make "perror" just print the function name, as that is what I
observed in other util-linux tools.

If I missed something, please let me know :)




Thanks again.
Rodrigo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2] fallocate: Add "--dig-holes" option
  2014-01-25 19:17 ` [PATCH 3/3] fallocate: Add "--dig-holes" option Rodrigo Campos
@ 2014-01-26 14:43   ` Rodrigo Campos
  2014-01-26 15:06     ` [PATCH v3] " Rodrigo Campos
  0 siblings, 1 reply; 20+ messages in thread
From: Rodrigo Campos @ 2014-01-26 14:43 UTC (permalink / raw)
  To: util-linux; +Cc: Rodrigo Campos

This option tries to detect chunk of '\0's and punch a hole, making the file
sparse in-place.

Signed-off-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
---
v2:
	use "ret = -1", instead of "ret = 1" if munmap() fails
---
 bash-completion/fallocate |   2 +-
 sys-utils/fallocate.1     |  19 +++++++-
 sys-utils/fallocate.c     | 114 ++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 120 insertions(+), 15 deletions(-)

diff --git a/bash-completion/fallocate b/bash-completion/fallocate
index 2c6e4cb..5fc58c0 100644
--- a/bash-completion/fallocate
+++ b/bash-completion/fallocate
@@ -15,7 +15,7 @@ _fallocate_module()
 	esac
 	case $cur in
 		-*)
-			OPTS="--keep-size --punch-hole --offset --length --help --version"
+			OPTS="--keep-size --punch-hole --detect-holes --offset --length --help --version"
 			COMPREPLY=( $(compgen -W "${OPTS[*]}" -- $cur) )
 			return 0
 			;;
diff --git a/sys-utils/fallocate.1 b/sys-utils/fallocate.1
index efa42c1..ac8e61d 100644
--- a/sys-utils/fallocate.1
+++ b/sys-utils/fallocate.1
@@ -11,6 +11,12 @@ fallocate \- preallocate or deallocate space to a file
 .B \-l
 .IR length
 .I filename
+.PP
+.B fallocate
+.RB \-d
+.RB [ \-l
+.IR length ]
+.I filename
 .SH DESCRIPTION
 .B fallocate
 is used to manipulate the allocated disk space for a file, either to deallocate
@@ -20,7 +26,8 @@ uninitialized, requiring no IO to the data blocks. This is much faster than
 creating a file by filling it with zeros.
 .PP
 As of the Linux Kernel v2.6.31, the fallocate system call is supported by the
-btrfs, ext4, ocfs2, and xfs filesystems.
+btrfs, ext4, ocfs2, and xfs filesystems. Support for options needed to run with
+\fI\-\-punch-hole\fR or \fI\-\-detect-holes\fR was added in Linux 2.6.38.
 .PP
 The exit code returned by
 .B fallocate
@@ -36,6 +43,16 @@ Do not modify the apparent length of the file.  This may effectively allocate
 blocks past EOF, which can be removed with a truncate.
 .IP "\fB\-p, \-\-punch-hole\fP"
 Punch holes in the file, the range should not exceed the length of the file.
+.IP "\fB\-d, \-\-dig-holes\fP"
+Detect and dig holes of, at least, \fIlength\fR size. If \fIlength\fR is not
+specified, it defaults to 32k. Makes the file sparse in-place, without using
+extra disk space. You can think of this as doing a "\fBcp --sparse\fP" and
+renaming the dest file as the original, without the need for extra disk space.
+.PP
+.IP
+Note that too small values for \fIlength\fR might be ignored. And too big values
+might use lot of RAM and not detect many holes. Also, when using this option, 
+\fI\-\-keep-size\fP is implied.
 .IP "\fB\-o, \-\-offset\fP \fIoffset\fP
 Specifies the beginning offset of the allocation, in bytes.
 .IP "\fB\-l, \-\-length\fP \fIlength\fP
diff --git a/sys-utils/fallocate.c b/sys-utils/fallocate.c
index 5c66553..bb3fef3 100644
--- a/sys-utils/fallocate.c
+++ b/sys-utils/fallocate.c
@@ -23,6 +23,7 @@
  */
 #include <sys/stat.h>
 #include <sys/types.h>
+#include <sys/mman.h>
 #include <ctype.h>
 #include <errno.h>
 #include <fcntl.h>
@@ -31,6 +32,7 @@
 #include <unistd.h>
 #include <getopt.h>
 #include <limits.h>
+#include <string.h>
 
 #ifndef HAVE_FALLOCATE
 # include <sys/syscall.h>
@@ -62,6 +64,7 @@ static void __attribute__((__noreturn__)) usage(FILE *out)
 	fputs(USAGE_OPTIONS, out);
 	fputs(_(" -n, --keep-size     don't modify the length of the file\n"
 		" -p, --punch-hole    punch holes in the file\n"
+		" -d, --dig-holes     detect and dig holes\n"
 		" -o, --offset <num>  offset of the (de)allocation, in bytes\n"
 		" -l, --length <num>  length of the (de)allocation, in bytes\n"), out);
 	fputs(USAGE_SEPARATOR, out);
@@ -106,6 +109,76 @@ static int xfallocate(int fd, int mode, off_t offset, off_t length)
 	return error;
 }
 
+/*
+ * Look for chunks of '\0's with size hole_size and when we find them, dig a
+ * hole on that offset with that size
+ */
+static int detect_holes(int fd, size_t hole_size)
+{
+	int ret = 0;
+	int err;
+
+	if (hole_size >= 100 * 1024 * 1024) {
+		size_t ram_mb = hole_size / 1024 / 1024;
+		printf("WARNING: %zu MB RAM will be used\n", ram_mb);
+		sleep(3);
+	}
+
+	/* Create a buffer of '\0's to compare against */
+	/* XXX: Use mmap() with MAP_PRIVATE so Linux can avoid this allocation */
+	void *zeros = mmap(NULL, hole_size, PROT_READ,
+	                   MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
+	if (zeros == MAP_FAILED) {
+		perror("mmap");
+		return -1;
+	}
+
+	/* buffer to read the file */
+	ssize_t buf_len = hole_size;
+	void *buf = malloc(buf_len);
+	if (buf == NULL) {
+		fputs(_("not enough memory\n"), stderr);
+		ret = -1;
+		goto out;
+	}
+
+	off_t end = lseek(fd, 0, SEEK_END);
+	if (end == -1) {
+		perror("lseek");
+		ret = -1;
+		goto out;
+	}
+
+	for (off_t offset = 0; offset + hole_size <= end; offset += buf_len) {
+
+		/* Try to read hole_size bytes */
+		buf_len = pread(fd, buf, hole_size, offset);
+		if (buf_len == -1) {
+			perror("pread");
+			ret = -1;
+			goto out;
+		}
+
+		/* Always use buf_len, as we may read less than hole_size bytes */
+		int not_zeros = memcmp(buf, zeros, buf_len);
+		if (not_zeros)
+			continue;
+
+		int ret = xfallocate(fd, FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE,
+		                     offset, buf_len);
+		if (ret)
+			goto out;
+	}
+out:
+	err = munmap(zeros, hole_size);
+	if (err) {
+		perror("munmap");
+		ret = -1;
+	}
+	free(buf);
+	return ret;
+}
+
 int main(int argc, char **argv)
 {
 	char	*fname;
@@ -113,17 +186,19 @@ int main(int argc, char **argv)
 	int	error;
 	int	fd;
 	int	mode = 0;
+	int	dig_holes = 0;
 	loff_t	length = -2LL;
 	loff_t	offset = 0;
 
 	static const struct option longopts[] = {
-	    { "help",      0, 0, 'h' },
-	    { "version",   0, 0, 'V' },
-	    { "keep-size", 0, 0, 'n' },
+	    { "help",       0, 0, 'h' },
+	    { "version",    0, 0, 'V' },
+	    { "keep-size",  0, 0, 'n' },
 	    { "punch-hole", 0, 0, 'p' },
-	    { "offset",    1, 0, 'o' },
-	    { "length",    1, 0, 'l' },
-	    { NULL,        0, 0, 0 }
+	    { "dig-holes",  0, 0, 'd' },
+	    { "offset",     1, 0, 'o' },
+	    { "length",     1, 0, 'l' },
+	    { NULL,         0, 0, 0 }
 	};
 
 	setlocale(LC_ALL, "");
@@ -131,7 +206,7 @@ int main(int argc, char **argv)
 	textdomain(PACKAGE);
 	atexit(close_stdout);
 
-	while ((c = getopt_long(argc, argv, "hVnpl:o:", longopts, NULL)) != -1) {
+	while ((c = getopt_long(argc, argv, "hVnpdl:o:", longopts, NULL)) != -1) {
 		switch(c) {
 		case 'h':
 			usage(stdout);
@@ -145,6 +220,9 @@ int main(int argc, char **argv)
 		case 'n':
 			mode |= FALLOC_FL_KEEP_SIZE;
 			break;
+		case 'd':
+			dig_holes = 1;
+			break;
 		case 'l':
 			length = cvtnum(optarg);
 			break;
@@ -156,8 +234,13 @@ int main(int argc, char **argv)
 			break;
 		}
 	}
-
-	if (length == -2LL)
+	if (dig_holes && mode != 0)
+		errx(EXIT_FAILURE, _("Can't use -p or -n with --dig-holes"));
+	if (dig_holes && offset != 0)
+		errx(EXIT_FAILURE, _("Can't use -o with --dig-holes"));
+	if (length == -2LL && dig_holes)
+		length = 32 * 1024;
+	if (length == -2LL && !dig_holes)
 		errx(EXIT_FAILURE, _("no length argument specified"));
 	if (length <= 0)
 		errx(EXIT_FAILURE, _("invalid length value specified"));
@@ -173,16 +256,21 @@ int main(int argc, char **argv)
 		usage(stderr);
 	}
 
-	fd = open(fname, O_WRONLY|O_CREAT, 0644);
+	fd = open(fname, O_RDWR|O_CREAT, 0644);
 	if (fd < 0)
 		err(EXIT_FAILURE, _("cannot open %s"), fname);
 
-	error = xfallocate(fd, mode, offset, length);
+	if (dig_holes)
+		error = detect_holes(fd, length);
+	else
+		error = xfallocate(fd, mode, offset, length);
+
+	/* Close before checking for errors, as we might have written it */
+	if (close_fd(fd) != 0)
+		err(EXIT_FAILURE, _("write failed: %s"), fname);
 
 	if (error < 0)
 		exit(EXIT_FAILURE);
 
-	if (close_fd(fd) != 0)
-		err(EXIT_FAILURE, _("write failed: %s"), fname);
 	return EXIT_SUCCESS;
 }
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3] fallocate: Add "--dig-holes" option
  2014-01-26 14:43   ` [PATCH v2] " Rodrigo Campos
@ 2014-01-26 15:06     ` Rodrigo Campos
  2014-02-14 10:47       ` Karel Zak
  0 siblings, 1 reply; 20+ messages in thread
From: Rodrigo Campos @ 2014-01-26 15:06 UTC (permalink / raw)
  To: util-linux; +Cc: Rodrigo Campos

This option tries to detect chunk of '\0's and punch a hole, making the file
sparse in-place.

Signed-off-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
---
v3:
	Use "--dig-holes" on bash-completion too (when I renamed, I forgot to
	update the bash completion file)

v2:
        use "ret = -1", instead of "ret = 1" if munmap() fails
---
 bash-completion/fallocate |   2 +-
 sys-utils/fallocate.1     |  19 +++++++-
 sys-utils/fallocate.c     | 114 ++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 120 insertions(+), 15 deletions(-)

diff --git a/bash-completion/fallocate b/bash-completion/fallocate
index 2c6e4cb..ce5f4f1 100644
--- a/bash-completion/fallocate
+++ b/bash-completion/fallocate
@@ -15,7 +15,7 @@ _fallocate_module()
 	esac
 	case $cur in
 		-*)
-			OPTS="--keep-size --punch-hole --offset --length --help --version"
+			OPTS="--keep-size --punch-hole --dig-holes --offset --length --help --version"
 			COMPREPLY=( $(compgen -W "${OPTS[*]}" -- $cur) )
 			return 0
 			;;
diff --git a/sys-utils/fallocate.1 b/sys-utils/fallocate.1
index efa42c1..ac8e61d 100644
--- a/sys-utils/fallocate.1
+++ b/sys-utils/fallocate.1
@@ -11,6 +11,12 @@ fallocate \- preallocate or deallocate space to a file
 .B \-l
 .IR length
 .I filename
+.PP
+.B fallocate
+.RB \-d
+.RB [ \-l
+.IR length ]
+.I filename
 .SH DESCRIPTION
 .B fallocate
 is used to manipulate the allocated disk space for a file, either to deallocate
@@ -20,7 +26,8 @@ uninitialized, requiring no IO to the data blocks. This is much faster than
 creating a file by filling it with zeros.
 .PP
 As of the Linux Kernel v2.6.31, the fallocate system call is supported by the
-btrfs, ext4, ocfs2, and xfs filesystems.
+btrfs, ext4, ocfs2, and xfs filesystems. Support for options needed to run with
+\fI\-\-punch-hole\fR or \fI\-\-detect-holes\fR was added in Linux 2.6.38.
 .PP
 The exit code returned by
 .B fallocate
@@ -36,6 +43,16 @@ Do not modify the apparent length of the file.  This may effectively allocate
 blocks past EOF, which can be removed with a truncate.
 .IP "\fB\-p, \-\-punch-hole\fP"
 Punch holes in the file, the range should not exceed the length of the file.
+.IP "\fB\-d, \-\-dig-holes\fP"
+Detect and dig holes of, at least, \fIlength\fR size. If \fIlength\fR is not
+specified, it defaults to 32k. Makes the file sparse in-place, without using
+extra disk space. You can think of this as doing a "\fBcp --sparse\fP" and
+renaming the dest file as the original, without the need for extra disk space.
+.PP
+.IP
+Note that too small values for \fIlength\fR might be ignored. And too big values
+might use lot of RAM and not detect many holes. Also, when using this option, 
+\fI\-\-keep-size\fP is implied.
 .IP "\fB\-o, \-\-offset\fP \fIoffset\fP
 Specifies the beginning offset of the allocation, in bytes.
 .IP "\fB\-l, \-\-length\fP \fIlength\fP
diff --git a/sys-utils/fallocate.c b/sys-utils/fallocate.c
index 5c66553..bb3fef3 100644
--- a/sys-utils/fallocate.c
+++ b/sys-utils/fallocate.c
@@ -23,6 +23,7 @@
  */
 #include <sys/stat.h>
 #include <sys/types.h>
+#include <sys/mman.h>
 #include <ctype.h>
 #include <errno.h>
 #include <fcntl.h>
@@ -31,6 +32,7 @@
 #include <unistd.h>
 #include <getopt.h>
 #include <limits.h>
+#include <string.h>
 
 #ifndef HAVE_FALLOCATE
 # include <sys/syscall.h>
@@ -62,6 +64,7 @@ static void __attribute__((__noreturn__)) usage(FILE *out)
 	fputs(USAGE_OPTIONS, out);
 	fputs(_(" -n, --keep-size     don't modify the length of the file\n"
 		" -p, --punch-hole    punch holes in the file\n"
+		" -d, --dig-holes     detect and dig holes\n"
 		" -o, --offset <num>  offset of the (de)allocation, in bytes\n"
 		" -l, --length <num>  length of the (de)allocation, in bytes\n"), out);
 	fputs(USAGE_SEPARATOR, out);
@@ -106,6 +109,76 @@ static int xfallocate(int fd, int mode, off_t offset, off_t length)
 	return error;
 }
 
+/*
+ * Look for chunks of '\0's with size hole_size and when we find them, dig a
+ * hole on that offset with that size
+ */
+static int detect_holes(int fd, size_t hole_size)
+{
+	int ret = 0;
+	int err;
+
+	if (hole_size >= 100 * 1024 * 1024) {
+		size_t ram_mb = hole_size / 1024 / 1024;
+		printf("WARNING: %zu MB RAM will be used\n", ram_mb);
+		sleep(3);
+	}
+
+	/* Create a buffer of '\0's to compare against */
+	/* XXX: Use mmap() with MAP_PRIVATE so Linux can avoid this allocation */
+	void *zeros = mmap(NULL, hole_size, PROT_READ,
+	                   MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
+	if (zeros == MAP_FAILED) {
+		perror("mmap");
+		return -1;
+	}
+
+	/* buffer to read the file */
+	ssize_t buf_len = hole_size;
+	void *buf = malloc(buf_len);
+	if (buf == NULL) {
+		fputs(_("not enough memory\n"), stderr);
+		ret = -1;
+		goto out;
+	}
+
+	off_t end = lseek(fd, 0, SEEK_END);
+	if (end == -1) {
+		perror("lseek");
+		ret = -1;
+		goto out;
+	}
+
+	for (off_t offset = 0; offset + hole_size <= end; offset += buf_len) {
+
+		/* Try to read hole_size bytes */
+		buf_len = pread(fd, buf, hole_size, offset);
+		if (buf_len == -1) {
+			perror("pread");
+			ret = -1;
+			goto out;
+		}
+
+		/* Always use buf_len, as we may read less than hole_size bytes */
+		int not_zeros = memcmp(buf, zeros, buf_len);
+		if (not_zeros)
+			continue;
+
+		int ret = xfallocate(fd, FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE,
+		                     offset, buf_len);
+		if (ret)
+			goto out;
+	}
+out:
+	err = munmap(zeros, hole_size);
+	if (err) {
+		perror("munmap");
+		ret = -1;
+	}
+	free(buf);
+	return ret;
+}
+
 int main(int argc, char **argv)
 {
 	char	*fname;
@@ -113,17 +186,19 @@ int main(int argc, char **argv)
 	int	error;
 	int	fd;
 	int	mode = 0;
+	int	dig_holes = 0;
 	loff_t	length = -2LL;
 	loff_t	offset = 0;
 
 	static const struct option longopts[] = {
-	    { "help",      0, 0, 'h' },
-	    { "version",   0, 0, 'V' },
-	    { "keep-size", 0, 0, 'n' },
+	    { "help",       0, 0, 'h' },
+	    { "version",    0, 0, 'V' },
+	    { "keep-size",  0, 0, 'n' },
 	    { "punch-hole", 0, 0, 'p' },
-	    { "offset",    1, 0, 'o' },
-	    { "length",    1, 0, 'l' },
-	    { NULL,        0, 0, 0 }
+	    { "dig-holes",  0, 0, 'd' },
+	    { "offset",     1, 0, 'o' },
+	    { "length",     1, 0, 'l' },
+	    { NULL,         0, 0, 0 }
 	};
 
 	setlocale(LC_ALL, "");
@@ -131,7 +206,7 @@ int main(int argc, char **argv)
 	textdomain(PACKAGE);
 	atexit(close_stdout);
 
-	while ((c = getopt_long(argc, argv, "hVnpl:o:", longopts, NULL)) != -1) {
+	while ((c = getopt_long(argc, argv, "hVnpdl:o:", longopts, NULL)) != -1) {
 		switch(c) {
 		case 'h':
 			usage(stdout);
@@ -145,6 +220,9 @@ int main(int argc, char **argv)
 		case 'n':
 			mode |= FALLOC_FL_KEEP_SIZE;
 			break;
+		case 'd':
+			dig_holes = 1;
+			break;
 		case 'l':
 			length = cvtnum(optarg);
 			break;
@@ -156,8 +234,13 @@ int main(int argc, char **argv)
 			break;
 		}
 	}
-
-	if (length == -2LL)
+	if (dig_holes && mode != 0)
+		errx(EXIT_FAILURE, _("Can't use -p or -n with --dig-holes"));
+	if (dig_holes && offset != 0)
+		errx(EXIT_FAILURE, _("Can't use -o with --dig-holes"));
+	if (length == -2LL && dig_holes)
+		length = 32 * 1024;
+	if (length == -2LL && !dig_holes)
 		errx(EXIT_FAILURE, _("no length argument specified"));
 	if (length <= 0)
 		errx(EXIT_FAILURE, _("invalid length value specified"));
@@ -173,16 +256,21 @@ int main(int argc, char **argv)
 		usage(stderr);
 	}
 
-	fd = open(fname, O_WRONLY|O_CREAT, 0644);
+	fd = open(fname, O_RDWR|O_CREAT, 0644);
 	if (fd < 0)
 		err(EXIT_FAILURE, _("cannot open %s"), fname);
 
-	error = xfallocate(fd, mode, offset, length);
+	if (dig_holes)
+		error = detect_holes(fd, length);
+	else
+		error = xfallocate(fd, mode, offset, length);
+
+	/* Close before checking for errors, as we might have written it */
+	if (close_fd(fd) != 0)
+		err(EXIT_FAILURE, _("write failed: %s"), fname);
 
 	if (error < 0)
 		exit(EXIT_FAILURE);
 
-	if (close_fd(fd) != 0)
-		err(EXIT_FAILURE, _("write failed: %s"), fname);
 	return EXIT_SUCCESS;
 }
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: fallocate: Add "--dig-holes" options and minor fixes
       [not found]   ` <20140201040414.GA23360@sdfg.com.ar>
@ 2014-02-11 11:31     ` Rodrigo Campos
  2014-02-11 18:34       ` Karel Zak
  0 siblings, 1 reply; 20+ messages in thread
From: Rodrigo Campos @ 2014-02-11 11:31 UTC (permalink / raw)
  To: util-linux, kzak

On Sat, Feb 01, 2014 at 04:04:14AM +0000, Rodrigo Campos wrote:
> On Sat, Jan 25, 2014 at 07:23:03PM +0000, Rodrigo Campos wrote:
> > On Sat, Jan 25, 2014 at 07:17:25PM +0000, Rodrigo Campos wrote:
> > > Hi,
> > > 
> > > As discussed in this[1] gmane thread, I'm submitting the funtionality to dig
> > > holes as part of the fallocate(1) command.
> > 
> > Ohh, forgot to tell, but I tried to re-use already used strings in _("")
> > contructors and make "perror" just print the function name, as that is what I
> > observed in other util-linux tools.
> > 
> > If I missed something, please let me know :)
> 
> Ping ? :)

Re-ping ? Any comments on this ?





Thanks,
Rodrigo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fallocate: Add "--dig-holes" options and minor fixes
  2014-02-11 11:31     ` Rodrigo Campos
@ 2014-02-11 18:34       ` Karel Zak
  2014-02-11 18:43         ` Rodrigo Campos
  0 siblings, 1 reply; 20+ messages in thread
From: Karel Zak @ 2014-02-11 18:34 UTC (permalink / raw)
  To: Rodrigo Campos; +Cc: util-linux

On Tue, Feb 11, 2014 at 11:31:39AM +0000, Rodrigo Campos wrote:
> On Sat, Feb 01, 2014 at 04:04:14AM +0000, Rodrigo Campos wrote:
> > On Sat, Jan 25, 2014 at 07:23:03PM +0000, Rodrigo Campos wrote:
> > > On Sat, Jan 25, 2014 at 07:17:25PM +0000, Rodrigo Campos wrote:
> > > > Hi,
> > > > 
> > > > As discussed in this[1] gmane thread, I'm submitting the funtionality to dig
> > > > holes as part of the fallocate(1) command.
> > > 
> > > Ohh, forgot to tell, but I tried to re-use already used strings in _("")
> > > contructors and make "perror" just print the function name, as that is what I
> > > observed in other util-linux tools.
> > > 
> > > If I missed something, please let me know :)
> > 
> > Ping ? :)
> 
> Re-ping ? Any comments on this ?

 Be patient, the patches seem good, I'm going to merge it this or next
 week. (I was busy with something else last weeks and now I have
 vacation.)

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fallocate: Add "--dig-holes" options and minor fixes
  2014-02-11 18:34       ` Karel Zak
@ 2014-02-11 18:43         ` Rodrigo Campos
  0 siblings, 0 replies; 20+ messages in thread
From: Rodrigo Campos @ 2014-02-11 18:43 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux

On Tue, Feb 11, 2014 at 07:34:08PM +0100, Karel Zak wrote:
> On Tue, Feb 11, 2014 at 11:31:39AM +0000, Rodrigo Campos wrote:
> > On Sat, Feb 01, 2014 at 04:04:14AM +0000, Rodrigo Campos wrote:
> > > On Sat, Jan 25, 2014 at 07:23:03PM +0000, Rodrigo Campos wrote:
> > > > On Sat, Jan 25, 2014 at 07:17:25PM +0000, Rodrigo Campos wrote:
> > > > > Hi,
> > > > > 
> > > > > As discussed in this[1] gmane thread, I'm submitting the funtionality to dig
> > > > > holes as part of the fallocate(1) command.
> > > > 
> > > > Ohh, forgot to tell, but I tried to re-use already used strings in _("")
> > > > contructors and make "perror" just print the function name, as that is what I
> > > > observed in other util-linux tools.
> > > > 
> > > > If I missed something, please let me know :)
> > > 
> > > Ping ? :)
> > 
> > Re-ping ? Any comments on this ?
> 
>  Be patient, the patches seem good, I'm going to merge it this or next

Sure, no problem! :)

>  week. (I was busy with something else last weeks and now I have
>  vacation.)

Please, have a rest :)






Thanks a lot,
Rodrigo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] fallocate: Add "--dig-holes" option
  2014-01-26 15:06     ` [PATCH v3] " Rodrigo Campos
@ 2014-02-14 10:47       ` Karel Zak
  2014-02-14 11:34         ` Karel Zak
                           ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Karel Zak @ 2014-02-14 10:47 UTC (permalink / raw)
  To: Rodrigo Campos; +Cc: util-linux

On Sun, Jan 26, 2014 at 03:06:50PM +0000, Rodrigo Campos wrote:
>  bash-completion/fallocate |   2 +-
>  sys-utils/fallocate.1     |  19 +++++++-
>  sys-utils/fallocate.c     | 114 ++++++++++++++++++++++++++++++++++++++++------
>  3 files changed, 120 insertions(+), 15 deletions(-)

 Applied with some changes and I believe that the code still need some
 improvements, see below

> + */
> +static int detect_holes(int fd, size_t hole_size)
> +{
> +	int ret = 0;
> +	int err;
> +
> +	if (hole_size >= 100 * 1024 * 1024) {
> +		size_t ram_mb = hole_size / 1024 / 1024;
> +		printf("WARNING: %zu MB RAM will be used\n", ram_mb);
> +		sleep(3);
> +	}

 I have removed this thing... 
 
 I don't like all the detection algorithm. Do we really need to allocate 
 all hole size and the buffer? 

 IMHO it would be enough to:

 * add posix_fadvise(... POSIX_FADV_SEQUENTIAL | POSIX_FADV_NOREUSE)

 * read the file in small chunks -- for example BUFSIZ and compare
   this with small empty static buffer.

 .. it's kernel business to read from FS/device in optimal way and I
 don't think that context switches are so critical issue when all the
 thing is about I/O.

 I didn't test it, so maybe I'm wrong, but the current code where we
 eat RAM seems too crazy. Comments?

> +	/* Create a buffer of '\0's to compare against */
> +	/* XXX: Use mmap() with MAP_PRIVATE so Linux can avoid this allocation */
> +	void *zeros = mmap(NULL, hole_size, PROT_READ,
> +	                   MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
> +	if (zeros == MAP_FAILED) {
> +		perror("mmap");

 we use err.h stuff, and it's usually good enough to exit after error
 than waste time with memory deallocation and another clean ups. The
 kernel is smart enough to clean up after exit().

> +	off_t end = lseek(fd, 0, SEEK_END);
    ^^^^^^^^^^^^^^
> +	if (end == -1) {
> +		perror("lseek");
> +		ret = -1;
> +		goto out;
> +	}
> +
> +	for (off_t offset = 0; offset + hole_size <= end; offset += buf_len) {
         ^^^^^^^^^^^

 Yes it's expected by C standards, but it sucks. Don't use it, it's
 reader's nightmare.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] fallocate: Add "--dig-holes" option
  2014-02-14 10:47       ` Karel Zak
@ 2014-02-14 11:34         ` Karel Zak
  2014-02-14 13:16           ` Rodrigo Campos
  2014-02-14 13:07         ` Rodrigo Campos
  2014-02-17 10:32         ` Karel Zak
  2 siblings, 1 reply; 20+ messages in thread
From: Karel Zak @ 2014-02-14 11:34 UTC (permalink / raw)
  To: Rodrigo Campos; +Cc: util-linux

On Fri, Feb 14, 2014 at 11:47:56AM +0100, Karel Zak wrote:
>  I don't like all the detection algorithm. Do we really need to allocate 
>  all hole size and the buffer? 
> 
>  IMHO it would be enough to:
> 
>  * add posix_fadvise(... POSIX_FADV_SEQUENTIAL | POSIX_FADV_NOREUSE)
> 
>  * read the file in small chunks -- for example BUFSIZ and compare
>    this with small empty static buffer.
> 
>  .. it's kernel business to read from FS/device in optimal way and I
>  don't think that context switches are so critical issue when all the
>  thing is about I/O.
> 
>  I didn't test it, so maybe I'm wrong, but the current code where we
>  eat RAM seems too crazy. Comments?

 .. and another issue, the current code assumes that the holes are
 aligned to hole_size that we read from the file. If the hole starts
 within the buffer and continues in the next chunk than it's ignored.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] fallocate: Add "--dig-holes" option
  2014-02-14 10:47       ` Karel Zak
  2014-02-14 11:34         ` Karel Zak
@ 2014-02-14 13:07         ` Rodrigo Campos
  2014-02-14 13:30           ` Karel Zak
  2014-02-17 10:32         ` Karel Zak
  2 siblings, 1 reply; 20+ messages in thread
From: Rodrigo Campos @ 2014-02-14 13:07 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux

On Fri, Feb 14, 2014 at 11:47:56AM +0100, Karel Zak wrote:
> On Sun, Jan 26, 2014 at 03:06:50PM +0000, Rodrigo Campos wrote:
> >  bash-completion/fallocate |   2 +-
> >  sys-utils/fallocate.1     |  19 +++++++-
> >  sys-utils/fallocate.c     | 114 ++++++++++++++++++++++++++++++++++++++++------
> >  3 files changed, 120 insertions(+), 15 deletions(-)
> 
>  Applied with some changes and I believe that the code still need some
>  improvements, see below

Thanks :)

> 
> > + */
> > +static int detect_holes(int fd, size_t hole_size)
> > +{
> > +	int ret = 0;
> > +	int err;
> > +
> > +	if (hole_size >= 100 * 1024 * 1024) {
> > +		size_t ram_mb = hole_size / 1024 / 1024;
> > +		printf("WARNING: %zu MB RAM will be used\n", ram_mb);
> > +		sleep(3);
> > +	}
> 
>  I have removed this thing... 
>  
>  I don't like all the detection algorithm. Do we really need to allocate 
>  all hole size and the buffer? 

No, not really. I just did it because it's easy and I thought you probably don't
use big sizes (as it will detect less holes) and probably some "small" size is
enough in practice, even for big files, and that size in RAM is probably not
significant.

> 
>  IMHO it would be enough to:
> 
>  * add posix_fadvise(... POSIX_FADV_SEQUENTIAL | POSIX_FADV_NOREUSE)
> 
>  * read the file in small chunks -- for example BUFSIZ and compare
>    this with small empty static buffer.

That's totally possible too, sure :)

> 
>  .. it's kernel business to read from FS/device in optimal way and I
>  don't think that context switches are so critical issue when all the
>  thing is about I/O.

context switches ? I'm not sure I follow you there...

> 
>  I didn't test it, so maybe I'm wrong, but the current code where we
>  eat RAM seems too crazy. Comments?

The default size is 32kb, so that's what you eat of RAM by default. And I don't
expect sizes bigger than, let's say 10MB but probably less, make sense... That
was my reasoning when I decided to allocate the whole buffer.

But it can be changed, of course :)

> 
> > +	/* Create a buffer of '\0's to compare against */
> > +	/* XXX: Use mmap() with MAP_PRIVATE so Linux can avoid this allocation */
> > +	void *zeros = mmap(NULL, hole_size, PROT_READ,
> > +	                   MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
> > +	if (zeros == MAP_FAILED) {
> > +		perror("mmap");
> 
>  we use err.h stuff, and it's usually good enough to exit after error
>  than waste time with memory deallocation and another clean ups. The
>  kernel is smart enough to clean up after exit().

Sorry, I didn't know this was prefered. And you already changed it, thanks!

> 
> > +	off_t end = lseek(fd, 0, SEEK_END);
>     ^^^^^^^^^^^^^^
> > +	if (end == -1) {
> > +		perror("lseek");
> > +		ret = -1;
> > +		goto out;
> > +	}
> > +
> > +	for (off_t offset = 0; offset + hole_size <= end; offset += buf_len) {
>          ^^^^^^^^^^^
> 
>  Yes it's expected by C standards, but it sucks. Don't use it, it's
>  reader's nightmare.

Ohh, I didn't know. Can I ask why ?

And what type should I use ? loff_t ? Why is this better ?

(just curious, I want to understand :))



Thanks a lot,
Rodrigo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] fallocate: Add "--dig-holes" option
  2014-02-14 11:34         ` Karel Zak
@ 2014-02-14 13:16           ` Rodrigo Campos
  0 siblings, 0 replies; 20+ messages in thread
From: Rodrigo Campos @ 2014-02-14 13:16 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux

On Fri, Feb 14, 2014 at 12:34:32PM +0100, Karel Zak wrote:
> On Fri, Feb 14, 2014 at 11:47:56AM +0100, Karel Zak wrote:
> >  I don't like all the detection algorithm. Do we really need to allocate 
> >  all hole size and the buffer? 
> > 
> >  IMHO it would be enough to:
> > 
> >  * add posix_fadvise(... POSIX_FADV_SEQUENTIAL | POSIX_FADV_NOREUSE)
> > 
> >  * read the file in small chunks -- for example BUFSIZ and compare
> >    this with small empty static buffer.
> > 
> >  .. it's kernel business to read from FS/device in optimal way and I
> >  don't think that context switches are so critical issue when all the
> >  thing is about I/O.
> > 
> >  I didn't test it, so maybe I'm wrong, but the current code where we
> >  eat RAM seems too crazy. Comments?
> 
>  .. and another issue, the current code assumes that the holes are
>  aligned to hole_size that we read from the file. If the hole starts
>  within the buffer and continues in the next chunk than it's ignored.

Yes, that was for simplicity too. And also, if the hole size is "small" compared
to the size of the chunks of '\0's, you will create a hole for most of the space
anyways.

There are, of course, pathological cases. But I'm not sure they happen in real
life, I think they are probably more theoretical and in practice only little
space that was '\0's where not holed. And if they do happen, we can improve it
later.

What do you think ?



Thanks a lot,
Rodrigo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] fallocate: Add "--dig-holes" option
  2014-02-14 13:07         ` Rodrigo Campos
@ 2014-02-14 13:30           ` Karel Zak
  2014-02-14 13:35             ` Rodrigo Campos
  0 siblings, 1 reply; 20+ messages in thread
From: Karel Zak @ 2014-02-14 13:30 UTC (permalink / raw)
  To: Rodrigo Campos; +Cc: util-linux

On Fri, Feb 14, 2014 at 01:07:53PM +0000, Rodrigo Campos wrote:
> On Fri, Feb 14, 2014 at 11:47:56AM +0100, Karel Zak wrote:
> > > +	for (off_t offset = 0; offset + hole_size <= end; offset += buf_len) {
> >          ^^^^^^^^^^^
> > 
> >  Yes it's expected by C standards, but it sucks. Don't use it, it's
> >  reader's nightmare.
> 
> Ohh, I didn't know. Can I ask why ?
> 
> And what type should I use ? loff_t ? Why is this better ?
> 
> (just curious, I want to understand :))

 Ah, my mistake.. I have no talked about "off_t", but about variables
 declarations. It's always better to have declarations at the begin of
 the code block or function. 

    for (int i = 0; ... )

 is C++ism that for unknown crazy reason has been accepted by C
 compilers/standards (C99). 

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] fallocate: Add "--dig-holes" option
  2014-02-14 13:30           ` Karel Zak
@ 2014-02-14 13:35             ` Rodrigo Campos
  0 siblings, 0 replies; 20+ messages in thread
From: Rodrigo Campos @ 2014-02-14 13:35 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux

On Fri, Feb 14, 2014 at 02:30:59PM +0100, Karel Zak wrote:
> On Fri, Feb 14, 2014 at 01:07:53PM +0000, Rodrigo Campos wrote:
> > On Fri, Feb 14, 2014 at 11:47:56AM +0100, Karel Zak wrote:
> > > > +	for (off_t offset = 0; offset + hole_size <= end; offset += buf_len) {
> > >          ^^^^^^^^^^^
> > > 
> > >  Yes it's expected by C standards, but it sucks. Don't use it, it's
> > >  reader's nightmare.
> > 
> > Ohh, I didn't know. Can I ask why ?
> > 
> > And what type should I use ? loff_t ? Why is this better ?
> > 
> > (just curious, I want to understand :))
> 
>  Ah, my mistake.. I have no talked about "off_t", but about variables
>  declarations. It's always better to have declarations at the begin of
>  the code block or function. 
> 
>     for (int i = 0; ... )
> 
>  is C++ism that for unknown crazy reason has been accepted by C
>  compilers/standards (C99). 

Ohh, sorry, my bad. I see you already fixed it also, thanks! :)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] fallocate: Add "--dig-holes" option
  2014-02-14 10:47       ` Karel Zak
  2014-02-14 11:34         ` Karel Zak
  2014-02-14 13:07         ` Rodrigo Campos
@ 2014-02-17 10:32         ` Karel Zak
  2014-02-17 14:15           ` Rodrigo Campos
  2 siblings, 1 reply; 20+ messages in thread
From: Karel Zak @ 2014-02-17 10:32 UTC (permalink / raw)
  To: Rodrigo Campos; +Cc: util-linux

On Fri, Feb 14, 2014 at 11:47:56AM +0100, Karel Zak wrote:
> On Sun, Jan 26, 2014 at 03:06:50PM +0000, Rodrigo Campos wrote:
> >  bash-completion/fallocate |   2 +-
> >  sys-utils/fallocate.1     |  19 +++++++-
> >  sys-utils/fallocate.c     | 114 ++++++++++++++++++++++++++++++++++++++++------
> >  3 files changed, 120 insertions(+), 15 deletions(-)
> 
>  Applied with some changes and I believe that the code still need some
>  improvements, see below


 OK, I did some changes to the code:

 The minimal hole size is based on filesystem blocksize (st_blksize) and 
 --length is no more used for this thing. I think it's better to use
 buffer size which "just works" than expect any input from users. It
 also dramatically simplify the code and the command semantic.

 The original 32K is too large, I guess that we prefer optimal sparse
 files rather than --dig-hole speed. (And it isn't so slow with 4KiB
 chunks.)

 Now the --offset and --length options are used as usually to specify
 an area with in the file.


$ ls -lash yyy
1.1G -rw-rw-r-- 1 kzak kzak 1.0G Feb 17 11:09 yyy
^^^^

$ time ./fallocate --dig-holes yyy
real    0m0.393s
user    0m0.045s
sys     0m0.345s

$ ls -lash yyy
28K -rw-rw-r-- 1 kzak kzak 1.0G Feb 17 11:10 yyy
^^^

  .. so from 1GiB to 28K in 0.4 second.


    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] fallocate: Add "--dig-holes" option
  2014-02-17 10:32         ` Karel Zak
@ 2014-02-17 14:15           ` Rodrigo Campos
  2014-02-17 14:49             ` Karel Zak
  0 siblings, 1 reply; 20+ messages in thread
From: Rodrigo Campos @ 2014-02-17 14:15 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux

On Mon, Feb 17, 2014 at 11:32:48AM +0100, Karel Zak wrote:
> On Fri, Feb 14, 2014 at 11:47:56AM +0100, Karel Zak wrote:
> > On Sun, Jan 26, 2014 at 03:06:50PM +0000, Rodrigo Campos wrote:
> > >  bash-completion/fallocate |   2 +-
> > >  sys-utils/fallocate.1     |  19 +++++++-
> > >  sys-utils/fallocate.c     | 114 ++++++++++++++++++++++++++++++++++++++++------
> > >  3 files changed, 120 insertions(+), 15 deletions(-)
> > 
> >  Applied with some changes and I believe that the code still need some
> >  improvements, see below
> 
> 
>  OK, I did some changes to the code:
> 
>  The minimal hole size is based on filesystem blocksize (st_blksize) and 
>  --length is no more used for this thing. I think it's better to use

This was not updated in the manpage, where it shows the run modes. It still
says on the top "fallocate -d [-l length] filename"

>  buffer size which "just works" than expect any input from users. It
>  also dramatically simplify the code and the command semantic.

I looked at the code and this is probably too personal :)

> 
>  The original 32K is too large, I guess that we prefer optimal sparse
>  files rather than --dig-hole speed. (And it isn't so slow with 4KiB
>  chunks.)
> 
>  Now the --offset and --length options are used as usually to specify
>  an area with in the file.
> 
> 
> $ ls -lash yyy
> 1.1G -rw-rw-r-- 1 kzak kzak 1.0G Feb 17 11:09 yyy
> ^^^^
> 
> $ time ./fallocate --dig-holes yyy
> real    0m0.393s
> user    0m0.045s
> sys     0m0.345s
> 
> $ ls -lash yyy
> 28K -rw-rw-r-- 1 kzak kzak 1.0G Feb 17 11:10 yyy
> ^^^

How did you create the file ? With fallocate ? Is it all zeros ? If it's all
zeros, 28K used seems like a bug maybe ?

Any comparisson to see how much it took before (to see the improvement or the
slowdown) seems interesting. But, in any case, it doesn't seem *too* slow
(although 1GB is kind of a small file), and maybe is even faster :)


I can't test it in my new laptop because it fails to compile (autogen and
configure run fine). It throws:

	rm -f ca.gmo && : -c --statistics --verbose -o ca.gmo ca.po
	mv: cannot stat ‘t-ca.gmo’: No such file or directory
	make[3]: *** [ca.gmo] Error 1


is this a known issue ?





Thanks a lot,
Rodrigo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] fallocate: Add "--dig-holes" option
  2014-02-17 14:15           ` Rodrigo Campos
@ 2014-02-17 14:49             ` Karel Zak
  2014-02-17 15:42               ` Rodrigo Campos
  0 siblings, 1 reply; 20+ messages in thread
From: Karel Zak @ 2014-02-17 14:49 UTC (permalink / raw)
  To: Rodrigo Campos; +Cc: util-linux

On Mon, Feb 17, 2014 at 02:15:29PM +0000, Rodrigo Campos wrote:
> On Mon, Feb 17, 2014 at 11:32:48AM +0100, Karel Zak wrote:
> > On Fri, Feb 14, 2014 at 11:47:56AM +0100, Karel Zak wrote:
> > > On Sun, Jan 26, 2014 at 03:06:50PM +0000, Rodrigo Campos wrote:
> > > >  bash-completion/fallocate |   2 +-
> > > >  sys-utils/fallocate.1     |  19 +++++++-
> > > >  sys-utils/fallocate.c     | 114 ++++++++++++++++++++++++++++++++++++++++------
> > > >  3 files changed, 120 insertions(+), 15 deletions(-)
> > > 
> > >  Applied with some changes and I believe that the code still need some
> > >  improvements, see below
> > 
> > 
> >  OK, I did some changes to the code:
> > 
> >  The minimal hole size is based on filesystem blocksize (st_blksize) and 
> >  --length is no more used for this thing. I think it's better to use
> 
> This was not updated in the manpage, where it shows the run modes. It still
> says on the top "fallocate -d [-l length] filename"

 Oh.. synopsis. Thanks, fixed now.

> > 28K -rw-rw-r-- 1 kzak kzak 1.0G Feb 17 11:10 yyy
> > ^^^
> 
> How did you create the file ? With fallocate ? Is it all zeros ? If it's all
> zeros, 28K used seems like a bug maybe ?

 It's file with 6 holes with different sizes (1024, 10240, 102400 ....etc).

> Any comparisson to see how much it took before (to see the improvement or the

old version:

        $ ls -lash yyy
        1.1G -rw-rw-r-- 1 kzak kzak 1.0G Feb 17 15:39 yyy

        $ time ./fallocate --dig-holes
        yyy

        real    0m0.525s
        user    0m0.057s
        sys     0m0.463s

        $ ls -lash yyy
        164K -rw-rw-r-- 1 kzak kzak 1.0G Feb 17 15:39 yyy

 The 164K is because the small holes are ignored due to 32K buffer size.

 The speed improvement is probably because new code calls fallocate()
 only when end of the hole is detected. The another possibility is
 posix_fadvise(), but I have doubts that kernel readahead is so good.

> I can't test it in my new laptop because it fails to compile (autogen and
> configure run fine). It throws:
> 
> 	rm -f ca.gmo && : -c --statistics --verbose -o ca.gmo ca.po
> 	mv: cannot stat ‘t-ca.gmo’: No such file or directory
> 	make[3]: *** [ca.gmo] Error 1
> 
> 
> is this a known issue ?

 hmm... try git clean -xdf; ./autogen.sh ....

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] fallocate: Add "--dig-holes" option
  2014-02-17 14:49             ` Karel Zak
@ 2014-02-17 15:42               ` Rodrigo Campos
  0 siblings, 0 replies; 20+ messages in thread
From: Rodrigo Campos @ 2014-02-17 15:42 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux

On Mon, Feb 17, 2014 at 03:49:07PM +0100, Karel Zak wrote:
> On Mon, Feb 17, 2014 at 02:15:29PM +0000, Rodrigo Campos wrote:
> > On Mon, Feb 17, 2014 at 11:32:48AM +0100, Karel Zak wrote:
> > > On Fri, Feb 14, 2014 at 11:47:56AM +0100, Karel Zak wrote:
> > > > On Sun, Jan 26, 2014 at 03:06:50PM +0000, Rodrigo Campos wrote:
> > > > >  bash-completion/fallocate |   2 +-
> > > > >  sys-utils/fallocate.1     |  19 +++++++-
> > > > >  sys-utils/fallocate.c     | 114 ++++++++++++++++++++++++++++++++++++++++------
> > > > >  3 files changed, 120 insertions(+), 15 deletions(-)
> > > > 
> > > >  Applied with some changes and I believe that the code still need some
> > > >  improvements, see below
> > > 
> > > 
> > >  OK, I did some changes to the code:
> > > 
> > >  The minimal hole size is based on filesystem blocksize (st_blksize) and 
> > >  --length is no more used for this thing. I think it's better to use
> > 
> > This was not updated in the manpage, where it shows the run modes. It still
> > says on the top "fallocate -d [-l length] filename"
> 
>  Oh.. synopsis. Thanks, fixed now.

Np :)

> 
> > > 28K -rw-rw-r-- 1 kzak kzak 1.0G Feb 17 11:10 yyy
> > > ^^^
> > 
> > How did you create the file ? With fallocate ? Is it all zeros ? If it's all
> > zeros, 28K used seems like a bug maybe ?
> 
>  It's file with 6 holes with different sizes (1024, 10240, 102400 ....etc).

Ok, 28K may be okay then :)

> 
> > Any comparisson to see how much it took before (to see the improvement or the
> 
> old version:
> 
>         $ ls -lash yyy
>         1.1G -rw-rw-r-- 1 kzak kzak 1.0G Feb 17 15:39 yyy
> 
>         $ time ./fallocate --dig-holes
>         yyy
> 
>         real    0m0.525s
>         user    0m0.057s
>         sys     0m0.463s
> 
>         $ ls -lash yyy
>         164K -rw-rw-r-- 1 kzak kzak 1.0G Feb 17 15:39 yyy
> 
>  The 164K is because the small holes are ignored due to 32K buffer size.
> 
>  The speed improvement is probably because new code calls fallocate()
>  only when end of the hole is detected. The another possibility is
>  posix_fadvise(), but I have doubts that kernel readahead is so good.

Yeah, is probably avoiding the syscall IMHO too.

> > I can't test it in my new laptop because it fails to compile (autogen and
> > configure run fine). It throws:
> > 
> > 	rm -f ca.gmo && : -c --statistics --verbose -o ca.gmo ca.po
> > 	mv: cannot stat ‘t-ca.gmo’: No such file or directory
> > 	make[3]: *** [ca.gmo] Error 1
> > 
> > 
> > is this a known issue ?
> 
>  hmm... try git clean -xdf; ./autogen.sh ....

It's pretty much a fresh clone. I tried that and didn't help, also tried a
completely new clone, and fails the same way (same error). I'm using a Debian
testing/unstable in a 64bit machine now... (but doesn't seem 64bit related)

Will try to investigate later, when I have some spare time




Thanks,
Rodrigo

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-02-17 15:42 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-25 19:17 fallocate: Add "--dig-holes" options and minor fixes Rodrigo Campos
2014-01-25 19:17 ` [PATCH 1/3] fallocate: Clarify that space can also be deallocated Rodrigo Campos
2014-01-25 19:17 ` [PATCH 2/3] fallocate: Hide #ifdef tricks to call fallocate in a function Rodrigo Campos
2014-01-25 19:17 ` [PATCH 3/3] fallocate: Add "--dig-holes" option Rodrigo Campos
2014-01-26 14:43   ` [PATCH v2] " Rodrigo Campos
2014-01-26 15:06     ` [PATCH v3] " Rodrigo Campos
2014-02-14 10:47       ` Karel Zak
2014-02-14 11:34         ` Karel Zak
2014-02-14 13:16           ` Rodrigo Campos
2014-02-14 13:07         ` Rodrigo Campos
2014-02-14 13:30           ` Karel Zak
2014-02-14 13:35             ` Rodrigo Campos
2014-02-17 10:32         ` Karel Zak
2014-02-17 14:15           ` Rodrigo Campos
2014-02-17 14:49             ` Karel Zak
2014-02-17 15:42               ` Rodrigo Campos
2014-01-25 19:23 ` fallocate: Add "--dig-holes" options and minor fixes Rodrigo Campos
     [not found]   ` <20140201040414.GA23360@sdfg.com.ar>
2014-02-11 11:31     ` Rodrigo Campos
2014-02-11 18:34       ` Karel Zak
2014-02-11 18:43         ` Rodrigo Campos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox