git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH/RFC v3 0/4] Improving performance of git clean
@ 2015-04-18 20:41 Erik Elfström
  2015-04-18 20:41 ` [PATCH/RFC v3 1/4] setup: add gentle version of read_gitfile Erik Elfström
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Erik Elfström @ 2015-04-18 20:41 UTC (permalink / raw)
  To: git; +Cc: Erik Elfström

I've marked this RFC since there are known problems here.

v2 of the patch can be found here:
http://thread.gmane.org/gmane.comp.version-control.git/267023/focus=267023

Changes in v3:
* Created setup.c:read_gitfile_gently to use for submodule
  probing
* Cleanup of some tests by use of test_commit helper
* Added more tests of cleaning in the presence of submodules
* Reversed expectation of test for cleaning nested bare repos.
  They are now expected to be cleaned. Added one more case.
* Fixed bug where submodules could be cleaned by using new
  read_gitfile_gently for additional submodule check in
  clean.c:is_git_repository
* Attempt to change behavior of patch implementation to clean
  bare repositories (only partially successful)
* Reworded commit message of the performance fix commit

Known Problems:
* Unsure about the setup.c:read_gitfile refactor, feels a bit
  messy?
* Potentially a missing sanity check of git file size in
  setup.c:read_gitfile_gently_or_non_gently
* We still get a behavioral change for empty bare repositories
  placed in a ".git" directory. Currently we clean empty bare
  repos in a .git folder but not non-empty one. After this
  patch we won't clean either. How serious is this? Is there
  an easy fix (preferebly to clean all bare repositories)?
* Still have issues in the performance tests, see comments
  from Thomas Gummerer on v2

Thanks to Junio C Hamano and Jeff King for spotting fundamental
problems in v2 and suggesting a solution.

Erik Elfström (4):
  setup: add gentle version of read_gitfile
  t7300: add tests to document behavior of clean and nested git
  p7300: add performance tests for clean
  clean: improve performance when removing lots of directories

 builtin/clean.c       |  25 ++++++++--
 cache.h               |   1 +
 setup.c               |  94 ++++++++++++++++++++++++++++---------
 t/perf/p7300-clean.sh |  37 +++++++++++++++
 t/t7300-clean.sh      | 125 ++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 257 insertions(+), 25 deletions(-)
 create mode 100755 t/perf/p7300-clean.sh

-- 
2.4.0.rc2.5.g2871d5e

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH/RFC v3 1/4] setup: add gentle version of read_gitfile
  2015-04-18 20:41 [PATCH/RFC v3 0/4] Improving performance of git clean Erik Elfström
@ 2015-04-18 20:41 ` Erik Elfström
  2015-04-18 20:41 ` [PATCH/RFC v3 2/4] t7300: add tests to document behavior of clean and nested git Erik Elfström
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Erik Elfström @ 2015-04-18 20:41 UTC (permalink / raw)
  To: git; +Cc: Erik Elfström

read_gitfile will die on most error cases. This makes it unsuitable
for speculative calls. Extract the core logic and provide a gentle
version that returns NULL on failure.

The first usecase of the new gentle version will be to probe for
submodules during git clean.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
---
If this is going to be used for speculative probing should there
be a sanity check before:
	buf = xmalloc(st.st_size + 1);
	len = read_in_full(fd, buf, st.st_size);
Something like:
	if (st.st_size > PATH_MAX*2) {
		error = N;
		goto cleanup_return;
	{
What do you think?

 cache.h |  1 +
 setup.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++---------------
 2 files changed, 74 insertions(+), 21 deletions(-)

diff --git a/cache.h b/cache.h
index 3d3244b..9d9199d 100644
--- a/cache.h
+++ b/cache.h
@@ -432,6 +432,7 @@ extern const char *get_git_namespace(void);
 extern const char *strip_namespace(const char *namespaced_ref);
 extern const char *get_git_work_tree(void);
 extern const char *read_gitfile(const char *path);
+extern const char *read_gitfile_gently(const char *path);
 extern const char *resolve_gitdir(const char *suspect);
 extern void set_git_work_tree(const char *tree);
 
diff --git a/setup.c b/setup.c
index 979b13f..a33b293 100644
--- a/setup.c
+++ b/setup.c
@@ -332,38 +332,46 @@ static int check_repository_format_gently(const char *gitdir, int *nongit_ok)
 	return 0;
 }
 
-/*
- * Try to read the location of the git directory from the .git file,
- * return path to git directory if found.
- */
-const char *read_gitfile(const char *path)
+static const char *read_gitfile_gently_or_non_gently(const char *path, int gently)
 {
-	char *buf;
+	int error = 0;
+	char *buf = NULL;
 	char *dir;
 	const char *slash;
 	struct stat st;
 	int fd;
 	ssize_t len;
-
-	if (stat(path, &st))
-		return NULL;
-	if (!S_ISREG(st.st_mode))
-		return NULL;
+	if (stat(path, &st)) {
+		error = 1;
+		goto cleanup_return;
+	}
+	if (!S_ISREG(st.st_mode)) {
+		error = 2;
+		goto cleanup_return;
+	}
 	fd = open(path, O_RDONLY);
-	if (fd < 0)
-		die_errno("Error opening '%s'", path);
+	if (fd < 0) {
+		error = 3;
+		goto cleanup_return;
+	}
 	buf = xmalloc(st.st_size + 1);
 	len = read_in_full(fd, buf, st.st_size);
 	close(fd);
-	if (len != st.st_size)
-		die("Error reading %s", path);
+	if (len != st.st_size) {
+		error = 4;
+		goto cleanup_return;
+	}
 	buf[len] = '\0';
-	if (!starts_with(buf, "gitdir: "))
-		die("Invalid gitfile format: %s", path);
+	if (!starts_with(buf, "gitdir: ")) {
+		error = 5;
+		goto cleanup_return;
+	}
 	while (buf[len - 1] == '\n' || buf[len - 1] == '\r')
 		len--;
-	if (len < 9)
-		die("No path in gitfile: %s", path);
+	if (len < 9) {
+		error = 6;
+		goto cleanup_return;
+	}
 	buf[len] = '\0';
 	dir = buf + 8;
 
@@ -378,14 +386,58 @@ const char *read_gitfile(const char *path)
 		buf = dir;
 	}
 
-	if (!is_git_directory(dir))
-		die("Not a git repository: %s", dir);
+	if (!is_git_directory(dir)) {
+		error = 7;
+		goto cleanup_return;
+	}
 	path = real_path(dir);
 
+cleanup_return:
 	free(buf);
+
+	if (error) {
+		if (gently)
+			return NULL;
+
+		switch (error) {
+		case 1: // failed to stat
+		case 2: // not regular file
+			return NULL;
+		case 3:
+			die_errno("Error opening '%s'", path);
+		case 4:
+			die("Error reading %s", path);
+		case 5:
+			die("Invalid gitfile format: %s", path);
+		case 6:
+			die("No path in gitfile: %s", path);
+		case 7:
+			die("Not a git repository: %s", dir);
+		default:
+			assert(0);
+		}
+	}
+
 	return path;
 }
 
+/*
+ * Try to read the location of the git directory from the .git file,
+ * return path to git directory if found, die on (most) failures.
+ */
+const char *read_gitfile(const char *path)
+{
+	return read_gitfile_gently_or_non_gently(path, 0);
+}
+
+/*
+ * Same as read_gitfile but return NULL on failure.
+ */
+const char *read_gitfile_gently(const char *path)
+{
+	return read_gitfile_gently_or_non_gently(path, 1);
+}
+
 static const char *setup_explicit_git_dir(const char *gitdirenv,
 					  struct strbuf *cwd,
 					  int *nongit_ok)
-- 
2.4.0.rc2.5.g2871d5e

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH/RFC v3 2/4] t7300: add tests to document behavior of clean and nested git
  2015-04-18 20:41 [PATCH/RFC v3 0/4] Improving performance of git clean Erik Elfström
  2015-04-18 20:41 ` [PATCH/RFC v3 1/4] setup: add gentle version of read_gitfile Erik Elfström
@ 2015-04-18 20:41 ` Erik Elfström
  2015-04-18 20:41 ` [PATCH/RFC v3 3/4] p7300: add performance tests for clean Erik Elfström
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Erik Elfström @ 2015-04-18 20:41 UTC (permalink / raw)
  To: git; +Cc: Erik Elfström

Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
---
 t/t7300-clean.sh | 127 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 127 insertions(+)

diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 99be5d9..4b9a72a 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -455,6 +455,133 @@ test_expect_success 'nested git work tree' '
 	! test -d bar
 '
 
+test_expect_failure 'should clean things that almost look like git but are not' '
+	rm -fr almost_git almost_bare_git almost_submodule &&
+	mkdir -p almost_git/.git/objects &&
+	mkdir -p almost_git/.git/refs &&
+	cat >almost_git/.git/HEAD <<-\EOF &&
+	garbage
+	EOF
+	cp -r almost_git/.git/ almost_bare_git &&
+	mkdir almost_submodule/ &&
+	cat >almost_submodule/.git <<-\EOF &&
+	garbage
+	EOF
+	test_when_finished "rm -rf almost_*" &&
+	## This will fail due to die("Invalid gitfile format: %s", path); in
+	## setup.c:read_gitfile.
+	git clean -f -d &&
+	test_path_is_missing almost_git &&
+	test_path_is_missing almost_bare_git &&
+	test_path_is_missing almost_submodule
+'
+
+test_expect_success 'should not clean submodules' '
+	rm -fr repo to_clean sub1 sub2 &&
+	mkdir repo to_clean &&
+	(
+		cd repo &&
+		git init &&
+		test_commit msg hello.world
+	) &&
+	git submodule add ./repo/.git sub1 &&
+	git commit -m "sub1" &&
+	git branch before_sub2 &&
+	git submodule add ./repo/.git sub2 &&
+	git commit -m "sub2" &&
+	git checkout before_sub2 &&
+	>to_clean/should_clean.this &&
+	git clean -f -d &&
+	test_path_is_file repo/.git/index &&
+	test_path_is_file repo/hello.world &&
+	test_path_is_file sub1/.git &&
+	test_path_is_file sub1/hello.world &&
+	test_path_is_file sub2/.git &&
+	test_path_is_file sub2/hello.world &&
+	test_path_is_missing to_clean
+'
+
+test_expect_failure 'nested (empty) git should be kept' '
+	rm -fr foo bar &&
+	git init foo &&
+	mkdir bar &&
+	>bar/goodbye.people &&
+	git clean -f -d &&
+	test_path_is_file foo/.git/HEAD &&
+	test_path_is_missing bar
+'
+
+test_expect_success 'nested bare repositories should be cleaned' '
+	rm -fr bare1 bare2 subdir &&
+	git init --bare bare1 &&
+	git clone --local --bare . bare2 &&
+	mkdir subdir &&
+	cp -r bare2 subdir/bare3 &&
+	git clean -f -d &&
+	test_path_is_missing bare1 &&
+	test_path_is_missing bare2 &&
+	test_path_is_missing subdir
+'
+
+test_expect_success 'nested (empty) bare repositories should be cleaned even when in .git' '
+	rm -fr strange_bare &&
+	mkdir strange_bare &&
+	git init --bare strange_bare/.git &&
+	git clean -f -d &&
+	test_path_is_missing strange_bare
+'
+
+test_expect_failure 'nested (non-empty) bare repositories should be cleaned even when in .git' '
+	rm -fr strange_bare &&
+	mkdir strange_bare &&
+	git clone --local --bare . strange_bare/.git &&
+	git clean -f -d &&
+	test_path_is_missing strange_bare
+'
+
+test_expect_success 'giving path in nested git work tree will remove it' '
+	rm -fr foo &&
+	mkdir foo &&
+	(
+		cd foo &&
+		git init &&
+		mkdir -p bar/baz &&
+		test_commit msg bar/baz/hello.world
+	) &&
+	git clean -f -d foo/bar/baz &&
+	test_path_is_file foo/.git/HEAD &&
+	test_path_is_dir foo/bar/ &&
+	test_path_is_missing foo/bar/baz
+'
+
+test_expect_success 'giving path to nested .git will not remove it' '
+	rm -fr foo &&
+	mkdir foo bar &&
+	(
+		cd foo &&
+		git init &&
+		test_commit msg hello.world
+	) &&
+	git clean -f -d foo/.git &&
+	test_path_is_file foo/.git/HEAD &&
+	test_path_is_dir foo/.git/refs &&
+	test_path_is_dir foo/.git/objects &&
+	test_path_is_dir bar/
+'
+
+test_expect_success 'giving path to nested .git/ will remove contents' '
+	rm -fr foo bar &&
+	mkdir foo bar &&
+	(
+		cd foo &&
+		git init &&
+		test_commit msg hello.world
+	) &&
+	git clean -f -d foo/.git/ &&
+	test_path_is_dir foo/.git &&
+	test_dir_is_empty foo/.git
+'
+
 test_expect_success 'force removal of nested git work tree' '
 	rm -fr foo bar baz &&
 	mkdir -p foo bar baz/boo &&
-- 
2.4.0.rc2.5.g2871d5e

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH/RFC v3 3/4] p7300: add performance tests for clean
  2015-04-18 20:41 [PATCH/RFC v3 0/4] Improving performance of git clean Erik Elfström
  2015-04-18 20:41 ` [PATCH/RFC v3 1/4] setup: add gentle version of read_gitfile Erik Elfström
  2015-04-18 20:41 ` [PATCH/RFC v3 2/4] t7300: add tests to document behavior of clean and nested git Erik Elfström
@ 2015-04-18 20:41 ` Erik Elfström
  2015-04-18 20:41 ` [PATCH/RFC v3 4/4] clean: improve performance when removing lots of directories Erik Elfström
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Erik Elfström @ 2015-04-18 20:41 UTC (permalink / raw)
  To: git; +Cc: Erik Elfström

Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
---
 t/perf/p7300-clean.sh | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)
 create mode 100755 t/perf/p7300-clean.sh

diff --git a/t/perf/p7300-clean.sh b/t/perf/p7300-clean.sh
new file mode 100755
index 0000000..af50d5d
--- /dev/null
+++ b/t/perf/p7300-clean.sh
@@ -0,0 +1,37 @@
+#!/bin/sh
+
+test_description="Test git-clean performance"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+test_checkout_worktree
+
+test_expect_success 'setup untracked directory with many sub dirs' '
+	rm -rf 500_sub_dirs 50000_sub_dirs clean_test_dir &&
+	mkdir 500_sub_dirs 50000_sub_dirs clean_test_dir &&
+	for i in $(test_seq 1 500)
+	do
+		mkdir 500_sub_dirs/dir$i || return $?
+	done &&
+	for i in $(test_seq 1 100)
+	do
+		cp -r 500_sub_dirs 50000_sub_dirs/dir$i || return $?
+	done
+'
+
+test_perf 'clean many untracked sub dirs, check for nested git' '
+	rm -rf clean_test_dir/50000_sub_dirs_cpy &&
+	cp -r 50000_sub_dirs clean_test_dir/50000_sub_dirs_cpy &&
+	git clean -q -f -d  clean_test_dir/ &&
+	test_dir_is_empty clean_test_dir
+'
+
+test_perf 'clean many untracked sub dirs, ignore nested git' '
+	rm -rf clean_test_dir/50000_sub_dirs_cpy &&
+	cp -r 50000_sub_dirs clean_test_dir/50000_sub_dirs_cpy &&
+	git clean -q -f -f -d  clean_test_dir/ &&
+	test_dir_is_empty clean_test_dir
+'
+
+test_done
-- 
2.4.0.rc2.5.g2871d5e

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH/RFC v3 4/4] clean: improve performance when removing lots of directories
  2015-04-18 20:41 [PATCH/RFC v3 0/4] Improving performance of git clean Erik Elfström
                   ` (2 preceding siblings ...)
  2015-04-18 20:41 ` [PATCH/RFC v3 3/4] p7300: add performance tests for clean Erik Elfström
@ 2015-04-18 20:41 ` Erik Elfström
  2015-04-19  1:14 ` [PATCH/RFC v3 0/4] Improving performance of git clean Junio C Hamano
  2015-04-20 22:14 ` Thomas Gummerer
  5 siblings, 0 replies; 14+ messages in thread
From: Erik Elfström @ 2015-04-18 20:41 UTC (permalink / raw)
  To: git; +Cc: Erik Elfström

"git clean" uses resolve_gitlink_ref() to check for the presence of
nested git repositories, but it has the drawback of creating a
ref_cache entry for every directory that should potentially be
cleaned. The linear search through the ref_cache list causes a massive
performance hit for large number of directories.

Modify clean.c:remove_dirs to use setup.c:is_git_directory and
setup.c:read_gitfile_gently instead.

Both these functions will open files and parse contents when they find
something that looks like a git repository. This is ok from a
performance standpoint since finding repository candidates should be
comparatively rare.

Using is_git_directory and read_gitfile_gently should give a more
standardized check for what is and what isn't a git repository but
also gives a slight behavioral change. We will now detect and respect
empty nested git repositories (only init run) and empty bare
repositories that have been placed in a ".git" directory. We will also
no longer die when cleaning a file named ".git" with garbage content
(it will be cleaned instead). Update t7300 to reflect this.

The time to clean an untracked directory containing 100000 sub
directories went from 61s to 1.7s after this change.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
---
 builtin/clean.c  | 25 +++++++++++++++++++++----
 t/t7300-clean.sh |  8 +++-----
 2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 98c103f..5cda3c5 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -10,7 +10,6 @@
 #include "cache.h"
 #include "dir.h"
 #include "parse-options.h"
-#include "refs.h"
 #include "string-list.h"
 #include "quote.h"
 #include "column.h"
@@ -148,6 +147,26 @@ static int exclude_cb(const struct option *opt, const char *arg, int unset)
 	return 0;
 }
 
+/*
+ * Return 1 if the given path is the root of a git repository or
+ * submodule else 0. Will not return 1 for bare repositories with the
+ * exception of creating a bare repository in "foo/.git" and calling
+ * is_git_repository("foo").
+ */
+static int is_git_repository(struct strbuf *path)
+{
+	int ret = 0;
+	size_t orig_path_len = path->len;
+	assert(orig_path_len != 0);
+	if (path->buf[orig_path_len - 1] != '/')
+		strbuf_addch(path, '/');
+	strbuf_addstr(path, ".git");
+	if (read_gitfile_gently(path->buf) || is_git_directory(path->buf))
+		ret = 1;
+	strbuf_setlen(path, orig_path_len);
+	return ret;
+}
+
 static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
 		int dry_run, int quiet, int *dir_gone)
 {
@@ -155,13 +174,11 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
 	struct strbuf quoted = STRBUF_INIT;
 	struct dirent *e;
 	int res = 0, ret = 0, gone = 1, original_len = path->len, len;
-	unsigned char submodule_head[20];
 	struct string_list dels = STRING_LIST_INIT_DUP;
 
 	*dir_gone = 1;
 
-	if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) &&
-			!resolve_gitlink_ref(path->buf, "HEAD", submodule_head)) {
+	if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) && is_git_repository(path)) {
 		if (!quiet) {
 			quote_path_relative(path->buf, prefix, &quoted);
 			printf(dry_run ?  _(msg_would_skip_git_dir) : _(msg_skip_git_dir),
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 4b9a72a..1bbb8ef 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -455,7 +455,7 @@ test_expect_success 'nested git work tree' '
 	! test -d bar
 '
 
-test_expect_failure 'should clean things that almost look like git but are not' '
+test_expect_success 'should clean things that almost look like git but are not' '
 	rm -fr almost_git almost_bare_git almost_submodule &&
 	mkdir -p almost_git/.git/objects &&
 	mkdir -p almost_git/.git/refs &&
@@ -468,8 +468,6 @@ test_expect_failure 'should clean things that almost look like git but are not'
 	garbage
 	EOF
 	test_when_finished "rm -rf almost_*" &&
-	## This will fail due to die("Invalid gitfile format: %s", path); in
-	## setup.c:read_gitfile.
 	git clean -f -d &&
 	test_path_is_missing almost_git &&
 	test_path_is_missing almost_bare_git &&
@@ -501,7 +499,7 @@ test_expect_success 'should not clean submodules' '
 	test_path_is_missing to_clean
 '
 
-test_expect_failure 'nested (empty) git should be kept' '
+test_expect_success 'nested (empty) git should be kept' '
 	rm -fr foo bar &&
 	git init foo &&
 	mkdir bar &&
@@ -523,7 +521,7 @@ test_expect_success 'nested bare repositories should be cleaned' '
 	test_path_is_missing subdir
 '
 
-test_expect_success 'nested (empty) bare repositories should be cleaned even when in .git' '
+test_expect_failure 'nested (empty) bare repositories should be cleaned even when in .git' '
 	rm -fr strange_bare &&
 	mkdir strange_bare &&
 	git init --bare strange_bare/.git &&
-- 
2.4.0.rc2.5.g2871d5e

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH/RFC v3 0/4] Improving performance of git clean
  2015-04-18 20:41 [PATCH/RFC v3 0/4] Improving performance of git clean Erik Elfström
                   ` (3 preceding siblings ...)
  2015-04-18 20:41 ` [PATCH/RFC v3 4/4] clean: improve performance when removing lots of directories Erik Elfström
@ 2015-04-19  1:14 ` Junio C Hamano
  2015-04-21 18:17   ` erik elfström
  2015-04-20 22:14 ` Thomas Gummerer
  5 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2015-04-19  1:14 UTC (permalink / raw)
  To: Erik Elfström; +Cc: git

Erik Elfström <erik.elfstrom@gmail.com> writes:

> Known Problems:
> * Unsure about the setup.c:read_gitfile refactor, feels a bit
>   messy?

The interface indeed feels somewhat messy.  I suspect that a better
interface might be more like setup_git_directory_gently() that is a
gentler version of setup_git_directory().  The function takes an
optional pointer to an int, and it behaves not-gently-at-all when
the optional argument is NULL.  The location the optional pointer
points at, when it is not NULL, can be used to return the error
state to the caller.  That function pair only uses the optional
argument to convey one bit of information (i.e. "are we in any git
repository or not?") back to the caller, but the interface could be
used to tell the caller a lot more if we wanted to.

If you model read_gitfile_gently() after that pattern, I would
expect that

 - The extra pattern would be "int *error";
 - The implementation of read_gitfile() would be

       #define read_gitfile(path) read_gitfile_gently((path), NULL)

   and the _gently() version will die when "error" parameter is set
   to NULL and finds any error.

 - The caller of the gentle variant can use the error code to
   determine what went wrong, not just the fact that it failed.  I
   do not think your caller does not have an immediate need to tell
   between "invalid gitfile format" and "No path in gitfile", but
   such an interface leaves that possibility open.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH/RFC v3 0/4] Improving performance of git clean
  2015-04-18 20:41 [PATCH/RFC v3 0/4] Improving performance of git clean Erik Elfström
                   ` (4 preceding siblings ...)
  2015-04-19  1:14 ` [PATCH/RFC v3 0/4] Improving performance of git clean Junio C Hamano
@ 2015-04-20 22:14 ` Thomas Gummerer
  2015-04-21 18:21   ` erik elfström
  5 siblings, 1 reply; 14+ messages in thread
From: Thomas Gummerer @ 2015-04-20 22:14 UTC (permalink / raw)
  To: Erik,
	=?iso-8859-1?Q?Elfstr=F6m_=3Cerik=2Eelfstrom=40gmail=2Ecom=3E?=; +Cc: git

On 04/18, Erik Elfström wrote:
> * Still have issues in the performance tests, see comments
>   from Thomas Gummerer on v2

I've looked at the "modern" style tests again, and I don't the code
churn is worth it just for using them for the performance tests.  If
anyone wants to take a look at the code, it's at
github.com/tgummerer/git tg/perf-lib.

I think adding the test_perf_setup_cleanup command would make more
sense in this case.  If you want I can send a patch for that.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH/RFC v3 0/4] Improving performance of git clean
  2015-04-19  1:14 ` [PATCH/RFC v3 0/4] Improving performance of git clean Junio C Hamano
@ 2015-04-21 18:17   ` erik elfström
  0 siblings, 0 replies; 14+ messages in thread
From: erik elfström @ 2015-04-21 18:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List

On Sun, Apr 19, 2015 at 3:14 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Erik Elfström <erik.elfstrom@gmail.com> writes:
>
>> Known Problems:
>> * Unsure about the setup.c:read_gitfile refactor, feels a bit
>>   messy?
>
> The interface indeed feels somewhat messy.  I suspect that a better
> interface might be more like setup_git_directory_gently() that is a
> gentler version of setup_git_directory().  The function takes an
> optional pointer to an int, and it behaves not-gently-at-all when
> the optional argument is NULL.  The location the optional pointer
> points at, when it is not NULL, can be used to return the error
> state to the caller.  That function pair only uses the optional
> argument to convey one bit of information (i.e. "are we in any git
> repository or not?") back to the caller, but the interface could be
> used to tell the caller a lot more if we wanted to.
>
> If you model read_gitfile_gently() after that pattern, I would
> expect that
>
>  - The extra pattern would be "int *error";
>  - The implementation of read_gitfile() would be
>
>        #define read_gitfile(path) read_gitfile_gently((path), NULL)
>
>    and the _gently() version will die when "error" parameter is set
>    to NULL and finds any error.
>
>  - The caller of the gentle variant can use the error code to
>    determine what went wrong, not just the fact that it failed.  I
>    do not think your caller does not have an immediate need to tell
>    between "invalid gitfile format" and "No path in gitfile", but
>    such an interface leaves that possibility open.
>
> Thanks.
>

Ok, I'll try that approach, thanks. (and apologies for the late reply)

/Erik

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH/RFC v3 0/4] Improving performance of git clean
  2015-04-20 22:14 ` Thomas Gummerer
@ 2015-04-21 18:21   ` erik elfström
  2015-04-21 19:02     ` Junio C Hamano
  2015-04-21 21:24     ` Jeff King
  0 siblings, 2 replies; 14+ messages in thread
From: erik elfström @ 2015-04-21 18:21 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: Git List

Ok, thanks for looking into this.

I have no well founded opinions on the implementation but I do
think the performance tests would be more meaningful if the
setup/cleanup code could be removed from the timed section.
If the community agrees on an implementation I would be happy
to convert the new tests, either directly in this series or as a follow
up if that is preferred.

/Erik

On Tue, Apr 21, 2015 at 12:14 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> On 04/18, Erik Elfström wrote:
>> * Still have issues in the performance tests, see comments
>>   from Thomas Gummerer on v2
>
> I've looked at the "modern" style tests again, and I don't the code
> churn is worth it just for using them for the performance tests.  If
> anyone wants to take a look at the code, it's at
> github.com/tgummerer/git tg/perf-lib.
>
> I think adding the test_perf_setup_cleanup command would make more
> sense in this case.  If you want I can send a patch for that.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH/RFC v3 0/4] Improving performance of git clean
  2015-04-21 18:21   ` erik elfström
@ 2015-04-21 19:02     ` Junio C Hamano
  2015-04-21 21:24     ` Jeff King
  1 sibling, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2015-04-21 19:02 UTC (permalink / raw)
  To: erik elfström; +Cc: Thomas Gummerer, Git List

erik elfström <erik.elfstrom@gmail.com> writes:

> Ok, thanks for looking into this.
>
> I have no well founded opinions on the implementation but I do
> think the performance tests would be more meaningful if the
> setup/cleanup code could be removed from the timed section.
> If the community agrees on an implementation I would be happy
> to convert the new tests, either directly in this series or as a follow
> up if that is preferred.

Let's not delay the fix and do the perf thing as a follow-up series,
possibly an even independent one.

In other words, let's keep the topics small.

>
> /Erik
>
> On Tue, Apr 21, 2015 at 12:14 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> On 04/18, Erik Elfström wrote:
>>> * Still have issues in the performance tests, see comments
>>>   from Thomas Gummerer on v2
>>
>> I've looked at the "modern" style tests again, and I don't the code
>> churn is worth it just for using them for the performance tests.  If
>> anyone wants to take a look at the code, it's at
>> github.com/tgummerer/git tg/perf-lib.
>>
>> I think adding the test_perf_setup_cleanup command would make more
>> sense in this case.  If you want I can send a patch for that.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH/RFC v3 0/4] Improving performance of git clean
  2015-04-21 18:21   ` erik elfström
  2015-04-21 19:02     ` Junio C Hamano
@ 2015-04-21 21:24     ` Jeff King
  2015-04-22 19:30       ` erik elfström
  1 sibling, 1 reply; 14+ messages in thread
From: Jeff King @ 2015-04-21 21:24 UTC (permalink / raw)
  To: erik elfström; +Cc: Thomas Gummerer, Git List

On Tue, Apr 21, 2015 at 08:21:37PM +0200, erik elfström wrote:

> Ok, thanks for looking into this.
> 
> I have no well founded opinions on the implementation but I do
> think the performance tests would be more meaningful if the
> setup/cleanup code could be removed from the timed section.
> If the community agrees on an implementation I would be happy
> to convert the new tests, either directly in this series or as a follow
> up if that is preferred.

If I understand correctly, the reason that you need per-run setup is
that your "git clean" command actually cleans things, and you need to
restore the original state for each time-trial. Can you instead use "git
clean -n" to do a dry-run? I think what you are timing is really the
"figure out what to clean" step, and not the cleaning itself.

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH/RFC v3 0/4] Improving performance of git clean
  2015-04-21 21:24     ` Jeff King
@ 2015-04-22 19:30       ` erik elfström
  2015-04-22 19:46         ` Jeff King
  0 siblings, 1 reply; 14+ messages in thread
From: erik elfström @ 2015-04-22 19:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Thomas Gummerer, Git List

On Tue, Apr 21, 2015 at 11:24 PM, Jeff King <peff@peff.net> wrote:
>
> If I understand correctly, the reason that you need per-run setup is
> that your "git clean" command actually cleans things, and you need to
> restore the original state for each time-trial. Can you instead use "git
> clean -n" to do a dry-run? I think what you are timing is really the
> "figure out what to clean" step, and not the cleaning itself.
>
> -Peff


Yes, that is the problem. A dry run will spot this particular performance
issue but maybe we lose some value as a general performance test if
we only do "half" the clean? Admittedly we clearly lose some value in
the current state as well due to the copying taking more time than the
cleaning. I could go either way here.

/Erik

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH/RFC v3 0/4] Improving performance of git clean
  2015-04-22 19:30       ` erik elfström
@ 2015-04-22 19:46         ` Jeff King
  2015-04-22 19:53           ` erik elfström
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff King @ 2015-04-22 19:46 UTC (permalink / raw)
  To: erik elfström; +Cc: Thomas Gummerer, Git List

On Wed, Apr 22, 2015 at 09:30:20PM +0200, erik elfström wrote:

> On Tue, Apr 21, 2015 at 11:24 PM, Jeff King <peff@peff.net> wrote:
> >
> > If I understand correctly, the reason that you need per-run setup is
> > that your "git clean" command actually cleans things, and you need to
> > restore the original state for each time-trial. Can you instead use "git
> > clean -n" to do a dry-run? I think what you are timing is really the
> > "figure out what to clean" step, and not the cleaning itself.
> 
> Yes, that is the problem. A dry run will spot this particular performance
> issue but maybe we lose some value as a general performance test if
> we only do "half" the clean? Admittedly we clearly lose some value in
> the current state as well due to the copying taking more time than the
> cleaning. I could go either way here.

I guess it is a matter of opinion. I think testing only the "find out
what to clean" half separately is actually beneficial, because it helps
us isolate any slowdown. If we want to add a test for the other half, we
can, but I do not actually think it is currently that interesting (it is
just calling unlink() in a loop).

So even leaving the practical matters aside, I do not think it is a bad
thing to split it up. When you add in the fact that it is practically
much easier to test the first half, it seems to me that testing just
that is a good first step.

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH/RFC v3 0/4] Improving performance of git clean
  2015-04-22 19:46         ` Jeff King
@ 2015-04-22 19:53           ` erik elfström
  0 siblings, 0 replies; 14+ messages in thread
From: erik elfström @ 2015-04-22 19:53 UTC (permalink / raw)
  To: Jeff King; +Cc: Thomas Gummerer, Git List

On Wed, Apr 22, 2015 at 9:46 PM, Jeff King <peff@peff.net> wrote:
> On Wed, Apr 22, 2015 at 09:30:20PM +0200, erik elfström wrote:
>>
>> Yes, that is the problem. A dry run will spot this particular performance
>> issue but maybe we lose some value as a general performance test if
>> we only do "half" the clean? Admittedly we clearly lose some value in
>> the current state as well due to the copying taking more time than the
>> cleaning. I could go either way here.
>
> I guess it is a matter of opinion. I think testing only the "find out
> what to clean" half separately is actually beneficial, because it helps
> us isolate any slowdown. If we want to add a test for the other half, we
> can, but I do not actually think it is currently that interesting (it is
> just calling unlink() in a loop).
>
> So even leaving the practical matters aside, I do not think it is a bad
> thing to split it up. When you add in the fact that it is practically
> much easier to test the first half, it seems to me that testing just
> that is a good first step.
>
> -Peff

Sounds reasonable to me. I'll make this change in v4, thanks!

(Sorry for the duplicate email Jeff, I'm bad at this mailing list thing...)

/Erik

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-04-22 19:53 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-18 20:41 [PATCH/RFC v3 0/4] Improving performance of git clean Erik Elfström
2015-04-18 20:41 ` [PATCH/RFC v3 1/4] setup: add gentle version of read_gitfile Erik Elfström
2015-04-18 20:41 ` [PATCH/RFC v3 2/4] t7300: add tests to document behavior of clean and nested git Erik Elfström
2015-04-18 20:41 ` [PATCH/RFC v3 3/4] p7300: add performance tests for clean Erik Elfström
2015-04-18 20:41 ` [PATCH/RFC v3 4/4] clean: improve performance when removing lots of directories Erik Elfström
2015-04-19  1:14 ` [PATCH/RFC v3 0/4] Improving performance of git clean Junio C Hamano
2015-04-21 18:17   ` erik elfström
2015-04-20 22:14 ` Thomas Gummerer
2015-04-21 18:21   ` erik elfström
2015-04-21 19:02     ` Junio C Hamano
2015-04-21 21:24     ` Jeff King
2015-04-22 19:30       ` erik elfström
2015-04-22 19:46         ` Jeff King
2015-04-22 19:53           ` erik elfström

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).