* [PATCH RESEND] Introduce core.keepHardLinks
From: Johannes Schindelin @ 2009-01-02 18:07 UTC (permalink / raw)
To: git, gitster
When a tracked file was hard linked, we used to break the hard link
whenever Git writes to that file.
In some situations, this behavior is less-than-desirable, especially
given the fact that some popular editors do not do that, such as
(in alphabetical order) emacs and vi.
So teach Git not to break hard links when the config variable
core.keepHardLinks is set to true. For backwards compatibility, this
variable defaults to false.
>From a safety viewpoint, nothing really changes, as to keep hard links,
Git will now open the files it updates with O_TRUNC instead of deleting
them first and then opening them with O_EXCL.
To keep the implementation simple, mode changes will still break the
hard links.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
Only the commit message was modified since last time, to
provide a better justification for the patch.
Documentation/config.txt | 4 +++
cache.h | 1 +
config.c | 4 +++
entry.c | 7 ++++-
environment.c | 1 +
t/t0056-hardlinked.sh | 58 ++++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 73 insertions(+), 2 deletions(-)
create mode 100644 t/t0056-hardlinked.sh
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 7408bb2..086d8a4 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -209,6 +209,10 @@ core.symlinks::
file. Useful on filesystems like FAT that do not support
symbolic links. True by default.
+core.keepHardLinks::
+ If true, do not break hard links by deleting and recreating the
+ files. Off by default.
+
core.gitProxy::
A "proxy command" to execute (as 'command host port') instead
of establishing direct connection to the remote server when
diff --git a/cache.h b/cache.h
index 231c06d..a9c5812 100644
--- a/cache.h
+++ b/cache.h
@@ -538,6 +538,7 @@ enum rebase_setup_type {
extern enum branch_track git_branch_track;
extern enum rebase_setup_type autorebase;
+extern int keep_hard_links;
#define GIT_REPO_VERSION 0
extern int repository_format_version;
diff --git a/config.c b/config.c
index 790405a..8ff2b4b 100644
--- a/config.c
+++ b/config.c
@@ -492,6 +492,10 @@ static int git_default_core_config(const char *var, const char *value)
if (!strcmp(var, "core.preloadindex")) {
core_preload_index = git_config_bool(var, value);
+ }
+
+ if (!strcmp(var, "core.keephardlinks")) {
+ keep_hard_links = git_config_bool(var, value);
return 0;
}
diff --git a/entry.c b/entry.c
index aa2ee46..dfddf83 100644
--- a/entry.c
+++ b/entry.c
@@ -82,7 +82,8 @@ static void remove_subtree(const char *path)
static int create_file(const char *path, unsigned int mode)
{
mode = (mode & 0100) ? 0777 : 0666;
- return open(path, O_WRONLY | O_CREAT | O_EXCL, mode);
+ return open(path, O_WRONLY | O_CREAT |
+ (keep_hard_links ? O_TRUNC : O_EXCL), mode);
}
static void *read_blob_entry(struct cache_entry *ce, const char *path, unsigned long *size)
@@ -225,7 +226,9 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *t
if (!state->force)
return error("%s is a directory", path);
remove_subtree(path);
- } else if (unlink(path))
+ } else if ((!keep_hard_links || !S_ISREG(st.st_mode) ||
+ st.st_mode != ce->ce_mode) &&
+ unlink(path))
return error("unable to unlink old '%s' (%s)", path, strerror(errno));
} else if (state->not_new)
return 0;
diff --git a/environment.c b/environment.c
index e278bce..fc91809 100644
--- a/environment.c
+++ b/environment.c
@@ -42,6 +42,7 @@ enum safe_crlf safe_crlf = SAFE_CRLF_WARN;
unsigned whitespace_rule_cfg = WS_DEFAULT_RULE;
enum branch_track git_branch_track = BRANCH_TRACK_REMOTE;
enum rebase_setup_type autorebase = AUTOREBASE_NEVER;
+int keep_hard_links = 0;
/* Parallel index stat data preload? */
int core_preload_index = 0;
diff --git a/t/t0056-hardlinked.sh b/t/t0056-hardlinked.sh
new file mode 100644
index 0000000..934c2bc
--- /dev/null
+++ b/t/t0056-hardlinked.sh
@@ -0,0 +1,58 @@
+#!/bin/sh
+
+test_description='read-tree and checkout respect hardlinked files'
+
+. ./test-lib.sh
+
+cat > file << EOF
+1. Nf3 Nf6 2. c4 g6 3. Nc3 Bg7 4. d4 O-O 5. Bf4 d5 6. Qb3 dxc4 7. Qxc4 c6
+8. e4 Nbd7 9. Rd1 Nb6 10. Qc5 Bg4 11. Bg5 Na4 12. Qa3 Nxc3 13. bxc3 Nxe4
+14. Bxe7 Qb6 15. Bc4 Nxc3 16. Bc5 Rfe8+ 17. Kf1 Be6 18. Bxb6 Bxc4+ 19. Kg1 Ne2+
+20. Kf1 Nxd4+ 21. Kg1 Ne2+ 22. Kf1 Nc3+ 23. Kg1 axb6 24. Qb4 Ra4 25. Qxb6 Nxd1
+26. h3 Rxa2 27. Kh2 Nxf2 28. Re1 Rxe1 29. Qd8+ Bf8 30. Nxe1 Bd5 31. Nf3 Ne4
+32. Qb8 b5 33. h4 h5 34. Ne5 Kg7 35. Kg1 Bc5+ 36. Kf1 Ng3+ 37. Ke1 Bb4+
+38. Kd1 Bb3+ 39. Kc1 Ne2+ 40. Kb1 Nc3+
+EOF
+
+ln file link || {
+ say "Could not hard link; skipping test"
+ test_done
+ exit
+}
+
+test_expect_success setup '
+
+ git config core.keepHardLinks true &&
+ test_cmp file link &&
+ cp file old &&
+ git add file &&
+ test_tick &&
+ git commit -m initial &&
+ echo "41. Kc1 Rc2#" >> file &&
+ git add file &&
+ test_tick &&
+ git commit -m 2nd &&
+ test_cmp file link &&
+ ! test_cmp file old
+
+'
+
+test_expect_success 'checking a file out does not break the hard link' '
+
+ git checkout HEAD^ -- file &&
+ test_cmp file link &&
+ test_cmp file old
+
+'
+
+test_expect_success 'read-tree -u -m does not break the hard link' '
+
+ git reset --hard &&
+ test_cmp file link &&
+ git read-tree -u -m HEAD^ &&
+ test_cmp file link &&
+ test_cmp file old
+
+'
+
+test_done
--
1.6.1.rc3.224.g95ac9
^ permalink raw reply related
* [CLEANUP PATCH RESEND] git wrapper: Make while loop more reader-friendly
From: Johannes Schindelin @ 2009-01-02 18:07 UTC (permalink / raw)
To: git, gitster
It is not a good practice to prefer performance over readability in
something as performance uncritical as finding the trailing slash
of argv[0].
So avoid head-scratching by making the loop user-readable, and not
hyper-performance-optimized.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
git.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/git.c b/git.c
index 940a498..e0d9071 100644
--- a/git.c
+++ b/git.c
@@ -428,9 +428,8 @@ int main(int argc, const char **argv)
* name, and the dirname as the default exec_path
* if we don't have anything better.
*/
- do
- --slash;
- while (cmd <= slash && !is_dir_sep(*slash));
+ while (cmd <= slash && !is_dir_sep(*slash))
+ slash--;
if (cmd <= slash) {
*slash++ = 0;
git_set_argv0_path(cmd);
--
1.6.1.rc3.224.g95ac9
^ permalink raw reply related
* [TOY PATCH] diffcore-rename: replace basename_same() heuristics by levenshtein
From: Johannes Schindelin @ 2009-01-02 18:07 UTC (permalink / raw)
To: git
When the environment variable GIT_USE_LEVENSHTEIN is set, the rename
score is reduced by the Damerau-Levenshtein distance. This should
lead to a little more intuitive rename scores.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
I have this in my personal tree, but I did not really use it.
Maybe others want to play with it.
diffcore-rename.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 168a95b..1f4b371 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -5,6 +5,7 @@
#include "diff.h"
#include "diffcore.h"
#include "hash.h"
+#include "levenshtein.h"
/* Table of rename/copy destinations */
@@ -187,7 +188,9 @@ static int estimate_similarity(struct diff_filespec *src,
if (!dst->size)
score = 0; /* should not happen */
else
- score = (int)(src_copied * MAX_SCORE / max_size);
+ score = (int)(src_copied * MAX_SCORE / max_size)
+ - (getenv("GIT_USE_LEVENSHTEIN") ?
+ levenshtein(src->path, dst->path, 1, 1, 1, 1) : 0);
return score;
}
--
1.6.1.rc3.224.g95ac9
^ permalink raw reply related
* [PATCH] bisect view: call gitk if Cygwin's SESSIONNAME variable is set
From: Johannes Schindelin @ 2009-01-02 18:08 UTC (permalink / raw)
To: git, gitster
It seems that Cygwin sets the variable SESSIONNAME when an interactive
desktop session is running, and does not set it when you log in via ssh.
So we can use this variable to determine whether to run gitk or git log
in git bisect view.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
git-bisect.sh | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/git-bisect.sh b/git-bisect.sh
index 17a35f6..85db4ba 100755
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -508,7 +508,7 @@ bisect_visualize() {
if test $# = 0
then
- case "${DISPLAY+set}${MSYSTEM+set}${SECURITYSESSIONID+set}" in
+ case "${DISPLAY+set}${SESSIONNAME+set}${MSYSTEM+set}${SECURITYSESSIONID+set}" in
'') set git log ;;
set*) set gitk ;;
esac
--
1.6.1.rc3.224.g95ac9
^ permalink raw reply related
* [RESEND PATCH] git add: do not add files from a submodule
From: Johannes Schindelin @ 2009-01-02 18:08 UTC (permalink / raw)
To: git, gitster
It comes quite as a surprise to an unsuspecting Git user that calling
"git add submodule/file" (which is a mistake, alright) _removes_
the submodule in the index, and adds the file. Instead, complain loudly.
While at it, be nice when the user said "git add submodule/" which is
most likely the consequence of tab-completion, and stage the submodule,
instead of trying to add the contents of that directory.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
Okay, so I let this slip for way too long. AFAIR I waited for
a few patches to go into 'next', and it was in an -rc cycle.
Happily, the patches seem to be in, and we're out of -rc.
Looking over the patch again, I do not find anything obviously
wrong, but certainly somebody will.
builtin-add.c | 28 ++++++++++++++++++++++++++++
t/t7400-submodule-basic.sh | 25 +++++++++++++++++++++++++
2 files changed, 53 insertions(+), 0 deletions(-)
diff --git a/builtin-add.c b/builtin-add.c
index 719de8b..ac98c83 100644
--- a/builtin-add.c
+++ b/builtin-add.c
@@ -68,6 +68,33 @@ static void prune_directory(struct dir_struct *dir, const char **pathspec, int p
free(seen);
}
+static void treat_gitlinks(const char **pathspec)
+{
+ int i;
+
+ if (!pathspec || !*pathspec)
+ return;
+
+ for (i = 0; i < active_nr; i++) {
+ struct cache_entry *ce = active_cache[i];
+ if (S_ISGITLINK(ce->ce_mode)) {
+ int len = ce_namelen(ce), j;
+ for (j = 0; pathspec[j]; j++) {
+ int len2 = strlen(pathspec[j]);
+ if (len2 <= len || pathspec[j][len] != '/' ||
+ memcmp(ce->name, pathspec[j], len))
+ continue;
+ if (len2 == len + 1)
+ /* strip trailing slash */
+ pathspec[j] = xstrndup(ce->name, len);
+ else
+ die ("Path '%s' is in submodule '%.*s'",
+ pathspec[j], len, ce->name);
+ }
+ }
+ }
+}
+
static void fill_directory(struct dir_struct *dir, const char **pathspec,
int ignored_too)
{
@@ -261,6 +288,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
if (read_cache() < 0)
die("index file corrupt");
+ treat_gitlinks(pathspec);
if (add_new_files)
/* This picks up the paths that are not tracked */
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index be73f7b..2ec7ac6 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -209,4 +209,29 @@ test_expect_success 'update --init' '
'
+test_expect_success 'do not add files from a submodule' '
+
+ git reset --hard &&
+ test_must_fail git add init/a
+
+'
+
+test_expect_success 'gracefully add submodule with a trailing slash' '
+
+ git reset --hard &&
+ git commit -m "commit subproject" init &&
+ (cd init &&
+ echo b > a) &&
+ git add init/ &&
+ git diff --exit-code --cached init &&
+ commit=$(cd init &&
+ git commit -m update a >/dev/null &&
+ git rev-parse HEAD) &&
+ git add init/ &&
+ test_must_fail git diff --exit-code --cached init &&
+ test $commit = $(git ls-files --stage |
+ sed -n "s/^160000 \([^ ]*\).*/\1/p")
+
+'
+
test_done
--
1.6.1.rc3.224.g95ac9
^ permalink raw reply related
* [PATCH] bundle: allow rev-list options to exclude annotated tags
From: Johannes Schindelin @ 2009-01-02 18:08 UTC (permalink / raw)
To: git, gitster
With options such as "--all --since=2.weeks.ago", annotated tags used to
be included, when they should have been excluded. The reason is that we
heavily abuse the revision walker to determine what needs to be included
or excluded. And the revision walker does not show tags at all (and
therefore never marks tags as uninteresting).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
bundle.c | 32 ++++++++++++++++++++++++++++++++
t/t5704-bundle.sh | 33 +++++++++++++++++++++++++++++++++
2 files changed, 65 insertions(+), 0 deletions(-)
create mode 100755 t/t5704-bundle.sh
diff --git a/bundle.c b/bundle.c
index daecd8e..4977962 100644
--- a/bundle.c
+++ b/bundle.c
@@ -167,6 +167,32 @@ int list_bundle_refs(struct bundle_header *header, int argc, const char **argv)
return list_refs(&header->references, argc, argv);
}
+static int is_tag_in_date_range(struct object *tag, struct rev_info *revs)
+{
+ unsigned long size;
+ enum object_type type;
+ char *buf, *line, *lineend;
+ unsigned long date;
+
+ if (revs->max_age == -1 && revs->min_age == -1)
+ return 1;
+
+ buf = read_sha1_file(tag->sha1, &type, &size);
+ if (!buf)
+ return 1;
+ line = memmem(buf, size, "\ntagger ", 8);
+ if (!line++)
+ return 1;
+ lineend = memchr(line, buf + size - line, '\n');
+ line = memchr(line, lineend ? lineend - line : buf + size - line, '>');
+ if (!line++)
+ return 1;
+ date = strtoul(line, NULL, 10);
+ free(buf);
+ return (revs->max_age == -1 || revs->max_age < date) &&
+ (revs->min_age == -1 || revs->min_age > date);
+}
+
int create_bundle(struct bundle_header *header, const char *path,
int argc, const char **argv)
{
@@ -255,6 +281,12 @@ int create_bundle(struct bundle_header *header, const char *path,
flag = 0;
display_ref = (flag & REF_ISSYMREF) ? e->name : ref;
+ if (e->item->type == OBJ_TAG &&
+ !is_tag_in_date_range(e->item, &revs)) {
+ e->item->flags |= UNINTERESTING;
+ continue;
+ }
+
/*
* Make sure the refs we wrote out is correct; --max-count and
* other limiting options could have prevented all the tips
diff --git a/t/t5704-bundle.sh b/t/t5704-bundle.sh
new file mode 100755
index 0000000..a8f4419
--- /dev/null
+++ b/t/t5704-bundle.sh
@@ -0,0 +1,33 @@
+#!/bin/sh
+
+test_description='some bundle related tests'
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+
+ : > file &&
+ git add file &&
+ test_tick &&
+ git commit -m initial &&
+ test_tick &&
+ git tag -m tag tag &&
+ : > file2 &&
+ git add file2 &&
+ : > file3 &&
+ test_tick &&
+ git commit -m second &&
+ git add file3 &&
+ test_tick &&
+ git commit -m third
+
+'
+
+test_expect_success 'tags can be excluded by rev-list options' '
+
+ git bundle create bundle --all --since=7.Apr.2005.15:16:00.-0700 &&
+ git ls-remote bundle > output &&
+ ! grep tag output
+
+'
+
+test_done
--
1.6.1.rc3.224.g95ac9
^ permalink raw reply related
* [CLEANUP PATCH] show <tag>: reuse pp_user_info() instead of duplicating code
From: Johannes Schindelin @ 2009-01-02 18:08 UTC (permalink / raw)
To: git, gitster
We used to extract the tagger information "by hand" in "git show <tag>",
but the function pp_user_info() already does that. Even better:
it respects the commit_format and date_format specified by the user.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
Just a little cleanup, as I tripped over that part of Git's source.
builtin-log.c | 21 ++++++---------------
1 files changed, 6 insertions(+), 15 deletions(-)
diff --git a/builtin-log.c b/builtin-log.c
index 99d1137..bc4e1e9 100644
--- a/builtin-log.c
+++ b/builtin-log.c
@@ -249,22 +249,13 @@ int cmd_whatchanged(int argc, const char **argv, const char *prefix)
static void show_tagger(char *buf, int len, struct rev_info *rev)
{
- char *email_end, *p;
- unsigned long date;
- int tz;
+ struct strbuf out = STRBUF_INIT;
- email_end = memchr(buf, '>', len);
- if (!email_end)
- return;
- p = ++email_end;
- while (isspace(*p))
- p++;
- date = strtoul(p, &p, 10);
- while (isspace(*p))
- p++;
- tz = (int)strtol(p, NULL, 10);
- printf("Tagger: %.*s\nDate: %s\n", (int)(email_end - buf), buf,
- show_date(date, tz, rev->date_mode));
+ pp_user_info("Tagger", rev->commit_format, &out, buf, rev->date_mode,
+ git_log_output_encoding ?
+ git_log_output_encoding: git_commit_encoding);
+ printf("%s\n", out.buf);
+ strbuf_release(&out);
}
static int show_object(const unsigned char *sha1, int show_tag_object,
--
1.6.1.rc3.224.g95ac9
^ permalink raw reply related
* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Johannes Schindelin @ 2009-01-02 18:17 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Pierre Habouzit, davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.LFD.2.00.0901011151440.5086@localhost.localdomain>
Hi,
On Thu, 1 Jan 2009, Linus Torvalds wrote:
> On Thu, 1 Jan 2009, Linus Torvalds wrote:
> >
> > So could we have some actual real data on it?
>
> .. and some testing. I tried to get some limited data for the kernel
> myself, by doing
>
> git log --patience -p v2.6.28.. > ~/patience
>
> but I just got a core-dump instead.
>
> Pinpointing it to a specific commit shows a smaller failure case:
>
> git show -p --patience 05d564fe00c05bf8ff93948057ca1acb5bc68e10
>
> which might help you debug this.
Thanks. I am on it. valgrind finds an earlier place in
xdl_change_compact() which I think is rather more sensible, but at the
same time a bit worrisome, too, as I did not expect any errors _that_
late in the game (I did not touch that code).
BTW the "-p" is not necessary with "show", indeed, you cannot even switch
it off.
Ciao,
Dscho
^ permalink raw reply
* Re: [CLEANUP PATCH RESEND] git wrapper: Make while loop more reader-friendly
From: Boyd Stephen Smith Jr. @ 2009-01-02 18:28 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, gitster
In-Reply-To: <alpine.DEB.1.00.0901021240270.27818@racer>
[-- Attachment #1: Type: text/plain, Size: 969 bytes --]
On Friday 2009 January 02 12:07:52 you wrote:
> It is not a good practice to prefer performance over readability in
> something as performance uncritical as finding the trailing slash
> of argv[0].
>
> So avoid head-scratching by making the loop user-readable, and not
> hyper-performance-optimized.
> - do
> - --slash;
> - while (cmd <= slash && !is_dir_sep(*slash));
> + while (cmd <= slash && !is_dir_sep(*slash))
> + slash--;
What confused people? The predecrement or the do/while? Should people that
don't understand one of those be hacking on git?
That said, I'm not opposed to the patch. It is easier on the eyes, though I
prefer the one-liner:
for (; cmd <= slash && !is_dir_sep(*slash); --slash);
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss@iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply
* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Johannes Schindelin @ 2009-01-02 18:46 UTC (permalink / raw)
To: Linus Torvalds
Cc: Clemens Buchacher, Adeodato Simó, Pierre Habouzit, davidel,
Francis Galiegue, Git ML
In-Reply-To: <alpine.LFD.2.00.0901020833000.5086@localhost.localdomain>
Hi,
On Fri, 2 Jan 2009, Linus Torvalds wrote:
> On Fri, 2 Jan 2009, Clemens Buchacher wrote:
> >
> > Only two choices, and I still get it wrong. The diffs should be
> > labeled the other way around, of course.
>
> Yes, this one is a real patience diff change, but it's also the same one
> that I've seen in the google fanboi findings. What google did _not_ show
> was any real-life examples, or anybody doing any critical analysis.
FWIW it's the test case in the commit introducing the --patience option.
> So I was hoping for something else than a single "in this case patience
> diff works really well". I was hoping to see what it does in real life.
I will dig out a real-world example where I _know_ patience diff would
have helped. (I remember that I rewrote a pretty large diff which was
sent on this list, only to understand what it actually does, and I am
pretty certain this is a good real-world showcase.)
But yes, I agree, the thing does not matter _all_ that much in reality.
The case where I expect a real improvement is when you modify a function
and insert a function just before it, and Myers' algorithm matches mainly
empty lines and lines ending in curly brackets.
In other words: something I tried to tackle with ee95ec5d(xdl_merge():
introduce XDL_MERGE_ZEALOUS_ALNUM) for merges.
The typical look of such a diff is something like this:
-<... some function header ...>
+<... a completely different function header ...>
{
- <... variables ...>
+ <... other variables ...>
for (i = 0; i < 10; i++) {
- <... some code ...>
+ <... some code doing something completely different ...>
}
return 0;
}
+<... the function header which was removed earlier ...>
+{
<... refactored _and also reindented_ code ...>
> And I haven't seen _any_ real critical analysis of it. Anywhere.
Neither have I. Let alone something close to documentation.
For example, when the "patience diff algorithm" is explained, it looks
more like a longest common sequence algorithm when the input is already
sorted in the first item.
Further, there is no rigorous analysis of the runtime (I figured that the
original runtime is O(nm) where "n" is the number of lines and "m" is the
length of the maximal ordered sequence of common unique lines, and my
implementation can only improve that to O(n log(m))).
This could be improved, I think, for the most common case where you have
pretty long common _continuous_ sequences of unique lines, i.e. large
ranges of lines that are identical.
The runtime is especially horrible in the light of the runtime of Myers'
algorithm, which uses O(n d), where d is the edit distance, i.e. the
number of lines added + number of lines removed. (Note: in the real
world, there are substantial speed ups for consecutive edits, i.e. line
ranges where there are no common lines at all.)
Also, I am less than thrilled by the job the fanbois did coming up with
"convincing" evidence: exactly as you pointed out, there are _no_
real-world examples where it helps.
And the worst part: one can only _guess_ what motivated patience diff. I
imagine it came from the observation that function headers are unique, and
that you usually want to preserve as much context around them.
Ciao,
Dscho
^ permalink raw reply
* Re: [CLEANUP PATCH RESEND] git wrapper: Make while loop more reader-friendly
From: Johannes Schindelin @ 2009-01-02 18:49 UTC (permalink / raw)
To: Boyd Stephen Smith Jr.; +Cc: git, gitster
In-Reply-To: <200901021228.07599.bss@iguanasuicide.net>
Hi,
On Fri, 2 Jan 2009, Boyd Stephen Smith Jr. wrote:
> On Friday 2009 January 02 12:07:52 you wrote:
> > It is not a good practice to prefer performance over readability in
> > something as performance uncritical as finding the trailing slash
> > of argv[0].
> >
> > So avoid head-scratching by making the loop user-readable, and not
> > hyper-performance-optimized.
>
> > - do
> > - --slash;
> > - while (cmd <= slash && !is_dir_sep(*slash));
> > + while (cmd <= slash && !is_dir_sep(*slash))
> > + slash--;
>
> What confused people? The predecrement or the do/while? Should people that
> don't understand one of those be hacking on git?
>
> That said, I'm not opposed to the patch. It is easier on the eyes, though I
> prefer the one-liner:
> for (; cmd <= slash && !is_dir_sep(*slash); --slash);
As I mentioned in the commit message: readability is something to be
cherished and worshipped.
For your pleasure, I will not go into details about the motions my bowels
went through when I looked at those three lines. Or your single line, for
that matter.
Ciao,
Dscho
^ permalink raw reply
* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Linus Torvalds @ 2009-01-02 18:49 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Pierre Habouzit, davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901021914420.30769@pacific.mpi-cbg.de>
On Fri, 2 Jan 2009, Johannes Schindelin wrote:
>
> BTW the "-p" is not necessary with "show", indeed, you cannot even switch
> it off.
I was just switching back-and-forth between "git log" and "git show" so
the -p came from just that, and is not necessary.
And you _can_ suppress the patch generation - use "-s".
Linus
^ permalink raw reply
* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Adeodato Simó @ 2009-01-02 18:50 UTC (permalink / raw)
To: Linus Torvalds
Cc: Johannes Schindelin, Pierre Habouzit, davidel, Francis Galiegue,
Git ML
In-Reply-To: <alpine.LFD.2.00.0901011747010.5086@localhost.localdomain>
* Linus Torvalds [Thu, 01 Jan 2009 17:56:13 -0800]:
> On Thu, 1 Jan 2009, Adeodato Simó wrote:
> > For me, the cases where I find patience output to be of substantial
> > higher readability are those involving a rewrite of several consecutive
> > paragraphs (i.e., lines of code separated by blank lines). Compare:
> I don't think that's a "patience diff" issue.
Ah, I see.
> That's simply an issue of merging consecutive diff fragments together if
> they are close-by, and that's independent of the actual diff algorithm
> itself.
> > I'll note that in this particular case, `git diff` yielded the very same
> > results with or without --patience. I don't know why that is, Johannes?
> > I'll also note that /usr/bin/diff produces (in this case) something
> > closer to patience than to git.
> See above - I really don't think this has anything to do with "patience vs
> non-patience". It's more akin to the things we do for our merge conflict
> markers: if we have two merge conflicts next to each other, with just a
> couple of lines in between, we coalesce the merge conflicts into one
> larger one instead.
> We don't do that for regular diffs - they're always kept minimal (ok, not
> really minimal, but as close to minimal as the algorithm finds them).
> See commit f407f14deaa14ebddd0d27238523ced8eca74393 for the git merge
> conflict merging. We _could_ do similar things for regular diffs. It's
> sometimes useful, sometimes not.
Independently of patience diff, then, I'd very much support changes to
improve the diff I pasted.
--
Adeodato Simó dato at net.com.org.es
Debian Developer adeodato at debian.org
Debugging is twice as hard as writing the code in the first place. Therefore,
if you write the code as cleverly as possible, you are, by definition, not
smart enough to debug it.
-- Brian W. Kernighan
^ permalink raw reply
* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Jeff King @ 2009-01-02 18:51 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
In-Reply-To: <alpine.DEB.1.00.0901021914420.30769@pacific.mpi-cbg.de>
On Fri, Jan 02, 2009 at 07:17:34PM +0100, Johannes Schindelin wrote:
> BTW the "-p" is not necessary with "show", indeed, you cannot even switch
> it off.
Half true:
$ git grep -A1 '"-s"' diff.c
diff.c: else if (!strcmp(arg, "-s"))
diff.c- options->output_format |= DIFF_FORMAT_NO_OUTPUT;
-Peff
^ permalink raw reply
* Re: why still no empty directory support in git
From: Johannes Schindelin @ 2009-01-02 18:55 UTC (permalink / raw)
To: Jeff King; +Cc: Asheesh Laroia, Git Mailing List
In-Reply-To: <20090101200651.GB6536@coredump.intra.peff.net>
Hi,
On Thu, 1 Jan 2009, Jeff King wrote:
> On Tue, Dec 30, 2008 at 03:58:46AM -0500, Asheesh Laroia wrote:
>
> > So, let's say I take your suggestion.
> >
> > $ touch ~/Maildir/new/.exists
> > $ git add ~/Maildir/new/.exists && git commit -m "La di da"
> >
> > Now a spec-compliant Maildir user agent will attempt to deliver this new
> > "email message" of zero bytes into the mail spool and assign it a message
> > UID. Doing so will remove it from Maildir/new.
>
> No. The maildir spec says:
>
> A unique name can be anything that doesn't contain a colon (or slash)
> and doesn't start with a dot.
> -- http://cr.yp.to/proto/maildir.html
>
> where a "unique name" is the filename used for a message. In practice,
> every maildir implementation I have seen ignores files starting with a
> dot. Do you have one that doesn't?
For the record, I am using Git to manage my mails, and never had any
problems after installing a hook which marks new empty directories with
.gitignore.
Ciao,
Dscho
^ permalink raw reply
* Re: [PATCH v2 1/3] rebase: learn to rebase root commit
From: Johannes Schindelin @ 2009-01-02 18:58 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Thomas Rast, git
In-Reply-To: <7v4p0iivwh.fsf@gitster.siamese.dyndns.org>
Hi,
On Thu, 1 Jan 2009, Junio C Hamano wrote:
> Thomas Rast <trast@student.ethz.ch> writes:
>
> > Teach git-rebase a new option --root, which instructs it to rebase the
> > entire history leading up to <branch>.
> >
> > The main use-case is with git-svn: suppose you start hacking (perhaps
> > offline) on a new project, but later notice you want to commit this
> > work to SVN. You will have to rebase the entire history, including
> > the root commit, on a (possibly empty) commit coming from git-svn, to
> > establish a history connection. This previously had to be done by
> > cherry-picking the root commit manually.
>
> I like what this series tries to do. Using the --root option is probably
> a more natural way to do what people often do with the "add graft and
> filter-branch the whole history once" procedure.
>
> But it somewhat feels sad if the "main" use-case for this is to start your
> project in git and then migrate away by feeding your history to subversion
> ;-).
FWIW I had a single case where I could have used something like this
myself, in my whole life. It was when I started to write
git-edit-patch-series.sh in its own repository, only to realize at the end
that I should have started it in a topic branch in my git.git tree.
Ciao,
Dscho
^ permalink raw reply
* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Linus Torvalds @ 2009-01-02 19:03 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Clemens Buchacher, Adeodato Simó, Pierre Habouzit, davidel,
Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901021918100.30769@pacific.mpi-cbg.de>
On Fri, 2 Jan 2009, Johannes Schindelin wrote:
>
> FWIW it's the test case in the commit introducing the --patience option.
Well, it's also the test-case in the very first hit on google for
"patience diff" (with the quotes).
In fact, it's the _only_ one I ever found ;)
> And the worst part: one can only _guess_ what motivated patience diff. I
> imagine it came from the observation that function headers are unique, and
> that you usually want to preserve as much context around them.
Well, I do like the notion of giving more weight to unique lines - I think
it makes sense. That said, I suspect it would make almost as much sense to
give more weight simply to _longer_ lines, and I suspect the standard
Myers' algorithm could possibly be simply extended to take line size into
account when calculating the weights.
Because the problem with diffs for C doesn't really tend to be as much
about non-unique lines as about just _trivial_ lines that are mostly empty
or contain just braces etc. Those are quite arguably almost totally
worthless for equality testing.
And btw, don't get me wrong - I don't think there is anything wrong with
the patience diff. I think it's a worthy thing to try, and I'm not at all
arguing against it. However, I do think that the people arguing for it
often do so based on less-than-very-logical arguments, and it's entirely
possible that other approaches are better (eg the "weight by size" thing
rather than "weight by uniqueness").
The thing about unique lines is that there are no guarantees at all that
they even exist, so a uniqueness-based thing will always have to fall back
on anything else. That, to me, implies that the whole notion is somewhat
mis-designed: it's clearly not a generic concept.
(In contrast, taking the length of the matching lines into account would
not have that kind of bad special case)
Linus
^ permalink raw reply
* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Johannes Schindelin @ 2009-01-02 19:07 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Pierre Habouzit, davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.LFD.2.00.0901021045290.5086@localhost.localdomain>
Hi,
On Fri, 2 Jan 2009, Linus Torvalds wrote:
> On Fri, 2 Jan 2009, Johannes Schindelin wrote:
> >
> > BTW the "-p" is not necessary with "show", indeed, you cannot even
> > switch it off.
>
> I was just switching back-and-forth between "git log" and "git show" so
> the -p came from just that, and is not necessary.
>
> And you _can_ suppress the patch generation - use "-s".
Ah, another thing learnt.
Thanks,
Dscho
^ permalink raw reply
* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Johannes Schindelin @ 2009-01-02 19:22 UTC (permalink / raw)
To: Linus Torvalds
Cc: Clemens Buchacher, Adeodato Simó, Pierre Habouzit, davidel,
Francis Galiegue, Git ML
In-Reply-To: <alpine.LFD.2.00.0901021050450.5086@localhost.localdomain>
Hi,
On Fri, 2 Jan 2009, Linus Torvalds wrote:
> On Fri, 2 Jan 2009, Johannes Schindelin wrote:
>
> > And the worst part: one can only _guess_ what motivated patience diff.
> > I imagine it came from the observation that function headers are
> > unique, and that you usually want to preserve as much context around
> > them.
>
> Well, I do like the notion of giving more weight to unique lines - I think
> it makes sense. That said, I suspect it would make almost as much sense to
> give more weight simply to _longer_ lines, and I suspect the standard
> Myers' algorithm could possibly be simply extended to take line size into
> account when calculating the weights.
I think that it makes more sense with the common unique lines, because
they are _unambiguously_ common. That is, if possible, we would like to
keep them as common lines.
BTW this also opens the door for more automatic code movement detection.
As for the longer lines: what exactly would you want to put "weight" on?
The edit distance is the number of plusses and minusses, and this is the
thing that actually is pretty critical for the performance: the larger the
distance, the longer it takes. So if you want to put a different "weight"
on a line, i.e. something else than a "1", you are risking a substantially
worse performance.
And I am still not convinced that longer lines should get more weight. A
line starting with "exit:" can be much more important than 8 tabs followed
by a curly bracket.
> Because the problem with diffs for C doesn't really tend to be as much
> about non-unique lines as about just _trivial_ lines that are mostly empty
> or contain just braces etc. Those are quite arguably almost totally
> worthless for equality testing.
Oh, but then we get very C specific. Let's not go there.
> And btw, don't get me wrong - I don't think there is anything wrong with
> the patience diff. I think it's a worthy thing to try, and I'm not at
> all arguing against it.
I never took you to be opposed. I myself mainly implemented it because I
wanted to play with it, and have something more useful than what I found
in the whole wide web.
> However, I do think that the people arguing for it often do so based on
> less-than-very-logical arguments, and it's entirely possible that other
> approaches are better (eg the "weight by size" thing rather than "weight
> by uniqueness").
Actually, I think it is very possible that merging hunks if there are not
enough alnums between them could make a whole lot of sense.
But IIRC it was shot down and replaced by the not-so-specific-to-C
algorithm to merge hunks when the output is shorter than having separate
hunks.
> The thing about unique lines is that there are no guarantees at all that
> they even exist, so a uniqueness-based thing will always have to fall
> back on anything else. That, to me, implies that the whole notion is
> somewhat mis-designed: it's clearly not a generic concept.
>
> (In contrast, taking the length of the matching lines into account would
> not have that kind of bad special case)
However, we know that humans like to start from the unique features they
see, and continue from there.
And while it is quite possible that it is the wrong approach (see all that
security theater at the airport, and some nice rational analyses about the
cost/benefit equations there), imitating humans is still the thing that
will convince most humans.
Ciao,
Dscho
^ permalink raw reply
* Re: [PATCH 2/2] gitweb: support hiding projects from user-visible lists
From: Jakub Narebski @ 2009-01-02 19:33 UTC (permalink / raw)
To: Matt McCutchen; +Cc: git
In-Reply-To: <1230082831.2971.45.camel@localhost>
On Wed, 2008-12-24, Matt McCutchen wrote:
> On Sat, 2008-12-13 at 14:02 -0800, Jakub Narebski wrote:
> >
> > Cannot you do this with new $export_auth_hook gitweb configuration
> > variable, added by Alexander Gavrilov in
> > dd7f5f1 (gitweb: Add a per-repository authorization hook.)
> > It is used in check_export_ok subroutine, and is is checked also when
> > getting list of project from file
> >
> > From gitweb/INSTALL
[...]
> > For example, if you use mod_perl to run the script, and have dumb
> > http protocol authentication configured for your repositories, you
> > can use the following hook to allow access only if the user is
> > authorized to read the files:
[...]
> $export_auth_hook would work, and it would have the nice (but not
> essential) feature of including private projects in the list shown to
> suitably authenticated users. The only problem is that my Web host
> doesn't support mod_perl. Is there a practical way to accomplish the
> same thing as the above example in a CGI script? I would like to avoid
> reimplementing Apache authentication-checking functionality if at all
> possible.
I know it is written that the example code is for mod_perl, but I
don't think it is mod_perl specific; have you checked if it works
for you? I assume that you use Apache, and have Apache Perl bindings
installed...
--
Jakub Narebski
Poland
^ permalink raw reply
* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Jeff King @ 2009-01-02 19:39 UTC (permalink / raw)
To: Linus Torvalds
Cc: Johannes Schindelin, Clemens Buchacher, Adeodato Simó,
Pierre Habouzit, davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.LFD.2.00.0901021050450.5086@localhost.localdomain>
On Fri, Jan 02, 2009 at 11:03:07AM -0800, Linus Torvalds wrote:
> Well, it's also the test-case in the very first hit on google for
> "patience diff" (with the quotes).
>
> In fact, it's the _only_ one I ever found ;)
If you just want to see the results on some real-world cases (and don't
care about measuring performance), try installing bzr and using their
patiencediff test program as a GIT_EXTERNAL_DIFF.
On Debian, it's:
$ sudo apt-get install bzr
$ cat >$HOME/patience <<'EOF'
#!/bin/sh
exec python /usr/share/pyshared/bzrlib/patiencediff.py "$2" "$5"
EOF
$ chmod 755 patience
$ GIT_EXTERNAL_DIFF=$HOME/patience git diff
Other distributions presumably install patiencediff.py somewhere
similar (or you can maybe even pull it from the source, but presumably
you have to install some other bzr modules, too).
-Peff
^ permalink raw reply
* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Jeff King @ 2009-01-02 19:50 UTC (permalink / raw)
To: Linus Torvalds
Cc: Johannes Schindelin, Clemens Buchacher, Adeodato Simó,
Pierre Habouzit, davidel, Francis Galiegue, Git ML
In-Reply-To: <20090102193904.GB9129@coredump.intra.peff.net>
On Fri, Jan 02, 2009 at 02:39:04PM -0500, Jeff King wrote:
> If you just want to see the results on some real-world cases (and don't
> care about measuring performance), try installing bzr and using their
> patiencediff test program as a GIT_EXTERNAL_DIFF.
>
> On Debian, it's:
>
> $ sudo apt-get install bzr
> $ cat >$HOME/patience <<'EOF'
> #!/bin/sh
> exec python /usr/share/pyshared/bzrlib/patiencediff.py "$2" "$5"
> EOF
> $ chmod 755 patience
> $ GIT_EXTERNAL_DIFF=$HOME/patience git diff
For added fun, try this:
-- >8 --
#!/bin/sh
canonical() {
grep '^[ +-]' | egrep -v '^(---|\+\+\+)'
}
git rev-list --no-merges HEAD | while read rev; do
git diff-tree -p $rev^ $rev | canonical >git.out
GIT_EXTERNAL_DIFF=$HOME/patience git diff $rev^ $rev | canonical >bzr.out
echo "`diff -U0 git.out bzr.out | wc -l` $rev"
done
-- 8< --
I'm running it on git.git now. It looks like both algorithms return the
same patch for most cases. Some of the differences are interesting,
though.
For example, f83b9ba209's commit message indicates that it moves the
"--format-patch" paragraph. Which is what "git diff" shows. Patience
diff shows it as moving other text _around_ that paragraph.
-Peff
^ permalink raw reply
* "git reset --hard" == "git checkout HEAD" == "git reset --hard HEAD" ???
From: chris @ 2009-01-02 19:57 UTC (permalink / raw)
To: git
Does "git reset --hard" == "git checkout HEAD" == "git reset --hard HEAD" ???
It seems we have 2 ways to blow away work we haven't checked in yet then right?
chris
^ permalink raw reply
* Re: [CLEANUP PATCH RESEND] git wrapper: Make while loop more reader-friendly
From: Boyd Stephen Smith Jr. @ 2009-01-02 20:04 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, gitster
In-Reply-To: <alpine.DEB.1.00.0901021947170.30769@pacific.mpi-cbg.de>
[-- Attachment #1: Type: text/plain, Size: 1335 bytes --]
On Friday 2009 January 02 12:49:53 Johannes Schindelin wrote:
> > > - do
> > > - --slash;
> > > - while (cmd <= slash && !is_dir_sep(*slash));
> > > + while (cmd <= slash && !is_dir_sep(*slash))
> > > + slash--;
> >
> > I prefer the one-liner:
> > for (; cmd <= slash && !is_dir_sep(*slash); --slash);
>
> As I mentioned in the commit message: readability is something to be
> cherished and worshipped.
It's also subjective. I think my one-line is more readable than your two
lines which is only slightly more readable than the original 3 lines. Or is
there some objective readability metric that of which I'm just not aware?
I also think that the lack of braces around your body on a separate line makes
it harder to read and easier to break, but I understand that is the git
coding style.
> For your pleasure, I will not go into details about the motions my bowels
> went through when I looked at those three lines. Or your single line, for
> that matter.
Please do, although privately if you like. I really don't see the problem the
patch is trying to fix.
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss@iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply
* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Jeff King @ 2009-01-02 20:52 UTC (permalink / raw)
To: Linus Torvalds
Cc: Johannes Schindelin, Clemens Buchacher, Adeodato Simó,
Pierre Habouzit, davidel, Francis Galiegue, Git ML
In-Reply-To: <20090102195053.GA10876@coredump.intra.peff.net>
On Fri, Jan 02, 2009 at 02:50:53PM -0500, Jeff King wrote:
> For example, f83b9ba209's commit message indicates that it moves the
> "--format-patch" paragraph. Which is what "git diff" shows. Patience
> diff shows it as moving other text _around_ that paragraph.
Here's another interesting one: d592b315. The commit removes dashes
from git commands in test scripts. Git says:
echo "tag-one-line" >expect &&
- git-tag -l | grep "^tag-one-line" >actual &&
+ git tag -l | grep "^tag-one-line" >actual &&
test_cmp expect actual &&
- git-tag -n0 -l | grep "^tag-one-line" >actual &&
+ git tag -n0 -l | grep "^tag-one-line" >actual &&
test_cmp expect actual &&
- git-tag -n0 -l tag-one-line >actual &&
+ git tag -n0 -l tag-one-line >actual &&
test_cmp expect actual &&
whereas patience says:
echo "tag-one-line" >expect &&
- git-tag -l | grep "^tag-one-line" >actual &&
- test_cmp expect actual &&
- git-tag -n0 -l | grep "^tag-one-line" >actual &&
- test_cmp expect actual &&
- git-tag -n0 -l tag-one-line >actual &&
+ git tag -l | grep "^tag-one-line" >actual &&
+ test_cmp expect actual &&
+ git tag -n0 -l | grep "^tag-one-line" >actual &&
+ test_cmp expect actual &&
+ git tag -n0 -l tag-one-line >actual &&
test_cmp expect actual &&
which is exactly what patience is advertised to do: it's treating the
non-unique lines as uninteresting markers. But in this case they _are_
interesting, and I think the git output is more readable. And this is a
case where your "weight lines by length instead of uniqueness"
suggestion would perform better, I think.
-Peff
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox