* [PATCHv4 00/12] git notes
@ 2009-08-27 1:43 Johan Herland
2009-08-27 1:43 ` [PATCHv4 01/12] Introduce commit notes Johan Herland
` (11 more replies)
0 siblings, 12 replies; 38+ messages in thread
From: Johan Herland @ 2009-08-27 1:43 UTC (permalink / raw)
To: gitster
Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
spearce
Another iteration of the 'git notes' feature. Rebased on top of 'next':
- Patches 1-7 are unchanged from (patches 1-5 + 7-8 of) the last iteration.
- Patch 8 introduces the new notes lookup code that offers both handling of
fanout subtrees, and other performance improvements.
- Patch 9 adds a selftest that verifies correct parsing of notes trees with
various fanouts.
- Patch 10 adds simple memory pooling to parts of the data structure from
patch 8. This improves performance slightly.
- Patches 11-12 adds the '%N' format specifier for pretty-printing commit
notes (as suggested by Dscho in the previous notes thread).
Some performance numbers from the notes lookup code:
(these numbers are from my Core 2 Quad, 4 GB RAM)
Test scenario I: Running t3302-notes-index-expensive
Timing numbers from the 'time_notes 100' following 'create_repo 10000'
no-notes notes
before 16.22s 23.74s
after 16.31s 22.16s
after+mempool 16.24s 22.03s
Comments: This is a worst case scenario for pretty much any notes lookup
algorithm: Looking up all 10000 notes with no fanout (fanout level 0).
The new implementation does marginally better than the old.
Test scenario II: Repo with 100,000 commits, 1 note per commit.
Timing 100 repetitions of 'git log -n 10 refs/heads/master >/dev/null'
without notes fanout level 0 fanout level 1 fanout level 2
before 0.20s 32.44s N/A N/A
after 0.19s 16.66s 0.85s 0.61s
after+mempool 0.19s 16.20s 0.83s 0.57s
Comments: This hopefully gives a better simulation of a common use case
(displaying only a handful of commits, and their notes). In the (relative)
worst case (fanout level 0), the new code almost twice as fast as the old one.
As we add fanout, the runtime plummets (since we only need to unpack a handful
of subtrees).
In practice, this means that with even a modest 2/38 or 2/2/36 fanout in the
notes tree (fanout level 1 and 2, respectively), the 'git log' user experience
goes from unbearable to barely noticeable in a repo with hundreds of thousands
of notes.
Have fun! :)
...Johan
Johan Herland (7):
Teach "-m <msg>" and "-F <file>" to "git notes edit"
fast-import: Add support for importing commit notes
t3302-notes-index-expensive: Speed up create_repo()
Teach the notes lookup code to parse notes trees with various fanout schemes
Selftests verifying semantics when loading notes trees with various fanouts
notes.c: Implement simple memory pooling of leaf nodes
Add flags to get_commit_notes() to control the format of the note string
Johannes Schindelin (5):
Introduce commit notes
Add a script to edit/inspect notes
Speed up git notes lookup
Add an expensive test for git-notes
Add '%N'-format for pretty-printing commit notes
.gitignore | 1 +
Documentation/config.txt | 13 ++
Documentation/git-fast-import.txt | 45 +++++-
Documentation/git-notes.txt | 60 +++++++
Documentation/pretty-formats.txt | 1 +
Makefile | 3 +
cache.h | 4 +
command-list.txt | 1 +
commit.c | 1 +
config.c | 5 +
environment.c | 1 +
fast-import.c | 88 +++++++++-
git-notes.sh | 121 +++++++++++++
notes.c | 338 +++++++++++++++++++++++++++++++++++++
notes.h | 10 +
pretty.c | 10 +
t/t3301-notes.sh | 150 ++++++++++++++++
t/t3302-notes-index-expensive.sh | 118 +++++++++++++
t/t3303-notes-subtrees.sh | 206 ++++++++++++++++++++++
t/t9300-fast-import.sh | 166 ++++++++++++++++++
20 files changed, 1332 insertions(+), 10 deletions(-)
create mode 100644 Documentation/git-notes.txt
create mode 100755 git-notes.sh
create mode 100644 notes.c
create mode 100644 notes.h
create mode 100755 t/t3301-notes.sh
create mode 100755 t/t3302-notes-index-expensive.sh
create mode 100755 t/t3303-notes-subtrees.sh
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCHv4 01/12] Introduce commit notes
2009-08-27 1:43 [PATCHv4 00/12] git notes Johan Herland
@ 2009-08-27 1:43 ` Johan Herland
2009-08-27 1:43 ` [PATCHv4 02/12] Add a script to edit/inspect notes Johan Herland
` (10 subsequent siblings)
11 siblings, 0 replies; 38+ messages in thread
From: Johan Herland @ 2009-08-27 1:43 UTC (permalink / raw)
To: gitster
Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
spearce, Johannes Schindelin
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Commit notes are blobs which are shown together with the commit
message. These blobs are taken from the notes ref, which you can
configure by the config variable core.notesRef, which in turn can
be overridden by the environment variable GIT_NOTES_REF.
The notes ref is a branch which contains "files" whose names are
the names of the corresponding commits (i.e. the SHA-1).
The rationale for putting this information into a ref is this: we
want to be able to fetch and possibly union-merge the notes,
maybe even look at the date when a note was introduced, and we
want to store them efficiently together with the other objects.
This patch has been improved by the following contributions:
- Thomas Rast: fix core.notesRef documentation
- Tor Arne Vestbø: fix printing of multi-line notes
- Alex Riesen: Using char array instead of char pointer costs less BSS
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Tor Arne Vestbø <tavestbo@trolltech.com>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Documentation/config.txt | 13 +++++++++
Makefile | 2 +
cache.h | 4 +++
commit.c | 1 +
config.c | 5 +++
environment.c | 1 +
notes.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++
notes.h | 7 +++++
pretty.c | 5 +++
9 files changed, 106 insertions(+), 0 deletions(-)
create mode 100644 notes.c
create mode 100644 notes.h
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 5256c7f..645775f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -439,6 +439,19 @@ On some file system/operating system combinations, this is unreliable.
Set this config setting to 'rename' there; However, This will remove the
check that makes sure that existing object files will not get overwritten.
+core.notesRef::
+ When showing commit messages, also show notes which are stored in
+ the given ref. This ref is expected to contain files named
+ after the full SHA-1 of the commit they annotate.
++
+If such a file exists in the given ref, the referenced blob is read, and
+appended to the commit message, separated by a "Notes:" line. If the
+given ref itself does not exist, it is not an error, but means that no
+notes should be printed.
++
+This setting defaults to "refs/notes/commits", and can be overridden by
+the `GIT_NOTES_REF` environment variable.
+
add.ignore-errors::
Tells 'git-add' to continue adding files when some files cannot be
added due to indexing errors. Equivalent to the '--ignore-errors'
diff --git a/Makefile b/Makefile
index 4190a5d..95571db 100644
--- a/Makefile
+++ b/Makefile
@@ -426,6 +426,7 @@ LIB_H += ll-merge.h
LIB_H += log-tree.h
LIB_H += mailmap.h
LIB_H += merge-recursive.h
+LIB_H += notes.h
LIB_H += object.h
LIB_H += pack.h
LIB_H += pack-refs.h
@@ -509,6 +510,7 @@ LIB_OBJS += match-trees.o
LIB_OBJS += merge-file.o
LIB_OBJS += merge-recursive.o
LIB_OBJS += name-hash.o
+LIB_OBJS += notes.o
LIB_OBJS += object.o
LIB_OBJS += pack-check.o
LIB_OBJS += pack-refs.o
diff --git a/cache.h b/cache.h
index 808daba..739b42c 100644
--- a/cache.h
+++ b/cache.h
@@ -371,6 +371,8 @@ static inline enum object_type object_type(unsigned int mode)
#define GITATTRIBUTES_FILE ".gitattributes"
#define INFOATTRIBUTES_FILE "info/attributes"
#define ATTRIBUTE_MACRO_PREFIX "[attr]"
+#define GIT_NOTES_REF_ENVIRONMENT "GIT_NOTES_REF"
+#define GIT_NOTES_DEFAULT_REF "refs/notes/commits"
extern int is_bare_repository_cfg;
extern int is_bare_repository(void);
@@ -565,6 +567,8 @@ enum object_creation_mode {
extern enum object_creation_mode object_creation_mode;
+extern char *notes_ref_name;
+
extern int grafts_replace_parents;
#define GIT_REPO_VERSION 0
diff --git a/commit.c b/commit.c
index e2bcbe8..ae6bff4 100644
--- a/commit.c
+++ b/commit.c
@@ -5,6 +5,7 @@
#include "utf8.h"
#include "diff.h"
#include "revision.h"
+#include "notes.h"
int save_commit_buffer = 1;
diff --git a/config.c b/config.c
index e87edea..70a7d34 100644
--- a/config.c
+++ b/config.c
@@ -467,6 +467,11 @@ static int git_default_core_config(const char *var, const char *value)
return 0;
}
+ if (!strcmp(var, "core.notesref")) {
+ notes_ref_name = xstrdup(value);
+ return 0;
+ }
+
if (!strcmp(var, "core.pager"))
return git_config_string(&pager_program, var, value);
diff --git a/environment.c b/environment.c
index 5de6837..571ab56 100644
--- a/environment.c
+++ b/environment.c
@@ -49,6 +49,7 @@ enum push_default_type push_default = PUSH_DEFAULT_MATCHING;
#define OBJECT_CREATION_MODE OBJECT_CREATION_USES_HARDLINKS
#endif
enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
+char *notes_ref_name;
int grafts_replace_parents = 1;
/* Parallel index stat data preload? */
diff --git a/notes.c b/notes.c
new file mode 100644
index 0000000..401966d
--- /dev/null
+++ b/notes.c
@@ -0,0 +1,68 @@
+#include "cache.h"
+#include "commit.h"
+#include "notes.h"
+#include "refs.h"
+#include "utf8.h"
+#include "strbuf.h"
+
+static int initialized;
+
+void get_commit_notes(const struct commit *commit, struct strbuf *sb,
+ const char *output_encoding)
+{
+ static const char utf8[] = "utf-8";
+ struct strbuf name = STRBUF_INIT;
+ unsigned char sha1[20];
+ char *msg, *msg_p;
+ unsigned long linelen, msglen;
+ enum object_type type;
+
+ if (!initialized) {
+ const char *env = getenv(GIT_NOTES_REF_ENVIRONMENT);
+ if (env)
+ notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT);
+ else if (!notes_ref_name)
+ notes_ref_name = GIT_NOTES_DEFAULT_REF;
+ if (notes_ref_name && read_ref(notes_ref_name, sha1))
+ notes_ref_name = NULL;
+ initialized = 1;
+ }
+
+ if (!notes_ref_name)
+ return;
+
+ strbuf_addf(&name, "%s:%s", notes_ref_name,
+ sha1_to_hex(commit->object.sha1));
+ if (get_sha1(name.buf, sha1))
+ return;
+
+ if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen ||
+ type != OBJ_BLOB)
+ return;
+
+ if (output_encoding && *output_encoding &&
+ strcmp(utf8, output_encoding)) {
+ char *reencoded = reencode_string(msg, output_encoding, utf8);
+ if (reencoded) {
+ free(msg);
+ msg = reencoded;
+ msglen = strlen(msg);
+ }
+ }
+
+ /* we will end the annotation by a newline anyway */
+ if (msglen && msg[msglen - 1] == '\n')
+ msglen--;
+
+ strbuf_addstr(sb, "\nNotes:\n");
+
+ for (msg_p = msg; msg_p < msg + msglen; msg_p += linelen + 1) {
+ linelen = strchrnul(msg_p, '\n') - msg_p;
+
+ strbuf_addstr(sb, " ");
+ strbuf_add(sb, msg_p, linelen);
+ strbuf_addch(sb, '\n');
+ }
+
+ free(msg);
+}
diff --git a/notes.h b/notes.h
new file mode 100644
index 0000000..79d21b6
--- /dev/null
+++ b/notes.h
@@ -0,0 +1,7 @@
+#ifndef NOTES_H
+#define NOTES_H
+
+void get_commit_notes(const struct commit *commit, struct strbuf *sb,
+ const char *output_encoding);
+
+#endif
diff --git a/pretty.c b/pretty.c
index f5983f8..e25db81 100644
--- a/pretty.c
+++ b/pretty.c
@@ -6,6 +6,7 @@
#include "string-list.h"
#include "mailmap.h"
#include "log-tree.h"
+#include "notes.h"
#include "color.h"
static char *user_format;
@@ -975,5 +976,9 @@ void pretty_print_commit(enum cmit_fmt fmt, const struct commit *commit,
*/
if (fmt == CMIT_FMT_EMAIL && sb->len <= beginning_of_body)
strbuf_addch(sb, '\n');
+
+ if (fmt != CMIT_FMT_ONELINE)
+ get_commit_notes(commit, sb, encoding);
+
free(reencoded);
}
--
1.6.4.304.g1365c.dirty
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCHv4 02/12] Add a script to edit/inspect notes
2009-08-27 1:43 [PATCHv4 00/12] git notes Johan Herland
2009-08-27 1:43 ` [PATCHv4 01/12] Introduce commit notes Johan Herland
@ 2009-08-27 1:43 ` Johan Herland
2009-08-27 1:43 ` [PATCHv4 03/12] Speed up git notes lookup Johan Herland
` (9 subsequent siblings)
11 siblings, 0 replies; 38+ messages in thread
From: Johan Herland @ 2009-08-27 1:43 UTC (permalink / raw)
To: gitster
Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
spearce, Johannes Schindelin
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
The script 'git notes' allows you to edit and show commit notes, by
calling either
git notes show <commit>
or
git notes edit <commit>
This patch has been improved by the following contributions:
- Tor Arne Vestbø: fix printing of multi-line notes
- Michael J Gruber: test and handle empty notes gracefully
- Thomas Rast:
- only clean up message file when editing
- use GIT_EDITOR and core.editor over VISUAL/EDITOR
- t3301: fix confusing quoting in test for valid notes ref
- t3301: use test_must_fail instead of !
- refuse to edit notes outside refs/notes/
- Junio C Hamano: tests: fix "export var=val"
- Christian Couder: documentation: fix 'linkgit' macro in "git-notes.txt"
- Johan Herland: minor cleanup and bugfixing in git-notes.sh (v2)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Tor Arne Vestbø <tavestbo@trolltech.com>
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
.gitignore | 1 +
Documentation/git-notes.txt | 46 +++++++++++++++++
Makefile | 1 +
command-list.txt | 1 +
git-notes.sh | 73 +++++++++++++++++++++++++++
t/t3301-notes.sh | 114 +++++++++++++++++++++++++++++++++++++++++++
6 files changed, 236 insertions(+), 0 deletions(-)
create mode 100644 Documentation/git-notes.txt
create mode 100755 git-notes.sh
create mode 100755 t/t3301-notes.sh
diff --git a/.gitignore b/.gitignore
index c446290..703241b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -86,6 +86,7 @@ git-mktag
git-mktree
git-name-rev
git-mv
+git-notes
git-pack-redundant
git-pack-objects
git-pack-refs
diff --git a/Documentation/git-notes.txt b/Documentation/git-notes.txt
new file mode 100644
index 0000000..7136016
--- /dev/null
+++ b/Documentation/git-notes.txt
@@ -0,0 +1,46 @@
+git-notes(1)
+============
+
+NAME
+----
+git-notes - Add/inspect commit notes
+
+SYNOPSIS
+--------
+[verse]
+'git-notes' (edit | show) [commit]
+
+DESCRIPTION
+-----------
+This command allows you to add notes to commit messages, without
+changing the commit. To discern these notes from the message stored
+in the commit object, the notes are indented like the message, after
+an unindented line saying "Notes:".
+
+To disable commit notes, you have to set the config variable
+core.notesRef to the empty string. Alternatively, you can set it
+to a different ref, something like "refs/notes/bugzilla". This setting
+can be overridden by the environment variable "GIT_NOTES_REF".
+
+
+SUBCOMMANDS
+-----------
+
+edit::
+ Edit the notes for a given commit (defaults to HEAD).
+
+show::
+ Show the notes for a given commit (defaults to HEAD).
+
+
+Author
+------
+Written by Johannes Schindelin <johannes.schindelin@gmx.de>
+
+Documentation
+-------------
+Documentation by Johannes Schindelin
+
+GIT
+---
+Part of the linkgit:git[7] suite
diff --git a/Makefile b/Makefile
index 95571db..a2ec7b0 100644
--- a/Makefile
+++ b/Makefile
@@ -315,6 +315,7 @@ SCRIPT_SH += git-merge-one-file.sh
SCRIPT_SH += git-merge-resolve.sh
SCRIPT_SH += git-mergetool.sh
SCRIPT_SH += git-mergetool--lib.sh
+SCRIPT_SH += git-notes.sh
SCRIPT_SH += git-parse-remote.sh
SCRIPT_SH += git-pull.sh
SCRIPT_SH += git-quiltimport.sh
diff --git a/command-list.txt b/command-list.txt
index fb03a2e..4296941 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -74,6 +74,7 @@ git-mktag plumbingmanipulators
git-mktree plumbingmanipulators
git-mv mainporcelain common
git-name-rev plumbinginterrogators
+git-notes mainporcelain
git-pack-objects plumbingmanipulators
git-pack-redundant plumbinginterrogators
git-pack-refs ancillarymanipulators
diff --git a/git-notes.sh b/git-notes.sh
new file mode 100755
index 0000000..f06c254
--- /dev/null
+++ b/git-notes.sh
@@ -0,0 +1,73 @@
+#!/bin/sh
+
+USAGE="(edit | show) [commit]"
+. git-sh-setup
+
+test -n "$3" && usage
+
+test -z "$1" && usage
+ACTION="$1"; shift
+
+test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="$(git config core.notesref)"
+test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="refs/notes/commits"
+
+COMMIT=$(git rev-parse --verify --default HEAD "$@") ||
+die "Invalid commit: $@"
+
+case "$ACTION" in
+edit)
+ if [ "${GIT_NOTES_REF#refs/notes/}" = "$GIT_NOTES_REF" ]; then
+ die "Refusing to edit notes in $GIT_NOTES_REF (outside of refs/notes/)"
+ fi
+
+ MSG_FILE="$GIT_DIR/new-notes-$COMMIT"
+ GIT_INDEX_FILE="$MSG_FILE.idx"
+ export GIT_INDEX_FILE
+
+ trap '
+ test -f "$MSG_FILE" && rm "$MSG_FILE"
+ test -f "$GIT_INDEX_FILE" && rm "$GIT_INDEX_FILE"
+ ' 0
+
+ GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MSG_FILE"
+
+ CURRENT_HEAD=$(git show-ref "$GIT_NOTES_REF" | cut -f 1 -d ' ')
+ if [ -z "$CURRENT_HEAD" ]; then
+ PARENT=
+ else
+ PARENT="-p $CURRENT_HEAD"
+ git read-tree "$GIT_NOTES_REF" || die "Could not read index"
+ git cat-file blob :$COMMIT >> "$MSG_FILE" 2> /dev/null
+ fi
+
+ core_editor="$(git config core.editor)"
+ ${GIT_EDITOR:-${core_editor:-${VISUAL:-${EDITOR:-vi}}}} "$MSG_FILE"
+
+ grep -v ^# < "$MSG_FILE" | git stripspace > "$MSG_FILE".processed
+ mv "$MSG_FILE".processed "$MSG_FILE"
+ if [ -s "$MSG_FILE" ]; then
+ BLOB=$(git hash-object -w "$MSG_FILE") ||
+ die "Could not write into object database"
+ git update-index --add --cacheinfo 0644 $BLOB $COMMIT ||
+ die "Could not write index"
+ else
+ test -z "$CURRENT_HEAD" &&
+ die "Will not initialise with empty tree"
+ git update-index --force-remove $COMMIT ||
+ die "Could not update index"
+ fi
+
+ TREE=$(git write-tree) || die "Could not write tree"
+ NEW_HEAD=$(echo Annotate $COMMIT | git commit-tree $TREE $PARENT) ||
+ die "Could not annotate"
+ git update-ref -m "Annotate $COMMIT" \
+ "$GIT_NOTES_REF" $NEW_HEAD $CURRENT_HEAD
+;;
+show)
+ git rev-parse -q --verify "$GIT_NOTES_REF":$COMMIT > /dev/null ||
+ die "No note for commit $COMMIT."
+ git show "$GIT_NOTES_REF":$COMMIT
+;;
+*)
+ usage
+esac
diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh
new file mode 100755
index 0000000..73e53be
--- /dev/null
+++ b/t/t3301-notes.sh
@@ -0,0 +1,114 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johannes E. Schindelin
+#
+
+test_description='Test commit notes'
+
+. ./test-lib.sh
+
+cat > fake_editor.sh << \EOF
+echo "$MSG" > "$1"
+echo "$MSG" >& 2
+EOF
+chmod a+x fake_editor.sh
+VISUAL=./fake_editor.sh
+export VISUAL
+
+test_expect_success 'cannot annotate non-existing HEAD' '
+ (MSG=3 && export MSG && test_must_fail git notes edit)
+'
+
+test_expect_success setup '
+ : > a1 &&
+ git add a1 &&
+ test_tick &&
+ git commit -m 1st &&
+ : > a2 &&
+ git add a2 &&
+ test_tick &&
+ git commit -m 2nd
+'
+
+test_expect_success 'need valid notes ref' '
+ (MSG=1 GIT_NOTES_REF=/ && export MSG GIT_NOTES_REF &&
+ test_must_fail git notes edit) &&
+ (MSG=2 GIT_NOTES_REF=/ && export MSG GIT_NOTES_REF &&
+ test_must_fail git notes show)
+'
+
+test_expect_success 'refusing to edit in refs/heads/' '
+ (MSG=1 GIT_NOTES_REF=refs/heads/bogus &&
+ export MSG GIT_NOTES_REF &&
+ test_must_fail git notes edit)
+'
+
+test_expect_success 'refusing to edit in refs/remotes/' '
+ (MSG=1 GIT_NOTES_REF=refs/remotes/bogus &&
+ export MSG GIT_NOTES_REF &&
+ test_must_fail git notes edit)
+'
+
+# 1 indicates caught gracefully by die, 128 means git-show barked
+test_expect_success 'handle empty notes gracefully' '
+ git notes show ; test 1 = $?
+'
+
+test_expect_success 'create notes' '
+ git config core.notesRef refs/notes/commits &&
+ MSG=b1 git notes edit &&
+ test ! -f .git/new-notes &&
+ test 1 = $(git ls-tree refs/notes/commits | wc -l) &&
+ test b1 = $(git notes show) &&
+ git show HEAD^ &&
+ test_must_fail git notes show HEAD^
+'
+
+cat > expect << EOF
+commit 268048bfb8a1fb38e703baceb8ab235421bf80c5
+Author: A U Thor <author@example.com>
+Date: Thu Apr 7 15:14:13 2005 -0700
+
+ 2nd
+
+Notes:
+ b1
+EOF
+
+test_expect_success 'show notes' '
+ ! (git cat-file commit HEAD | grep b1) &&
+ git log -1 > output &&
+ test_cmp expect output
+'
+test_expect_success 'create multi-line notes (setup)' '
+ : > a3 &&
+ git add a3 &&
+ test_tick &&
+ git commit -m 3rd &&
+ MSG="b3
+c3c3c3c3
+d3d3d3" git notes edit
+'
+
+cat > expect-multiline << EOF
+commit 1584215f1d29c65e99c6c6848626553fdd07fd75
+Author: A U Thor <author@example.com>
+Date: Thu Apr 7 15:15:13 2005 -0700
+
+ 3rd
+
+Notes:
+ b3
+ c3c3c3c3
+ d3d3d3
+EOF
+
+printf "\n" >> expect-multiline
+cat expect >> expect-multiline
+
+test_expect_success 'show multi-line notes' '
+ git log -2 > output &&
+ test_cmp expect-multiline output
+'
+
+test_done
--
1.6.4.304.g1365c.dirty
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCHv4 03/12] Speed up git notes lookup
2009-08-27 1:43 [PATCHv4 00/12] git notes Johan Herland
2009-08-27 1:43 ` [PATCHv4 01/12] Introduce commit notes Johan Herland
2009-08-27 1:43 ` [PATCHv4 02/12] Add a script to edit/inspect notes Johan Herland
@ 2009-08-27 1:43 ` Johan Herland
2009-08-27 1:43 ` [PATCHv4 04/12] Add an expensive test for git-notes Johan Herland
` (8 subsequent siblings)
11 siblings, 0 replies; 38+ messages in thread
From: Johan Herland @ 2009-08-27 1:43 UTC (permalink / raw)
To: gitster
Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
spearce, Johannes Schindelin
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To avoid looking up each and every commit in the notes ref's tree
object, which is very expensive, speed things up by slurping the tree
object's contents into a hash_map.
The idea for the hashmap singleton is from David Reiss, initial
benchmarking by Jeff King.
Note: the implementation allows for arbitrary entries in the notes
tree object, ignoring those that do not reference a valid object. This
allows you to annotate arbitrary branches, or objects.
This patch has been improved by the following contributions:
- Junio C Hamano: fixed an obvious error in initialize_hash_map()
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
notes.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
1 files changed, 102 insertions(+), 10 deletions(-)
diff --git a/notes.c b/notes.c
index 401966d..9172154 100644
--- a/notes.c
+++ b/notes.c
@@ -4,15 +4,112 @@
#include "refs.h"
#include "utf8.h"
#include "strbuf.h"
+#include "tree-walk.h"
+
+struct entry {
+ unsigned char commit_sha1[20];
+ unsigned char notes_sha1[20];
+};
+
+struct hash_map {
+ struct entry *entries;
+ off_t count, size;
+};
static int initialized;
+static struct hash_map hash_map;
+
+static int hash_index(struct hash_map *map, const unsigned char *sha1)
+{
+ int i = ((*(unsigned int *)sha1) % map->size);
+
+ for (;;) {
+ unsigned char *current = map->entries[i].commit_sha1;
+
+ if (!hashcmp(sha1, current))
+ return i;
+
+ if (is_null_sha1(current))
+ return -1 - i;
+
+ if (++i == map->size)
+ i = 0;
+ }
+}
+
+static void add_entry(const unsigned char *commit_sha1,
+ const unsigned char *notes_sha1)
+{
+ int index;
+
+ if (hash_map.count + 1 > hash_map.size >> 1) {
+ int i, old_size = hash_map.size;
+ struct entry *old = hash_map.entries;
+
+ hash_map.size = old_size ? old_size << 1 : 64;
+ hash_map.entries = (struct entry *)
+ xcalloc(sizeof(struct entry), hash_map.size);
+
+ for (i = 0; i < old_size; i++)
+ if (!is_null_sha1(old[i].commit_sha1)) {
+ index = -1 - hash_index(&hash_map,
+ old[i].commit_sha1);
+ memcpy(hash_map.entries + index, old + i,
+ sizeof(struct entry));
+ }
+ free(old);
+ }
+
+ index = hash_index(&hash_map, commit_sha1);
+ if (index < 0) {
+ index = -1 - index;
+ hash_map.count++;
+ }
+
+ hashcpy(hash_map.entries[index].commit_sha1, commit_sha1);
+ hashcpy(hash_map.entries[index].notes_sha1, notes_sha1);
+}
+
+static void initialize_hash_map(const char *notes_ref_name)
+{
+ unsigned char sha1[20], commit_sha1[20];
+ unsigned mode;
+ struct tree_desc desc;
+ struct name_entry entry;
+ void *buf;
+
+ if (!notes_ref_name || read_ref(notes_ref_name, commit_sha1) ||
+ get_tree_entry(commit_sha1, "", sha1, &mode))
+ return;
+
+ buf = fill_tree_descriptor(&desc, sha1);
+ if (!buf)
+ die("Could not read %s for notes-index", sha1_to_hex(sha1));
+
+ while (tree_entry(&desc, &entry))
+ if (!get_sha1(entry.path, commit_sha1))
+ add_entry(commit_sha1, entry.sha1);
+ free(buf);
+}
+
+static unsigned char *lookup_notes(const unsigned char *commit_sha1)
+{
+ int index;
+
+ if (!hash_map.size)
+ return NULL;
+
+ index = hash_index(&hash_map, commit_sha1);
+ if (index < 0)
+ return NULL;
+ return hash_map.entries[index].notes_sha1;
+}
void get_commit_notes(const struct commit *commit, struct strbuf *sb,
const char *output_encoding)
{
static const char utf8[] = "utf-8";
- struct strbuf name = STRBUF_INIT;
- unsigned char sha1[20];
+ unsigned char *sha1;
char *msg, *msg_p;
unsigned long linelen, msglen;
enum object_type type;
@@ -23,17 +120,12 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT);
else if (!notes_ref_name)
notes_ref_name = GIT_NOTES_DEFAULT_REF;
- if (notes_ref_name && read_ref(notes_ref_name, sha1))
- notes_ref_name = NULL;
+ initialize_hash_map(notes_ref_name);
initialized = 1;
}
- if (!notes_ref_name)
- return;
-
- strbuf_addf(&name, "%s:%s", notes_ref_name,
- sha1_to_hex(commit->object.sha1));
- if (get_sha1(name.buf, sha1))
+ sha1 = lookup_notes(commit->object.sha1);
+ if (!sha1)
return;
if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen ||
--
1.6.4.304.g1365c.dirty
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCHv4 04/12] Add an expensive test for git-notes
2009-08-27 1:43 [PATCHv4 00/12] git notes Johan Herland
` (2 preceding siblings ...)
2009-08-27 1:43 ` [PATCHv4 03/12] Speed up git notes lookup Johan Herland
@ 2009-08-27 1:43 ` Johan Herland
2009-08-27 1:43 ` [PATCHv4 05/12] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
` (7 subsequent siblings)
11 siblings, 0 replies; 38+ messages in thread
From: Johan Herland @ 2009-08-27 1:43 UTC (permalink / raw)
To: gitster
Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
spearce, Johannes Schindelin
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
git-notes have the potential of being pretty expensive, so test with
a lot of commits. A lot. So to make things cheaper, you have to
opt-in explicitely, by setting the environment variable
GIT_NOTES_TIMING_TESTS.
This patch has been improved by the following contributions:
- Junio C Hamano: tests: fix "export var=val"
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
t/t3302-notes-index-expensive.sh | 98 ++++++++++++++++++++++++++++++++++++++
1 files changed, 98 insertions(+), 0 deletions(-)
create mode 100755 t/t3302-notes-index-expensive.sh
diff --git a/t/t3302-notes-index-expensive.sh b/t/t3302-notes-index-expensive.sh
new file mode 100755
index 0000000..0ef3e95
--- /dev/null
+++ b/t/t3302-notes-index-expensive.sh
@@ -0,0 +1,98 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johannes E. Schindelin
+#
+
+test_description='Test commit notes index (expensive!)'
+
+. ./test-lib.sh
+
+test -z "$GIT_NOTES_TIMING_TESTS" && {
+ say Skipping timing tests
+ test_done
+ exit
+}
+
+create_repo () {
+ number_of_commits=$1
+ nr=0
+ parent=
+ test -d .git || {
+ git init &&
+ tree=$(git write-tree) &&
+ while [ $nr -lt $number_of_commits ]; do
+ test_tick &&
+ commit=$(echo $nr | git commit-tree $tree $parent) ||
+ return
+ parent="-p $commit"
+ nr=$(($nr+1))
+ done &&
+ git update-ref refs/heads/master $commit &&
+ {
+ GIT_INDEX_FILE=.git/temp; export GIT_INDEX_FILE;
+ git rev-list HEAD | cat -n | sed "s/^[ ][ ]*/ /g" |
+ while read nr sha1; do
+ blob=$(echo note $nr | git hash-object -w --stdin) &&
+ echo $sha1 | sed "s/^/0644 $blob 0 /"
+ done | git update-index --index-info &&
+ tree=$(git write-tree) &&
+ test_tick &&
+ commit=$(echo notes | git commit-tree $tree) &&
+ git update-ref refs/notes/commits $commit
+ } &&
+ git config core.notesRef refs/notes/commits
+ }
+}
+
+test_notes () {
+ count=$1 &&
+ git config core.notesRef refs/notes/commits &&
+ git log | grep "^ " > output &&
+ i=1 &&
+ while [ $i -le $count ]; do
+ echo " $(($count-$i))" &&
+ echo " note $i" &&
+ i=$(($i+1));
+ done > expect &&
+ git diff expect output
+}
+
+cat > time_notes << \EOF
+ mode=$1
+ i=1
+ while [ $i -lt $2 ]; do
+ case $1 in
+ no-notes)
+ GIT_NOTES_REF=non-existing; export GIT_NOTES_REF
+ ;;
+ notes)
+ unset GIT_NOTES_REF
+ ;;
+ esac
+ git log >/dev/null
+ i=$(($i+1))
+ done
+EOF
+
+time_notes () {
+ for mode in no-notes notes
+ do
+ echo $mode
+ /usr/bin/time sh ../time_notes $mode $1
+ done
+}
+
+for count in 10 100 1000 10000; do
+
+ mkdir $count
+ (cd $count;
+
+ test_expect_success "setup $count" "create_repo $count"
+
+ test_expect_success 'notes work' "test_notes $count"
+
+ test_expect_success 'notes timing' "time_notes 100"
+ )
+done
+
+test_done
--
1.6.4.304.g1365c.dirty
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCHv4 05/12] Teach "-m <msg>" and "-F <file>" to "git notes edit"
2009-08-27 1:43 [PATCHv4 00/12] git notes Johan Herland
` (3 preceding siblings ...)
2009-08-27 1:43 ` [PATCHv4 04/12] Add an expensive test for git-notes Johan Herland
@ 2009-08-27 1:43 ` Johan Herland
2009-08-27 1:43 ` [PATCHv4 06/12] fast-import: Add support for importing commit notes Johan Herland
` (6 subsequent siblings)
11 siblings, 0 replies; 38+ messages in thread
From: Johan Herland @ 2009-08-27 1:43 UTC (permalink / raw)
To: gitster
Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
spearce
The "-m" and "-F" options are already the established method
(in both git-commit and git-tag) to specify a commit/tag message
without invoking the editor. This patch teaches "git notes edit"
to respect the same options for specifying a notes message without
invoking the editor.
Multiple "-m" and/or "-F" options are concatenated as separate
paragraphs.
The patch also updates the "git notes" documentation and adds
selftests for the new functionality. Unfortunately, the added
selftests include a couple of lines with trailing whitespace
(without these the test will fail). This may cause git to warn
about "whitespace errors".
This patch has been improved by the following contributions:
- Thomas Rast: fix trailing whitespace in t3301
Signed-off-by: Johan Herland <johan@herland.net>
---
Documentation/git-notes.txt | 16 ++++++++++-
git-notes.sh | 64 +++++++++++++++++++++++++++++++++++++-----
t/t3301-notes.sh | 36 ++++++++++++++++++++++++
3 files changed, 107 insertions(+), 9 deletions(-)
diff --git a/Documentation/git-notes.txt b/Documentation/git-notes.txt
index 7136016..94cceb1 100644
--- a/Documentation/git-notes.txt
+++ b/Documentation/git-notes.txt
@@ -8,7 +8,7 @@ git-notes - Add/inspect commit notes
SYNOPSIS
--------
[verse]
-'git-notes' (edit | show) [commit]
+'git-notes' (edit [-F <file> | -m <msg>] | show) [commit]
DESCRIPTION
-----------
@@ -33,6 +33,20 @@ show::
Show the notes for a given commit (defaults to HEAD).
+OPTIONS
+-------
+-m <msg>::
+ Use the given note message (instead of prompting).
+ If multiple `-m` (or `-F`) options are given, their
+ values are concatenated as separate paragraphs.
+
+-F <file>::
+ Take the note message from the given file. Use '-' to
+ read the note message from the standard input.
+ If multiple `-F` (or `-m`) options are given, their
+ values are concatenated as separate paragraphs.
+
+
Author
------
Written by Johannes Schindelin <johannes.schindelin@gmx.de>
diff --git a/git-notes.sh b/git-notes.sh
index f06c254..e642e47 100755
--- a/git-notes.sh
+++ b/git-notes.sh
@@ -1,16 +1,59 @@
#!/bin/sh
-USAGE="(edit | show) [commit]"
+USAGE="(edit [-F <file> | -m <msg>] | show) [commit]"
. git-sh-setup
-test -n "$3" && usage
-
test -z "$1" && usage
ACTION="$1"; shift
test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="$(git config core.notesref)"
test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="refs/notes/commits"
+MESSAGE=
+while test $# != 0
+do
+ case "$1" in
+ -m)
+ test "$ACTION" = "edit" || usage
+ shift
+ if test "$#" = "0"; then
+ die "error: option -m needs an argument"
+ else
+ if [ -z "$MESSAGE" ]; then
+ MESSAGE="$1"
+ else
+ MESSAGE="$MESSAGE
+
+$1"
+ fi
+ shift
+ fi
+ ;;
+ -F)
+ test "$ACTION" = "edit" || usage
+ shift
+ if test "$#" = "0"; then
+ die "error: option -F needs an argument"
+ else
+ if [ -z "$MESSAGE" ]; then
+ MESSAGE="$(cat "$1")"
+ else
+ MESSAGE="$MESSAGE
+
+$(cat "$1")"
+ fi
+ shift
+ fi
+ ;;
+ -*)
+ usage
+ ;;
+ *)
+ break
+ ;;
+ esac
+done
+
COMMIT=$(git rev-parse --verify --default HEAD "$@") ||
die "Invalid commit: $@"
@@ -29,19 +72,24 @@ edit)
test -f "$GIT_INDEX_FILE" && rm "$GIT_INDEX_FILE"
' 0
- GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MSG_FILE"
-
CURRENT_HEAD=$(git show-ref "$GIT_NOTES_REF" | cut -f 1 -d ' ')
if [ -z "$CURRENT_HEAD" ]; then
PARENT=
else
PARENT="-p $CURRENT_HEAD"
git read-tree "$GIT_NOTES_REF" || die "Could not read index"
- git cat-file blob :$COMMIT >> "$MSG_FILE" 2> /dev/null
fi
- core_editor="$(git config core.editor)"
- ${GIT_EDITOR:-${core_editor:-${VISUAL:-${EDITOR:-vi}}}} "$MSG_FILE"
+ if [ -z "$MESSAGE" ]; then
+ GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MSG_FILE"
+ if [ ! -z "$CURRENT_HEAD" ]; then
+ git cat-file blob :$COMMIT >> "$MSG_FILE" 2> /dev/null
+ fi
+ core_editor="$(git config core.editor)"
+ ${GIT_EDITOR:-${core_editor:-${VISUAL:-${EDITOR:-vi}}}} "$MSG_FILE"
+ else
+ echo "$MESSAGE" > "$MSG_FILE"
+ fi
grep -v ^# < "$MSG_FILE" | git stripspace > "$MSG_FILE".processed
mv "$MSG_FILE".processed "$MSG_FILE"
diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh
index 73e53be..1e34f48 100755
--- a/t/t3301-notes.sh
+++ b/t/t3301-notes.sh
@@ -110,5 +110,41 @@ test_expect_success 'show multi-line notes' '
git log -2 > output &&
test_cmp expect-multiline output
'
+test_expect_success 'create -m and -F notes (setup)' '
+ : > a4 &&
+ git add a4 &&
+ test_tick &&
+ git commit -m 4th &&
+ echo "xyzzy" > note5 &&
+ git notes edit -m spam -F note5 -m "foo
+bar
+baz"
+'
+
+whitespace=" "
+cat > expect-m-and-F << EOF
+commit 15023535574ded8b1a89052b32673f84cf9582b8
+Author: A U Thor <author@example.com>
+Date: Thu Apr 7 15:16:13 2005 -0700
+
+ 4th
+
+Notes:
+ spam
+$whitespace
+ xyzzy
+$whitespace
+ foo
+ bar
+ baz
+EOF
+
+printf "\n" >> expect-m-and-F
+cat expect-multiline >> expect-m-and-F
+
+test_expect_success 'show -m and -F notes' '
+ git log -3 > output &&
+ test_cmp expect-m-and-F output
+'
test_done
--
1.6.4.304.g1365c.dirty
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCHv4 06/12] fast-import: Add support for importing commit notes
2009-08-27 1:43 [PATCHv4 00/12] git notes Johan Herland
` (4 preceding siblings ...)
2009-08-27 1:43 ` [PATCHv4 05/12] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
@ 2009-08-27 1:43 ` Johan Herland
2009-08-27 1:43 ` [PATCHv4 07/12] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
` (5 subsequent siblings)
11 siblings, 0 replies; 38+ messages in thread
From: Johan Herland @ 2009-08-27 1:43 UTC (permalink / raw)
To: gitster
Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
spearce
Introduce a 'notemodify' subcommand of the 'commit' command. This subcommand
is similar to 'filemodify', except that no mode is supplied (all notes have
mode 0644), and the path is set to the hex SHA1 of the given "comittish".
This enables fast import of note objects along with their associated commits,
since the notes can now be named using the mark references of their
corresponding commits.
The patch also includes a test case of the added functionality.
Signed-off-by: Johan Herland <johan@herland.net>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
---
Documentation/git-fast-import.txt | 45 +++++++++--
fast-import.c | 88 +++++++++++++++++++-
t/t9300-fast-import.sh | 166 +++++++++++++++++++++++++++++++++++++
3 files changed, 289 insertions(+), 10 deletions(-)
diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index c2f483a..288032c 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -316,7 +316,7 @@ change to the project.
data
('from' SP <committish> LF)?
('merge' SP <committish> LF)?
- (filemodify | filedelete | filecopy | filerename | filedeleteall)*
+ (filemodify | filedelete | filecopy | filerename | filedeleteall | notemodify)*
LF?
....
@@ -339,14 +339,13 @@ commit message use a 0 length data. Commit messages are free-form
and are not interpreted by Git. Currently they must be encoded in
UTF-8, as fast-import does not permit other encodings to be specified.
-Zero or more `filemodify`, `filedelete`, `filecopy`, `filerename`
-and `filedeleteall` commands
+Zero or more `filemodify`, `filedelete`, `filecopy`, `filerename`,
+`filedeleteall` and `notemodify` commands
may be included to update the contents of the branch prior to
creating the commit. These commands may be supplied in any order.
However it is recommended that a `filedeleteall` command precede
-all `filemodify`, `filecopy` and `filerename` commands in the same
-commit, as `filedeleteall`
-wipes the branch clean (see below).
+all `filemodify`, `filecopy`, `filerename` and `notemodify` commands in
+the same commit, as `filedeleteall` wipes the branch clean (see below).
The `LF` after the command is optional (it used to be required).
@@ -595,6 +594,40 @@ more memory per active branch (less than 1 MiB for even most large
projects); so frontends that can easily obtain only the affected
paths for a commit are encouraged to do so.
+`notemodify`
+^^^^^^^^^^^^
+Included in a `commit` command to add a new note (annotating a given
+commit) or change the content of an existing note. This command has
+two different means of specifying the content of the note.
+
+External data format::
+ The data content for the note was already supplied by a prior
+ `blob` command. The frontend just needs to connect it to the
+ commit that is to be annotated.
++
+....
+ 'N' SP <dataref> SP <committish> LF
+....
++
+Here `<dataref>` can be either a mark reference (`:<idnum>`)
+set by a prior `blob` command, or a full 40-byte SHA-1 of an
+existing Git blob object.
+
+Inline data format::
+ The data content for the note has not been supplied yet.
+ The frontend wants to supply it as part of this modify
+ command.
++
+....
+ 'N' SP 'inline' SP <committish> LF
+ data
+....
++
+See below for a detailed description of the `data` command.
+
+In both formats `<committish>` is any of the commit specification
+expressions also accepted by `from` (see above).
+
`mark`
~~~~~~
Arranges for fast-import to save a reference to the current object, allowing
diff --git a/fast-import.c b/fast-import.c
index 7ef9865..97e6e9e 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -22,8 +22,8 @@ Format of STDIN stream:
('author' sp name sp '<' email '>' sp when lf)?
'committer' sp name sp '<' email '>' sp when lf
commit_msg
- ('from' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf)?
- ('merge' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf)*
+ ('from' sp committish lf)?
+ ('merge' sp committish lf)*
file_change*
lf?;
commit_msg ::= data;
@@ -41,15 +41,18 @@ Format of STDIN stream:
file_obm ::= 'M' sp mode sp (hexsha1 | idnum) sp path_str lf;
file_inm ::= 'M' sp mode sp 'inline' sp path_str lf
data;
+ note_obm ::= 'N' sp (hexsha1 | idnum) sp committish lf;
+ note_inm ::= 'N' sp 'inline' sp committish lf
+ data;
new_tag ::= 'tag' sp tag_str lf
- 'from' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf
+ 'from' sp committish lf
('tagger' sp name sp '<' email '>' sp when lf)?
tag_msg;
tag_msg ::= data;
reset_branch ::= 'reset' sp ref_str lf
- ('from' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf)?
+ ('from' sp committish lf)?
lf?;
checkpoint ::= 'checkpoint' lf
@@ -88,6 +91,7 @@ Format of STDIN stream:
# stream formatting is: \, " and LF. Otherwise these values
# are UTF8.
#
+ committish ::= (ref_str | hexsha1 | sha1exp_str | idnum);
ref_str ::= ref;
sha1exp_str ::= sha1exp;
tag_str ::= tag;
@@ -2003,6 +2007,80 @@ static void file_change_cr(struct branch *b, int rename)
leaf.tree);
}
+static void note_change_n(struct branch *b)
+{
+ const char *p = command_buf.buf + 2;
+ static struct strbuf uq = STRBUF_INIT;
+ struct object_entry *oe = oe;
+ struct branch *s;
+ unsigned char sha1[20], commit_sha1[20];
+ uint16_t inline_data = 0;
+
+ /* <dataref> or 'inline' */
+ if (*p == ':') {
+ char *x;
+ oe = find_mark(strtoumax(p + 1, &x, 10));
+ hashcpy(sha1, oe->sha1);
+ p = x;
+ } else if (!prefixcmp(p, "inline")) {
+ inline_data = 1;
+ p += 6;
+ } else {
+ if (get_sha1_hex(p, sha1))
+ die("Invalid SHA1: %s", command_buf.buf);
+ oe = find_object(sha1);
+ p += 40;
+ }
+ if (*p++ != ' ')
+ die("Missing space after SHA1: %s", command_buf.buf);
+
+ /* <committish> */
+ s = lookup_branch(p);
+ if (s) {
+ hashcpy(commit_sha1, s->sha1);
+ } else if (*p == ':') {
+ uintmax_t commit_mark = strtoumax(p + 1, NULL, 10);
+ struct object_entry *commit_oe = find_mark(commit_mark);
+ if (commit_oe->type != OBJ_COMMIT)
+ die("Mark :%" PRIuMAX " not a commit", commit_mark);
+ hashcpy(commit_sha1, commit_oe->sha1);
+ } else if (!get_sha1(p, commit_sha1)) {
+ unsigned long size;
+ char *buf = read_object_with_reference(commit_sha1,
+ commit_type, &size, commit_sha1);
+ if (!buf || size < 46)
+ die("Not a valid commit: %s", p);
+ free(buf);
+ } else
+ die("Invalid ref name or SHA1 expression: %s", p);
+
+ if (inline_data) {
+ static struct strbuf buf = STRBUF_INIT;
+
+ if (p != uq.buf) {
+ strbuf_addstr(&uq, p);
+ p = uq.buf;
+ }
+ read_next_command();
+ parse_data(&buf);
+ store_object(OBJ_BLOB, &buf, &last_blob, sha1, 0);
+ } else if (oe) {
+ if (oe->type != OBJ_BLOB)
+ die("Not a blob (actually a %s): %s",
+ typename(oe->type), command_buf.buf);
+ } else {
+ enum object_type type = sha1_object_info(sha1, NULL);
+ if (type < 0)
+ die("Blob not found: %s", command_buf.buf);
+ if (type != OBJ_BLOB)
+ die("Not a blob (actually a %s): %s",
+ typename(type), command_buf.buf);
+ }
+
+ tree_content_set(&b->branch_tree, sha1_to_hex(commit_sha1), sha1,
+ S_IFREG | 0644, NULL);
+}
+
static void file_change_deleteall(struct branch *b)
{
release_tree_content_recursive(b->branch_tree.tree);
@@ -2172,6 +2250,8 @@ static void parse_new_commit(void)
file_change_cr(b, 1);
else if (!prefixcmp(command_buf.buf, "C "))
file_change_cr(b, 0);
+ else if (!prefixcmp(command_buf.buf, "N "))
+ note_change_n(b);
else if (!strcmp("deleteall", command_buf.buf))
file_change_deleteall(b);
else {
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 821be7c..b49815d 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -1088,4 +1088,170 @@ INPUT_END
test_expect_success 'P: fail on blob mark in gitlink' '
test_must_fail git fast-import <input'
+###
+### series Q (notes)
+###
+
+note1_data="Note for the first commit"
+note2_data="Note for the second commit"
+note3_data="Note for the third commit"
+
+test_tick
+cat >input <<INPUT_END
+blob
+mark :2
+data <<EOF
+$file2_data
+EOF
+
+commit refs/heads/notes-test
+mark :3
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+first (:3)
+COMMIT
+
+M 644 :2 file2
+
+blob
+mark :4
+data $file4_len
+$file4_data
+commit refs/heads/notes-test
+mark :5
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+second (:5)
+COMMIT
+
+M 644 :4 file4
+
+commit refs/heads/notes-test
+mark :6
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+third (:6)
+COMMIT
+
+M 644 inline file5
+data <<EOF
+$file5_data
+EOF
+
+M 755 inline file6
+data <<EOF
+$file6_data
+EOF
+
+blob
+mark :7
+data <<EOF
+$note1_data
+EOF
+
+blob
+mark :8
+data <<EOF
+$note2_data
+EOF
+
+commit refs/notes/foobar
+mark :9
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+notes (:9)
+COMMIT
+
+N :7 :3
+N :8 :5
+N inline :6
+data <<EOF
+$note3_data
+EOF
+
+INPUT_END
+test_expect_success \
+ 'Q: commit notes' \
+ 'git fast-import <input &&
+ git whatchanged notes-test'
+test_expect_success \
+ 'Q: verify pack' \
+ 'for p in .git/objects/pack/*.pack;do git verify-pack $p||exit;done'
+
+commit1=$(git rev-parse notes-test~2)
+commit2=$(git rev-parse notes-test^)
+commit3=$(git rev-parse notes-test)
+
+cat >expect <<EOF
+author $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+
+first (:3)
+EOF
+test_expect_success \
+ 'Q: verify first commit' \
+ 'git cat-file commit notes-test~2 | sed 1d >actual &&
+ test_cmp expect actual'
+
+cat >expect <<EOF
+parent $commit1
+author $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+
+second (:5)
+EOF
+test_expect_success \
+ 'Q: verify second commit' \
+ 'git cat-file commit notes-test^ | sed 1d >actual &&
+ test_cmp expect actual'
+
+cat >expect <<EOF
+parent $commit2
+author $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+
+third (:6)
+EOF
+test_expect_success \
+ 'Q: verify third commit' \
+ 'git cat-file commit notes-test | sed 1d >actual &&
+ test_cmp expect actual'
+
+cat >expect <<EOF
+author $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+
+notes (:9)
+EOF
+test_expect_success \
+ 'Q: verify notes commit' \
+ 'git cat-file commit refs/notes/foobar | sed 1d >actual &&
+ test_cmp expect actual'
+
+cat >expect.unsorted <<EOF
+100644 blob $commit1
+100644 blob $commit2
+100644 blob $commit3
+EOF
+cat expect.unsorted | sort >expect
+test_expect_success \
+ 'Q: verify notes tree' \
+ 'git cat-file -p refs/notes/foobar^{tree} | sed "s/ [0-9a-f]* / /" >actual &&
+ test_cmp expect actual'
+
+echo "$note1_data" >expect
+test_expect_success \
+ 'Q: verify note for first commit' \
+ 'git cat-file blob refs/notes/foobar:$commit1 >actual && test_cmp expect actual'
+
+echo "$note2_data" >expect
+test_expect_success \
+ 'Q: verify note for second commit' \
+ 'git cat-file blob refs/notes/foobar:$commit2 >actual && test_cmp expect actual'
+
+echo "$note3_data" >expect
+test_expect_success \
+ 'Q: verify note for third commit' \
+ 'git cat-file blob refs/notes/foobar:$commit3 >actual && test_cmp expect actual'
+
test_done
--
1.6.4.304.g1365c.dirty
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCHv4 07/12] t3302-notes-index-expensive: Speed up create_repo()
2009-08-27 1:43 [PATCHv4 00/12] git notes Johan Herland
` (5 preceding siblings ...)
2009-08-27 1:43 ` [PATCHv4 06/12] fast-import: Add support for importing commit notes Johan Herland
@ 2009-08-27 1:43 ` Johan Herland
2009-08-27 1:43 ` [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes Johan Herland
` (4 subsequent siblings)
11 siblings, 0 replies; 38+ messages in thread
From: Johan Herland @ 2009-08-27 1:43 UTC (permalink / raw)
To: gitster
Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
spearce
Creating repos with 10/100/1000/10000 commits and notes takes a lot of time.
However, using git-fast-import to do the job is a lot more efficient than
using plumbing commands to do the same.
This patch decreases the overall run-time of this test on my machine from
~3 to ~1 minutes.
Signed-off-by: Johan Herland <johan@herland.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
t/t3302-notes-index-expensive.sh | 74 ++++++++++++++++++++++++--------------
1 files changed, 47 insertions(+), 27 deletions(-)
diff --git a/t/t3302-notes-index-expensive.sh b/t/t3302-notes-index-expensive.sh
index 0ef3e95..ee84fc4 100755
--- a/t/t3302-notes-index-expensive.sh
+++ b/t/t3302-notes-index-expensive.sh
@@ -16,30 +16,50 @@ test -z "$GIT_NOTES_TIMING_TESTS" && {
create_repo () {
number_of_commits=$1
nr=0
- parent=
test -d .git || {
git init &&
- tree=$(git write-tree) &&
- while [ $nr -lt $number_of_commits ]; do
- test_tick &&
- commit=$(echo $nr | git commit-tree $tree $parent) ||
- return
- parent="-p $commit"
- nr=$(($nr+1))
- done &&
- git update-ref refs/heads/master $commit &&
- {
- GIT_INDEX_FILE=.git/temp; export GIT_INDEX_FILE;
- git rev-list HEAD | cat -n | sed "s/^[ ][ ]*/ /g" |
- while read nr sha1; do
- blob=$(echo note $nr | git hash-object -w --stdin) &&
- echo $sha1 | sed "s/^/0644 $blob 0 /"
- done | git update-index --index-info &&
- tree=$(git write-tree) &&
+ (
+ while [ $nr -lt $number_of_commits ]; do
+ nr=$(($nr+1))
+ mark=$(($nr+$nr))
+ notemark=$(($mark+1))
+ test_tick &&
+ cat <<INPUT_END &&
+commit refs/heads/master
+mark :$mark
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit #$nr
+COMMIT
+
+M 644 inline file
+data <<EOF
+file in commit #$nr
+EOF
+
+blob
+mark :$notemark
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+ echo "N :$notemark :$mark" >> note_commit
+ done &&
test_tick &&
- commit=$(echo notes | git commit-tree $tree) &&
- git update-ref refs/notes/commits $commit
- } &&
+ cat <<INPUT_END &&
+commit refs/notes/commits
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+notes
+COMMIT
+
+INPUT_END
+
+ cat note_commit
+ ) |
+ git fast-import --quiet &&
git config core.notesRef refs/notes/commits
}
}
@@ -48,13 +68,13 @@ test_notes () {
count=$1 &&
git config core.notesRef refs/notes/commits &&
git log | grep "^ " > output &&
- i=1 &&
- while [ $i -le $count ]; do
- echo " $(($count-$i))" &&
- echo " note $i" &&
- i=$(($i+1));
+ i=$count &&
+ while [ $i -gt 0 ]; do
+ echo " commit #$i" &&
+ echo " note for commit #$i" &&
+ i=$(($i-1));
done > expect &&
- git diff expect output
+ test_cmp expect output
}
cat > time_notes << \EOF
--
1.6.4.304.g1365c.dirty
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-27 1:43 [PATCHv4 00/12] git notes Johan Herland
` (6 preceding siblings ...)
2009-08-27 1:43 ` [PATCHv4 07/12] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
@ 2009-08-27 1:43 ` Johan Herland
2009-08-27 5:00 ` Junio C Hamano
2009-08-27 1:43 ` [PATCHv4 09/12] Selftests verifying semantics when loading notes trees with various fanouts Johan Herland
` (3 subsequent siblings)
11 siblings, 1 reply; 38+ messages in thread
From: Johan Herland @ 2009-08-27 1:43 UTC (permalink / raw)
To: gitster
Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
spearce, Johannes Schindelin
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
---
notes.c | 294 +++++++++++++++++++++++++++++++++++++++++++++++++--------------
1 files changed, 228 insertions(+), 66 deletions(-)
diff --git a/notes.c b/notes.c
index 9172154..a637056 100644
--- a/notes.c
+++ b/notes.c
@@ -6,103 +6,265 @@
#include "strbuf.h"
#include "tree-walk.h"
-struct entry {
- unsigned char commit_sha1[20];
- unsigned char notes_sha1[20];
+/*
+ * Use a non-balancing simple 16-tree structure with struct int_node as
+ * internal nodes, and struct leaf_node as leaf nodes. Each int_node has a
+ * 16-array of pointers to its children.
+ * The bottom 2 bits of each pointer is used to identify the pointer type
+ * - ptr & 3 == 0 - NULL pointer, assert(ptr == NULL)
+ * - ptr & 3 == 1 - pointer to next internal node - cast to struct int_node *
+ * - ptr & 3 == 2 - pointer to note entry - cast to struct leaf_node *
+ * - ptr & 3 == 3 - pointer to subtree entry - cast to struct leaf_node *
+ *
+ * The root node is a statically allocated struct int_node.
+ */
+struct int_node {
+ void *a[16];
};
-struct hash_map {
- struct entry *entries;
- off_t count, size;
+/*
+ * Leaf nodes come in two variants, note entries and subtree entries,
+ * distinguished by the LSb of the leaf node pointer (see above).
+ * As a note entry, the key is the SHA1 of the referenced commit, and the
+ * value is the SHA1 of the note object.
+ * As a subtree entry, the key is the prefix SHA1 (w/trailing NULs) of the
+ * referenced commit, using the last byte of the key to store the length of
+ * the prefix. The value is the SHA1 of the tree object containing the notes
+ * subtree.
+ */
+struct leaf_node {
+ unsigned char key_sha1[20];
+ unsigned char val_sha1[20];
};
+#define PTR_TYPE_NULL 0
+#define PTR_TYPE_INTERNAL 1
+#define PTR_TYPE_NOTE 2
+#define PTR_TYPE_SUBTREE 3
+
+#define GET_PTR_TYPE(ptr) ((uintptr_t) (ptr) & 3)
+#define CLR_PTR_TYPE(ptr) ((void *) ((uintptr_t) (ptr) & ~3))
+#define SET_PTR_TYPE(ptr, type) ((void *) ((uintptr_t) (ptr) | (type)))
+
+#define GET_NIBBLE(n, sha1) (((sha1[n >> 1]) >> ((n & 0x01) << 2)) & 0x0f)
+
+#define MATCHING_SUBTREE(key_sha1, subtree_sha1) \
+ (!memcmp(key_sha1, subtree_sha1, subtree_sha1[19]))
+
static int initialized;
-static struct hash_map hash_map;
-static int hash_index(struct hash_map *map, const unsigned char *sha1)
-{
- int i = ((*(unsigned int *)sha1) % map->size);
+static struct int_node root_node;
- for (;;) {
- unsigned char *current = map->entries[i].commit_sha1;
+static void load_subtree(struct leaf_node *subtree, struct int_node *node,
+ unsigned int n);
- if (!hashcmp(sha1, current))
- return i;
+/*
+ * To find a leaf_node:
+ * 1. Start at the root node, with n = 0
+ * 2. Use the nth nibble of the key as an index into a:
+ * - If a[n] is an int_node, recurse into that node and increment n
+ * - If a leaf_node with matching key, return leaf_node (assert note entry)
+ * - If a matching subtree entry, unpack that subtree entry (and remove it);
+ * restart search at the current level.
+ * - Otherwise, we end up at a NULL pointer, or a non-matching leaf_node.
+ * Backtrack out of the recursion, one level at a time and check a[0]:
+ * - If a[0] at the current level is a matching subtree entry, unpack that
+ * subtree entry (and remove it); restart search at the current level.
+ */
+static struct leaf_node *note_tree_find(struct int_node *tree, unsigned char n,
+ const unsigned char *key_sha1)
+{
+ struct leaf_node *l;
+ unsigned char i = GET_NIBBLE(n, key_sha1);
+ void *p = tree->a[i];
- if (is_null_sha1(current))
- return -1 - i;
+ switch(GET_PTR_TYPE(p)) {
+ case PTR_TYPE_INTERNAL:
+ l = note_tree_find(CLR_PTR_TYPE(p), n + 1, key_sha1);
+ if (l)
+ return l;
+ break;
+ case PTR_TYPE_NOTE:
+ l = (struct leaf_node *) CLR_PTR_TYPE(p);
+ if (!hashcmp(key_sha1, l->key_sha1))
+ return l; /* return note object matching given key */
+ break;
+ case PTR_TYPE_SUBTREE:
+ l = (struct leaf_node *) CLR_PTR_TYPE(p);
+ if (MATCHING_SUBTREE(key_sha1, l->key_sha1)) {
+ /* unpack tree and resume search */
+ tree->a[i] = NULL;
+ load_subtree(l, tree, n);
+ free(l);
+ return note_tree_find(tree, n, key_sha1);
+ }
+ break;
+ case PTR_TYPE_NULL:
+ default:
+ assert(!p);
+ break;
+ }
- if (++i == map->size)
- i = 0;
+ /*
+ * Did not find key at this (or any lower) level.
+ * Check if there's a matching subtree entry in tree->a[0].
+ * If so, unpack tree and resume search.
+ */
+ p = tree->a[0];
+ if (GET_PTR_TYPE(p) != PTR_TYPE_SUBTREE)
+ return NULL;
+ l = (struct leaf_node *) CLR_PTR_TYPE(p);
+ if (MATCHING_SUBTREE(key_sha1, l->key_sha1)) {
+ /* unpack tree and resume search */
+ tree->a[0] = NULL;
+ load_subtree(l, tree, n);
+ free(l);
+ return note_tree_find(tree, n, key_sha1);
}
+ return NULL;
}
-static void add_entry(const unsigned char *commit_sha1,
- const unsigned char *notes_sha1)
+/*
+ * To insert a leaf_node:
+ * 1. Start at the root node, with n = 0
+ * 2. Use the nth nibble of the key as an index into a:
+ * - If a[n] is NULL, store the tweaked pointer directly into a[n]
+ * - If a[n] is an int_node, recurse into that node and increment n
+ * - If a[n] is a leaf_node:
+ * 1. Check if they're equal, and handle that (abort? overwrite?)
+ * 2. Create a new int_node, and store both leaf_nodes there
+ * 3. Store the new int_node into a[n].
+ */
+static int note_tree_insert(struct int_node *tree, unsigned char n,
+ const struct leaf_node *entry, unsigned char type)
{
- int index;
-
- if (hash_map.count + 1 > hash_map.size >> 1) {
- int i, old_size = hash_map.size;
- struct entry *old = hash_map.entries;
-
- hash_map.size = old_size ? old_size << 1 : 64;
- hash_map.entries = (struct entry *)
- xcalloc(sizeof(struct entry), hash_map.size);
-
- for (i = 0; i < old_size; i++)
- if (!is_null_sha1(old[i].commit_sha1)) {
- index = -1 - hash_index(&hash_map,
- old[i].commit_sha1);
- memcpy(hash_map.entries + index, old + i,
- sizeof(struct entry));
- }
- free(old);
+ struct int_node *new_node;
+ const struct leaf_node *l;
+ int ret;
+ unsigned char i = GET_NIBBLE(n, entry->key_sha1);
+ void *p = tree->a[i];
+ assert(GET_PTR_TYPE(entry) == PTR_TYPE_NULL);
+ switch(GET_PTR_TYPE(p)) {
+ case PTR_TYPE_NULL:
+ assert(!p);
+ tree->a[i] = SET_PTR_TYPE(entry, type);
+ return 0;
+ case PTR_TYPE_INTERNAL:
+ return note_tree_insert(CLR_PTR_TYPE(p), n + 1, entry, type);
+ default:
+ assert(GET_PTR_TYPE(p) == PTR_TYPE_NOTE ||
+ GET_PTR_TYPE(p) == PTR_TYPE_SUBTREE);
+ l = (const struct leaf_node *) CLR_PTR_TYPE(p);
+ if (!hashcmp(entry->key_sha1, l->key_sha1))
+ return -1; /* abort insert on matching key */
+ new_node = (struct int_node *)
+ xcalloc(sizeof(struct int_node), 1);
+ ret = note_tree_insert(new_node, n + 1,
+ CLR_PTR_TYPE(p), GET_PTR_TYPE(p));
+ if (ret) {
+ free(new_node);
+ return -1;
+ }
+ tree->a[i] = SET_PTR_TYPE(new_node, PTR_TYPE_INTERNAL);
+ return note_tree_insert(new_node, n + 1, entry, type);
}
+}
- index = hash_index(&hash_map, commit_sha1);
- if (index < 0) {
- index = -1 - index;
- hash_map.count++;
+/*
+ * Convert a partial SHA1 hex string to the corresponding partial SHA1 value.
+ * - hex - Partial SHA1 segment in ASCII hex format
+ * - hex_len - Length of above segment. Must be multiple of 2 between 0 and 40
+ * - sha1 - Partial SHA1 value is written here
+ * - sha1_len - Max #bytes to store in sha1, Must be >= hex_len / 2, and < 20
+ * Returns -1 on error (invalid arguments or invalid SHA1 (not in hex format).
+ * Otherwise, returns number of bytes written to sha1 (i.e. hex_len / 2).
+ * Pads sha1 with NULs up to sha1_len (not included in returned length).
+ */
+static int get_sha1_hex_segment(const char *hex, unsigned int hex_len,
+ unsigned char *sha1, unsigned int sha1_len)
+{
+ unsigned int i, len = hex_len >> 1;
+ if (hex_len % 2 != 0 || len > sha1_len)
+ return -1;
+ for (i = 0; i < len; i++) {
+ unsigned int val = (hexval(hex[0]) << 4) | hexval(hex[1]);
+ if (val & ~0xff)
+ return -1;
+ *sha1++ = val;
+ hex += 2;
}
+ for (; i < sha1_len; i++)
+ *sha1++ = 0;
+ return len;
+}
+
+static void load_subtree(struct leaf_node *subtree, struct int_node *node,
+ unsigned int n)
+{
+ unsigned char commit_sha1[20];
+ unsigned int prefix_len;
+ int status;
+ void *buf;
+ struct tree_desc desc;
+ struct name_entry entry;
+
+ buf = fill_tree_descriptor(&desc, subtree->val_sha1);
+ if (!buf)
+ die("Could not read %s for notes-index",
+ sha1_to_hex(subtree->val_sha1));
+
+ prefix_len = subtree->key_sha1[19];
+ assert(prefix_len * 2 >= n);
+ memcpy(commit_sha1, subtree->key_sha1, prefix_len);
+ while (tree_entry(&desc, &entry)) {
+ int len = get_sha1_hex_segment(entry.path, strlen(entry.path),
+ commit_sha1 + prefix_len, 20 - prefix_len);
+ if (len < 0)
+ continue; /* entry.path is not a SHA1 sum. Skip */
+ len += prefix_len;
- hashcpy(hash_map.entries[index].commit_sha1, commit_sha1);
- hashcpy(hash_map.entries[index].notes_sha1, notes_sha1);
+ /*
+ * If commit SHA1 is complete (len == 20), assume note object
+ * If commit SHA1 is incomplete (len < 20), assume note subtree
+ */
+ if (len <= 20) {
+ unsigned char type = PTR_TYPE_NOTE;
+ struct leaf_node *l = (struct leaf_node *)
+ xcalloc(sizeof(struct leaf_node), 1);
+ hashcpy(l->key_sha1, commit_sha1);
+ hashcpy(l->val_sha1, entry.sha1);
+ if (len < 20) {
+ l->key_sha1[19] = (unsigned char) len;
+ type = PTR_TYPE_SUBTREE;
+ }
+ status = note_tree_insert(node, n, l, type);
+ assert(!status);
+ }
+ }
+ free(buf);
}
-static void initialize_hash_map(const char *notes_ref_name)
+static void initialize_notes(const char *notes_ref_name)
{
unsigned char sha1[20], commit_sha1[20];
unsigned mode;
- struct tree_desc desc;
- struct name_entry entry;
- void *buf;
+ struct leaf_node root_tree;
if (!notes_ref_name || read_ref(notes_ref_name, commit_sha1) ||
get_tree_entry(commit_sha1, "", sha1, &mode))
return;
- buf = fill_tree_descriptor(&desc, sha1);
- if (!buf)
- die("Could not read %s for notes-index", sha1_to_hex(sha1));
-
- while (tree_entry(&desc, &entry))
- if (!get_sha1(entry.path, commit_sha1))
- add_entry(commit_sha1, entry.sha1);
- free(buf);
+ hashclr(root_tree.key_sha1);
+ hashcpy(root_tree.val_sha1, sha1);
+ load_subtree(&root_tree, &root_node, 0);
}
static unsigned char *lookup_notes(const unsigned char *commit_sha1)
{
- int index;
-
- if (!hash_map.size)
- return NULL;
-
- index = hash_index(&hash_map, commit_sha1);
- if (index < 0)
- return NULL;
- return hash_map.entries[index].notes_sha1;
+ struct leaf_node *found = note_tree_find(&root_node, 0, commit_sha1);
+ if (found)
+ return found->val_sha1;
+ return NULL;
}
void get_commit_notes(const struct commit *commit, struct strbuf *sb,
@@ -120,7 +282,7 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT);
else if (!notes_ref_name)
notes_ref_name = GIT_NOTES_DEFAULT_REF;
- initialize_hash_map(notes_ref_name);
+ initialize_notes(notes_ref_name);
initialized = 1;
}
--
1.6.4.304.g1365c.dirty
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCHv4 09/12] Selftests verifying semantics when loading notes trees with various fanouts
2009-08-27 1:43 [PATCHv4 00/12] git notes Johan Herland
` (7 preceding siblings ...)
2009-08-27 1:43 ` [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes Johan Herland
@ 2009-08-27 1:43 ` Johan Herland
2009-08-27 1:43 ` [PATCHv4 10/12] notes.c: Implement simple memory pooling of leaf nodes Johan Herland
` (2 subsequent siblings)
11 siblings, 0 replies; 38+ messages in thread
From: Johan Herland @ 2009-08-27 1:43 UTC (permalink / raw)
To: gitster
Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
spearce
Add selftests verifying:
- that we are able to parse notes trees with various fanout schemes
- that notes trees with conflicting fanout schemes are parsed as expected
Signed-off-by: Johan Herland <johan@herland.net>
---
t/t3303-notes-subtrees.sh | 206 +++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 206 insertions(+), 0 deletions(-)
create mode 100755 t/t3303-notes-subtrees.sh
diff --git a/t/t3303-notes-subtrees.sh b/t/t3303-notes-subtrees.sh
new file mode 100755
index 0000000..40bb3f4
--- /dev/null
+++ b/t/t3303-notes-subtrees.sh
@@ -0,0 +1,206 @@
+#!/bin/sh
+
+test_description='Test commit notes organized in subtrees'
+
+. ./test-lib.sh
+
+number_of_commits=100
+
+start_note_commit () {
+ test_tick &&
+ cat <<INPUT_END
+commit refs/notes/commits
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+notes
+COMMIT
+
+from refs/notes/commits^0
+deleteall
+INPUT_END
+
+}
+
+verify_notes () {
+ git log | grep "^ " > output &&
+ i=$number_of_commits &&
+ while [ $i -gt 0 ]; do
+ echo " commit #$i" &&
+ echo " note for commit #$i" &&
+ i=$(($i-1));
+ done > expect &&
+ test_cmp expect output
+}
+
+test_expect_success 'setup: create $number_of_commits commits' '
+
+ (
+ nr=0 &&
+ while [ $nr -lt $number_of_commits ]; do
+ nr=$(($nr+1)) &&
+ test_tick &&
+ cat <<INPUT_END
+commit refs/heads/master
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit #$nr
+COMMIT
+
+M 644 inline file
+data <<EOF
+file in commit #$nr
+EOF
+
+INPUT_END
+
+ done &&
+ test_tick &&
+ cat <<INPUT_END
+commit refs/notes/commits
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+no notes
+COMMIT
+
+deleteall
+
+INPUT_END
+
+ ) |
+ git fast-import --quiet &&
+ git config core.notesRef refs/notes/commits
+'
+
+test_expect_success 'test notes in 2/38-fanout' '
+
+ (
+ start_note_commit &&
+ nr=$number_of_commits &&
+ git rev-list refs/heads/master |
+ while read sha1; do
+ note_path=$(echo "$sha1" | sed "s|^..|&/|")
+ cat <<INPUT_END &&
+M 100644 inline $note_path
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+ nr=$(($nr-1))
+ done
+ ) |
+ git fast-import --quiet
+'
+
+test_expect_success 'verify notes in 2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in 4/36-fanout' '
+
+ (
+ start_note_commit &&
+ nr=$number_of_commits &&
+ git rev-list refs/heads/master |
+ while read sha1; do
+ note_path=$(echo "$sha1" | sed "s|^....|&/|")
+ cat <<INPUT_END &&
+M 100644 inline $note_path
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+ nr=$(($nr-1))
+ done
+ ) |
+ git fast-import --quiet
+'
+
+test_expect_success 'verify notes in 4/36-fanout' 'verify_notes'
+
+test_expect_success 'test notes in 4/36-fanout overriding 2/38-fanout' '
+
+ (
+ start_note_commit &&
+ nr=$number_of_commits &&
+ git rev-list refs/heads/master |
+ while read sha1; do
+ ignored_note_path=$(echo "$sha1" | sed "s|^..|&/|")
+ preferred_note_path=$(echo "$sha1" | sed "s|^....|&/|")
+ cat <<INPUT_END &&
+M 100644 inline $ignored_note_path
+data <<EOF
+IGNORED note for commit #$nr
+EOF
+
+M 100644 inline $preferred_note_path
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+ nr=$(($nr-1))
+ done
+ ) |
+ git fast-import --quiet
+'
+
+test_expect_success 'verify notes in 4/36-fanout overriding 2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in 2/2/36-fanout' '
+
+ (
+ start_note_commit &&
+ nr=$number_of_commits &&
+ git rev-list refs/heads/master |
+ while read sha1; do
+ note_path=$(echo "$sha1" | sed "s|^\(..\)\(..\)|\1/\2/|")
+ cat <<INPUT_END &&
+M 100644 inline $note_path
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+ nr=$(($nr-1))
+ done
+ ) |
+ git fast-import --quiet
+'
+
+test_expect_success 'verify notes in 2/2/36-fanout' 'verify_notes'
+
+test_expect_success 'test notes in 2/38-fanout overriding 2/2/36-fanout' '
+
+ (
+ start_note_commit &&
+ nr=$number_of_commits &&
+ git rev-list refs/heads/master |
+ while read sha1; do
+ ignored_note_path=$(echo "$sha1" | sed "s|^\(..\)\(..\)|\1/\2/|")
+ preferred_note_path=$(echo "$sha1" | sed "s|^..|&/|")
+ cat <<INPUT_END &&
+M 100644 inline $ignored_note_path
+data <<EOF
+IGNORED note for commit #$nr
+EOF
+
+M 100644 inline $preferred_note_path
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+ nr=$(($nr-1))
+ done
+ ) |
+ git fast-import --quiet
+'
+
+test_expect_success 'verify notes in 2/38-fanout overriding 2/2/36-fanout' 'verify_notes'
+
+test_done
--
1.6.4.304.g1365c.dirty
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCHv4 10/12] notes.c: Implement simple memory pooling of leaf nodes
2009-08-27 1:43 [PATCHv4 00/12] git notes Johan Herland
` (8 preceding siblings ...)
2009-08-27 1:43 ` [PATCHv4 09/12] Selftests verifying semantics when loading notes trees with various fanouts Johan Herland
@ 2009-08-27 1:43 ` Johan Herland
2009-08-27 7:39 ` Alex Riesen
2009-08-27 1:43 ` [PATCHv4 11/12] Add flags to get_commit_notes() to control the format of the note string Johan Herland
2009-08-27 1:43 ` [PATCHv4 12/12] Add '%N'-format for pretty-printing commit notes Johan Herland
11 siblings, 1 reply; 38+ messages in thread
From: Johan Herland @ 2009-08-27 1:43 UTC (permalink / raw)
To: gitster
Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
spearce
Allocate page-sized chunks for holding struct leaf_node objects.
This slightly (but consistently) improves runtime performance of notes
lookup, at a very slight increase (~2K on average) in memory usage.
When allocating a new memory pool, the older pool is leaked, but this is
no worse than the current situation, where (pretty much) all leaf_nodes
are leaked anyway.
Signed-off-by: Johan Herland <johan@herland.net>
---
notes.c | 22 ++++++++++++++++++----
1 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/notes.c b/notes.c
index a637056..5d1ee17 100644
--- a/notes.c
+++ b/notes.c
@@ -55,6 +55,23 @@ static int initialized;
static struct int_node root_node;
+/* leaf nodes are allocated in simple memory pools */
+#define LEAF_NODE_POOL_SIZE 100
+static struct leaf_node *leaf_node_pool;
+static unsigned int leaf_node_pool_used;
+
+
+static struct leaf_node *new_leaf_node()
+{
+ if (!leaf_node_pool || leaf_node_pool_used >= LEAF_NODE_POOL_SIZE) {
+ /* MEMORY LEAK: */
+ leaf_node_pool = (struct leaf_node *)
+ xcalloc(sizeof(struct leaf_node), LEAF_NODE_POOL_SIZE);
+ leaf_node_pool_used = 0;
+ }
+ return leaf_node_pool + leaf_node_pool_used++;
+}
+
static void load_subtree(struct leaf_node *subtree, struct int_node *node,
unsigned int n);
@@ -95,7 +112,6 @@ static struct leaf_node *note_tree_find(struct int_node *tree, unsigned char n,
/* unpack tree and resume search */
tree->a[i] = NULL;
load_subtree(l, tree, n);
- free(l);
return note_tree_find(tree, n, key_sha1);
}
break;
@@ -118,7 +134,6 @@ static struct leaf_node *note_tree_find(struct int_node *tree, unsigned char n,
/* unpack tree and resume search */
tree->a[0] = NULL;
load_subtree(l, tree, n);
- free(l);
return note_tree_find(tree, n, key_sha1);
}
return NULL;
@@ -229,8 +244,7 @@ static void load_subtree(struct leaf_node *subtree, struct int_node *node,
*/
if (len <= 20) {
unsigned char type = PTR_TYPE_NOTE;
- struct leaf_node *l = (struct leaf_node *)
- xcalloc(sizeof(struct leaf_node), 1);
+ struct leaf_node *l = new_leaf_node();
hashcpy(l->key_sha1, commit_sha1);
hashcpy(l->val_sha1, entry.sha1);
if (len < 20) {
--
1.6.4.304.g1365c.dirty
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCHv4 11/12] Add flags to get_commit_notes() to control the format of the note string
2009-08-27 1:43 [PATCHv4 00/12] git notes Johan Herland
` (9 preceding siblings ...)
2009-08-27 1:43 ` [PATCHv4 10/12] notes.c: Implement simple memory pooling of leaf nodes Johan Herland
@ 2009-08-27 1:43 ` Johan Herland
2009-08-27 1:43 ` [PATCHv4 12/12] Add '%N'-format for pretty-printing commit notes Johan Herland
11 siblings, 0 replies; 38+ messages in thread
From: Johan Herland @ 2009-08-27 1:43 UTC (permalink / raw)
To: gitster
Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
spearce
This patch adds the following flags to get_commit_notes() for adjusting the
format of the produced note string:
- NOTES_SHOW_HEADER: Print "Notes:" line before the notes contents
- NOTES_INDENT: Indent notes contents by 4 spaces
Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
---
notes.c | 8 +++++---
notes.h | 5 ++++-
pretty.c | 3 ++-
3 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/notes.c b/notes.c
index 5d1ee17..bf520ae 100644
--- a/notes.c
+++ b/notes.c
@@ -282,7 +282,7 @@ static unsigned char *lookup_notes(const unsigned char *commit_sha1)
}
void get_commit_notes(const struct commit *commit, struct strbuf *sb,
- const char *output_encoding)
+ const char *output_encoding, int flags)
{
static const char utf8[] = "utf-8";
unsigned char *sha1;
@@ -322,12 +322,14 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
if (msglen && msg[msglen - 1] == '\n')
msglen--;
- strbuf_addstr(sb, "\nNotes:\n");
+ if (flags & NOTES_SHOW_HEADER)
+ strbuf_addstr(sb, "\nNotes:\n");
for (msg_p = msg; msg_p < msg + msglen; msg_p += linelen + 1) {
linelen = strchrnul(msg_p, '\n') - msg_p;
- strbuf_addstr(sb, " ");
+ if (flags & NOTES_INDENT)
+ strbuf_addstr(sb, " ");
strbuf_add(sb, msg_p, linelen);
strbuf_addch(sb, '\n');
}
diff --git a/notes.h b/notes.h
index 79d21b6..7f3eed4 100644
--- a/notes.h
+++ b/notes.h
@@ -1,7 +1,10 @@
#ifndef NOTES_H
#define NOTES_H
+#define NOTES_SHOW_HEADER 1
+#define NOTES_INDENT 2
+
void get_commit_notes(const struct commit *commit, struct strbuf *sb,
- const char *output_encoding);
+ const char *output_encoding, int flags);
#endif
diff --git a/pretty.c b/pretty.c
index e25db81..01eadd0 100644
--- a/pretty.c
+++ b/pretty.c
@@ -978,7 +978,8 @@ void pretty_print_commit(enum cmit_fmt fmt, const struct commit *commit,
strbuf_addch(sb, '\n');
if (fmt != CMIT_FMT_ONELINE)
- get_commit_notes(commit, sb, encoding);
+ get_commit_notes(commit, sb, encoding,
+ NOTES_SHOW_HEADER | NOTES_INDENT);
free(reencoded);
}
--
1.6.4.304.g1365c.dirty
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCHv4 12/12] Add '%N'-format for pretty-printing commit notes
2009-08-27 1:43 [PATCHv4 00/12] git notes Johan Herland
` (10 preceding siblings ...)
2009-08-27 1:43 ` [PATCHv4 11/12] Add flags to get_commit_notes() to control the format of the note string Johan Herland
@ 2009-08-27 1:43 ` Johan Herland
11 siblings, 0 replies; 38+ messages in thread
From: Johan Herland @ 2009-08-27 1:43 UTC (permalink / raw)
To: gitster
Cc: git, johan, Johannes.Schindelin, trast, tavestbo, git, chriscool,
spearce
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
---
Documentation/pretty-formats.txt | 1 +
pretty.c | 4 ++++
2 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
index 2a845b1..5fb10b3 100644
--- a/Documentation/pretty-formats.txt
+++ b/Documentation/pretty-formats.txt
@@ -123,6 +123,7 @@ The placeholders are:
- '%s': subject
- '%f': sanitized subject line, suitable for a filename
- '%b': body
+- '%N': commit notes
- '%Cred': switch color to red
- '%Cgreen': switch color to green
- '%Cblue': switch color to blue
diff --git a/pretty.c b/pretty.c
index 01eadd0..7f350bb 100644
--- a/pretty.c
+++ b/pretty.c
@@ -702,6 +702,10 @@ static size_t format_commit_item(struct strbuf *sb, const char *placeholder,
case 'd':
format_decoration(sb, commit);
return 1;
+ case 'N':
+ get_commit_notes(commit, sb, git_log_output_encoding ?
+ git_log_output_encoding : git_commit_encoding, 0);
+ return 1;
}
/* For the rest we have to parse the commit header. */
--
1.6.4.304.g1365c.dirty
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-27 1:43 ` [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes Johan Herland
@ 2009-08-27 5:00 ` Junio C Hamano
2009-08-27 9:35 ` Johan Herland
2009-08-27 10:42 ` Johannes Schindelin
0 siblings, 2 replies; 38+ messages in thread
From: Junio C Hamano @ 2009-08-27 5:00 UTC (permalink / raw)
To: Johan Herland
Cc: git, Johannes.Schindelin, trast, tavestbo, git, chriscool,
spearce
Johan Herland <johan@herland.net> writes:
> The semantics used when parsing notes trees (with regards to fanout subtrees)
> follow Dscho's proposal fairly closely:
> - No concatenation/merging of notes is performed. If there are several notes
> objects referencing a given commit, only one of those objects are used.
> - If a notes object for a given commit is present in the "root" notes tree,
> no subtrees are consulted; the object in the root tree is used directly.
> - If there are more than one subtree that prefix-matches the given commit,
> only the subtree with the longest matching prefix is consulted. This
> means that if the given commit is e.g. "deadbeef", and the notes tree have
> subtrees "de" and "dead", then the following paths in the notes tree are
> searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
> - Fanout directories (subtrees) must references a whole number of bytes
> from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
> acceptable; "d" and "dea" are not.
> - Multiple levels of fanout are allowed. All the above rules apply
> recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
If I am reading this correctly, the earlier parts of the series were
aiming to let multiple people to add notes to the same commit more or less
uncordinated while still allowing to merge them sensibly, but now such a
workflow becomes impossible with this change.
The above claims notes trees with different levels of fan-out are allowed,
but what it really means is that merging notes trees with different levels
of fan-out will produce a useless result that records notes for the same
commit in different blobs all over the notes tree, and asking the notes
mechanism to give the notes for one commit will give only one piece that
originates in the tree whose creator happened to have used the longest
prefix while ignoring all others. It may _allow_ such a layout, but how
would such semantics be useful in the first place?
I suspect that I am missing something but my gut feeling is that this
change turns an interesting hack (even though it might be expensive) into
a hack that is not useful at all in the real world, without some order,
discipline, or guideline is applied.
What's the recommended way to work with this system from the end user's
point of view in a distirbuted environment? Somebody up in the project is
supposed to decide what fan-out is to be used for the whole project and
everybody should follow that structure? If a participant in the project
forgets that rule (or makes a mistake), a notes tree that mistakenly
merges his notes tree becomes practically useless? If so, perhaps we
would need a mechanism to avoid such a mistake from happening?
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 10/12] notes.c: Implement simple memory pooling of leaf nodes
2009-08-27 1:43 ` [PATCHv4 10/12] notes.c: Implement simple memory pooling of leaf nodes Johan Herland
@ 2009-08-27 7:39 ` Alex Riesen
2009-08-27 9:49 ` Johan Herland
0 siblings, 1 reply; 38+ messages in thread
From: Alex Riesen @ 2009-08-27 7:39 UTC (permalink / raw)
To: Johan Herland
Cc: gitster, git, Johannes.Schindelin, trast, tavestbo, git,
chriscool, spearce
On Thu, Aug 27, 2009 at 03:43, Johan Herland<johan@herland.net> wrote:
> When allocating a new memory pool, the older pool is leaked, but this is
> no worse than the current situation, where (pretty much) all leaf_nodes
> are leaked anyway.
Could you return the unused nodes back into tghe mempool?
By making the pool a preallocated list, perhaps?
And then it is trivial to provide a deallocation function for the mempool,
which something really concerned about the memleak can call (like when
or if libgit get more usable in an application context).
> @@ -95,7 +112,6 @@ static struct leaf_node *note_tree_find(struct int_node *tree, unsigned char n,
> /* unpack tree and resume search */
> tree->a[i] = NULL;
> load_subtree(l, tree, n);
> - free(l);
free_leaf_node(l), which returns the node into mempool
> return note_tree_find(tree, n, key_sha1);
> }
> break;
> @@ -118,7 +134,6 @@ static struct leaf_node *note_tree_find(struct int_node *tree, unsigned char n,
> /* unpack tree and resume search */
> tree->a[0] = NULL;
> load_subtree(l, tree, n);
> - free(l);
free_leaf_node(l);
> return note_tree_find(tree, n, key_sha1);
> }
> return NULL;
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-27 5:00 ` Junio C Hamano
@ 2009-08-27 9:35 ` Johan Herland
2009-08-27 10:47 ` Johannes Schindelin
2009-08-27 20:55 ` Junio C Hamano
2009-08-27 10:42 ` Johannes Schindelin
1 sibling, 2 replies; 38+ messages in thread
From: Johan Herland @ 2009-08-27 9:35 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Johannes.Schindelin, trast, tavestbo, git, chriscool,
spearce
On Thursday 27 August 2009, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > The semantics used when parsing notes trees (with regards to fanout
> > subtrees) follow Dscho's proposal fairly closely:
> > - No concatenation/merging of notes is performed. If there are several
> > notes objects referencing a given commit, only one of those objects are
> > used. - If a notes object for a given commit is present in the "root"
> > notes tree, no subtrees are consulted; the object in the root tree is
> > used directly. - If there are more than one subtree that prefix-matches
> > the given commit, only the subtree with the longest matching prefix is
> > consulted. This means that if the given commit is e.g. "deadbeef", and
> > the notes tree have subtrees "de" and "dead", then the following paths
> > in the notes tree are searched: "deadbeef", "dead/beef". Note that
> > "de/adbeef" is NOT searched. - Fanout directories (subtrees) must
> > references a whole number of bytes from the SHA1 sum they subdivide.
> > E.g. subtrees "dead" and "de" are acceptable; "d" and "dea" are not.
> > - Multiple levels of fanout are allowed. All the above rules apply
> > recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
>
> If I am reading this correctly, the earlier parts of the series were
> aiming to let multiple people to add notes to the same commit more or
> less uncordinated while still allowing to merge them sensibly, but now
> such a workflow becomes impossible with this change.
>
> The above claims notes trees with different levels of fan-out are
> allowed, but what it really means is that merging notes trees with
> different levels of fan-out will produce a useless result that records
> notes for the same commit in different blobs all over the notes tree, and
> asking the notes mechanism to give the notes for one commit will give
> only one piece that originates in the tree whose creator happened to have
> used the longest prefix while ignoring all others. It may _allow_ such a
> layout, but how would such semantics be useful in the first place?
>
> I suspect that I am missing something but my gut feeling is that this
> change turns an interesting hack (even though it might be expensive) into
> a hack that is not useful at all in the real world, without some order,
> discipline, or guideline is applied.
As you may remember, the major, remaining problem with the jh/notes patch
series was that as the number of notes in a repo grew, the runtime cost of
displaying (even a single one of) them became prohibitive. This was the
major reason why Dscho and me did NOT want you to merge this series earlier.
The solution to this performance problem (which has been discussed since
almost a year ago), is to use some fanout scheme in the notes tree, so that
we can load individual notes without necessarily parsing the entire notes
tree. This patch implements the _reading_ of a notes trees with fanout.
However, as you correctly identify, allowing fanout makes it possible to add
multiple notes for the same commit in the same notes tree.
Therefore, we must now create the order/discipline/guideline you request by
taking away this extra freedom.
This will take the form of enforcing a chosen fanout scheme when _writing_
notes. This code (yet to be written) will include:
- refactoring the notes tree when the number of notes call for a different
fanout scheme (e.g. create a 2/38 fanout scheme when the number of notes in
a (sub)tree exceed some threshold).
- adding and editing notes while following the current fanout scheme.
- adding a custom merge strategy for note trees, which reads notes trees
Currently I have a two different ideas on a suitable fanout scheme:
1. Define a threshold where the number of notes in a (sub)tree becomes large
enough warrant a fanout. For now, let's assume that threshold is 1024. Start
with en empty notes tree with no fanout. When we reach 1024 notes, split the
notes tree into 256 subdirs (2/38 fanout). As each of the subdirs reach 1024
notes, split those subdirs further (2/2/36 fanout), and so on. In practice
(since SHA1 gives us a uniform distribution), notes trees with up to:
- < 1024 notes will have no fanout
- < ~256K notes will have 2/38 fanout
- < ~64M notes will have 2/2/36 fanout
- < ~16G notes will have 2/2/2/34 fanout
2. Simply decide on a constant 2/2/36 fanout. For the case with < 256K
notes, this is somewhat wasteful, but not prohibitively expensive. For the
case with > 64M notes, performance will start to degrade. The big advantage
with this approach is that when this is hardcoded into the notes code, we
have regained the property that notes for a given commit have exactly _one_
unique position in the notes tree across all installations (enabling us to
fall back on the regular merge strategy).
> What's the recommended way to work with this system from the end user's
> point of view in a distirbuted environment?
The end user should not know or care about what fanout scheme is used.
Everything should be handled seamlessly by the code.
> Somebody up in the project is supposed to decide what fan-out is to be
> used for the whole project and everybody should follow that structure?
Nope, the code should decide which fanout scheme to use.
> If a participant in the project forgets that rule (or makes a mistake), a
> notes tree that mistakenly merges his notes tree becomes practically
> useless?
No. Again this should be invisible to the user.
> If so, perhaps we would need a mechanism to avoid such a mistake from
> happening?
The notes code should prevent notes that violate the fanout scheme from
being created.
The fanout scheme is not something the user should have to worry about at
all.
I totally understand that you don't want to merge the notes feature before
the "writing" side is taken care of. As such, this iteration is simply yet
another iteration towards that final goal.
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 10/12] notes.c: Implement simple memory pooling of leaf nodes
2009-08-27 7:39 ` Alex Riesen
@ 2009-08-27 9:49 ` Johan Herland
2009-08-27 22:43 ` Johan Herland
0 siblings, 1 reply; 38+ messages in thread
From: Johan Herland @ 2009-08-27 9:49 UTC (permalink / raw)
To: git
Cc: Alex Riesen, gitster, Johannes.Schindelin, trast, tavestbo, git,
chriscool, spearce
On Thursday 27 August 2009, Alex Riesen wrote:
> On Thu, Aug 27, 2009 at 03:43, Johan Herland<johan@herland.net> wrote:
> > When allocating a new memory pool, the older pool is leaked, but this
> > is no worse than the current situation, where (pretty much) all
> > leaf_nodes are leaked anyway.
>
> Could you return the unused nodes back into the mempool?
> By making the pool a preallocated list, perhaps?
Yes, maintaining a free-list is certainly possible. However, the number of
free()d leaf_nodes is relatively small (only subtree entries are free()d
after unpacking them into the tree structure), so I'm not sure it pays off,
runtime-wise.
> And then it is trivial to provide a deallocation function for the
> mempool, which something really concerned about the memleak can call
> (like when or if libgit get more usable in an application context).
Yes, I plan to provide a free_notes() function that free()s all the memory
associated with the notes data structure. This would of course keep
references to all the mempools, and deallocate them (along with all the
int_nodes).
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-27 5:00 ` Junio C Hamano
2009-08-27 9:35 ` Johan Herland
@ 2009-08-27 10:42 ` Johannes Schindelin
1 sibling, 0 replies; 38+ messages in thread
From: Johannes Schindelin @ 2009-08-27 10:42 UTC (permalink / raw)
To: Junio C Hamano
Cc: Johan Herland, git, trast, tavestbo, git, chriscool, spearce
Hi,
On Wed, 26 Aug 2009, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
>
> > The semantics used when parsing notes trees (with regards to fanout subtrees)
> > follow Dscho's proposal fairly closely:
> > - No concatenation/merging of notes is performed. If there are several notes
> > objects referencing a given commit, only one of those objects are used.
> > - If a notes object for a given commit is present in the "root" notes tree,
> > no subtrees are consulted; the object in the root tree is used directly.
> > - If there are more than one subtree that prefix-matches the given commit,
> > only the subtree with the longest matching prefix is consulted. This
> > means that if the given commit is e.g. "deadbeef", and the notes tree have
> > subtrees "de" and "dead", then the following paths in the notes tree are
> > searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
> > - Fanout directories (subtrees) must references a whole number of bytes
> > from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
> > acceptable; "d" and "dea" are not.
> > - Multiple levels of fanout are allowed. All the above rules apply
> > recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
>
> If I am reading this correctly, the earlier parts of the series were
> aiming to let multiple people to add notes to the same commit more or less
> uncordinated while still allowing to merge them sensibly, but now such a
> workflow becomes impossible with this change.
>
> The above claims notes trees with different levels of fan-out are allowed,
> but what it really means is that merging notes trees with different levels
> of fan-out will produce a useless result that records notes for the same
> commit in different blobs all over the notes tree, and asking the notes
> mechanism to give the notes for one commit will give only one piece that
> originates in the tree whose creator happened to have used the longest
> prefix while ignoring all others. It may _allow_ such a layout, but how
> would such semantics be useful in the first place?
>
> I suspect that I am missing something but my gut feeling is that this
> change turns an interesting hack (even though it might be expensive) into
> a hack that is not useful at all in the real world, without some order,
> discipline, or guideline is applied.
>
> What's the recommended way to work with this system from the end user's
> point of view in a distirbuted environment? Somebody up in the project is
> supposed to decide what fan-out is to be used for the whole project and
> everybody should follow that structure? If a participant in the project
> forgets that rule (or makes a mistake), a notes tree that mistakenly
> merges his notes tree becomes practically useless? If so, perhaps we
> would need a mechanism to avoid such a mistake from happening?
Hmm...
Maybe you're right (and mugwump was right all along) that _all_ matching
notes should be shown...
Ciao,
Dscho
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-27 9:35 ` Johan Herland
@ 2009-08-27 10:47 ` Johannes Schindelin
2009-08-27 20:58 ` Junio C Hamano
2009-08-27 20:55 ` Junio C Hamano
1 sibling, 1 reply; 38+ messages in thread
From: Johannes Schindelin @ 2009-08-27 10:47 UTC (permalink / raw)
To: Johan Herland
Cc: Junio C Hamano, git, trast, tavestbo, git, chriscool, spearce
Hi,
On Thu, 27 Aug 2009, Johan Herland wrote:
> On Thursday 27 August 2009, Junio C Hamano wrote:
> > Somebody up in the project is supposed to decide what fan-out is to be
> > used for the whole project and everybody should follow that structure?
>
> Nope, the code should decide which fanout scheme to use.
I half-agree, the code should decide which fanout scheme to use, but
_only_ when producing new notes.
I imagine that it could merge the existing notes, and try to make sure
that there are no more blobs in a given subtree than a certain threshold;
if that threshold is reached, it could fan-out using 2-digit subtrees,
merging what needs merging (by concatenation) along the way.
The natural precedence of shallower paths/longer basenames should cope
well with that (i.e. prefer to show abcd/... over ab/cd/...).
Ciao,
Dscho
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-27 9:35 ` Johan Herland
2009-08-27 10:47 ` Johannes Schindelin
@ 2009-08-27 20:55 ` Junio C Hamano
2009-08-27 21:27 ` Shawn O. Pearce
1 sibling, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2009-08-27 20:55 UTC (permalink / raw)
To: Johan Herland
Cc: git, Johannes.Schindelin, trast, tavestbo, git, chriscool,
spearce
Johan Herland <johan@herland.net> writes:
> 2. Simply decide on a constant 2/2/36 fanout. For the case with < 256K
> notes, this is somewhat wasteful, but not prohibitively expensive. For the
> case with > 64M notes, performance will start to degrade. The big advantage
> with this approach is that when this is hardcoded into the notes code, we
> have regained the property that notes for a given commit have exactly _one_
> unique position in the notes tree across all installations (enabling us to
> fall back on the regular merge strategy).
I thought it was Gitney who suggested to use a top-level fan-out based on
the committer-date. If you typically have already parsed the commit when
you want to look up notes objects for it, it won't have extra overhead,
and when looking at only recent history it will only need to access a
small subset of trees. I thought it was a neat idea (except that the
question becomes what the granularity of the top level fan-out should
be---one a day? one a month?---the optimum would depend on commit
density). Was that idea shot down for some reason?
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-27 10:47 ` Johannes Schindelin
@ 2009-08-27 20:58 ` Junio C Hamano
2009-08-28 8:48 ` Johannes Schindelin
0 siblings, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2009-08-27 20:58 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Johan Herland, git, trast, tavestbo, git, chriscool, spearce
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> I half-agree, the code should decide which fanout scheme to use, but
> _only_ when producing new notes.
>
> I imagine that it could merge the existing notes, and try to make sure
> that there are no more blobs in a given subtree than a certain threshold;
> if that threshold is reached, it could fan-out using 2-digit subtrees,
> merging what needs merging (by concatenation) along the way.
>
> The natural precedence of shallower paths/longer basenames should cope
> well with that (i.e. prefer to show abcd/... over ab/cd/...).
Oh, if the plan for merging the trees is such that it takes care of
"multiple notes pointing at the same commit" issues like you outline, then
I can see it would work nicely.
At that point, fan-out would become merely an implementation detail,
something the end user never needs to worry about, just like what base
object is chosen to represent another object in a packfile.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-27 20:55 ` Junio C Hamano
@ 2009-08-27 21:27 ` Shawn O. Pearce
2009-08-27 21:50 ` Junio C Hamano
0 siblings, 1 reply; 38+ messages in thread
From: Shawn O. Pearce @ 2009-08-27 21:27 UTC (permalink / raw)
To: Junio C Hamano
Cc: Johan Herland, git, Johannes.Schindelin, trast, tavestbo, git,
chriscool
Junio C Hamano <gitster@pobox.com> wrote:
> Johan Herland <johan@herland.net> writes:
> > 2. Simply decide on a constant 2/2/36 fanout.
>
> I thought it was Gitney who suggested to use a top-level fan-out based on
> the committer-date. If you typically have already parsed the commit when
> you want to look up notes objects for it, it won't have extra overhead,
> and when looking at only recent history it will only need to access a
> small subset of trees. I thought it was a neat idea (except that the
> question becomes what the granularity of the top level fan-out should
> be---one a day? one a month?---the optimum would depend on commit
> density). Was that idea shot down for some reason?
Yea, it was me. I still think it might be a useful idea, since
it allows you better density of loading notes when parsing the
recent commits. In theory the last 256 commits can easly be in
each of the 2/ fanout buckets, making 2/38 pointless for reducing
the search space. Commit date on the other hand can probably force
all of them into the same bucket, making it easy to have the last
256 commits in cache, from a single bucket.
But I thought you shot it down, by saying that we also wanted to
support notes on blobs. I happen to see no value in a note on
a blob, a blob alone doesn't make much sense without at least an
annotated tag or commit to provide it some named context, and the
latter two have dates.
--
Shawn.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-27 21:27 ` Shawn O. Pearce
@ 2009-08-27 21:50 ` Junio C Hamano
2009-08-27 23:03 ` Johan Herland
0 siblings, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2009-08-27 21:50 UTC (permalink / raw)
To: Shawn O. Pearce
Cc: Junio C Hamano, Johan Herland, git, Johannes.Schindelin, trast,
tavestbo, git, chriscool
"Shawn O. Pearce" <spearce@spearce.org> writes:
> Yea, it was me. I still think it might be a useful idea, since
> it allows you better density of loading notes when parsing the
> recent commits. In theory the last 256 commits can easly be in
> each of the 2/ fanout buckets, making 2/38 pointless for reducing
> the search space. Commit date on the other hand can probably force
> all of them into the same bucket, making it easy to have the last
> 256 commits in cache, from a single bucket.
>
> But I thought you shot it down, by saying that we also wanted to
> support notes on blobs. I happen to see no value in a note on
> a blob, a blob alone doesn't make much sense without at least an
> annotated tag or commit to provide it some named context, and the
> latter two have dates.
Yeah, and in this thread everybody seems to be talking about commits so I
think it is fine to limit notes only to commits.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 10/12] notes.c: Implement simple memory pooling of leaf nodes
2009-08-27 9:49 ` Johan Herland
@ 2009-08-27 22:43 ` Johan Herland
0 siblings, 0 replies; 38+ messages in thread
From: Johan Herland @ 2009-08-27 22:43 UTC (permalink / raw)
To: Alex Riesen
Cc: git, gitster, Johannes.Schindelin, trast, tavestbo, git,
chriscool, spearce
On Thursday 27 August 2009, Johan Herland wrote:
> On Thursday 27 August 2009, Alex Riesen wrote:
> > On Thu, Aug 27, 2009 at 03:43, Johan Herland<johan@herland.net> wrote:
> > > When allocating a new memory pool, the older pool is leaked, but this
> > > is no worse than the current situation, where (pretty much) all
> > > leaf_nodes are leaked anyway.
> >
> > Could you return the unused nodes back into the mempool?
> > By making the pool a preallocated list, perhaps?
>
> Yes, maintaining a free-list is certainly possible. However, the number
> of free()d leaf_nodes is relatively small (only subtree entries are
> free()d after unpacking them into the tree structure), so I'm not sure it
> pays off, runtime-wise.
I played around with the free-list idea, but it cost more than the memory
pooling code saved in the first place. I'm leaning towards dropping the
whole memory pooling idea, as the small run-time improvement is probably not
worth the added complexity. We'll see. I'll re-evaluate once I've refactored
the code according to the other threads of this discussion.
> > And then it is trivial to provide a deallocation function for the
> > mempool, which something really concerned about the memleak can call
> > (like when or if libgit get more usable in an application context).
>
> Yes, I plan to provide a free_notes() function that free()s all the
> memory associated with the notes data structure. This would of course
> keep references to all the mempools, and deallocate them (along with all
> the int_nodes).
This still stands, of course. Should be part of the next iteration.
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-27 21:50 ` Junio C Hamano
@ 2009-08-27 23:03 ` Johan Herland
2009-08-27 23:39 ` Jeff King
2009-08-28 8:51 ` Johannes Schindelin
0 siblings, 2 replies; 38+ messages in thread
From: Johan Herland @ 2009-08-27 23:03 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Shawn O. Pearce, Johannes.Schindelin, trast,
tavestbo, git, chriscool
On Thursday 27 August 2009, Junio C Hamano wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> > Yea, it was me. I still think it might be a useful idea, since
> > it allows you better density of loading notes when parsing the
> > recent commits. In theory the last 256 commits can easly be in
> > each of the 2/ fanout buckets, making 2/38 pointless for reducing
> > the search space. Commit date on the other hand can probably force
> > all of them into the same bucket, making it easy to have the last
> > 256 commits in cache, from a single bucket.
> >
> > But I thought you shot it down, by saying that we also wanted to
> > support notes on blobs. I happen to see no value in a note on
> > a blob, a blob alone doesn't make much sense without at least an
> > annotated tag or commit to provide it some named context, and the
> > latter two have dates.
>
> Yeah, and in this thread everybody seems to be talking about commits so I
> think it is fine to limit notes only to commits.
Agreed. I'm starting to come around to the idea of storing them in subtrees
based on commit dates. For one, you don't have multiple notes for one commit
in the same notes tree. Also, the common-case access pattern seems tempting.
Dscho: Were there other problems with the date-based approach other than not
supporting notes on trees and blobs?
If not, I'll start preparing another series with the date-based approach.
Thanks for the input, guys. :)
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-27 23:03 ` Johan Herland
@ 2009-08-27 23:39 ` Jeff King
2009-08-28 0:30 ` Junio C Hamano
2009-08-28 8:51 ` Johannes Schindelin
1 sibling, 1 reply; 38+ messages in thread
From: Jeff King @ 2009-08-27 23:39 UTC (permalink / raw)
To: Johan Herland
Cc: git, Junio C Hamano, Shawn O. Pearce, Johannes.Schindelin, trast,
tavestbo, git, chriscool
On Fri, Aug 28, 2009 at 01:03:05AM +0200, Johan Herland wrote:
> Agreed. I'm starting to come around to the idea of storing them in subtrees
> based on commit dates. For one, you don't have multiple notes for one commit
> in the same notes tree. Also, the common-case access pattern seems tempting.
>
> Dscho: Were there other problems with the date-based approach other than not
> supporting notes on trees and blobs?
>
> If not, I'll start preparing another series with the date-based approach.
Would you ever want to load a note for a commit when you did not have
that commit present (in which case you would not know its date)?
-Peff
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-27 23:39 ` Jeff King
@ 2009-08-28 0:30 ` Junio C Hamano
2009-08-28 0:40 ` Sverre Rabbelier
0 siblings, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2009-08-28 0:30 UTC (permalink / raw)
To: Jeff King
Cc: Johan Herland, git, Shawn O. Pearce, Johannes.Schindelin, trast,
tavestbo, git, chriscool
Jeff King <peff@peff.net> writes:
> On Fri, Aug 28, 2009 at 01:03:05AM +0200, Johan Herland wrote:
>
>> Agreed. I'm starting to come around to the idea of storing them in subtrees
>> based on commit dates. For one, you don't have multiple notes for one commit
>> in the same notes tree. Also, the common-case access pattern seems tempting.
>>
>> Dscho: Were there other problems with the date-based approach other than not
>> supporting notes on trees and blobs?
>>
>> If not, I'll start preparing another series with the date-based approach.
>
> Would you ever want to load a note for a commit when you did not have
> that commit present (in which case you would not know its date)?
You cannot "git show" nor "git checkout" it, nor have reached it via "git
bisect". How did you get interested in the commit in the first place?
What is the reason you are not fetching the commit after you somehow got
interested in the commit?
Here is one possible scenario I can think of. The notes tree is published
and distributed more widely than the project's main source. You fetch and
usually follow notes tree (which would probably be more lightweight) but
not the source. Perhaps your bug tracking system is based on notes, and
the project is throwing all the bugs in one such notes tree. You are a
bug tracking person, and you are following the notes tree closely, but are
only following the 'master' branch but not 'next' nor 'pu'.
And you somehow browse through the 'notes' tree, and learn that a note
refers to an interesting commit. But the commit the note applies to is
not yet in 'master' so you do not have it. Even though you would want to
fetch 'next' and 'pu', you are having connection problem and you cannot
fetch the source until your ISP gets back online tomorrow morning.
Nah, that's not a plausible scenario---at that point you already have read
the note.
So, how?
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-28 0:30 ` Junio C Hamano
@ 2009-08-28 0:40 ` Sverre Rabbelier
2009-08-28 1:43 ` Junio C Hamano
0 siblings, 1 reply; 38+ messages in thread
From: Sverre Rabbelier @ 2009-08-28 0:40 UTC (permalink / raw)
To: Junio C Hamano
Cc: Jeff King, Johan Herland, git, Shawn O. Pearce,
Johannes.Schindelin, trast, tavestbo, git, chriscool
Heya,
On Thu, Aug 27, 2009 at 17:30, Junio C Hamano<gitster@pobox.com> wrote:
> So, how?
A note in branch foo (which you do not follow) was referenced by a
commit in branch bar (which you do follow)?
--
Cheers,
Sverre Rabbelier
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-28 0:40 ` Sverre Rabbelier
@ 2009-08-28 1:43 ` Junio C Hamano
2009-08-28 2:51 ` Sverre Rabbelier
0 siblings, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2009-08-28 1:43 UTC (permalink / raw)
To: Sverre Rabbelier
Cc: Jeff King, Johan Herland, git, Shawn O. Pearce,
Johannes.Schindelin, trast, tavestbo, git, chriscool
Sverre Rabbelier <srabbelier@gmail.com> writes:
> Heya,
>
> On Thu, Aug 27, 2009 at 17:30, Junio C Hamano<gitster@pobox.com> wrote:
>> So, how?
>
> A note in branch foo (which you do not follow) was referenced by a
> commit in branch bar (which you do follow)?
Sorry, I do not follow. How does a note be _in_ branch foo, and how does
a commit _reference_ a note?
Did you mean "a commit in branch bar referred to a commit in branch foo",
similar to the way the "cherry-picked from X" comment can refer to a
missing commit?
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-28 1:43 ` Junio C Hamano
@ 2009-08-28 2:51 ` Sverre Rabbelier
2009-08-28 3:02 ` Junio C Hamano
0 siblings, 1 reply; 38+ messages in thread
From: Sverre Rabbelier @ 2009-08-28 2:51 UTC (permalink / raw)
To: Junio C Hamano
Cc: Jeff King, Johan Herland, git, Shawn O. Pearce,
Johannes.Schindelin, trast, tavestbo, git, chriscool
Heya,
On Thu, Aug 27, 2009 at 18:43, Junio C Hamano<gitster@pobox.com> wrote:
> Did you mean "a commit in branch bar referred to a commit in branch foo",
> similar to the way the "cherry-picked from X" comment can refer to a
> missing commit?
Yes, sorry, I meant referred to in the commit message.
--
Cheers,
Sverre Rabbelier
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-28 2:51 ` Sverre Rabbelier
@ 2009-08-28 3:02 ` Junio C Hamano
2009-08-28 3:05 ` Sverre Rabbelier
0 siblings, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2009-08-28 3:02 UTC (permalink / raw)
To: Sverre Rabbelier
Cc: Junio C Hamano, Jeff King, Johan Herland, git, Shawn O. Pearce,
Johannes.Schindelin, trast, tavestbo, git, chriscool
Sverre Rabbelier <srabbelier@gmail.com> writes:
> On Thu, Aug 27, 2009 at 18:43, Junio C Hamano<gitster@pobox.com> wrote:
>> Did you mean "a commit in branch bar referred to a commit in branch foo",
>> similar to the way the "cherry-picked from X" comment can refer to a
>> missing commit?
>
> Yes, sorry, I meant referred to in the commit message.
In such a case would you rather want to see the commit itself first, or at
least, commit _and_ notes _together_?
I admit that this all depends on what application the notes are used for,
but I am suffering from lack of imagination to come up with a plausible
scenario in which you would want to look at the notes to that missing
commit first, without getting the commit itself.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-28 3:02 ` Junio C Hamano
@ 2009-08-28 3:05 ` Sverre Rabbelier
2009-08-28 3:35 ` Junio C Hamano
0 siblings, 1 reply; 38+ messages in thread
From: Sverre Rabbelier @ 2009-08-28 3:05 UTC (permalink / raw)
To: Junio C Hamano
Cc: Jeff King, Johan Herland, git, Shawn O. Pearce,
Johannes.Schindelin, trast, tavestbo, git, chriscool
Heya,
On Thu, Aug 27, 2009 at 20:02, Junio C Hamano<gitster@pobox.com> wrote:
> In such a case would you rather want to see the commit itself first, or at
> least, commit _and_ notes _together_?
Assuming you do download all notes, I think it would be nice to be
able to read the note; and since there's no way to download the commit
separately it would require one to guess which head the commit belongs
to and fetch the entire branch...?
--
Cheers,
Sverre Rabbelier
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-28 3:05 ` Sverre Rabbelier
@ 2009-08-28 3:35 ` Junio C Hamano
0 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2009-08-28 3:35 UTC (permalink / raw)
To: Sverre Rabbelier
Cc: Jeff King, Johan Herland, git, Shawn O. Pearce,
Johannes.Schindelin, trast, tavestbo, git, chriscool
Sverre Rabbelier <srabbelier@gmail.com> writes:
> On Thu, Aug 27, 2009 at 20:02, Junio C Hamano<gitster@pobox.com> wrote:
>> In such a case would you rather want to see the commit itself first, or at
>> least, commit _and_ notes _together_?
>
> Assuming you do download all notes, I think it would be nice to be
> able to read the note; and since there's no way to download the commit
> separately it would require one to guess which head the commit belongs
> to and fetch the entire branch...?
Some random thoughts...
* If there are very many branches (in the worst case, they are so many
that the upstream uses the expand extention to serve the project),
maybe the notes namespace will also have many branches. It is unclear
how a user is expected to know which notes branch a note to a
particular commit is to be found.
* Perhaps to solve that problem, such a project may use notes in the
corresponding "notes branch"? Then your assumption does not hold, as
you first guess which notes branch to fetch to find the note that may
not even exist for this issue to become a real problem.
* If you assume all the notes are downloaded in such a project, you can
still go through all the top-level trees (that are date based fan-out)
and find the note to the commit object you do not have. At that point,
it only becomes performance issue for an unusual case where you have
a note but the commit the note applies to.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-27 20:58 ` Junio C Hamano
@ 2009-08-28 8:48 ` Johannes Schindelin
0 siblings, 0 replies; 38+ messages in thread
From: Johannes Schindelin @ 2009-08-28 8:48 UTC (permalink / raw)
To: Junio C Hamano
Cc: Johan Herland, git, trast, tavestbo, git, chriscool, spearce
Hi,
On Thu, 27 Aug 2009, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > I half-agree, the code should decide which fanout scheme to use, but
> > _only_ when producing new notes.
> >
> > I imagine that it could merge the existing notes, and try to make sure
> > that there are no more blobs in a given subtree than a certain
> > threshold; if that threshold is reached, it could fan-out using
> > 2-digit subtrees, merging what needs merging (by concatenation) along
> > the way.
> >
> > The natural precedence of shallower paths/longer basenames should cope
> > well with that (i.e. prefer to show abcd/... over ab/cd/...).
>
> Oh, if the plan for merging the trees is such that it takes care of
> "multiple notes pointing at the same commit" issues like you outline,
> then I can see it would work nicely.
>
> At that point, fan-out would become merely an implementation detail,
> something the end user never needs to worry about, just like what base
> object is chosen to represent another object in a packfile.
Oh, that's where you're coming from! Now I understand your concerns; it
never occurred to me that the user should be encouraged to add notes to
the tree herself. I was always under the impression that the fan-out is
an implementation detail best hidden from the user.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-27 23:03 ` Johan Herland
2009-08-27 23:39 ` Jeff King
@ 2009-08-28 8:51 ` Johannes Schindelin
2009-08-28 10:40 ` Johan Herland
1 sibling, 1 reply; 38+ messages in thread
From: Johannes Schindelin @ 2009-08-28 8:51 UTC (permalink / raw)
To: Johan Herland
Cc: git, Junio C Hamano, Shawn O. Pearce, trast, tavestbo, git,
chriscool
Hi,
On Fri, 28 Aug 2009, Johan Herland wrote:
> On Thursday 27 August 2009, Junio C Hamano wrote:
> > "Shawn O. Pearce" <spearce@spearce.org> writes:
> > > Yea, it was me. I still think it might be a useful idea, since it
> > > allows you better density of loading notes when parsing the recent
> > > commits. In theory the last 256 commits can easly be in each of the
> > > 2/ fanout buckets, making 2/38 pointless for reducing the search
> > > space. Commit date on the other hand can probably force all of them
> > > into the same bucket, making it easy to have the last 256 commits in
> > > cache, from a single bucket.
> > >
> > > But I thought you shot it down, by saying that we also wanted to
> > > support notes on blobs. I happen to see no value in a note on a
> > > blob, a blob alone doesn't make much sense without at least an
> > > annotated tag or commit to provide it some named context, and the
> > > latter two have dates.
> >
> > Yeah, and in this thread everybody seems to be talking about commits
> > so I think it is fine to limit notes only to commits.
>
> Agreed. I'm starting to come around to the idea of storing them in
> subtrees based on commit dates. For one, you don't have multiple notes
> for one commit in the same notes tree. Also, the common-case access
> pattern seems tempting.
>
> Dscho: Were there other problems with the date-based approach other than
> not supporting notes on trees and blobs?
It emphasized an implementation detail too much for my liking.
And I would rather have some flexibility in the code as to _when_ it fans
out and when not.
So I can easily imagine a full repository which has only, say, 5 notes.
Why not have a single tree for all of those?
And I can easily imagine a repository that has a daily note generated by
an automatic build, and no other notes. The date-based fan-out just
wastes our time here, and even hurts performance.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-28 8:51 ` Johannes Schindelin
@ 2009-08-28 10:40 ` Johan Herland
2009-08-28 11:56 ` Johannes Schindelin
0 siblings, 1 reply; 38+ messages in thread
From: Johan Herland @ 2009-08-28 10:40 UTC (permalink / raw)
To: Johannes Schindelin
Cc: git, Junio C Hamano, Shawn O. Pearce, trast, tavestbo, git,
chriscool
On Friday 28 August 2009, Johannes Schindelin wrote:
> Hi,
>
> On Fri, 28 Aug 2009, Johan Herland wrote:
> > On Thursday 27 August 2009, Junio C Hamano wrote:
> > > "Shawn O. Pearce" <spearce@spearce.org> writes:
> > > > Yea, it was me. I still think it might be a useful idea, since
> > > > it allows you better density of loading notes when parsing the
> > > > recent commits. In theory the last 256 commits can easly be in
> > > > each of the 2/ fanout buckets, making 2/38 pointless for
> > > > reducing the search space. Commit date on the other hand can
> > > > probably force all of them into the same bucket, making it easy
> > > > to have the last 256 commits in cache, from a single bucket.
> > > >
> > > > But I thought you shot it down, by saying that we also wanted
> > > > to support notes on blobs. I happen to see no value in a note
> > > > on a blob, a blob alone doesn't make much sense without at
> > > > least an annotated tag or commit to provide it some named
> > > > context, and the latter two have dates.
> > >
> > > Yeah, and in this thread everybody seems to be talking about
> > > commits so I think it is fine to limit notes only to commits.
> >
> > Agreed. I'm starting to come around to the idea of storing them in
> > subtrees based on commit dates. For one, you don't have multiple
> > notes for one commit in the same notes tree. Also, the common-case
> > access pattern seems tempting.
> >
> > Dscho: Were there other problems with the date-based approach other
> > than not supporting notes on trees and blobs?
>
> It emphasized an implementation detail too much for my liking.
>
> And I would rather have some flexibility in the code as to _when_ it
> fans out and when not.
>
> So I can easily imagine a full repository which has only, say, 5
> notes. Why not have a single tree for all of those?
Yes, if you only have a handful of notes, the date-based approach is
definitely overkill. On the other hand, if you only have a handful of
notes, performance is not going to be a problem in the first place, no
matter which notes structure you use...
> And I can easily imagine a repository that has a daily note generated
> by an automatic build, and no other notes. The date-based fan-out
> just wastes our time here, and even hurts performance.
What about a month-based fanout? Looking at the kernel repo with
git log --all --date=iso --format="%ad" |
cut -c1-7 | sort | uniq -c | sort -n
I find that commits are spread across 66 months, and the most active
month (2008-07) has 5661 commits. If we assume the one-note-per-commit
worst case, that gives up to 5661 notes per month-based subdir. Is that
too much?
Doing
for subdir in $(find . -type d); do
echo "$(ls -1 $subdir | wc -l) $subdir"
done | sort -n
shows me that the currently largest tree in the kernel has 985 entries
(include/linux), so a 5661-entry tree is probably larger than what git
is used to...
...just thinking that we shold make things as simple as possible (but no
simpler), and if a month-based fanout works adequately in all practical
cases, then we should go with that...
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-28 10:40 ` Johan Herland
@ 2009-08-28 11:56 ` Johannes Schindelin
2009-08-28 14:15 ` Johan Herland
0 siblings, 1 reply; 38+ messages in thread
From: Johannes Schindelin @ 2009-08-28 11:56 UTC (permalink / raw)
To: Johan Herland
Cc: git, Junio C Hamano, Shawn O. Pearce, trast, tavestbo, git,
chriscool
Hi,
On Fri, 28 Aug 2009, Johan Herland wrote:
> On Friday 28 August 2009, Johannes Schindelin wrote:
>
> > And I can easily imagine a repository that has a daily note generated
> > by an automatic build, and no other notes. The date-based fan-out
> > just wastes our time here, and even hurts performance.
>
> What about a month-based fanout?
Well, I hoped to convince you that the date-based approach is too rigid.
You basically cannot adapt the optimal data layout to the available data.
(I like to think of this issue as related to storing deltas: we let Git
choose relatively freely what to delta against, and do not force a delta
against the parent commit like others do; I think it is pretty obvious
that our approach is more powerful.)
So the simplest (yet powerful-enough) way I could imagine is to teach the
reading part to accept any fan-out (but that fan-out is really only based
on the object name, nothing else), and to adjust the writing/merging part
such that it has a maximum bin size (i.e. it starts a new fan-out whenever
a tree object contains more than a config-specifyable limit).
I was certainly not thinking of something as complicated as Huffman.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
2009-08-28 11:56 ` Johannes Schindelin
@ 2009-08-28 14:15 ` Johan Herland
0 siblings, 0 replies; 38+ messages in thread
From: Johan Herland @ 2009-08-28 14:15 UTC (permalink / raw)
To: Johannes Schindelin
Cc: git, Junio C Hamano, Shawn O. Pearce, trast, tavestbo, git,
chriscool
On Friday 28 August 2009, Johannes Schindelin wrote:
> On Fri, 28 Aug 2009, Johan Herland wrote:
> > On Friday 28 August 2009, Johannes Schindelin wrote:
> > > And I can easily imagine a repository that has a daily note
> > > generated by an automatic build, and no other notes. The
> > > date-based fan-out just wastes our time here, and even hurts
> > > performance.
> >
> > What about a month-based fanout?
>
> Well, I hoped to convince you that the date-based approach is too
> rigid. You basically cannot adapt the optimal data layout to the
> available data.
>
> (I like to think of this issue as related to storing deltas: we let
> Git choose relatively freely what to delta against, and do not force
> a delta against the parent commit like others do; I think it is
> pretty obvious that our approach is more powerful.)
>
> So the simplest (yet powerful-enough) way I could imagine is to teach
> the reading part to accept any fan-out (but that fan-out is really
> only based on the object name, nothing else), and to adjust the
> writing/merging part such that it has a maximum bin size (i.e. it
> starts a new fan-out whenever a tree object contains more than a
> config-specifyable limit).
I agree with your points on flexibility and not nailing down a structure
that might prove too rigid in the future.
But it seems the date-based approach might offer wins that an
object-name-based approach (flexible or not) simply cannot hope to
match...
Also a rigid organization (with unique note locations) makes the
implementation simpler and faster: If you allow notes for a given
commit at several places in the notes tree (and require the result to
be the concatenation of those notes, which seems to be the saner
choice), the lookup procedure must keep looking even after it has found
the first match. This affects both runtime and memory consumption
negatively (more subtrees must be unpacked, etc.)
I guess I'll code up both alternatives so that we can get some actual
numbers...
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2009-08-28 14:17 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-27 1:43 [PATCHv4 00/12] git notes Johan Herland
2009-08-27 1:43 ` [PATCHv4 01/12] Introduce commit notes Johan Herland
2009-08-27 1:43 ` [PATCHv4 02/12] Add a script to edit/inspect notes Johan Herland
2009-08-27 1:43 ` [PATCHv4 03/12] Speed up git notes lookup Johan Herland
2009-08-27 1:43 ` [PATCHv4 04/12] Add an expensive test for git-notes Johan Herland
2009-08-27 1:43 ` [PATCHv4 05/12] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
2009-08-27 1:43 ` [PATCHv4 06/12] fast-import: Add support for importing commit notes Johan Herland
2009-08-27 1:43 ` [PATCHv4 07/12] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
2009-08-27 1:43 ` [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes Johan Herland
2009-08-27 5:00 ` Junio C Hamano
2009-08-27 9:35 ` Johan Herland
2009-08-27 10:47 ` Johannes Schindelin
2009-08-27 20:58 ` Junio C Hamano
2009-08-28 8:48 ` Johannes Schindelin
2009-08-27 20:55 ` Junio C Hamano
2009-08-27 21:27 ` Shawn O. Pearce
2009-08-27 21:50 ` Junio C Hamano
2009-08-27 23:03 ` Johan Herland
2009-08-27 23:39 ` Jeff King
2009-08-28 0:30 ` Junio C Hamano
2009-08-28 0:40 ` Sverre Rabbelier
2009-08-28 1:43 ` Junio C Hamano
2009-08-28 2:51 ` Sverre Rabbelier
2009-08-28 3:02 ` Junio C Hamano
2009-08-28 3:05 ` Sverre Rabbelier
2009-08-28 3:35 ` Junio C Hamano
2009-08-28 8:51 ` Johannes Schindelin
2009-08-28 10:40 ` Johan Herland
2009-08-28 11:56 ` Johannes Schindelin
2009-08-28 14:15 ` Johan Herland
2009-08-27 10:42 ` Johannes Schindelin
2009-08-27 1:43 ` [PATCHv4 09/12] Selftests verifying semantics when loading notes trees with various fanouts Johan Herland
2009-08-27 1:43 ` [PATCHv4 10/12] notes.c: Implement simple memory pooling of leaf nodes Johan Herland
2009-08-27 7:39 ` Alex Riesen
2009-08-27 9:49 ` Johan Herland
2009-08-27 22:43 ` Johan Herland
2009-08-27 1:43 ` [PATCHv4 11/12] Add flags to get_commit_notes() to control the format of the note string Johan Herland
2009-08-27 1:43 ` [PATCHv4 12/12] Add '%N'-format for pretty-printing commit notes Johan Herland
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).