* [PATCHv3 0/8] RESEND: git notes
@ 2009-07-29 2:25 Johan Herland
2009-07-29 2:25 ` [PATCHv3 1/8] Introduce commit notes Johan Herland
` (7 more replies)
0 siblings, 8 replies; 30+ messages in thread
From: Johan Herland @ 2009-07-29 2:25 UTC (permalink / raw)
To: gitster
Cc: git, johannes.schindelin, trast, tavestbo, git, chriscool,
spearce, Johan Herland
Hi,
Here is a long overdue resend and improvement of the jh/notes topic in 'pu'.
The 5 first patches are pretty much unchanged (except for better attribution
of the various people who have helped improve this patch series).
The 6th patch introduces a first draft of notes tree parsing with support for
fanout subtrees. This first draft is just a straightforward implementation of
what I have picked up from the (many) discussions on this topic. As such,
this first draft focuses on correctness, rather than performance. BTW, did I
mention this was a first draft?
The 7th patch is stolen from the jh/vcs-cvs topic in 'pu', and teaches
git-fast-import to import note objects.
The final 8th patch is a relatively straightforward optimization of t3302.
Have fun! :)
...Johan
Johan Herland (4):
Teach "-m <msg>" and "-F <file>" to "git notes edit"
First draft of notes tree parser with support for fanout subtrees
fast-import: Add support for importing commit notes
t3302-notes-index-expensive: Speed up create_repo()
Johannes Schindelin (4):
Introduce commit notes
Add a script to edit/inspect notes
Speed up git notes lookup
Add an expensive test for git-notes
.gitignore | 1 +
Documentation/config.txt | 13 ++
Documentation/git-fast-import.txt | 45 +++++-
Documentation/git-notes.txt | 60 ++++++++
Makefile | 3 +
cache.h | 4 +
command-list.txt | 1 +
commit.c | 1 +
config.c | 5 +
environment.c | 1 +
fast-import.c | 88 +++++++++++-
git-notes.sh | 121 +++++++++++++++
notes.c | 295 +++++++++++++++++++++++++++++++++++++
notes.h | 7 +
pretty.c | 5 +
t/t3301-notes.sh | 149 +++++++++++++++++++
t/t3302-notes-index-expensive.sh | 118 +++++++++++++++
t/t3303-notes-subtrees.sh | 206 ++++++++++++++++++++++++++
t/t9300-fast-import.sh | 166 +++++++++++++++++++++
19 files changed, 1279 insertions(+), 10 deletions(-)
create mode 100644 Documentation/git-notes.txt
create mode 100755 git-notes.sh
create mode 100644 notes.c
create mode 100644 notes.h
create mode 100755 t/t3301-notes.sh
create mode 100755 t/t3302-notes-index-expensive.sh
create mode 100755 t/t3303-notes-subtrees.sh
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCHv3 1/8] Introduce commit notes
2009-07-29 2:25 [PATCHv3 0/8] RESEND: git notes Johan Herland
@ 2009-07-29 2:25 ` Johan Herland
2009-07-29 8:52 ` Alex Riesen
2009-07-29 2:25 ` [PATCHv3 2/8] Add a script to edit/inspect notes Johan Herland
` (6 subsequent siblings)
7 siblings, 1 reply; 30+ messages in thread
From: Johan Herland @ 2009-07-29 2:25 UTC (permalink / raw)
To: gitster
Cc: git, johannes.schindelin, trast, tavestbo, git, chriscool,
spearce, Johannes Schindelin, Johan Herland
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Commit notes are blobs which are shown together with the commit
message. These blobs are taken from the notes ref, which you can
configure by the config variable core.notesRef, which in turn can
be overridden by the environment variable GIT_NOTES_REF.
The notes ref is a branch which contains "files" whose names are
the names of the corresponding commits (i.e. the SHA-1).
The rationale for putting this information into a ref is this: we
want to be able to fetch and possibly union-merge the notes,
maybe even look at the date when a note was introduced, and we
want to store them efficiently together with the other objects.
This patch has been improved by the following contributions:
- Thomas Rast: fix core.notesRef documentation
- Tor Arne Vestbø: fix printing of multi-line notes
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Tor Arne Vestbø <tavestbo@trolltech.com>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Documentation/config.txt | 13 ++++++++
Makefile | 2 +
cache.h | 4 ++
commit.c | 1 +
config.c | 5 +++
environment.c | 1 +
notes.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++
notes.h | 7 ++++
pretty.c | 5 +++
9 files changed, 107 insertions(+), 0 deletions(-)
create mode 100644 notes.c
create mode 100644 notes.h
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 792dd42..6788958 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -439,6 +439,19 @@ On some file system/operating system combinations, this is unreliable.
Set this config setting to 'rename' there; However, This will remove the
check that makes sure that existing object files will not get overwritten.
+core.notesRef::
+ When showing commit messages, also show notes which are stored in
+ the given ref. This ref is expected to contain files named
+ after the full SHA-1 of the commit they annotate.
++
+If such a file exists in the given ref, the referenced blob is read, and
+appended to the commit message, separated by a "Notes:" line. If the
+given ref itself does not exist, it is not an error, but means that no
+notes should be printed.
++
+This setting defaults to "refs/notes/commits", and can be overridden by
+the `GIT_NOTES_REF` environment variable.
+
add.ignore-errors::
Tells 'git-add' to continue adding files when some files cannot be
added due to indexing errors. Equivalent to the '--ignore-errors'
diff --git a/Makefile b/Makefile
index ce082cf..cc188ca 100644
--- a/Makefile
+++ b/Makefile
@@ -429,6 +429,7 @@ LIB_H += ll-merge.h
LIB_H += log-tree.h
LIB_H += mailmap.h
LIB_H += merge-recursive.h
+LIB_H += notes.h
LIB_H += object.h
LIB_H += pack.h
LIB_H += pack-refs.h
@@ -512,6 +513,7 @@ LIB_OBJS += match-trees.o
LIB_OBJS += merge-file.o
LIB_OBJS += merge-recursive.o
LIB_OBJS += name-hash.o
+LIB_OBJS += notes.o
LIB_OBJS += object.o
LIB_OBJS += pack-check.o
LIB_OBJS += pack-refs.o
diff --git a/cache.h b/cache.h
index 1a2a3c9..3d761b8 100644
--- a/cache.h
+++ b/cache.h
@@ -371,6 +371,8 @@ static inline enum object_type object_type(unsigned int mode)
#define GITATTRIBUTES_FILE ".gitattributes"
#define INFOATTRIBUTES_FILE "info/attributes"
#define ATTRIBUTE_MACRO_PREFIX "[attr]"
+#define GIT_NOTES_REF_ENVIRONMENT "GIT_NOTES_REF"
+#define GIT_NOTES_DEFAULT_REF "refs/notes/commits"
extern int is_bare_repository_cfg;
extern int is_bare_repository(void);
@@ -566,6 +568,8 @@ enum object_creation_mode {
extern enum object_creation_mode object_creation_mode;
+extern char *notes_ref_name;
+
extern int grafts_replace_parents;
#define GIT_REPO_VERSION 0
diff --git a/commit.c b/commit.c
index e2bcbe8..ae6bff4 100644
--- a/commit.c
+++ b/commit.c
@@ -5,6 +5,7 @@
#include "utf8.h"
#include "diff.h"
#include "revision.h"
+#include "notes.h"
int save_commit_buffer = 1;
diff --git a/config.c b/config.c
index 1b3823d..9d99319 100644
--- a/config.c
+++ b/config.c
@@ -469,6 +469,11 @@ static int git_default_core_config(const char *var, const char *value)
return 0;
}
+ if (!strcmp(var, "core.notesref")) {
+ notes_ref_name = xstrdup(value);
+ return 0;
+ }
+
if (!strcmp(var, "core.pager"))
return git_config_string(&pager_program, var, value);
diff --git a/environment.c b/environment.c
index d478125..36c7776 100644
--- a/environment.c
+++ b/environment.c
@@ -51,6 +51,7 @@ enum push_default_type push_default = PUSH_DEFAULT_MATCHING;
#define OBJECT_CREATION_MODE OBJECT_CREATION_USES_HARDLINKS
#endif
enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
+char *notes_ref_name;
int grafts_replace_parents = 1;
/* Parallel index stat data preload? */
diff --git a/notes.c b/notes.c
new file mode 100644
index 0000000..b0cf553
--- /dev/null
+++ b/notes.c
@@ -0,0 +1,69 @@
+#include "cache.h"
+#include "commit.h"
+#include "notes.h"
+#include "refs.h"
+#include "utf8.h"
+#include "strbuf.h"
+
+static int initialized;
+
+void get_commit_notes(const struct commit *commit, struct strbuf *sb,
+ const char *output_encoding)
+{
+ static const char *utf8 = "utf-8";
+ struct strbuf name = STRBUF_INIT;
+ const char *hex;
+ unsigned char sha1[20];
+ char *msg, *msg_p;
+ unsigned long linelen, msglen;
+ enum object_type type;
+
+ if (!initialized) {
+ const char *env = getenv(GIT_NOTES_REF_ENVIRONMENT);
+ if (env)
+ notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT);
+ else if (!notes_ref_name)
+ notes_ref_name = GIT_NOTES_DEFAULT_REF;
+ if (notes_ref_name && read_ref(notes_ref_name, sha1))
+ notes_ref_name = NULL;
+ initialized = 1;
+ }
+
+ if (!notes_ref_name)
+ return;
+
+ strbuf_addf(&name, "%s:%s", notes_ref_name,
+ sha1_to_hex(commit->object.sha1));
+ if (get_sha1(name.buf, sha1))
+ return;
+
+ if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen ||
+ type != OBJ_BLOB)
+ return;
+
+ if (output_encoding && *output_encoding &&
+ strcmp(utf8, output_encoding)) {
+ char *reencoded = reencode_string(msg, output_encoding, utf8);
+ if (reencoded) {
+ free(msg);
+ msg = reencoded;
+ msglen = strlen(msg);
+ }
+ }
+
+ /* we will end the annotation by a newline anyway */
+ if (msglen && msg[msglen - 1] == '\n')
+ msglen--;
+
+ strbuf_addstr(sb, "\nNotes:\n");
+
+ for (msg_p = msg; msg_p < msg + msglen; msg_p += linelen + 1) {
+ linelen = strchrnul(msg_p, '\n') - msg_p;
+
+ strbuf_addstr(sb, " ");
+ strbuf_add(sb, msg_p, linelen);
+ strbuf_addch(sb, '\n');
+ }
+
+ free(msg);
+}
diff --git a/notes.h b/notes.h
new file mode 100644
index 0000000..79d21b6
--- /dev/null
+++ b/notes.h
@@ -0,0 +1,7 @@
+#ifndef NOTES_H
+#define NOTES_H
+
+void get_commit_notes(const struct commit *commit, struct strbuf *sb,
+ const char *output_encoding);
+
+#endif
diff --git a/pretty.c b/pretty.c
index e5328da..fee6789 100644
--- a/pretty.c
+++ b/pretty.c
@@ -6,6 +6,7 @@
#include "string-list.h"
#include "mailmap.h"
#include "log-tree.h"
+#include "notes.h"
#include "color.h"
static char *user_format;
@@ -963,5 +964,9 @@ void pretty_print_commit(enum cmit_fmt fmt, const struct commit *commit,
*/
if (fmt == CMIT_FMT_EMAIL && sb->len <= beginning_of_body)
strbuf_addch(sb, '\n');
+
+ if (fmt != CMIT_FMT_ONELINE)
+ get_commit_notes(commit, sb, encoding);
+
free(reencoded);
}
--
1.6.4.rc3.138.ga6b98.dirty
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCHv3 2/8] Add a script to edit/inspect notes
2009-07-29 2:25 [PATCHv3 0/8] RESEND: git notes Johan Herland
2009-07-29 2:25 ` [PATCHv3 1/8] Introduce commit notes Johan Herland
@ 2009-07-29 2:25 ` Johan Herland
2009-07-29 2:25 ` [PATCHv3 3/8] Speed up git notes lookup Johan Herland
` (5 subsequent siblings)
7 siblings, 0 replies; 30+ messages in thread
From: Johan Herland @ 2009-07-29 2:25 UTC (permalink / raw)
To: gitster
Cc: git, johannes.schindelin, trast, tavestbo, git, chriscool,
spearce, Johannes Schindelin, Johan Herland
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
The script 'git notes' allows you to edit and show commit notes, by
calling either
git notes show <commit>
or
git notes edit <commit>
This patch has been improved by the following contributions:
- Tor Arne Vestbø: fix printing of multi-line notes
- Michael J Gruber: test and handle empty notes gracefully
- Thomas Rast:
- only clean up message file when editing
- use GIT_EDITOR and core.editor over VISUAL/EDITOR
- t3301: fix confusing quoting in test for valid notes ref
- t3301: use test_must_fail instead of !
- refuse to edit notes outside refs/notes/
- Junio C Hamano: tests: fix "export var=val"
- Christian Couder: documentation: fix 'linkgit' macro in "git-notes.txt"
- Johan Herland: minor cleanup and bugfixing in git-notes.sh (v2)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Tor Arne Vestbø <tavestbo@trolltech.com>
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
.gitignore | 1 +
Documentation/git-notes.txt | 46 +++++++++++++++++
Makefile | 1 +
command-list.txt | 1 +
git-notes.sh | 73 +++++++++++++++++++++++++++
t/t3301-notes.sh | 114 +++++++++++++++++++++++++++++++++++++++++++
6 files changed, 236 insertions(+), 0 deletions(-)
create mode 100644 Documentation/git-notes.txt
create mode 100755 git-notes.sh
create mode 100755 t/t3301-notes.sh
diff --git a/.gitignore b/.gitignore
index 10808e3..04926fc 100644
--- a/.gitignore
+++ b/.gitignore
@@ -86,6 +86,7 @@ git-mktag
git-mktree
git-name-rev
git-mv
+git-notes
git-pack-redundant
git-pack-objects
git-pack-refs
diff --git a/Documentation/git-notes.txt b/Documentation/git-notes.txt
new file mode 100644
index 0000000..7136016
--- /dev/null
+++ b/Documentation/git-notes.txt
@@ -0,0 +1,46 @@
+git-notes(1)
+============
+
+NAME
+----
+git-notes - Add/inspect commit notes
+
+SYNOPSIS
+--------
+[verse]
+'git-notes' (edit | show) [commit]
+
+DESCRIPTION
+-----------
+This command allows you to add notes to commit messages, without
+changing the commit. To discern these notes from the message stored
+in the commit object, the notes are indented like the message, after
+an unindented line saying "Notes:".
+
+To disable commit notes, you have to set the config variable
+core.notesRef to the empty string. Alternatively, you can set it
+to a different ref, something like "refs/notes/bugzilla". This setting
+can be overridden by the environment variable "GIT_NOTES_REF".
+
+
+SUBCOMMANDS
+-----------
+
+edit::
+ Edit the notes for a given commit (defaults to HEAD).
+
+show::
+ Show the notes for a given commit (defaults to HEAD).
+
+
+Author
+------
+Written by Johannes Schindelin <johannes.schindelin@gmx.de>
+
+Documentation
+-------------
+Documentation by Johannes Schindelin
+
+GIT
+---
+Part of the linkgit:git[7] suite
diff --git a/Makefile b/Makefile
index cc188ca..c9003d7 100644
--- a/Makefile
+++ b/Makefile
@@ -319,6 +319,7 @@ SCRIPT_SH += git-merge-one-file.sh
SCRIPT_SH += git-merge-resolve.sh
SCRIPT_SH += git-mergetool.sh
SCRIPT_SH += git-mergetool--lib.sh
+SCRIPT_SH += git-notes.sh
SCRIPT_SH += git-parse-remote.sh
SCRIPT_SH += git-pull.sh
SCRIPT_SH += git-quiltimport.sh
diff --git a/command-list.txt b/command-list.txt
index fb03a2e..4296941 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -74,6 +74,7 @@ git-mktag plumbingmanipulators
git-mktree plumbingmanipulators
git-mv mainporcelain common
git-name-rev plumbinginterrogators
+git-notes mainporcelain
git-pack-objects plumbingmanipulators
git-pack-redundant plumbinginterrogators
git-pack-refs ancillarymanipulators
diff --git a/git-notes.sh b/git-notes.sh
new file mode 100755
index 0000000..f06c254
--- /dev/null
+++ b/git-notes.sh
@@ -0,0 +1,73 @@
+#!/bin/sh
+
+USAGE="(edit | show) [commit]"
+. git-sh-setup
+
+test -n "$3" && usage
+
+test -z "$1" && usage
+ACTION="$1"; shift
+
+test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="$(git config core.notesref)"
+test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="refs/notes/commits"
+
+COMMIT=$(git rev-parse --verify --default HEAD "$@") ||
+die "Invalid commit: $@"
+
+case "$ACTION" in
+edit)
+ if [ "${GIT_NOTES_REF#refs/notes/}" = "$GIT_NOTES_REF" ]; then
+ die "Refusing to edit notes in $GIT_NOTES_REF (outside of refs/notes/)"
+ fi
+
+ MSG_FILE="$GIT_DIR/new-notes-$COMMIT"
+ GIT_INDEX_FILE="$MSG_FILE.idx"
+ export GIT_INDEX_FILE
+
+ trap '
+ test -f "$MSG_FILE" && rm "$MSG_FILE"
+ test -f "$GIT_INDEX_FILE" && rm "$GIT_INDEX_FILE"
+ ' 0
+
+ GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MSG_FILE"
+
+ CURRENT_HEAD=$(git show-ref "$GIT_NOTES_REF" | cut -f 1 -d ' ')
+ if [ -z "$CURRENT_HEAD" ]; then
+ PARENT=
+ else
+ PARENT="-p $CURRENT_HEAD"
+ git read-tree "$GIT_NOTES_REF" || die "Could not read index"
+ git cat-file blob :$COMMIT >> "$MSG_FILE" 2> /dev/null
+ fi
+
+ core_editor="$(git config core.editor)"
+ ${GIT_EDITOR:-${core_editor:-${VISUAL:-${EDITOR:-vi}}}} "$MSG_FILE"
+
+ grep -v ^# < "$MSG_FILE" | git stripspace > "$MSG_FILE".processed
+ mv "$MSG_FILE".processed "$MSG_FILE"
+ if [ -s "$MSG_FILE" ]; then
+ BLOB=$(git hash-object -w "$MSG_FILE") ||
+ die "Could not write into object database"
+ git update-index --add --cacheinfo 0644 $BLOB $COMMIT ||
+ die "Could not write index"
+ else
+ test -z "$CURRENT_HEAD" &&
+ die "Will not initialise with empty tree"
+ git update-index --force-remove $COMMIT ||
+ die "Could not update index"
+ fi
+
+ TREE=$(git write-tree) || die "Could not write tree"
+ NEW_HEAD=$(echo Annotate $COMMIT | git commit-tree $TREE $PARENT) ||
+ die "Could not annotate"
+ git update-ref -m "Annotate $COMMIT" \
+ "$GIT_NOTES_REF" $NEW_HEAD $CURRENT_HEAD
+;;
+show)
+ git rev-parse -q --verify "$GIT_NOTES_REF":$COMMIT > /dev/null ||
+ die "No note for commit $COMMIT."
+ git show "$GIT_NOTES_REF":$COMMIT
+;;
+*)
+ usage
+esac
diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh
new file mode 100755
index 0000000..73e53be
--- /dev/null
+++ b/t/t3301-notes.sh
@@ -0,0 +1,114 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johannes E. Schindelin
+#
+
+test_description='Test commit notes'
+
+. ./test-lib.sh
+
+cat > fake_editor.sh << \EOF
+echo "$MSG" > "$1"
+echo "$MSG" >& 2
+EOF
+chmod a+x fake_editor.sh
+VISUAL=./fake_editor.sh
+export VISUAL
+
+test_expect_success 'cannot annotate non-existing HEAD' '
+ (MSG=3 && export MSG && test_must_fail git notes edit)
+'
+
+test_expect_success setup '
+ : > a1 &&
+ git add a1 &&
+ test_tick &&
+ git commit -m 1st &&
+ : > a2 &&
+ git add a2 &&
+ test_tick &&
+ git commit -m 2nd
+'
+
+test_expect_success 'need valid notes ref' '
+ (MSG=1 GIT_NOTES_REF=/ && export MSG GIT_NOTES_REF &&
+ test_must_fail git notes edit) &&
+ (MSG=2 GIT_NOTES_REF=/ && export MSG GIT_NOTES_REF &&
+ test_must_fail git notes show)
+'
+
+test_expect_success 'refusing to edit in refs/heads/' '
+ (MSG=1 GIT_NOTES_REF=refs/heads/bogus &&
+ export MSG GIT_NOTES_REF &&
+ test_must_fail git notes edit)
+'
+
+test_expect_success 'refusing to edit in refs/remotes/' '
+ (MSG=1 GIT_NOTES_REF=refs/remotes/bogus &&
+ export MSG GIT_NOTES_REF &&
+ test_must_fail git notes edit)
+'
+
+# 1 indicates caught gracefully by die, 128 means git-show barked
+test_expect_success 'handle empty notes gracefully' '
+ git notes show ; test 1 = $?
+'
+
+test_expect_success 'create notes' '
+ git config core.notesRef refs/notes/commits &&
+ MSG=b1 git notes edit &&
+ test ! -f .git/new-notes &&
+ test 1 = $(git ls-tree refs/notes/commits | wc -l) &&
+ test b1 = $(git notes show) &&
+ git show HEAD^ &&
+ test_must_fail git notes show HEAD^
+'
+
+cat > expect << EOF
+commit 268048bfb8a1fb38e703baceb8ab235421bf80c5
+Author: A U Thor <author@example.com>
+Date: Thu Apr 7 15:14:13 2005 -0700
+
+ 2nd
+
+Notes:
+ b1
+EOF
+
+test_expect_success 'show notes' '
+ ! (git cat-file commit HEAD | grep b1) &&
+ git log -1 > output &&
+ test_cmp expect output
+'
+test_expect_success 'create multi-line notes (setup)' '
+ : > a3 &&
+ git add a3 &&
+ test_tick &&
+ git commit -m 3rd &&
+ MSG="b3
+c3c3c3c3
+d3d3d3" git notes edit
+'
+
+cat > expect-multiline << EOF
+commit 1584215f1d29c65e99c6c6848626553fdd07fd75
+Author: A U Thor <author@example.com>
+Date: Thu Apr 7 15:15:13 2005 -0700
+
+ 3rd
+
+Notes:
+ b3
+ c3c3c3c3
+ d3d3d3
+EOF
+
+printf "\n" >> expect-multiline
+cat expect >> expect-multiline
+
+test_expect_success 'show multi-line notes' '
+ git log -2 > output &&
+ test_cmp expect-multiline output
+'
+
+test_done
--
1.6.4.rc3.138.ga6b98.dirty
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCHv3 3/8] Speed up git notes lookup
2009-07-29 2:25 [PATCHv3 0/8] RESEND: git notes Johan Herland
2009-07-29 2:25 ` [PATCHv3 1/8] Introduce commit notes Johan Herland
2009-07-29 2:25 ` [PATCHv3 2/8] Add a script to edit/inspect notes Johan Herland
@ 2009-07-29 2:25 ` Johan Herland
2009-07-29 2:25 ` [PATCHv3 4/8] Add an expensive test for git-notes Johan Herland
` (4 subsequent siblings)
7 siblings, 0 replies; 30+ messages in thread
From: Johan Herland @ 2009-07-29 2:25 UTC (permalink / raw)
To: gitster
Cc: git, johannes.schindelin, trast, tavestbo, git, chriscool,
spearce, Johannes Schindelin, Johan Herland
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To avoid looking up each and every commit in the notes ref's tree
object, which is very expensive, speed things up by slurping the tree
object's contents into a hash_map.
The idea for the hashmap singleton is from David Reiss, initial
benchmarking by Jeff King.
Note: the implementation allows for arbitrary entries in the notes
tree object, ignoring those that do not reference a valid object. This
allows you to annotate arbitrary branches, or objects.
This patch has been improved by the following contributions:
- Junio C Hamano: fixed an obvious error in initialize_hash_map()
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
notes.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
1 files changed, 102 insertions(+), 11 deletions(-)
diff --git a/notes.c b/notes.c
index b0cf553..bd73784 100644
--- a/notes.c
+++ b/notes.c
@@ -4,16 +4,112 @@
#include "refs.h"
#include "utf8.h"
#include "strbuf.h"
+#include "tree-walk.h"
+
+struct entry {
+ unsigned char commit_sha1[20];
+ unsigned char notes_sha1[20];
+};
+
+struct hash_map {
+ struct entry *entries;
+ off_t count, size;
+};
static int initialized;
+static struct hash_map hash_map;
+
+static int hash_index(struct hash_map *map, const unsigned char *sha1)
+{
+ int i = ((*(unsigned int *)sha1) % map->size);
+
+ for (;;) {
+ unsigned char *current = map->entries[i].commit_sha1;
+
+ if (!hashcmp(sha1, current))
+ return i;
+
+ if (is_null_sha1(current))
+ return -1 - i;
+
+ if (++i == map->size)
+ i = 0;
+ }
+}
+
+static void add_entry(const unsigned char *commit_sha1,
+ const unsigned char *notes_sha1)
+{
+ int index;
+
+ if (hash_map.count + 1 > hash_map.size >> 1) {
+ int i, old_size = hash_map.size;
+ struct entry *old = hash_map.entries;
+
+ hash_map.size = old_size ? old_size << 1 : 64;
+ hash_map.entries = (struct entry *)
+ xcalloc(sizeof(struct entry), hash_map.size);
+
+ for (i = 0; i < old_size; i++)
+ if (!is_null_sha1(old[i].commit_sha1)) {
+ index = -1 - hash_index(&hash_map,
+ old[i].commit_sha1);
+ memcpy(hash_map.entries + index, old + i,
+ sizeof(struct entry));
+ }
+ free(old);
+ }
+
+ index = hash_index(&hash_map, commit_sha1);
+ if (index < 0) {
+ index = -1 - index;
+ hash_map.count++;
+ }
+
+ hashcpy(hash_map.entries[index].commit_sha1, commit_sha1);
+ hashcpy(hash_map.entries[index].notes_sha1, notes_sha1);
+}
+
+static void initialize_hash_map(const char *notes_ref_name)
+{
+ unsigned char sha1[20], commit_sha1[20];
+ unsigned mode;
+ struct tree_desc desc;
+ struct name_entry entry;
+ void *buf;
+
+ if (!notes_ref_name || read_ref(notes_ref_name, commit_sha1) ||
+ get_tree_entry(commit_sha1, "", sha1, &mode))
+ return;
+
+ buf = fill_tree_descriptor(&desc, sha1);
+ if (!buf)
+ die("Could not read %s for notes-index", sha1_to_hex(sha1));
+
+ while (tree_entry(&desc, &entry))
+ if (!get_sha1(entry.path, commit_sha1))
+ add_entry(commit_sha1, entry.sha1);
+ free(buf);
+}
+
+static unsigned char *lookup_notes(const unsigned char *commit_sha1)
+{
+ int index;
+
+ if (!hash_map.size)
+ return NULL;
+
+ index = hash_index(&hash_map, commit_sha1);
+ if (index < 0)
+ return NULL;
+ return hash_map.entries[index].notes_sha1;
+}
void get_commit_notes(const struct commit *commit, struct strbuf *sb,
const char *output_encoding)
{
static const char *utf8 = "utf-8";
- struct strbuf name = STRBUF_INIT;
- const char *hex;
- unsigned char sha1[20];
+ unsigned char *sha1;
char *msg, *msg_p;
unsigned long linelen, msglen;
enum object_type type;
@@ -24,17 +120,12 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT);
else if (!notes_ref_name)
notes_ref_name = GIT_NOTES_DEFAULT_REF;
- if (notes_ref_name && read_ref(notes_ref_name, sha1))
- notes_ref_name = NULL;
+ initialize_hash_map(notes_ref_name);
initialized = 1;
}
- if (!notes_ref_name)
- return;
-
- strbuf_addf(&name, "%s:%s", notes_ref_name,
- sha1_to_hex(commit->object.sha1));
- if (get_sha1(name.buf, sha1))
+ sha1 = lookup_notes(commit->object.sha1);
+ if (!sha1)
return;
if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen ||
--
1.6.4.rc3.138.ga6b98.dirty
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCHv3 4/8] Add an expensive test for git-notes
2009-07-29 2:25 [PATCHv3 0/8] RESEND: git notes Johan Herland
` (2 preceding siblings ...)
2009-07-29 2:25 ` [PATCHv3 3/8] Speed up git notes lookup Johan Herland
@ 2009-07-29 2:25 ` Johan Herland
2009-07-29 2:25 ` [PATCHv3 5/8] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
` (3 subsequent siblings)
7 siblings, 0 replies; 30+ messages in thread
From: Johan Herland @ 2009-07-29 2:25 UTC (permalink / raw)
To: gitster
Cc: git, johannes.schindelin, trast, tavestbo, git, chriscool,
spearce, Johannes Schindelin, Johan Herland
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
git-notes have the potential of being pretty expensive, so test with
a lot of commits. A lot. So to make things cheaper, you have to
opt-in explicitely, by setting the environment variable
GIT_NOTES_TIMING_TESTS.
This patch has been improved by the following contributions:
- Junio C Hamano: tests: fix "export var=val"
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
t/t3302-notes-index-expensive.sh | 98 ++++++++++++++++++++++++++++++++++++++
1 files changed, 98 insertions(+), 0 deletions(-)
create mode 100755 t/t3302-notes-index-expensive.sh
diff --git a/t/t3302-notes-index-expensive.sh b/t/t3302-notes-index-expensive.sh
new file mode 100755
index 0000000..0ef3e95
--- /dev/null
+++ b/t/t3302-notes-index-expensive.sh
@@ -0,0 +1,98 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johannes E. Schindelin
+#
+
+test_description='Test commit notes index (expensive!)'
+
+. ./test-lib.sh
+
+test -z "$GIT_NOTES_TIMING_TESTS" && {
+ say Skipping timing tests
+ test_done
+ exit
+}
+
+create_repo () {
+ number_of_commits=$1
+ nr=0
+ parent=
+ test -d .git || {
+ git init &&
+ tree=$(git write-tree) &&
+ while [ $nr -lt $number_of_commits ]; do
+ test_tick &&
+ commit=$(echo $nr | git commit-tree $tree $parent) ||
+ return
+ parent="-p $commit"
+ nr=$(($nr+1))
+ done &&
+ git update-ref refs/heads/master $commit &&
+ {
+ GIT_INDEX_FILE=.git/temp; export GIT_INDEX_FILE;
+ git rev-list HEAD | cat -n | sed "s/^[ ][ ]*/ /g" |
+ while read nr sha1; do
+ blob=$(echo note $nr | git hash-object -w --stdin) &&
+ echo $sha1 | sed "s/^/0644 $blob 0 /"
+ done | git update-index --index-info &&
+ tree=$(git write-tree) &&
+ test_tick &&
+ commit=$(echo notes | git commit-tree $tree) &&
+ git update-ref refs/notes/commits $commit
+ } &&
+ git config core.notesRef refs/notes/commits
+ }
+}
+
+test_notes () {
+ count=$1 &&
+ git config core.notesRef refs/notes/commits &&
+ git log | grep "^ " > output &&
+ i=1 &&
+ while [ $i -le $count ]; do
+ echo " $(($count-$i))" &&
+ echo " note $i" &&
+ i=$(($i+1));
+ done > expect &&
+ git diff expect output
+}
+
+cat > time_notes << \EOF
+ mode=$1
+ i=1
+ while [ $i -lt $2 ]; do
+ case $1 in
+ no-notes)
+ GIT_NOTES_REF=non-existing; export GIT_NOTES_REF
+ ;;
+ notes)
+ unset GIT_NOTES_REF
+ ;;
+ esac
+ git log >/dev/null
+ i=$(($i+1))
+ done
+EOF
+
+time_notes () {
+ for mode in no-notes notes
+ do
+ echo $mode
+ /usr/bin/time sh ../time_notes $mode $1
+ done
+}
+
+for count in 10 100 1000 10000; do
+
+ mkdir $count
+ (cd $count;
+
+ test_expect_success "setup $count" "create_repo $count"
+
+ test_expect_success 'notes work' "test_notes $count"
+
+ test_expect_success 'notes timing' "time_notes 100"
+ )
+done
+
+test_done
--
1.6.4.rc3.138.ga6b98.dirty
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCHv3 5/8] Teach "-m <msg>" and "-F <file>" to "git notes edit"
2009-07-29 2:25 [PATCHv3 0/8] RESEND: git notes Johan Herland
` (3 preceding siblings ...)
2009-07-29 2:25 ` [PATCHv3 4/8] Add an expensive test for git-notes Johan Herland
@ 2009-07-29 2:25 ` Johan Herland
2009-07-29 7:57 ` Thomas Rast
2009-07-29 2:25 ` [PATCHv3 6/8] First draft of notes tree parser with support for fanout subtrees Johan Herland
` (2 subsequent siblings)
7 siblings, 1 reply; 30+ messages in thread
From: Johan Herland @ 2009-07-29 2:25 UTC (permalink / raw)
To: gitster
Cc: git, johannes.schindelin, trast, tavestbo, git, chriscool,
spearce, Johan Herland
The "-m" and "-F" options are already the established method
(in both git-commit and git-tag) to specify a commit/tag message
without invoking the editor. This patch teaches "git notes edit"
to respect the same options for specifying a notes message without
invoking the editor.
Multiple "-m" and/or "-F" options are concatenated as separate
paragraphs.
The patch also updates the "git notes" documentation and adds
selftests for the new functionality. Unfortunately, the added
selftests include a couple of lines with trailing whitespace
(without these the test will fail). This may cause git to warn
about "whitespace errors".
Signed-off-by: Johan Herland <johan@herland.net>
---
Documentation/git-notes.txt | 16 ++++++++++-
git-notes.sh | 64 +++++++++++++++++++++++++++++++++++++-----
t/t3301-notes.sh | 35 +++++++++++++++++++++++
3 files changed, 106 insertions(+), 9 deletions(-)
diff --git a/Documentation/git-notes.txt b/Documentation/git-notes.txt
index 7136016..94cceb1 100644
--- a/Documentation/git-notes.txt
+++ b/Documentation/git-notes.txt
@@ -8,7 +8,7 @@ git-notes - Add/inspect commit notes
SYNOPSIS
--------
[verse]
-'git-notes' (edit | show) [commit]
+'git-notes' (edit [-F <file> | -m <msg>] | show) [commit]
DESCRIPTION
-----------
@@ -33,6 +33,20 @@ show::
Show the notes for a given commit (defaults to HEAD).
+OPTIONS
+-------
+-m <msg>::
+ Use the given note message (instead of prompting).
+ If multiple `-m` (or `-F`) options are given, their
+ values are concatenated as separate paragraphs.
+
+-F <file>::
+ Take the note message from the given file. Use '-' to
+ read the note message from the standard input.
+ If multiple `-F` (or `-m`) options are given, their
+ values are concatenated as separate paragraphs.
+
+
Author
------
Written by Johannes Schindelin <johannes.schindelin@gmx.de>
diff --git a/git-notes.sh b/git-notes.sh
index f06c254..e642e47 100755
--- a/git-notes.sh
+++ b/git-notes.sh
@@ -1,16 +1,59 @@
#!/bin/sh
-USAGE="(edit | show) [commit]"
+USAGE="(edit [-F <file> | -m <msg>] | show) [commit]"
. git-sh-setup
-test -n "$3" && usage
-
test -z "$1" && usage
ACTION="$1"; shift
test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="$(git config core.notesref)"
test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="refs/notes/commits"
+MESSAGE=
+while test $# != 0
+do
+ case "$1" in
+ -m)
+ test "$ACTION" = "edit" || usage
+ shift
+ if test "$#" = "0"; then
+ die "error: option -m needs an argument"
+ else
+ if [ -z "$MESSAGE" ]; then
+ MESSAGE="$1"
+ else
+ MESSAGE="$MESSAGE
+
+$1"
+ fi
+ shift
+ fi
+ ;;
+ -F)
+ test "$ACTION" = "edit" || usage
+ shift
+ if test "$#" = "0"; then
+ die "error: option -F needs an argument"
+ else
+ if [ -z "$MESSAGE" ]; then
+ MESSAGE="$(cat "$1")"
+ else
+ MESSAGE="$MESSAGE
+
+$(cat "$1")"
+ fi
+ shift
+ fi
+ ;;
+ -*)
+ usage
+ ;;
+ *)
+ break
+ ;;
+ esac
+done
+
COMMIT=$(git rev-parse --verify --default HEAD "$@") ||
die "Invalid commit: $@"
@@ -29,19 +72,24 @@ edit)
test -f "$GIT_INDEX_FILE" && rm "$GIT_INDEX_FILE"
' 0
- GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MSG_FILE"
-
CURRENT_HEAD=$(git show-ref "$GIT_NOTES_REF" | cut -f 1 -d ' ')
if [ -z "$CURRENT_HEAD" ]; then
PARENT=
else
PARENT="-p $CURRENT_HEAD"
git read-tree "$GIT_NOTES_REF" || die "Could not read index"
- git cat-file blob :$COMMIT >> "$MSG_FILE" 2> /dev/null
fi
- core_editor="$(git config core.editor)"
- ${GIT_EDITOR:-${core_editor:-${VISUAL:-${EDITOR:-vi}}}} "$MSG_FILE"
+ if [ -z "$MESSAGE" ]; then
+ GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MSG_FILE"
+ if [ ! -z "$CURRENT_HEAD" ]; then
+ git cat-file blob :$COMMIT >> "$MSG_FILE" 2> /dev/null
+ fi
+ core_editor="$(git config core.editor)"
+ ${GIT_EDITOR:-${core_editor:-${VISUAL:-${EDITOR:-vi}}}} "$MSG_FILE"
+ else
+ echo "$MESSAGE" > "$MSG_FILE"
+ fi
grep -v ^# < "$MSG_FILE" | git stripspace > "$MSG_FILE".processed
mv "$MSG_FILE".processed "$MSG_FILE"
diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh
index 73e53be..9eaa338 100755
--- a/t/t3301-notes.sh
+++ b/t/t3301-notes.sh
@@ -110,5 +110,40 @@ test_expect_success 'show multi-line notes' '
git log -2 > output &&
test_cmp expect-multiline output
'
+test_expect_success 'create -m and -F notes (setup)' '
+ : > a4 &&
+ git add a4 &&
+ test_tick &&
+ git commit -m 4th &&
+ echo "xyzzy" > note5 &&
+ git notes edit -m spam -F note5 -m "foo
+bar
+baz"
+'
+
+cat > expect-m-and-F << EOF
+commit 15023535574ded8b1a89052b32673f84cf9582b8
+Author: A U Thor <author@example.com>
+Date: Thu Apr 7 15:16:13 2005 -0700
+
+ 4th
+
+Notes:
+ spam
+
+ xyzzy
+
+ foo
+ bar
+ baz
+EOF
+
+printf "\n" >> expect-m-and-F
+cat expect-multiline >> expect-m-and-F
+
+test_expect_success 'show -m and -F notes' '
+ git log -3 > output &&
+ test_cmp expect-m-and-F output
+'
test_done
--
1.6.4.rc3.138.ga6b98.dirty
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCHv3 6/8] First draft of notes tree parser with support for fanout subtrees
2009-07-29 2:25 [PATCHv3 0/8] RESEND: git notes Johan Herland
` (4 preceding siblings ...)
2009-07-29 2:25 ` [PATCHv3 5/8] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
@ 2009-07-29 2:25 ` Johan Herland
2009-07-29 16:45 ` Johannes Schindelin
2009-07-29 2:25 ` [PATCHv3 7/8] fast-import: Add support for importing commit notes Johan Herland
2009-07-29 2:25 ` [PATCHv3 8/8] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
7 siblings, 1 reply; 30+ messages in thread
From: Johan Herland @ 2009-07-29 2:25 UTC (permalink / raw)
To: gitster
Cc: git, johannes.schindelin, trast, tavestbo, git, chriscool,
spearce, Johan Herland
This is a relatively straightforward implementation of parsing notes trees
that use fanout directories to limit the size of individual tree objects.
This first draft uses a simple linked list for holding unparsed subtree
references (to be parsed on demand), and as such, this first draft
concentrates more on correctness than performance (AFAICS from t3302, there
is no measurable performance impact when no fanout subtrees are present).
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the _longest_ matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
(This might change in the future)
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout is allowed. All the above rules apply recursively.
E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
The patch includes new selftests for verifying the expected behaviour when
loading notes trees with various fanout schemes.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
---
notes.c | 165 ++++++++++++++++++++++++++++++++----
t/t3303-notes-subtrees.sh | 206 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 356 insertions(+), 15 deletions(-)
create mode 100755 t/t3303-notes-subtrees.sh
diff --git a/notes.c b/notes.c
index bd73784..7b50a70 100644
--- a/notes.c
+++ b/notes.c
@@ -16,8 +16,19 @@ struct hash_map {
off_t count, size;
};
+struct subtree_entry {
+ /*
+ * SHA1 prefix is stored in the first 19 bytes (w/trailing NUL bytes);
+ * length of SHA1 prefix is stored in the last byte
+ */
+ unsigned char sha1_prefix_w_len[20];
+ unsigned char subtree_sha1[20];
+ struct subtree_entry *next;
+};
+
static int initialized;
static struct hash_map hash_map;
+static struct subtree_entry *subtree_list;
static int hash_index(struct hash_map *map, const unsigned char *sha1)
{
@@ -70,39 +81,163 @@ static void add_entry(const unsigned char *commit_sha1,
hashcpy(hash_map.entries[index].notes_sha1, notes_sha1);
}
+/*
+ * Convert a partial SHA1 sum (hex format) to a SHA1 value.
+ * - hex - ASCII hex SHA1 segment
+ * - hex_len - Length of above segment. Must be multiple of 2 between 0 and 40
+ * - sha1 - Value of SHA1 is written here
+ * - sha1_len - Max #bytes to store in sha1, Must be between 0 and 20,
+ * and >= hex_len / 2
+ * Returns -1 on error (invalid arguments or invalid ASCII hex SHA1 format).
+ * Otherwise, returns number of bytes written to sha1 (hex_len / 2).
+ * Pads sha1 with NULs up to sha1_len (not included in returned length).
+ */
+static int get_sha1_hex_segment(const char *hex, unsigned int hex_len,
+ unsigned char *sha1, unsigned int sha1_len)
+{
+ unsigned int i, len = hex_len >> 1;
+ if (hex_len % 2 != 0 || len > sha1_len)
+ return -1;
+ for (i = 0; i < len; i++) {
+ unsigned int val = (hexval(hex[0]) << 4) | hexval(hex[1]);
+ if (val & ~0xff)
+ return -1;
+ *sha1++ = val;
+ hex += 2;
+ }
+ for (; i < sha1_len; i++)
+ *sha1++ = 0;
+ return len;
+}
+
+static void load_subtree(struct subtree_entry *se)
+{
+ unsigned char commit_sha1[20];
+ unsigned int prefix_len;
+ void *buf;
+ struct tree_desc desc;
+ struct name_entry entry;
+ struct subtree_entry *tmp_list = NULL, *tmp_last = NULL;
+
+ buf = fill_tree_descriptor(&desc, se->subtree_sha1);
+ if (!buf)
+ die("Could not read %s for notes-index",
+ sha1_to_hex(se->subtree_sha1));
+
+ prefix_len = se->sha1_prefix_w_len[19];
+ memcpy(commit_sha1, se->sha1_prefix_w_len, prefix_len);
+ while (tree_entry(&desc, &entry)) {
+ int len = get_sha1_hex_segment(entry.path, strlen(entry.path),
+ commit_sha1 + prefix_len, 20 - prefix_len);
+ if (len < 0)
+ continue; /* entry.path is not a SHA1 sum. Skip */
+ len += prefix_len;
+
+ /* If commit SHA1 is complete, assume note object */
+ if (len == 20)
+ add_entry(commit_sha1, entry.sha1);
+ /* If commit SHA1 is incomplete, assume note subtree */
+ else if (len < 20 && entry.mode == S_IFDIR) {
+ struct subtree_entry *n = (struct subtree_entry *)
+ xcalloc(sizeof(struct subtree_entry), 1);
+ hashcpy(n->sha1_prefix_w_len, commit_sha1);
+ n->sha1_prefix_w_len[19] = (unsigned char) len;
+ hashcpy(n->subtree_sha1, entry.sha1);
+
+ if (!tmp_list) {
+ tmp_list = n;
+ tmp_last = n;
+ }
+ else {
+ assert(!tmp_last->next);
+ assert(hashcmp(n->sha1_prefix_w_len,
+ tmp_last->sha1_prefix_w_len) > 0);
+ tmp_last->next = n;
+ tmp_last = n;
+ }
+ }
+ }
+ free(buf);
+ if (tmp_list) {
+ /* insert tmp_list immediately after se */
+ assert(hashcmp(tmp_list->sha1_prefix_w_len,
+ se->sha1_prefix_w_len) > 0);
+ if (se->next) {
+ assert(hashcmp(se->next->sha1_prefix_w_len,
+ tmp_last->sha1_prefix_w_len) > 0);
+ tmp_last->next = se->next;
+ }
+ se->next = tmp_list;
+ }
+}
+
static void initialize_hash_map(const char *notes_ref_name)
{
unsigned char sha1[20], commit_sha1[20];
unsigned mode;
- struct tree_desc desc;
- struct name_entry entry;
- void *buf;
+ struct subtree_entry root_tree;
if (!notes_ref_name || read_ref(notes_ref_name, commit_sha1) ||
get_tree_entry(commit_sha1, "", sha1, &mode))
return;
- buf = fill_tree_descriptor(&desc, sha1);
- if (!buf)
- die("Could not read %s for notes-index", sha1_to_hex(sha1));
+ hashclr(root_tree.sha1_prefix_w_len);
+ hashcpy(root_tree.subtree_sha1, sha1);
+ root_tree.next = NULL;
+ load_subtree(&root_tree);
+ subtree_list = root_tree.next;
+}
- while (tree_entry(&desc, &entry))
- if (!get_sha1(entry.path, commit_sha1))
- add_entry(commit_sha1, entry.sha1);
- free(buf);
+/*
+ * Compare the given commit SHA1 against the given subtree entry.
+ * Return -1 if the commit SHA1 cannot exist within the given subtree, or any
+ * subtree following it.
+ * Return 0 if the commit SHA1 _may_ exist within the given subtree.
+ * Return 1 if the commit SHA1 cannot exist within the given subtree, but may
+ * exist within a subtree following it.
+ */
+static int commit_subtree_cmp(const unsigned char *commit_sha1,
+ const struct subtree_entry *entry)
+{
+ unsigned int prefix_len = entry->sha1_prefix_w_len[19];
+ return memcmp(commit_sha1, entry->sha1_prefix_w_len, prefix_len);
+}
+
+static struct subtree_entry *lookup_subtree(const unsigned char *commit_sha1)
+{
+ struct subtree_entry *found = NULL, *cur = subtree_list;
+ while (cur) {
+ int cmp = commit_subtree_cmp(commit_sha1, cur);
+ if (!cmp)
+ found = cur;
+ if (cmp < 0)
+ break;
+ cur = cur->next;
+ }
+ return found;
}
static unsigned char *lookup_notes(const unsigned char *commit_sha1)
{
int index;
+ struct subtree_entry *subtree;
- if (!hash_map.size)
- return NULL;
+ /* First, try to find the commit SHA1 directly in hash map */
+ index = hash_map.size ? hash_index(&hash_map, commit_sha1) : -1;
+ if (index >= 0)
+ return hash_map.entries[index].notes_sha1;
- index = hash_index(&hash_map, commit_sha1);
- if (index < 0)
+ /* Next, try finding a subtree that may contain the commit SHA1 */
+ subtree = lookup_subtree(commit_sha1);
+
+ /* Give up if no subtree found, or if subtree is already loaded */
+ if (!subtree || is_null_sha1(subtree->subtree_sha1))
return NULL;
- return hash_map.entries[index].notes_sha1;
+
+ /* Load subtree into hash_map, and retry lookup recursively */
+ load_subtree(subtree);
+ hashclr(subtree->subtree_sha1);
+ return lookup_notes(commit_sha1);
}
void get_commit_notes(const struct commit *commit, struct strbuf *sb,
diff --git a/t/t3303-notes-subtrees.sh b/t/t3303-notes-subtrees.sh
new file mode 100755
index 0000000..40bb3f4
--- /dev/null
+++ b/t/t3303-notes-subtrees.sh
@@ -0,0 +1,206 @@
+#!/bin/sh
+
+test_description='Test commit notes organized in subtrees'
+
+. ./test-lib.sh
+
+number_of_commits=100
+
+start_note_commit () {
+ test_tick &&
+ cat <<INPUT_END
+commit refs/notes/commits
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+notes
+COMMIT
+
+from refs/notes/commits^0
+deleteall
+INPUT_END
+
+}
+
+verify_notes () {
+ git log | grep "^ " > output &&
+ i=$number_of_commits &&
+ while [ $i -gt 0 ]; do
+ echo " commit #$i" &&
+ echo " note for commit #$i" &&
+ i=$(($i-1));
+ done > expect &&
+ test_cmp expect output
+}
+
+test_expect_success 'setup: create $number_of_commits commits' '
+
+ (
+ nr=0 &&
+ while [ $nr -lt $number_of_commits ]; do
+ nr=$(($nr+1)) &&
+ test_tick &&
+ cat <<INPUT_END
+commit refs/heads/master
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit #$nr
+COMMIT
+
+M 644 inline file
+data <<EOF
+file in commit #$nr
+EOF
+
+INPUT_END
+
+ done &&
+ test_tick &&
+ cat <<INPUT_END
+commit refs/notes/commits
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+no notes
+COMMIT
+
+deleteall
+
+INPUT_END
+
+ ) |
+ git fast-import --quiet &&
+ git config core.notesRef refs/notes/commits
+'
+
+test_expect_success 'test notes in 2/38-fanout' '
+
+ (
+ start_note_commit &&
+ nr=$number_of_commits &&
+ git rev-list refs/heads/master |
+ while read sha1; do
+ note_path=$(echo "$sha1" | sed "s|^..|&/|")
+ cat <<INPUT_END &&
+M 100644 inline $note_path
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+ nr=$(($nr-1))
+ done
+ ) |
+ git fast-import --quiet
+'
+
+test_expect_success 'verify notes in 2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in 4/36-fanout' '
+
+ (
+ start_note_commit &&
+ nr=$number_of_commits &&
+ git rev-list refs/heads/master |
+ while read sha1; do
+ note_path=$(echo "$sha1" | sed "s|^....|&/|")
+ cat <<INPUT_END &&
+M 100644 inline $note_path
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+ nr=$(($nr-1))
+ done
+ ) |
+ git fast-import --quiet
+'
+
+test_expect_success 'verify notes in 4/36-fanout' 'verify_notes'
+
+test_expect_success 'test notes in 4/36-fanout overriding 2/38-fanout' '
+
+ (
+ start_note_commit &&
+ nr=$number_of_commits &&
+ git rev-list refs/heads/master |
+ while read sha1; do
+ ignored_note_path=$(echo "$sha1" | sed "s|^..|&/|")
+ preferred_note_path=$(echo "$sha1" | sed "s|^....|&/|")
+ cat <<INPUT_END &&
+M 100644 inline $ignored_note_path
+data <<EOF
+IGNORED note for commit #$nr
+EOF
+
+M 100644 inline $preferred_note_path
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+ nr=$(($nr-1))
+ done
+ ) |
+ git fast-import --quiet
+'
+
+test_expect_success 'verify notes in 4/36-fanout overriding 2/38-fanout' 'verify_notes'
+
+test_expect_success 'test notes in 2/2/36-fanout' '
+
+ (
+ start_note_commit &&
+ nr=$number_of_commits &&
+ git rev-list refs/heads/master |
+ while read sha1; do
+ note_path=$(echo "$sha1" | sed "s|^\(..\)\(..\)|\1/\2/|")
+ cat <<INPUT_END &&
+M 100644 inline $note_path
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+ nr=$(($nr-1))
+ done
+ ) |
+ git fast-import --quiet
+'
+
+test_expect_success 'verify notes in 2/2/36-fanout' 'verify_notes'
+
+test_expect_success 'test notes in 2/38-fanout overriding 2/2/36-fanout' '
+
+ (
+ start_note_commit &&
+ nr=$number_of_commits &&
+ git rev-list refs/heads/master |
+ while read sha1; do
+ ignored_note_path=$(echo "$sha1" | sed "s|^\(..\)\(..\)|\1/\2/|")
+ preferred_note_path=$(echo "$sha1" | sed "s|^..|&/|")
+ cat <<INPUT_END &&
+M 100644 inline $ignored_note_path
+data <<EOF
+IGNORED note for commit #$nr
+EOF
+
+M 100644 inline $preferred_note_path
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+ nr=$(($nr-1))
+ done
+ ) |
+ git fast-import --quiet
+'
+
+test_expect_success 'verify notes in 2/38-fanout overriding 2/2/36-fanout' 'verify_notes'
+
+test_done
--
1.6.4.rc3.138.ga6b98.dirty
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCHv3 7/8] fast-import: Add support for importing commit notes
2009-07-29 2:25 [PATCHv3 0/8] RESEND: git notes Johan Herland
` (5 preceding siblings ...)
2009-07-29 2:25 ` [PATCHv3 6/8] First draft of notes tree parser with support for fanout subtrees Johan Herland
@ 2009-07-29 2:25 ` Johan Herland
2009-07-29 14:21 ` Shawn O. Pearce
2009-07-29 2:25 ` [PATCHv3 8/8] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
7 siblings, 1 reply; 30+ messages in thread
From: Johan Herland @ 2009-07-29 2:25 UTC (permalink / raw)
To: gitster
Cc: git, johannes.schindelin, trast, tavestbo, git, chriscool,
spearce, Johan Herland
Introduce a 'notemodify' subcommand of the 'commit' command. This subcommand
is similar to 'filemodify', except that no mode is supplied (all notes have
mode 0644), and the path is set to the hex SHA1 of the given "comittish".
This enables fast import of note objects along with their associated commits,
since the notes can now be named using the mark references of their
corresponding commits.
The patch also includes a test case of the added functionality.
Cc: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Johan Herland <johan@herland.net>
---
Documentation/git-fast-import.txt | 45 +++++++++--
fast-import.c | 88 +++++++++++++++++++-
t/t9300-fast-import.sh | 166 +++++++++++++++++++++++++++++++++++++
3 files changed, 289 insertions(+), 10 deletions(-)
diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index c2f483a..288032c 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -316,7 +316,7 @@ change to the project.
data
('from' SP <committish> LF)?
('merge' SP <committish> LF)?
- (filemodify | filedelete | filecopy | filerename | filedeleteall)*
+ (filemodify | filedelete | filecopy | filerename | filedeleteall | notemodify)*
LF?
....
@@ -339,14 +339,13 @@ commit message use a 0 length data. Commit messages are free-form
and are not interpreted by Git. Currently they must be encoded in
UTF-8, as fast-import does not permit other encodings to be specified.
-Zero or more `filemodify`, `filedelete`, `filecopy`, `filerename`
-and `filedeleteall` commands
+Zero or more `filemodify`, `filedelete`, `filecopy`, `filerename`,
+`filedeleteall` and `notemodify` commands
may be included to update the contents of the branch prior to
creating the commit. These commands may be supplied in any order.
However it is recommended that a `filedeleteall` command precede
-all `filemodify`, `filecopy` and `filerename` commands in the same
-commit, as `filedeleteall`
-wipes the branch clean (see below).
+all `filemodify`, `filecopy`, `filerename` and `notemodify` commands in
+the same commit, as `filedeleteall` wipes the branch clean (see below).
The `LF` after the command is optional (it used to be required).
@@ -595,6 +594,40 @@ more memory per active branch (less than 1 MiB for even most large
projects); so frontends that can easily obtain only the affected
paths for a commit are encouraged to do so.
+`notemodify`
+^^^^^^^^^^^^
+Included in a `commit` command to add a new note (annotating a given
+commit) or change the content of an existing note. This command has
+two different means of specifying the content of the note.
+
+External data format::
+ The data content for the note was already supplied by a prior
+ `blob` command. The frontend just needs to connect it to the
+ commit that is to be annotated.
++
+....
+ 'N' SP <dataref> SP <committish> LF
+....
++
+Here `<dataref>` can be either a mark reference (`:<idnum>`)
+set by a prior `blob` command, or a full 40-byte SHA-1 of an
+existing Git blob object.
+
+Inline data format::
+ The data content for the note has not been supplied yet.
+ The frontend wants to supply it as part of this modify
+ command.
++
+....
+ 'N' SP 'inline' SP <committish> LF
+ data
+....
++
+See below for a detailed description of the `data` command.
+
+In both formats `<committish>` is any of the commit specification
+expressions also accepted by `from` (see above).
+
`mark`
~~~~~~
Arranges for fast-import to save a reference to the current object, allowing
diff --git a/fast-import.c b/fast-import.c
index 7ef9865..97e6e9e 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -22,8 +22,8 @@ Format of STDIN stream:
('author' sp name sp '<' email '>' sp when lf)?
'committer' sp name sp '<' email '>' sp when lf
commit_msg
- ('from' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf)?
- ('merge' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf)*
+ ('from' sp committish lf)?
+ ('merge' sp committish lf)*
file_change*
lf?;
commit_msg ::= data;
@@ -41,15 +41,18 @@ Format of STDIN stream:
file_obm ::= 'M' sp mode sp (hexsha1 | idnum) sp path_str lf;
file_inm ::= 'M' sp mode sp 'inline' sp path_str lf
data;
+ note_obm ::= 'N' sp (hexsha1 | idnum) sp committish lf;
+ note_inm ::= 'N' sp 'inline' sp committish lf
+ data;
new_tag ::= 'tag' sp tag_str lf
- 'from' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf
+ 'from' sp committish lf
('tagger' sp name sp '<' email '>' sp when lf)?
tag_msg;
tag_msg ::= data;
reset_branch ::= 'reset' sp ref_str lf
- ('from' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf)?
+ ('from' sp committish lf)?
lf?;
checkpoint ::= 'checkpoint' lf
@@ -88,6 +91,7 @@ Format of STDIN stream:
# stream formatting is: \, " and LF. Otherwise these values
# are UTF8.
#
+ committish ::= (ref_str | hexsha1 | sha1exp_str | idnum);
ref_str ::= ref;
sha1exp_str ::= sha1exp;
tag_str ::= tag;
@@ -2003,6 +2007,80 @@ static void file_change_cr(struct branch *b, int rename)
leaf.tree);
}
+static void note_change_n(struct branch *b)
+{
+ const char *p = command_buf.buf + 2;
+ static struct strbuf uq = STRBUF_INIT;
+ struct object_entry *oe = oe;
+ struct branch *s;
+ unsigned char sha1[20], commit_sha1[20];
+ uint16_t inline_data = 0;
+
+ /* <dataref> or 'inline' */
+ if (*p == ':') {
+ char *x;
+ oe = find_mark(strtoumax(p + 1, &x, 10));
+ hashcpy(sha1, oe->sha1);
+ p = x;
+ } else if (!prefixcmp(p, "inline")) {
+ inline_data = 1;
+ p += 6;
+ } else {
+ if (get_sha1_hex(p, sha1))
+ die("Invalid SHA1: %s", command_buf.buf);
+ oe = find_object(sha1);
+ p += 40;
+ }
+ if (*p++ != ' ')
+ die("Missing space after SHA1: %s", command_buf.buf);
+
+ /* <committish> */
+ s = lookup_branch(p);
+ if (s) {
+ hashcpy(commit_sha1, s->sha1);
+ } else if (*p == ':') {
+ uintmax_t commit_mark = strtoumax(p + 1, NULL, 10);
+ struct object_entry *commit_oe = find_mark(commit_mark);
+ if (commit_oe->type != OBJ_COMMIT)
+ die("Mark :%" PRIuMAX " not a commit", commit_mark);
+ hashcpy(commit_sha1, commit_oe->sha1);
+ } else if (!get_sha1(p, commit_sha1)) {
+ unsigned long size;
+ char *buf = read_object_with_reference(commit_sha1,
+ commit_type, &size, commit_sha1);
+ if (!buf || size < 46)
+ die("Not a valid commit: %s", p);
+ free(buf);
+ } else
+ die("Invalid ref name or SHA1 expression: %s", p);
+
+ if (inline_data) {
+ static struct strbuf buf = STRBUF_INIT;
+
+ if (p != uq.buf) {
+ strbuf_addstr(&uq, p);
+ p = uq.buf;
+ }
+ read_next_command();
+ parse_data(&buf);
+ store_object(OBJ_BLOB, &buf, &last_blob, sha1, 0);
+ } else if (oe) {
+ if (oe->type != OBJ_BLOB)
+ die("Not a blob (actually a %s): %s",
+ typename(oe->type), command_buf.buf);
+ } else {
+ enum object_type type = sha1_object_info(sha1, NULL);
+ if (type < 0)
+ die("Blob not found: %s", command_buf.buf);
+ if (type != OBJ_BLOB)
+ die("Not a blob (actually a %s): %s",
+ typename(type), command_buf.buf);
+ }
+
+ tree_content_set(&b->branch_tree, sha1_to_hex(commit_sha1), sha1,
+ S_IFREG | 0644, NULL);
+}
+
static void file_change_deleteall(struct branch *b)
{
release_tree_content_recursive(b->branch_tree.tree);
@@ -2172,6 +2250,8 @@ static void parse_new_commit(void)
file_change_cr(b, 1);
else if (!prefixcmp(command_buf.buf, "C "))
file_change_cr(b, 0);
+ else if (!prefixcmp(command_buf.buf, "N "))
+ note_change_n(b);
else if (!strcmp("deleteall", command_buf.buf))
file_change_deleteall(b);
else {
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 821be7c..b49815d 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -1088,4 +1088,170 @@ INPUT_END
test_expect_success 'P: fail on blob mark in gitlink' '
test_must_fail git fast-import <input'
+###
+### series Q (notes)
+###
+
+note1_data="Note for the first commit"
+note2_data="Note for the second commit"
+note3_data="Note for the third commit"
+
+test_tick
+cat >input <<INPUT_END
+blob
+mark :2
+data <<EOF
+$file2_data
+EOF
+
+commit refs/heads/notes-test
+mark :3
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+first (:3)
+COMMIT
+
+M 644 :2 file2
+
+blob
+mark :4
+data $file4_len
+$file4_data
+commit refs/heads/notes-test
+mark :5
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+second (:5)
+COMMIT
+
+M 644 :4 file4
+
+commit refs/heads/notes-test
+mark :6
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+third (:6)
+COMMIT
+
+M 644 inline file5
+data <<EOF
+$file5_data
+EOF
+
+M 755 inline file6
+data <<EOF
+$file6_data
+EOF
+
+blob
+mark :7
+data <<EOF
+$note1_data
+EOF
+
+blob
+mark :8
+data <<EOF
+$note2_data
+EOF
+
+commit refs/notes/foobar
+mark :9
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+notes (:9)
+COMMIT
+
+N :7 :3
+N :8 :5
+N inline :6
+data <<EOF
+$note3_data
+EOF
+
+INPUT_END
+test_expect_success \
+ 'Q: commit notes' \
+ 'git fast-import <input &&
+ git whatchanged notes-test'
+test_expect_success \
+ 'Q: verify pack' \
+ 'for p in .git/objects/pack/*.pack;do git verify-pack $p||exit;done'
+
+commit1=$(git rev-parse notes-test~2)
+commit2=$(git rev-parse notes-test^)
+commit3=$(git rev-parse notes-test)
+
+cat >expect <<EOF
+author $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+
+first (:3)
+EOF
+test_expect_success \
+ 'Q: verify first commit' \
+ 'git cat-file commit notes-test~2 | sed 1d >actual &&
+ test_cmp expect actual'
+
+cat >expect <<EOF
+parent $commit1
+author $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+
+second (:5)
+EOF
+test_expect_success \
+ 'Q: verify second commit' \
+ 'git cat-file commit notes-test^ | sed 1d >actual &&
+ test_cmp expect actual'
+
+cat >expect <<EOF
+parent $commit2
+author $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+
+third (:6)
+EOF
+test_expect_success \
+ 'Q: verify third commit' \
+ 'git cat-file commit notes-test | sed 1d >actual &&
+ test_cmp expect actual'
+
+cat >expect <<EOF
+author $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+
+notes (:9)
+EOF
+test_expect_success \
+ 'Q: verify notes commit' \
+ 'git cat-file commit refs/notes/foobar | sed 1d >actual &&
+ test_cmp expect actual'
+
+cat >expect.unsorted <<EOF
+100644 blob $commit1
+100644 blob $commit2
+100644 blob $commit3
+EOF
+cat expect.unsorted | sort >expect
+test_expect_success \
+ 'Q: verify notes tree' \
+ 'git cat-file -p refs/notes/foobar^{tree} | sed "s/ [0-9a-f]* / /" >actual &&
+ test_cmp expect actual'
+
+echo "$note1_data" >expect
+test_expect_success \
+ 'Q: verify note for first commit' \
+ 'git cat-file blob refs/notes/foobar:$commit1 >actual && test_cmp expect actual'
+
+echo "$note2_data" >expect
+test_expect_success \
+ 'Q: verify note for second commit' \
+ 'git cat-file blob refs/notes/foobar:$commit2 >actual && test_cmp expect actual'
+
+echo "$note3_data" >expect
+test_expect_success \
+ 'Q: verify note for third commit' \
+ 'git cat-file blob refs/notes/foobar:$commit3 >actual && test_cmp expect actual'
+
test_done
--
1.6.4.rc3.138.ga6b98.dirty
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCHv3 8/8] t3302-notes-index-expensive: Speed up create_repo()
2009-07-29 2:25 [PATCHv3 0/8] RESEND: git notes Johan Herland
` (6 preceding siblings ...)
2009-07-29 2:25 ` [PATCHv3 7/8] fast-import: Add support for importing commit notes Johan Herland
@ 2009-07-29 2:25 ` Johan Herland
2009-07-29 16:46 ` Johannes Schindelin
7 siblings, 1 reply; 30+ messages in thread
From: Johan Herland @ 2009-07-29 2:25 UTC (permalink / raw)
To: gitster
Cc: git, johannes.schindelin, trast, tavestbo, git, chriscool,
spearce, Johan Herland
Creating repos with 10/100/1000/10000 commits and notes takes a lot of time.
However, using git-fast-import to do the job is a lot more efficient than
using plumbing commands to do the same.
This patch decreases the overall run-time of this test on my machine from
~3 to ~1 minutes.
Signed-off-by: Johan Herland <johan@herland.net>
---
t/t3302-notes-index-expensive.sh | 74 ++++++++++++++++++++++++--------------
1 files changed, 47 insertions(+), 27 deletions(-)
diff --git a/t/t3302-notes-index-expensive.sh b/t/t3302-notes-index-expensive.sh
index 0ef3e95..ee84fc4 100755
--- a/t/t3302-notes-index-expensive.sh
+++ b/t/t3302-notes-index-expensive.sh
@@ -16,30 +16,50 @@ test -z "$GIT_NOTES_TIMING_TESTS" && {
create_repo () {
number_of_commits=$1
nr=0
- parent=
test -d .git || {
git init &&
- tree=$(git write-tree) &&
- while [ $nr -lt $number_of_commits ]; do
- test_tick &&
- commit=$(echo $nr | git commit-tree $tree $parent) ||
- return
- parent="-p $commit"
- nr=$(($nr+1))
- done &&
- git update-ref refs/heads/master $commit &&
- {
- GIT_INDEX_FILE=.git/temp; export GIT_INDEX_FILE;
- git rev-list HEAD | cat -n | sed "s/^[ ][ ]*/ /g" |
- while read nr sha1; do
- blob=$(echo note $nr | git hash-object -w --stdin) &&
- echo $sha1 | sed "s/^/0644 $blob 0 /"
- done | git update-index --index-info &&
- tree=$(git write-tree) &&
+ (
+ while [ $nr -lt $number_of_commits ]; do
+ nr=$(($nr+1))
+ mark=$(($nr+$nr))
+ notemark=$(($mark+1))
+ test_tick &&
+ cat <<INPUT_END &&
+commit refs/heads/master
+mark :$mark
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit #$nr
+COMMIT
+
+M 644 inline file
+data <<EOF
+file in commit #$nr
+EOF
+
+blob
+mark :$notemark
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+ echo "N :$notemark :$mark" >> note_commit
+ done &&
test_tick &&
- commit=$(echo notes | git commit-tree $tree) &&
- git update-ref refs/notes/commits $commit
- } &&
+ cat <<INPUT_END &&
+commit refs/notes/commits
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+notes
+COMMIT
+
+INPUT_END
+
+ cat note_commit
+ ) |
+ git fast-import --quiet &&
git config core.notesRef refs/notes/commits
}
}
@@ -48,13 +68,13 @@ test_notes () {
count=$1 &&
git config core.notesRef refs/notes/commits &&
git log | grep "^ " > output &&
- i=1 &&
- while [ $i -le $count ]; do
- echo " $(($count-$i))" &&
- echo " note $i" &&
- i=$(($i+1));
+ i=$count &&
+ while [ $i -gt 0 ]; do
+ echo " commit #$i" &&
+ echo " note for commit #$i" &&
+ i=$(($i-1));
done > expect &&
- git diff expect output
+ test_cmp expect output
}
cat > time_notes << \EOF
--
1.6.4.rc3.138.ga6b98.dirty
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCHv3 5/8] Teach "-m <msg>" and "-F <file>" to "git notes edit"
2009-07-29 2:25 ` [PATCHv3 5/8] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
@ 2009-07-29 7:57 ` Thomas Rast
2009-07-30 1:02 ` Johan Herland
0 siblings, 1 reply; 30+ messages in thread
From: Thomas Rast @ 2009-07-29 7:57 UTC (permalink / raw)
To: Johan Herland
Cc: gitster, git, johannes.schindelin, tavestbo, git, chriscool,
spearce
Johan Herland wrote:
> diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh
[...]
> +cat > expect-m-and-F << EOF
> +commit 15023535574ded8b1a89052b32673f84cf9582b8
> +Author: A U Thor <author@example.com>
> +Date: Thu Apr 7 15:16:13 2005 -0700
> +
> + 4th
> +
> +Notes:
> + spam
> +
> + xyzzy
> +
> + foo
There's significant trailing whitespace here (in the lines between
spam, xyzzy and foo) that initially broke the test for me because I
use apply.whitespace=fix. Can you guard the whitespace if it is
really important, with something like
sed 's/#$//' > expect <<EOF
whitespace: #
EOF
Thanks!
--
Thomas Rast
trast@{inf,student}.ethz.ch
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv3 1/8] Introduce commit notes
2009-07-29 2:25 ` [PATCHv3 1/8] Introduce commit notes Johan Herland
@ 2009-07-29 8:52 ` Alex Riesen
2009-07-29 16:40 ` Johannes Schindelin
2009-07-30 0:42 ` Johan Herland
0 siblings, 2 replies; 30+ messages in thread
From: Alex Riesen @ 2009-07-29 8:52 UTC (permalink / raw)
To: Johan Herland
Cc: gitster, git, johannes.schindelin, trast, tavestbo, git,
chriscool, spearce
On Wed, Jul 29, 2009 at 04:25, Johan Herland<johan@herland.net> wrote:
> +void get_commit_notes(const struct commit *commit, struct strbuf *sb,
> + const char *output_encoding)
> +{
> + static const char *utf8 = "utf-8";
Using an array
const char utf8[] = "utf-8";
costs you less BSS (no separate storage for the pointer).
> @@ -963,5 +964,9 @@ void pretty_print_commit(enum cmit_fmt fmt, const struct commit *commit,
> +
> + if (fmt != CMIT_FMT_ONELINE)
> + get_commit_notes(commit, sb, encoding);
> +
Someday we will need a way to switch off the display of notes
without resolving to oneline format.
Is there a notes specifier for the printf-like log message formatting
(--pretty=format: or --format) planned, BTW?
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv3 7/8] fast-import: Add support for importing commit notes
2009-07-29 2:25 ` [PATCHv3 7/8] fast-import: Add support for importing commit notes Johan Herland
@ 2009-07-29 14:21 ` Shawn O. Pearce
2009-07-30 1:29 ` Johan Herland
0 siblings, 1 reply; 30+ messages in thread
From: Shawn O. Pearce @ 2009-07-29 14:21 UTC (permalink / raw)
To: Johan Herland
Cc: gitster, git, johannes.schindelin, trast, tavestbo, git,
chriscool
Johan Herland <johan@herland.net> wrote:
> Introduce a 'notemodify' subcommand of the 'commit' command. This subcommand
> is similar to 'filemodify', except that no mode is supplied (all notes have
> mode 0644), and the path is set to the hex SHA1 of the given "comittish".
>
> This enables fast import of note objects along with their associated commits,
> since the notes can now be named using the mark references of their
> corresponding commits.
>
> The patch also includes a test case of the added functionality.
Seems sane to me.
Acked-by: Shawn O. Pearce <spearce@spearce.org>
> +static void note_change_n(struct branch *b)
> +{
...
> + tree_content_set(&b->branch_tree, sha1_to_hex(commit_sha1), sha1,
> + S_IFREG | 0644, NULL);
I thought you wanted to use the note code to handle the name
formatting here?
--
Shawn.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv3 1/8] Introduce commit notes
2009-07-29 8:52 ` Alex Riesen
@ 2009-07-29 16:40 ` Johannes Schindelin
2009-07-30 0:50 ` Johan Herland
2009-07-30 0:42 ` Johan Herland
1 sibling, 1 reply; 30+ messages in thread
From: Johannes Schindelin @ 2009-07-29 16:40 UTC (permalink / raw)
To: Alex Riesen
Cc: Johan Herland, gitster, git, trast, tavestbo, git, chriscool,
spearce
[-- Attachment #1: Type: TEXT/PLAIN, Size: 974 bytes --]
Hi,
On Wed, 29 Jul 2009, Alex Riesen wrote:
> On Wed, Jul 29, 2009 at 04:25, Johan Herland<johan@herland.net> wrote:
> > +void get_commit_notes(const struct commit *commit, struct strbuf *sb,
> > + const char *output_encoding)
> > +{
> > + static const char *utf8 = "utf-8";
>
> Using an array
>
> const char utf8[] = "utf-8";
>
> costs you less BSS (no separate storage for the pointer).
Good to know!
> > @@ -963,5 +964,9 @@ void pretty_print_commit(enum cmit_fmt fmt, const struct commit *commit,
> > +
> > + if (fmt != CMIT_FMT_ONELINE)
> > + get_commit_notes(commit, sb, encoding);
> > +
>
> Someday we will need a way to switch off the display of notes
> without resolving to oneline format.
> Is there a notes specifier for the printf-like log message formatting
> (--pretty=format: or --format) planned, BTW?
That would probably be something like "GIT_NOTES_REF=nyanyanya git log"?
Ciao,
Dscho
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv3 6/8] First draft of notes tree parser with support for fanout subtrees
2009-07-29 2:25 ` [PATCHv3 6/8] First draft of notes tree parser with support for fanout subtrees Johan Herland
@ 2009-07-29 16:45 ` Johannes Schindelin
2009-07-30 0:18 ` Testing performance of the notes lookup code (Was: [PATCHv3 6/8] First draft of notes tree parser with support for fanout subtrees) Johan Herland
0 siblings, 1 reply; 30+ messages in thread
From: Johannes Schindelin @ 2009-07-29 16:45 UTC (permalink / raw)
To: Johan Herland; +Cc: gitster, git, trast, tavestbo, git, chriscool, spearce
Hi,
On Wed, 29 Jul 2009, Johan Herland wrote:
> This is a relatively straightforward implementation of parsing notes
> trees that use fanout directories to limit the size of individual tree
> objects. This first draft uses a simple linked list for holding unparsed
> subtree references (to be parsed on demand), and as such, this first
> draft concentrates more on correctness than performance (AFAICS from
> t3302, there is no measurable performance impact when no fanout subtrees
> are present).
I know you want to have something working first and optimize then, but I
imagined that the hashmap can actually contain the entries of the partial
hashes, too. You'll need to extend the data type, of course, to be able
to say just how many digits of the SHA-1 are valid, and I guess for
consistency you'll need to pad with 0s.
BTW have you done any performance benchmarks? If so, how do they look?
Ciao,
Dscho
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv3 8/8] t3302-notes-index-expensive: Speed up create_repo()
2009-07-29 2:25 ` [PATCHv3 8/8] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
@ 2009-07-29 16:46 ` Johannes Schindelin
0 siblings, 0 replies; 30+ messages in thread
From: Johannes Schindelin @ 2009-07-29 16:46 UTC (permalink / raw)
To: Johan Herland; +Cc: gitster, git, trast, tavestbo, git, chriscool, spearce
Hi,
On Wed, 29 Jul 2009, Johan Herland wrote:
> Creating repos with 10/100/1000/10000 commits and notes takes a lot of time.
> However, using git-fast-import to do the job is a lot more efficient than
> using plumbing commands to do the same.
>
> This patch decreases the overall run-time of this test on my machine from
> ~3 to ~1 minutes.
ACK!
Ciao,
Dscho
^ permalink raw reply [flat|nested] 30+ messages in thread
* Testing performance of the notes lookup code (Was: [PATCHv3 6/8] First draft of notes tree parser with support for fanout subtrees)
2009-07-29 16:45 ` Johannes Schindelin
@ 2009-07-30 0:18 ` Johan Herland
2009-08-01 2:36 ` [RFC] First draft of 256-tree structure for storing notes Johan Herland
0 siblings, 1 reply; 30+ messages in thread
From: Johan Herland @ 2009-07-30 0:18 UTC (permalink / raw)
To: Johannes Schindelin
Cc: gitster, git, trast, tavestbo, git, chriscool, spearce
On Wednesday 29 July 2009, Johannes Schindelin wrote:
> On Wed, 29 Jul 2009, Johan Herland wrote:
> > This is a relatively straightforward implementation of parsing notes
> > trees that use fanout directories to limit the size of individual tree
> > objects. This first draft uses a simple linked list for holding
> > unparsed subtree references (to be parsed on demand), and as such, this
> > first draft concentrates more on correctness than performance (AFAICS
> > from t3302, there is no measurable performance impact when no fanout
> > subtrees are present).
>
> I know you want to have something working first and optimize then, but I
> imagined that the hashmap can actually contain the entries of the partial
> hashes, too. You'll need to extend the data type, of course, to be able
> to say just how many digits of the SHA-1 are valid, and I guess for
> consistency you'll need to pad with 0s.
Thanks for the ideas. I will look into this once I have a set of
performance tests that I feel give a better picture of the notes parsing
performance.
> BTW have you done any performance benchmarks? If so, how do they look?
I've just started. As I said above, t3302 (which has no fanout) is not
negatively affected by my current implementation. However this does not
say much (only that the current draft doesn't screw up completely...).
I've started making some scripts for more accurately testing the
performance of the notes parsing code at different fanout levels.
At the end of this email you will find a patch with my current state of
these scripts. (This patch is not meant to be included in the jh/notes
topic. It is only extra scripts for those interested in testing the
performance of this particular piece of code.)
METHODOLOGY
So far, there are 3 scripts:
- create_test_repo.sh: Creates a repo with 100,000 commits and one note
object per commit. Also creates 3 notes trees, each holding all
100,000 notes, but in different fanout structures:
- refs/notes/fanout0: all 100,000 notes directly inside the root tree.
- refs/notes/fanout1: notes are stored in a 2/38 structure (i.e. split
into 256 subtrees within the root tree).
- refs/notes/fanout2: notes are stored in a 2/2/36 structure (i.e. an
additional 256-fanout within each of the 256 subtrees).
- verify_correctness.sh: Verify that the "git log" output for each of the
3 notes refs is as expected. This is just to verify that the notes
parsing code is actually doing the right job.
- test_performance.sh: For each of the 3 notes refs, run "git log -n 10"
100 times, and report the time used. I feel this gives a more accurate
impression of the real-world performance of the notes parsing code,
since we:
- Test the code at several fanout levels
- Only lookup _some_ of the notes (looking up _all_ of the notes, i.e.
"git log" without -n, is clearly the worst-case scenario for any code
that loads subtree on-demand).
There is also a config script - config.sh - where you can tweak all of
the above test parameters.
PRELIMINARY RESULTS
Running the above test_performance.sh on the current state of jh/notes
give the following output on my machine:
$ ./test_performance.sh
Timing 100 reps of 'git log -n 10 refs/heads/master >/dev/null' at fanout level 0...
30.56user 2.08system 0:32.71elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1038490minor)pagefaults 0swaps
Timing 100 reps of 'git log -n 10 refs/heads/master >/dev/null' at fanout level 1...
1.64user 0.20system 0:01.88elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+104883minor)pagefaults 0swaps
Timing 100 reps of 'git log -n 10 refs/heads/master >/dev/null' at fanout level 2...
0.48user 0.18system 0:00.65elapsed 101%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+94283minor)pagefaults 0swaps
...which indicates that even with my straightforward first draft based on
a simple linked list, there are huge performance wins in using one or more
fanout levels.
THE SCRIPTS
---
notes_performance/config.sh | 9 +++
notes_performance/create_test_repo.sh | 90 +++++++++++++++++++++++++++++++
notes_performance/test_performance.sh | 31 +++++++++++
notes_performance/verify_correctness.sh | 32 +++++++++++
4 files changed, 162 insertions(+), 0 deletions(-)
create mode 100644 notes_performance/config.sh
create mode 100755 notes_performance/create_test_repo.sh
create mode 100755 notes_performance/test_performance.sh
create mode 100755 notes_performance/verify_correctness.sh
diff --git a/notes_performance/config.sh b/notes_performance/config.sh
new file mode 100644
index 0000000..13abae8
--- /dev/null
+++ b/notes_performance/config.sh
@@ -0,0 +1,9 @@
+#!/bin/sh
+
+GIT=../../git
+
+num_commits=100000
+repo_dir=test_repo
+
+log_length=10
+log_reps=100
diff --git a/notes_performance/create_test_repo.sh b/notes_performance/create_test_repo.sh
new file mode 100755
index 0000000..fee3535
--- /dev/null
+++ b/notes_performance/create_test_repo.sh
@@ -0,0 +1,90 @@
+#!/bin/sh
+
+source ./config.sh
+
+# Create a test repo with $num_commits commits, and one note object per commit.
+# The repo has the following refs:
+# - refs/heads/master pointing to the last commit
+# - refs/notes/fanout0 referencing all notes with no fanout
+# - refs/notes/fanout1 referencing all notes with 2/38 fanout
+# - refs/notes/fanout2 referencing all notes with 2/2/36 fanout
+
+if [ -d $repo_dir ]; then
+ echo "Skipping repo creation, $repo_dir already exists"
+ exit 1
+fi
+
+mkdir $repo_dir &&
+cd $repo_dir &&
+$GIT init &&
+nr=0 &&
+echo "Creating $num_commits commits..." 1>&2 &&
+while [ $nr -lt $num_commits ]; do
+ nr=$(($nr+1)) &&
+ cat <<INPUT_END
+commit refs/heads/master
+committer Foo Bar <foobar@example.com> 1234567890 +0000
+data <<COMMIT
+commit #$nr
+COMMIT
+
+M 644 inline file
+data <<EOF
+file in commit #$nr
+EOF
+
+INPUT_END
+
+done |
+$GIT fast-import --quiet &&
+echo "done" 1>&2 &&
+(
+ echo "Creating $num_commits notes..." 1>&2 &&
+ nr=0 &&
+ while [ $nr -lt $num_commits ]; do
+ nr=$(($nr+1)) &&
+ cat <<INPUT_END
+blob
+mark :$nr
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+ done &&
+ echo "done" 1>&2 &&
+ for fanout_levels in 0 1 2; do
+ notes_ref="refs/notes/fanout$fanout_levels" &&
+ echo "Creating notes tree with fanout level $fanout_levels..." 1>&2 &&
+ cat <<INPUT_END &&
+commit $notes_ref
+committer Foo Bar <foobar@example.com> 1234567890 +0000
+data <<COMMIT
+notes with fanout level $fanout_levels
+COMMIT
+
+INPUT_END
+
+ nr=$num_commits &&
+ $GIT rev-list refs/heads/master |
+ while read sha1; do
+ case $fanout_levels in
+ 0)
+ note_path=$sha1
+ ;;
+ 1)
+ note_path="${sha1:0:2}/${sha1:2}"
+ ;;
+ 2)
+ note_path="${sha1:0:2}/${sha1:2:2}/${sha1:4}"
+ ;;
+ esac &&
+ echo "M 100644 :$nr $note_path" &&
+ nr=$(($nr-1))
+ done &&
+ echo "done" 1>&2
+ done
+) |
+$GIT fast-import --quiet &&
+$GIT gc
diff --git a/notes_performance/test_performance.sh b/notes_performance/test_performance.sh
new file mode 100755
index 0000000..65a5dbf
--- /dev/null
+++ b/notes_performance/test_performance.sh
@@ -0,0 +1,31 @@
+#!/bin/sh
+
+source ./config.sh
+
+# Test "git log -n $log_length" for each of the 3 notes refs
+# - refs/notes/fanout0
+# - refs/notes/fanout1
+# - refs/notes/fanout2
+
+if [ ! -d $repo_dir ]; then
+ echo "Cannot test performance, $repo_dir missing"
+ exit 1
+fi
+
+cd $repo_dir &&
+cat > time_notes <<EOF &&
+ i=0 &&
+ while [ \$i -lt $log_reps ]; do
+ $GIT log -n $log_length refs/heads/master >/dev/null &&
+ i=\$((\$i+1))
+ done
+
+EOF
+
+for fanout_levels in 0 1 2; do
+ echo "Timing $log_reps reps of 'git log -n $log_length refs/heads/master >/dev/null' at fanout level $fanout_levels..." &&
+ notes_ref="refs/notes/fanout$fanout_levels" &&
+ $GIT config core.notesRef $notes_ref &&
+ /usr/bin/time sh time_notes &&
+ echo
+done
diff --git a/notes_performance/verify_correctness.sh b/notes_performance/verify_correctness.sh
new file mode 100755
index 0000000..ca40223
--- /dev/null
+++ b/notes_performance/verify_correctness.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+
+source ./config.sh
+
+# Verify proper contents of git log for each of the 3 notes refs
+# - refs/notes/fanout0
+# - refs/notes/fanout1
+# - refs/notes/fanout2
+
+if [ ! -d $repo_dir ]; then
+ echo "Cannot verify correctness, $repo_dir missing"
+ exit 1
+fi
+
+cd $repo_dir &&
+for fanout_levels in 0 1 2; do
+ echo "Verifying correct git log at fanout level $fanout_levels" &&
+ notes_ref="refs/notes/fanout$fanout_levels" &&
+ $GIT config core.notesRef $notes_ref &&
+ $GIT log | grep "^ " > output &&
+ nr=$num_commits &&
+ while [ $nr -gt 0 ]; do
+ echo " commit #$nr" &&
+ echo " note for commit #$nr" &&
+ nr=$(($nr-1));
+ done > expect &&
+ echo "done" &&
+ diff -u expect output || {
+ echo "Failed verification of fanout level $fanout_levels"
+ exit 1
+ }
+done
--
1.6.4.rc3.138.ga6b98.dirty
Have fun!
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCHv3 1/8] Introduce commit notes
2009-07-29 8:52 ` Alex Riesen
2009-07-29 16:40 ` Johannes Schindelin
@ 2009-07-30 0:42 ` Johan Herland
1 sibling, 0 replies; 30+ messages in thread
From: Johan Herland @ 2009-07-30 0:42 UTC (permalink / raw)
To: git
Cc: Alex Riesen, gitster, johannes.schindelin, trast, tavestbo, git,
chriscool, spearce
On Wednesday 29 July 2009, Alex Riesen wrote:
> On Wed, Jul 29, 2009 at 04:25, Johan Herland<johan@herland.net> wrote:
> > +void get_commit_notes(const struct commit *commit, struct strbuf *sb,
> > + const char *output_encoding)
> > +{
> > + static const char *utf8 = "utf-8";
>
> Using an array
>
> const char utf8[] = "utf-8";
>
> costs you less BSS (no separate storage for the pointer).
Thanks, will be fixed in the next iteration of this topic.
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv3 1/8] Introduce commit notes
2009-07-29 16:40 ` Johannes Schindelin
@ 2009-07-30 0:50 ` Johan Herland
2009-07-30 1:14 ` Johannes Schindelin
0 siblings, 1 reply; 30+ messages in thread
From: Johan Herland @ 2009-07-30 0:50 UTC (permalink / raw)
To: git
Cc: Johannes Schindelin, Alex Riesen, gitster, trast, tavestbo, git,
chriscool, spearce
On Wednesday 29 July 2009, Johannes Schindelin wrote:
> On Wed, 29 Jul 2009, Alex Riesen wrote:
> > On Wed, Jul 29, 2009 at 04:25, Johan Herland<johan@herland.net> wrote:
> > > @@ -963,5 +964,9 @@ void pretty_print_commit(enum cmit_fmt fmt, const
> > > struct commit *commit, +
> > > + if (fmt != CMIT_FMT_ONELINE)
> > > + get_commit_notes(commit, sb, encoding);
> > > +
> >
> > Someday we will need a way to switch off the display of notes
> > without resolving to oneline format.
> > Is there a notes specifier for the printf-like log message formatting
> > (--pretty=format: or --format) planned, BTW?
>
> That would probably be something like "GIT_NOTES_REF=nyanyanya git log"?
Yes, that works, although I suspect some users will prefer a command-line
argument instead.
Nonetheless, I think it makes sense to add a notes specifier to be used in
--pretty/--format.
I'll try to remember to look into this later, but I'll be grateful if
someone gets to it before me.
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv3 5/8] Teach "-m <msg>" and "-F <file>" to "git notes edit"
2009-07-29 7:57 ` Thomas Rast
@ 2009-07-30 1:02 ` Johan Herland
0 siblings, 0 replies; 30+ messages in thread
From: Johan Herland @ 2009-07-30 1:02 UTC (permalink / raw)
To: Thomas Rast
Cc: git, gitster, johannes.schindelin, tavestbo, git, chriscool,
spearce
On Wednesday 29 July 2009, Thomas Rast wrote:
> Johan Herland wrote:
> > diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh
>
> [...]
>
> > +cat > expect-m-and-F << EOF
> > +commit 15023535574ded8b1a89052b32673f84cf9582b8
> > +Author: A U Thor <author@example.com>
> > +Date: Thu Apr 7 15:16:13 2005 -0700
> > +
> > + 4th
> > +
> > +Notes:
> > + spam
> > +
> > + xyzzy
> > +
> > + foo
>
> There's significant trailing whitespace here (in the lines between
> spam, xyzzy and foo) that initially broke the test for me because I
> use apply.whitespace=fix. Can you guard the whitespace if it is
> really important, with something like
>
> sed 's/#$//' > expect <<EOF
> whitespace: #
> EOF
Thanks! Will be fixed in the next iteration of this topic.
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCHv3 1/8] Introduce commit notes
2009-07-30 0:50 ` Johan Herland
@ 2009-07-30 1:14 ` Johannes Schindelin
0 siblings, 0 replies; 30+ messages in thread
From: Johannes Schindelin @ 2009-07-30 1:14 UTC (permalink / raw)
To: Johan Herland
Cc: git, Alex Riesen, gitster, trast, tavestbo, git, chriscool,
spearce
Hi,
On Thu, 30 Jul 2009, Johan Herland wrote:
> On Wednesday 29 July 2009, Johannes Schindelin wrote:
> > On Wed, 29 Jul 2009, Alex Riesen wrote:
> > > On Wed, Jul 29, 2009 at 04:25, Johan Herland<johan@herland.net> wrote:
> > > > @@ -963,5 +964,9 @@ void pretty_print_commit(enum cmit_fmt fmt, const
> > > > struct commit *commit, +
> > > > + if (fmt != CMIT_FMT_ONELINE)
> > > > + get_commit_notes(commit, sb, encoding);
> > > > +
> > >
> > > Someday we will need a way to switch off the display of notes
> > > without resolving to oneline format.
> > > Is there a notes specifier for the printf-like log message formatting
> > > (--pretty=format: or --format) planned, BTW?
> >
> > That would probably be something like "GIT_NOTES_REF=nyanyanya git log"?
>
> Yes, that works, although I suspect some users will prefer a command-line
> argument instead.
>
> Nonetheless, I think it makes sense to add a notes specifier to be used in
> --pretty/--format.
>
> I'll try to remember to look into this later, but I'll be grateful if
> someone gets to it before me.
Probably you will not want to show the "\nNotes:" prefix, and also not
indent the string, but that is something you could make conditional upon a
flag to get_commit_notes(). But this should get you started:
-- snipsnap --
diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
index 2a845b1..5fb10b3 100644
--- a/Documentation/pretty-formats.txt
+++ b/Documentation/pretty-formats.txt
@@ -123,6 +123,7 @@ The placeholders are:
- '%s': subject
- '%f': sanitized subject line, suitable for a filename
- '%b': body
+- '%N': commit notes
- '%Cred': switch color to red
- '%Cgreen': switch color to green
- '%Cblue': switch color to blue
diff --git a/pretty.c b/pretty.c
index fee6789..7349697 100644
--- a/pretty.c
+++ b/pretty.c
@@ -690,6 +690,10 @@ static size_t format_commit_item(struct strbuf *sb, const char *placeholder,
case 'd':
format_decoration(sb, commit);
return 1;
+ case 'N':
+ get_commit_notes(commit, sb, git_log_output_encoding ?
+ git_log_output_encoding : git_commit_encoding);
+ return 1;
}
/* For the rest we have to parse the commit header. */
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCHv3 7/8] fast-import: Add support for importing commit notes
2009-07-29 14:21 ` Shawn O. Pearce
@ 2009-07-30 1:29 ` Johan Herland
0 siblings, 0 replies; 30+ messages in thread
From: Johan Herland @ 2009-07-30 1:29 UTC (permalink / raw)
To: Shawn O. Pearce
Cc: git, gitster, johannes.schindelin, trast, tavestbo, git,
chriscool
On Wednesday 29 July 2009, Shawn O. Pearce wrote:
> Johan Herland <johan@herland.net> wrote:
> > +static void note_change_n(struct branch *b)
> > +{
>
> ...
>
> > + tree_content_set(&b->branch_tree, sha1_to_hex(commit_sha1), sha1,
> > + S_IFREG | 0644, NULL);
>
> I thought you wanted to use the note code to handle the name
> formatting here?
Yes, I do, as soon as the notes code knows how to format/write note names
(which I plan to tackle after nailing the _reading_ part of the code).
Have fun! :)
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC] First draft of 256-tree structure for storing notes
2009-07-30 0:18 ` Testing performance of the notes lookup code (Was: [PATCHv3 6/8] First draft of notes tree parser with support for fanout subtrees) Johan Herland
@ 2009-08-01 2:36 ` Johan Herland
2009-08-13 3:00 ` [RFC] Store subtree entries in the same hash map as the note entries Johan Herland
2009-08-26 10:31 ` [RFC] Use a 16-tree instead of a 256-tree for storing notes Johan Herland
0 siblings, 2 replies; 30+ messages in thread
From: Johan Herland @ 2009-08-01 2:36 UTC (permalink / raw)
To: Johannes Schindelin
Cc: git, gitster, trast, tavestbo, git, chriscool, spearce
This patch stores note entries and unexpanded fanout subtree entries in a
customized 256-tree structure.
Initial performance numbers are encouraging:
$ ./test_performance.sh
Timing 100 reps of 'git log -n 10 refs/heads/master >/dev/null' at fanout level 0...
14.92user 4.84system 0:20.39elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2164780minor)pagefaults 0swaps
Timing 100 reps of 'git log -n 10 refs/heads/master >/dev/null' at fanout level 1...
0.71user 0.32system 0:01.06elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+154090minor)pagefaults 0swaps
Timing 100 reps of 'git log -n 10 refs/heads/master >/dev/null' at fanout level 2...
0.44user 0.18system 0:00.63elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+94183minor)pagefaults 0swaps
This is pretty much twice as fast as the existing version (which uses a hash map
for notes, and a linked list for unexpanded fanout subtrees).
Signed-off-by: Johan Herland <johan@herland.net>
---
Hi,
Just a quick update before I leave for a week of vacation (with spotty
Internet access, at best).
I've been thinking about various data structures for the notes code for
the last couple of days, and here is a quick first draft of the idea that
I found most promising: Storing both notes and unexpanded subtrees as leaf
nodes in a customized 256-tree structure. The initial performance numbers
look very promising (~twice as fast as the previous implementation at
fanout levels 0 and 1), and there are still probably several optimization
that can be done (an obvious example is reducing malloc pressure by
memory pooling leaf_node objects).
AFAICS, this implementation is semantically equivalent the previous code
(longer prefixes are preferred, no merging of notes, multiple/nested
fanout levels are allowed, etc.).
Have fun! :)
...Johan
notes.c | 325 ++++++++++++++++++++++++++++++++++-----------------------------
1 files changed, 174 insertions(+), 151 deletions(-)
diff --git a/notes.c b/notes.c
index 7e9dc49..9282b16 100644
--- a/notes.c
+++ b/notes.c
@@ -6,79 +6,165 @@
#include "strbuf.h"
#include "tree-walk.h"
-struct entry {
- unsigned char commit_sha1[20];
- unsigned char notes_sha1[20];
+/*
+ * Use a non-balancing simple 256-tree structure with struct int_node as
+ * internal nodes, and struct leaf_node as leaf nodes. Each int_node has a
+ * 256-array of pointers to its children
+ * The bottom 2 bits of each pointer is used to identify the pointer type
+ * - ptr & 3 == 0 - NULL pointer, assert(ptr == NULL)
+ * - ptr & 3 == 1 - pointer to next internal node - cast to struct int_node *
+ * - ptr & 3 == 2 - pointer to note entry - cast to struct leaf_node *
+ * - ptr & 3 == 3 - pointer to subtree entry - cast to struct leaf_node *
+ *
+ * The root node is always an inline struct int_node.
+ * An array_entry always starts out with all pointers set to NULL.
+ *
+ * To add a leaf_node:
+ * 1. Start at the root node, with n = 0
+ * 2. Use the nth byte of the key as an index into a:
+ * - If NULL, store the tweaked pointer directly into a[n]
+ * - If an int_node, recurse into that node and increment n
+ * - If a leaf_node:
+ * 1. Check if they're equal, and handle that (abort? overwrite?)
+ * 2. Create a new int_node, and store both leaf_nodes there
+ * 3. Store the new int_node into a[n].
+ *
+ * To find a leaf_node:
+ * 1. Start at the root node, with n = 0
+ * 2. Use the nth byte of the key as an index into a:
+ * - If an int_node, recurse into that node and increment n
+ * - If a leaf_node with matching key, return leaf_node (assert note entry)
+ * - If a matching subtree entry, unpack that subtree entry (and remove it);
+ * restart search at the current level.
+ * - Otherwise, we end up at a NULL pointer, or a non-matching leaf_node.
+ * Backtrack out of the recursion, one level at a time and check a[0]:
+ * - If a[0] at the current level is a matching subtree entry, unpack that
+ * subtree entry (and remove it); restart search at the current level.
+ */
+struct int_node {
+ void *a[256];
};
-struct hash_map {
- struct entry *entries;
- off_t count, size;
+/*
+ * Leaf nodes come in two variants, note entries and subtree entries,
+ * distinguished by the LSb of the leaf node pointer (see above).
+ * As a note entry, the key is the SHA1 of the referenced commit, and the value
+ * is the SHA1 of the note object.
+ * As a subtree entry, the key is the prefix SHA1 (w/trailing NULs) of the
+ * referenced commit, including the prefix length in the last byte of the key.
+ * The value is the SHA1 of the tree object containing the notes subtree.
+ */
+struct leaf_node {
+ unsigned char key_sha1[20];
+ unsigned char val_sha1[20];
};
-struct subtree_entry {
- /*
- * SHA1 prefix is stored in the first 19 bytes (w/trailing NUL bytes);
- * length of SHA1 prefix is stored in the last byte
- */
- unsigned char sha1_prefix_w_len[20];
- unsigned char subtree_sha1[20];
- struct subtree_entry *next;
-};
+#define PTR_TYPE_NULL 0
+#define PTR_TYPE_INTERNAL 1
+#define PTR_TYPE_NOTE 2
+#define PTR_TYPE_SUBTREE 3
-static int initialized;
-static struct hash_map hash_map;
-static struct subtree_entry *subtree_list;
+#define GET_PTR_TYPE(ptr) ((uintptr_t) (ptr) & 3)
+#define CLR_PTR_TYPE(ptr) ((void *) ((uintptr_t) (ptr) & ~3))
+#define SET_PTR_TYPE(ptr, type) ((void *) ((uintptr_t) (ptr) | (type)))
-static int hash_index(struct hash_map *map, const unsigned char *sha1)
-{
- int i = ((*(unsigned int *)sha1) % map->size);
+#define MATCHING_SUBTREE(key_sha1, subtree_sha1) \
+ (!memcmp(key_sha1, subtree_sha1, subtree_sha1[19]))
- for (;;) {
- unsigned char *current = map->entries[i].commit_sha1;
+static struct int_node root_node;
- if (!hashcmp(sha1, current))
- return i;
+static int initialized;
- if (is_null_sha1(current))
- return -1 - i;
- if (++i == map->size)
- i = 0;
- }
-}
+static void load_subtree(struct leaf_node *subtree, struct int_node *node,
+ unsigned int n);
-static void add_entry(const unsigned char *commit_sha1,
- const unsigned char *notes_sha1)
+static struct leaf_node *note_tree_find(struct int_node *tree, unsigned char n,
+ const unsigned char *key_sha1)
{
- int index;
-
- if (hash_map.count + 1 > hash_map.size >> 1) {
- int i, old_size = hash_map.size;
- struct entry *old = hash_map.entries;
-
- hash_map.size = old_size ? old_size << 1 : 64;
- hash_map.entries = (struct entry *)
- xcalloc(sizeof(struct entry), hash_map.size);
-
- for (i = 0; i < old_size; i++)
- if (!is_null_sha1(old[i].commit_sha1)) {
- index = -1 - hash_index(&hash_map,
- old[i].commit_sha1);
- memcpy(hash_map.entries + index, old + i,
- sizeof(struct entry));
- }
- free(old);
+ struct leaf_node *l;
+ unsigned char i = key_sha1[n];
+ void *p = tree->a[i];
+
+ switch(GET_PTR_TYPE(p)) {
+ case PTR_TYPE_INTERNAL:
+ l = note_tree_find(CLR_PTR_TYPE(p), n + 1, key_sha1);
+ if (l)
+ return l;
+ break;
+ case PTR_TYPE_NOTE:
+ l = (struct leaf_node *) CLR_PTR_TYPE(p);
+ if (!hashcmp(key_sha1, l->key_sha1))
+ return l; /* return note object matching given key */
+ break;
+ case PTR_TYPE_SUBTREE:
+ l = (struct leaf_node *) CLR_PTR_TYPE(p);
+ if (MATCHING_SUBTREE(key_sha1, l->key_sha1)) {
+ /* unpack tree and resume search */
+ tree->a[i] = NULL;
+ load_subtree(l, tree, n);
+ free(l);
+ return note_tree_find(tree, n, key_sha1);
+ }
+ break;
+ case PTR_TYPE_NULL:
+ default:
+ assert(!p);
+ break;
}
- index = hash_index(&hash_map, commit_sha1);
- if (index < 0) {
- index = -1 - index;
- hash_map.count++;
+ /*
+ * Did not find key at this (or any lower) level.
+ * Check if there's a matching subtree entry in tree->a[0].
+ * If so, unpack tree and resume search.
+ */
+ p = tree->a[0];
+ if (GET_PTR_TYPE(p) != PTR_TYPE_SUBTREE)
+ return NULL;
+ l = (struct leaf_node *) CLR_PTR_TYPE(p);
+ if (MATCHING_SUBTREE(key_sha1, l->key_sha1)) {
+ /* unpack tree and resume search */
+ tree->a[0] = NULL;
+ load_subtree(l, tree, n);
+ free(l);
+ return note_tree_find(tree, n, key_sha1);
}
+ return NULL;
+}
- hashcpy(hash_map.entries[index].commit_sha1, commit_sha1);
- hashcpy(hash_map.entries[index].notes_sha1, notes_sha1);
+static int note_tree_insert(struct int_node *tree, unsigned char n,
+ const struct leaf_node *entry, unsigned char type)
+{
+ struct int_node *new_node;
+ const struct leaf_node *l;
+ int ret;
+ unsigned char i = entry->key_sha1[n];
+ void *p = tree->a[i];
+ assert(GET_PTR_TYPE(entry) == PTR_TYPE_NULL);
+ switch(GET_PTR_TYPE(p)) {
+ case PTR_TYPE_NULL:
+ assert(!p);
+ tree->a[i] = SET_PTR_TYPE(entry, type);
+ return 0;
+ case PTR_TYPE_INTERNAL:
+ return note_tree_insert(CLR_PTR_TYPE(p), n + 1, entry, type);
+ default:
+ assert(GET_PTR_TYPE(p) == PTR_TYPE_NOTE ||
+ GET_PTR_TYPE(p) == PTR_TYPE_SUBTREE);
+ l = (const struct leaf_node *) CLR_PTR_TYPE(p);
+ if (!hashcmp(entry->key_sha1, l->key_sha1))
+ return -1; /* abort insert on matching key */
+ new_node = (struct int_node *)
+ xcalloc(sizeof(struct int_node), 1);
+ ret = note_tree_insert(new_node, n + 1,
+ CLR_PTR_TYPE(p), GET_PTR_TYPE(p));
+ if (ret) {
+ free(new_node);
+ return -1;
+ }
+ tree->a[i] = SET_PTR_TYPE(new_node, PTR_TYPE_INTERNAL);
+ return note_tree_insert(new_node, n + 1, entry, type);
+ }
}
/*
@@ -110,22 +196,23 @@ static int get_sha1_hex_segment(const char *hex, unsigned int hex_len,
return len;
}
-static void load_subtree(struct subtree_entry *se)
+static void load_subtree(struct leaf_node *subtree, struct int_node *node,
+ unsigned int n)
{
unsigned char commit_sha1[20];
unsigned int prefix_len;
void *buf;
struct tree_desc desc;
struct name_entry entry;
- struct subtree_entry *tmp_list = NULL, *tmp_last = NULL;
- buf = fill_tree_descriptor(&desc, se->subtree_sha1);
+ buf = fill_tree_descriptor(&desc, subtree->val_sha1);
if (!buf)
die("Could not read %s for notes-index",
- sha1_to_hex(se->subtree_sha1));
+ sha1_to_hex(subtree->val_sha1));
- prefix_len = se->sha1_prefix_w_len[19];
- memcpy(commit_sha1, se->sha1_prefix_w_len, prefix_len);
+ prefix_len = subtree->key_sha1[19];
+ assert(prefix_len >= n);
+ memcpy(commit_sha1, subtree->key_sha1, prefix_len);
while (tree_entry(&desc, &entry)) {
int len = get_sha1_hex_segment(entry.path, strlen(entry.path),
commit_sha1 + prefix_len, 20 - prefix_len);
@@ -133,111 +220,47 @@ static void load_subtree(struct subtree_entry *se)
continue; /* entry.path is not a SHA1 sum. Skip */
len += prefix_len;
- /* If commit SHA1 is complete, assume note object */
- if (len == 20)
- add_entry(commit_sha1, entry.sha1);
- /* If commit SHA1 is incomplete, assume note subtree */
- else if (len < 20 && entry.mode == S_IFDIR) {
- struct subtree_entry *n = (struct subtree_entry *)
- xcalloc(sizeof(struct subtree_entry), 1);
- hashcpy(n->sha1_prefix_w_len, commit_sha1);
- n->sha1_prefix_w_len[19] = (unsigned char) len;
- hashcpy(n->subtree_sha1, entry.sha1);
-
- if (!tmp_list) {
- tmp_list = n;
- tmp_last = n;
- }
- else {
- assert(!tmp_last->next);
- assert(hashcmp(n->sha1_prefix_w_len,
- tmp_last->sha1_prefix_w_len) > 0);
- tmp_last->next = n;
- tmp_last = n;
+ /*
+ * If commit SHA1 is complete (len == 20), assume note object
+ * If commit SHA1 is incomplete (len < 20), assume note subtree
+ */
+ if (len <= 20) {
+ unsigned char type = PTR_TYPE_NOTE;
+ struct leaf_node *l = (struct leaf_node *)
+ xcalloc(sizeof(struct leaf_node), 1);
+ hashcpy(l->key_sha1, commit_sha1);
+ hashcpy(l->val_sha1, entry.sha1);
+ if (len < 20) {
+ l->key_sha1[19] = (unsigned char) len;
+ type = PTR_TYPE_SUBTREE;
}
+ assert(!note_tree_insert(node, n, l, type));
}
}
free(buf);
- if (tmp_list) {
- /* insert tmp_list immediately after se */
- assert(hashcmp(tmp_list->sha1_prefix_w_len,
- se->sha1_prefix_w_len) > 0);
- if (se->next) {
- assert(hashcmp(se->next->sha1_prefix_w_len,
- tmp_last->sha1_prefix_w_len) > 0);
- tmp_last->next = se->next;
- }
- se->next = tmp_list;
- }
}
-static void initialize_hash_map(const char *notes_ref_name)
+static void initialize_notes(const char *notes_ref_name)
{
unsigned char sha1[20], commit_sha1[20];
unsigned mode;
- struct subtree_entry root_tree;
+ struct leaf_node root_tree;
if (!notes_ref_name || read_ref(notes_ref_name, commit_sha1) ||
get_tree_entry(commit_sha1, "", sha1, &mode))
return;
- hashclr(root_tree.sha1_prefix_w_len);
- hashcpy(root_tree.subtree_sha1, sha1);
- root_tree.next = NULL;
- load_subtree(&root_tree);
- subtree_list = root_tree.next;
-}
-
-/*
- * Compare the given commit SHA1 against the given subtree entry.
- * Return -1 if the commit SHA1 cannot exist within the given subtree, or any
- * subtree following it.
- * Return 0 if the commit SHA1 _may_ exist within the given subtree.
- * Return 1 if the commit SHA1 cannot exist within the given subtree, but may
- * exist within a subtree following it.
- */
-static int commit_subtree_cmp(const unsigned char *commit_sha1,
- const struct subtree_entry *entry)
-{
- unsigned int prefix_len = entry->sha1_prefix_w_len[19];
- return memcmp(commit_sha1, entry->sha1_prefix_w_len, prefix_len);
-}
-
-static struct subtree_entry *lookup_subtree(const unsigned char *commit_sha1)
-{
- struct subtree_entry *found = NULL, *cur = subtree_list;
- while (cur) {
- int cmp = commit_subtree_cmp(commit_sha1, cur);
- if (!cmp)
- found = cur;
- if (cmp < 0)
- break;
- cur = cur->next;
- }
- return found;
+ hashclr(root_tree.key_sha1);
+ hashcpy(root_tree.val_sha1, sha1);
+ load_subtree(&root_tree, &root_node, 0);
}
static unsigned char *lookup_notes(const unsigned char *commit_sha1)
{
- int index;
- struct subtree_entry *subtree;
-
- /* First, try to find the commit SHA1 directly in hash map */
- index = hash_map.size ? hash_index(&hash_map, commit_sha1) : -1;
- if (index >= 0)
- return hash_map.entries[index].notes_sha1;
-
- /* Next, try finding a subtree that may contain the commit SHA1 */
- subtree = lookup_subtree(commit_sha1);
-
- /* Give up if no subtree found, or if subtree is already loaded */
- if (!subtree || is_null_sha1(subtree->subtree_sha1))
- return NULL;
-
- /* Load subtree into hash_map, and retry lookup recursively */
- load_subtree(subtree);
- hashclr(subtree->subtree_sha1);
- return lookup_notes(commit_sha1);
+ struct leaf_node *found = note_tree_find(&root_node, 0, commit_sha1);
+ if (found)
+ return found->val_sha1;
+ return NULL;
}
void get_commit_notes(const struct commit *commit, struct strbuf *sb,
@@ -255,7 +278,7 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT);
else if (!notes_ref_name)
notes_ref_name = GIT_NOTES_DEFAULT_REF;
- initialize_hash_map(notes_ref_name);
+ initialize_notes(notes_ref_name);
initialized = 1;
}
--
1.6.4.rc3.138.ga6b98.dirty
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC] Store subtree entries in the same hash map as the note entries
2009-08-01 2:36 ` [RFC] First draft of 256-tree structure for storing notes Johan Herland
@ 2009-08-13 3:00 ` Johan Herland
2009-08-26 10:31 ` [RFC] Use a 16-tree instead of a 256-tree for storing notes Johan Herland
1 sibling, 0 replies; 30+ messages in thread
From: Johan Herland @ 2009-08-13 3:00 UTC (permalink / raw)
To: Johannes Schindelin
Cc: git, gitster, trast, tavestbo, git, chriscool, spearce
This is a first draft at implementing Dscho's idea of improving note parser
performance by storing the subtree entries in the same hash map as the note
entries.
In order to tell subtree entries and note entries apart, another member has
been added to struct entry: unsigned char commit_sha1_len. This member
stores the number of "valid" bytes in the commit_sha1 member, meaning that
a value of 1-19 indicates a subtree entry, and a value of 20 indicates a
note entry. There are two more special values for this new member as well:
- Since a null SHA1 is also a valid subtree entry (e.g. the "00/*" subtree
in a 2/38 fanout scheme), we can no longer use is_null_sha1() to identify
unused entries in the hash map. Instead, a value of 0 in the new
commit_sha1_len member indicates that this is entry is null/unused.
There is one exception to this rule: the root notes tree which is just a
special case of a subtree entry with 0 "valid" bytes in the commit_sha1
member. However, this exception is acceptable, as the root entry is never
stored in the hash map (the hash map is initialized by unpacking this
root tree entry).
- The second special commit_sha1_len value is 255, and is used to indicate
a subtree entry that has already been unpacked, and should therefore be
removed from the hash map. It is expensive and non-trivial to _delete_
an entry in the hash map, and we therefore use this special value to
_ignore_ the entry. If/when the hash map is grown/reallocated, we simply
avoid bringing the unpacked subtree entries into the new hash map.
$ ./test_performance.sh
Timing 100 reps of 'git log -n 10 refs/heads/master >/dev/null' at fanout level 0...
21.80user 2.15system 0:24.12elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1051886minor)pagefaults 0swaps
Timing 100 reps of 'git log -n 10 refs/heads/master >/dev/null' at fanout level 1...
1.24user 0.24system 0:01.49elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+120478minor)pagefaults 0swaps
Timing 100 reps of 'git log -n 10 refs/heads/master >/dev/null' at fanout level 2...
0.73user 0.20system 0:00.95elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+105782minor)pagefaults 0swaps
Signed-off-by: Johan Herland <johan@herland.net>
---
On Wednesday 29 July 2009, Johannes Schindelin wrote:
> I know you want to have something working first and optimize then, but I
> imagined that the hashmap can actually contain the entries of the partial
> hashes, too. You'll need to extend the data type, of course, to be able
> to say just how many digits of the SHA-1 are valid, and I guess for
> consistency you'll need to pad with 0s.
Ok. Here's a draft implementation of your proposal to store the subtree
entries in the same hash map as the note entries.
This code is somewhat faster than my first subtree-in-linked-list draft,
but considerably slower than my 256-tree proposal:
fanout level 0 fanout level 1 fanout level 2
subtree-in-ll 32.71s 1.88s 0.65s
all-in-hash-map 24.12s 1.49s 0.95s
all-in-256-tree 20.39s 1.06s 0.63s
I'm not sure how closely my implementation follows your vision, so please
suggest improvements, fixes, etc.
Have fun! :)
...Johan
notes.c | 155 +++++++++++++++++++++++++--------------------------------------
1 files changed, 62 insertions(+), 93 deletions(-)
diff --git a/notes.c b/notes.c
index 7e9dc49..34b8892 100644
--- a/notes.c
+++ b/notes.c
@@ -9,6 +9,15 @@
struct entry {
unsigned char commit_sha1[20];
unsigned char notes_sha1[20];
+ /*
+ * The following member can have the following values:
+ * 0 - This is a NULL/blank entry in the hash_map
+ * (Exception: The root_tree entry in initialize_hash_map())
+ * 1-19 - This is a subtree with the given number of valid prefix bytes
+ * 20 - This is a note
+ * 255 - This is a subtree which has been unpacked, please ignore
+ */
+ unsigned char commit_sha1_len;
};
struct hash_map {
@@ -16,43 +25,35 @@ struct hash_map {
off_t count, size;
};
-struct subtree_entry {
- /*
- * SHA1 prefix is stored in the first 19 bytes (w/trailing NUL bytes);
- * length of SHA1 prefix is stored in the last byte
- */
- unsigned char sha1_prefix_w_len[20];
- unsigned char subtree_sha1[20];
- struct subtree_entry *next;
-};
-
static int initialized;
static struct hash_map hash_map;
-static struct subtree_entry *subtree_list;
static int hash_index(struct hash_map *map, const unsigned char *sha1)
{
int i = ((*(unsigned int *)sha1) % map->size);
for (;;) {
- unsigned char *current = map->entries[i].commit_sha1;
+ struct entry *current = &(map->entries[i]);
- if (!hashcmp(sha1, current))
- return i;
-
- if (is_null_sha1(current))
+ if (!current->commit_sha1_len)
return -1 - i;
+ if (!hashcmp(sha1, current->commit_sha1)
+ && current->commit_sha1_len <= 20)
+ return i;
+
if (++i == map->size)
i = 0;
}
}
static void add_entry(const unsigned char *commit_sha1,
- const unsigned char *notes_sha1)
+ const unsigned char *notes_sha1,
+ unsigned char commit_sha1_len)
{
int index;
+ assert(commit_sha1_len > 0 && commit_sha1_len <= 20);
if (hash_map.count + 1 > hash_map.size >> 1) {
int i, old_size = hash_map.size;
struct entry *old = hash_map.entries;
@@ -61,13 +62,20 @@ static void add_entry(const unsigned char *commit_sha1,
hash_map.entries = (struct entry *)
xcalloc(sizeof(struct entry), hash_map.size);
- for (i = 0; i < old_size; i++)
- if (!is_null_sha1(old[i].commit_sha1)) {
+ for (i = 0; i < old_size; i++) {
+ switch (old[i].commit_sha1_len) {
+ case 255:
+ hash_map.count--;
+ /* fall through */
+ case 0:
+ continue;
+ default:
index = -1 - hash_index(&hash_map,
old[i].commit_sha1);
memcpy(hash_map.entries + index, old + i,
sizeof(struct entry));
}
+ }
free(old);
}
@@ -79,6 +87,7 @@ static void add_entry(const unsigned char *commit_sha1,
hashcpy(hash_map.entries[index].commit_sha1, commit_sha1);
hashcpy(hash_map.entries[index].notes_sha1, notes_sha1);
+ hash_map.entries[index].commit_sha1_len = commit_sha1_len;
}
/*
@@ -110,22 +119,26 @@ static int get_sha1_hex_segment(const char *hex, unsigned int hex_len,
return len;
}
-static void load_subtree(struct subtree_entry *se)
+static void load_subtree(struct entry *subtree)
{
unsigned char commit_sha1[20];
unsigned int prefix_len;
void *buf;
struct tree_desc desc;
struct name_entry entry;
- struct subtree_entry *tmp_list = NULL, *tmp_last = NULL;
- buf = fill_tree_descriptor(&desc, se->subtree_sha1);
+ buf = fill_tree_descriptor(&desc, subtree->notes_sha1);
if (!buf)
die("Could not read %s for notes-index",
- sha1_to_hex(se->subtree_sha1));
+ sha1_to_hex(subtree->notes_sha1));
+
+ prefix_len = subtree->commit_sha1_len;
+ assert(prefix_len < 20);
+ memcpy(commit_sha1, subtree->commit_sha1, prefix_len);
+
+ /* Invalidate this subtree from further consideration */
+ subtree->commit_sha1_len = 255;
- prefix_len = se->sha1_prefix_w_len[19];
- memcpy(commit_sha1, se->sha1_prefix_w_len, prefix_len);
while (tree_entry(&desc, &entry)) {
int len = get_sha1_hex_segment(entry.path, strlen(entry.path),
commit_sha1 + prefix_len, 20 - prefix_len);
@@ -133,110 +146,66 @@ static void load_subtree(struct subtree_entry *se)
continue; /* entry.path is not a SHA1 sum. Skip */
len += prefix_len;
- /* If commit SHA1 is complete, assume note object */
- if (len == 20)
- add_entry(commit_sha1, entry.sha1);
- /* If commit SHA1 is incomplete, assume note subtree */
- else if (len < 20 && entry.mode == S_IFDIR) {
- struct subtree_entry *n = (struct subtree_entry *)
- xcalloc(sizeof(struct subtree_entry), 1);
- hashcpy(n->sha1_prefix_w_len, commit_sha1);
- n->sha1_prefix_w_len[19] = (unsigned char) len;
- hashcpy(n->subtree_sha1, entry.sha1);
-
- if (!tmp_list) {
- tmp_list = n;
- tmp_last = n;
- }
- else {
- assert(!tmp_last->next);
- assert(hashcmp(n->sha1_prefix_w_len,
- tmp_last->sha1_prefix_w_len) > 0);
- tmp_last->next = n;
- tmp_last = n;
- }
- }
+ if (len == 20 || (len < 20 && entry.mode == S_IFDIR))
+ add_entry(commit_sha1, entry.sha1, len);
}
free(buf);
- if (tmp_list) {
- /* insert tmp_list immediately after se */
- assert(hashcmp(tmp_list->sha1_prefix_w_len,
- se->sha1_prefix_w_len) > 0);
- if (se->next) {
- assert(hashcmp(se->next->sha1_prefix_w_len,
- tmp_last->sha1_prefix_w_len) > 0);
- tmp_last->next = se->next;
- }
- se->next = tmp_list;
- }
}
static void initialize_hash_map(const char *notes_ref_name)
{
unsigned char sha1[20], commit_sha1[20];
unsigned mode;
- struct subtree_entry root_tree;
+ struct entry root_tree;
if (!notes_ref_name || read_ref(notes_ref_name, commit_sha1) ||
get_tree_entry(commit_sha1, "", sha1, &mode))
return;
- hashclr(root_tree.sha1_prefix_w_len);
- hashcpy(root_tree.subtree_sha1, sha1);
- root_tree.next = NULL;
+ hashclr(root_tree.commit_sha1);
+ hashcpy(root_tree.notes_sha1, sha1);
+ root_tree.commit_sha1_len = 0;
load_subtree(&root_tree);
- subtree_list = root_tree.next;
}
-/*
- * Compare the given commit SHA1 against the given subtree entry.
- * Return -1 if the commit SHA1 cannot exist within the given subtree, or any
- * subtree following it.
- * Return 0 if the commit SHA1 _may_ exist within the given subtree.
- * Return 1 if the commit SHA1 cannot exist within the given subtree, but may
- * exist within a subtree following it.
- */
-static int commit_subtree_cmp(const unsigned char *commit_sha1,
- const struct subtree_entry *entry)
+static struct entry *lookup_subtree(const unsigned char *commit_sha1)
{
- unsigned int prefix_len = entry->sha1_prefix_w_len[19];
- return memcmp(commit_sha1, entry->sha1_prefix_w_len, prefix_len);
-}
+ unsigned char prefix_sha1[20];
+ unsigned char i;
+ int index;
-static struct subtree_entry *lookup_subtree(const unsigned char *commit_sha1)
-{
- struct subtree_entry *found = NULL, *cur = subtree_list;
- while (cur) {
- int cmp = commit_subtree_cmp(commit_sha1, cur);
- if (!cmp)
- found = cur;
- if (cmp < 0)
- break;
- cur = cur->next;
+ hashcpy(prefix_sha1, commit_sha1);
+ for (i = 19; i; --i) {
+ prefix_sha1[i] = 0;
+ index = hash_index(&hash_map, prefix_sha1);
+ if (index >= 0 && hash_map.entries[index].commit_sha1_len == i)
+ return &(hash_map.entries[index]);
}
- return found;
+ return NULL;
}
static unsigned char *lookup_notes(const unsigned char *commit_sha1)
{
int index;
- struct subtree_entry *subtree;
+ struct entry *subtree;
+
+ if (!hash_map.size)
+ return NULL;
/* First, try to find the commit SHA1 directly in hash map */
- index = hash_map.size ? hash_index(&hash_map, commit_sha1) : -1;
+ index = hash_index(&hash_map, commit_sha1);
if (index >= 0)
return hash_map.entries[index].notes_sha1;
/* Next, try finding a subtree that may contain the commit SHA1 */
subtree = lookup_subtree(commit_sha1);
- /* Give up if no subtree found, or if subtree is already loaded */
- if (!subtree || is_null_sha1(subtree->subtree_sha1))
+ /* Give up if no subtree found */
+ if (!subtree)
return NULL;
/* Load subtree into hash_map, and retry lookup recursively */
load_subtree(subtree);
- hashclr(subtree->subtree_sha1);
return lookup_notes(commit_sha1);
}
--
1.6.4.rc3.138.ga6b98.dirty
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC] Use a 16-tree instead of a 256-tree for storing notes
2009-08-01 2:36 ` [RFC] First draft of 256-tree structure for storing notes Johan Herland
2009-08-13 3:00 ` [RFC] Store subtree entries in the same hash map as the note entries Johan Herland
@ 2009-08-26 10:31 ` Johan Herland
2009-08-26 12:05 ` Alex Riesen
1 sibling, 1 reply; 30+ messages in thread
From: Johan Herland @ 2009-08-26 10:31 UTC (permalink / raw)
To: Johannes Schindelin
Cc: git, gitster, trast, tavestbo, git, chriscool, spearce
The 256-tree structure is considerably faster than storing all entries in a
hash_map. Also, the memory consumption of the 256-tree structure is lower
than the hash_map, provided that you're only loading a few notes from a
"properly fanned-out" notes tree (i.e. 100000 notes in a 2/2/36 structure).
However, in the worst case (loading all 100000 notes), the memory usage of
the 256-tree structure (62.64 MB) is significantly worse than the hash_map
approach (10.25 MB).
This patch modifies the 256-tree structure into a 16-tree structure. This
significantly improves the memory situation. The result uses less memory
than both the 256-tree structure, and the hash_map approach, with a worst
case usage of 8.54 MB. Additionally, it seems to slightly improve the
runtime performance as well (probably because of the improved memory usage).
In conclusion, this is faster and smaller than all the previous drafts.
$ ./test_performance.sh
Timing 100 reps of 'git log -n 10 refs/heads/master >/dev/null' at fanout level 0...
15.05user 1.44system 0:16.59elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+782490minor)pagefaults 0swaps
Timing 100 reps of 'git log -n 10 refs/heads/master >/dev/null' at fanout level 1...
0.68user 0.17system 0:00.87elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+99585minor)pagefaults 0swaps
Timing 100 reps of 'git log -n 10 refs/heads/master >/dev/null' at fanout level 2...
0.41user 0.17system 0:00.61elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+94084minor)pagefaults 0swaps
Signed-off-by: Johan Herland <johan@herland.net>
---
Hi,
This patch goes on top of the 256-tree RFC I sent earlier.
If nobody suggests further improvements, this patch will be
included in the next iteration of the jh/notes topic.
Have fun! :)
...Johan
notes.c | 26 ++++++++++++++------------
1 files changed, 14 insertions(+), 12 deletions(-)
diff --git a/notes.c b/notes.c
index 9282b16..32b1e01 100644
--- a/notes.c
+++ b/notes.c
@@ -7,9 +7,9 @@
#include "tree-walk.h"
/*
- * Use a non-balancing simple 256-tree structure with struct int_node as
+ * Use a non-balancing simple 16-tree structure with struct int_node as
* internal nodes, and struct leaf_node as leaf nodes. Each int_node has a
- * 256-array of pointers to its children
+ * 16-array of pointers to its children
* The bottom 2 bits of each pointer is used to identify the pointer type
* - ptr & 3 == 0 - NULL pointer, assert(ptr == NULL)
* - ptr & 3 == 1 - pointer to next internal node - cast to struct int_node *
@@ -21,18 +21,18 @@
*
* To add a leaf_node:
* 1. Start at the root node, with n = 0
- * 2. Use the nth byte of the key as an index into a:
- * - If NULL, store the tweaked pointer directly into a[n]
- * - If an int_node, recurse into that node and increment n
- * - If a leaf_node:
+ * 2. Use the nth nibble of the key as an index into a:
+ * - If a[n] is NULL, store the tweaked pointer directly into a[n]
+ * - If a[n] is an int_node, recurse into that node and increment n
+ * - If a[n] is a leaf_node:
* 1. Check if they're equal, and handle that (abort? overwrite?)
* 2. Create a new int_node, and store both leaf_nodes there
* 3. Store the new int_node into a[n].
*
* To find a leaf_node:
* 1. Start at the root node, with n = 0
- * 2. Use the nth byte of the key as an index into a:
- * - If an int_node, recurse into that node and increment n
+ * 2. Use the nth nibble of the key as an index into a:
+ * - If a[n] is an int_node, recurse into that node and increment n
* - If a leaf_node with matching key, return leaf_node (assert note entry)
* - If a matching subtree entry, unpack that subtree entry (and remove it);
* restart search at the current level.
@@ -42,7 +42,7 @@
* subtree entry (and remove it); restart search at the current level.
*/
struct int_node {
- void *a[256];
+ void *a[16];
};
/*
@@ -79,11 +79,13 @@ static int initialized;
static void load_subtree(struct leaf_node *subtree, struct int_node *node,
unsigned int n);
+#define GET_NIBBLE(n, sha1) (((sha1[n >> 1]) >> ((n & 0x01) << 2)) & 0x0f)
+
static struct leaf_node *note_tree_find(struct int_node *tree, unsigned char n,
const unsigned char *key_sha1)
{
struct leaf_node *l;
- unsigned char i = key_sha1[n];
+ unsigned char i = GET_NIBBLE(n, key_sha1);
void *p = tree->a[i];
switch(GET_PTR_TYPE(p)) {
@@ -138,7 +140,7 @@ static int note_tree_insert(struct int_node *tree, unsigned char n,
struct int_node *new_node;
const struct leaf_node *l;
int ret;
- unsigned char i = entry->key_sha1[n];
+ unsigned char i = GET_NIBBLE(n, entry->key_sha1);
void *p = tree->a[i];
assert(GET_PTR_TYPE(entry) == PTR_TYPE_NULL);
switch(GET_PTR_TYPE(p)) {
@@ -211,7 +213,7 @@ static void load_subtree(struct leaf_node *subtree, struct int_node *node,
sha1_to_hex(subtree->val_sha1));
prefix_len = subtree->key_sha1[19];
- assert(prefix_len >= n);
+ assert(prefix_len * 2 >= n);
memcpy(commit_sha1, subtree->key_sha1, prefix_len);
while (tree_entry(&desc, &entry)) {
int len = get_sha1_hex_segment(entry.path, strlen(entry.path),
--
1.6.4.304.g1365c.dirty
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC] Use a 16-tree instead of a 256-tree for storing notes
2009-08-26 10:31 ` [RFC] Use a 16-tree instead of a 256-tree for storing notes Johan Herland
@ 2009-08-26 12:05 ` Alex Riesen
2009-08-26 12:56 ` Johan Herland
0 siblings, 1 reply; 30+ messages in thread
From: Alex Riesen @ 2009-08-26 12:05 UTC (permalink / raw)
To: Johan Herland
Cc: Johannes Schindelin, git, gitster, trast, tavestbo, git,
chriscool, spearce
On Wed, Aug 26, 2009 at 12:31, Johan Herland<johan@herland.net> wrote:
> The 256-tree structure is considerably faster than storing all entries in a
This part is confusing. Was 256-tree better (as in "faster") then?
> hash_map. Also, the memory consumption of the 256-tree structure is lower
> than the hash_map, provided that you're only loading a few notes from a
> "properly fanned-out" notes tree (i.e. 100000 notes in a 2/2/36 structure).
> However, in the worst case (loading all 100000 notes), the memory usage of
> the 256-tree structure (62.64 MB) is significantly worse than the hash_map
> approach (10.25 MB).
>
> This patch modifies the 256-tree structure into a 16-tree structure. This
> significantly improves the memory situation. The result uses less memory
> than both the 256-tree structure, and the hash_map approach, with a worst
> case usage of 8.54 MB. Additionally, it seems to slightly improve the
> runtime performance as well (probably because of the improved memory usage).
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Use a 16-tree instead of a 256-tree for storing notes
2009-08-26 12:05 ` Alex Riesen
@ 2009-08-26 12:56 ` Johan Herland
2009-08-26 13:24 ` Alex Riesen
0 siblings, 1 reply; 30+ messages in thread
From: Johan Herland @ 2009-08-26 12:56 UTC (permalink / raw)
To: Alex Riesen
Cc: Johannes Schindelin, git, gitster, trast, tavestbo, git,
chriscool, spearce
On Wednesday 26 August 2009, Alex Riesen wrote:
> On Wed, Aug 26, 2009 at 12:31, Johan Herland<johan@herland.net> wrote:
> > The 256-tree structure is considerably faster than storing all
> > entries in a
>
> This part is confusing. Was 256-tree better (as in "faster") then?
256-tree is faster than the everything-in-hash_map draft.
16-tree is slightly faster than 256-tree
256-tree uses more memory (in the worst case) that the
everything-in-hash-map draft.
16-tree uses less memory than both.
Makes sense?
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Use a 16-tree instead of a 256-tree for storing notes
2009-08-26 12:56 ` Johan Herland
@ 2009-08-26 13:24 ` Alex Riesen
2009-08-26 13:27 ` Andreas Ericsson
0 siblings, 1 reply; 30+ messages in thread
From: Alex Riesen @ 2009-08-26 13:24 UTC (permalink / raw)
To: Johan Herland
Cc: Johannes Schindelin, git, gitster, trast, tavestbo, git,
chriscool, spearce
On Wed, Aug 26, 2009 at 14:56, Johan Herland<johan@herland.net> wrote:
> On Wednesday 26 August 2009, Alex Riesen wrote:
>> On Wed, Aug 26, 2009 at 12:31, Johan Herland<johan@herland.net> wrote:
>> > The 256-tree structure is considerably faster than storing all
>> > entries in a
>>
>> This part is confusing. Was 256-tree better (as in "faster") then?
>
> 256-tree is faster than the everything-in-hash_map draft.
> 16-tree is slightly faster than 256-tree
>
> 256-tree uses more memory (in the worst case) that the
> everything-in-hash-map draft.
> 16-tree uses less memory than both.
>
> Makes sense?
Oh, it does, it is just confusingly presented. How about:
The 16-tree is both faster and has lower footprint then 256-tree
code, which in its turn is noticably faster and smaller then existing
hash_map implementation. ...
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Use a 16-tree instead of a 256-tree for storing notes
2009-08-26 13:24 ` Alex Riesen
@ 2009-08-26 13:27 ` Andreas Ericsson
2009-08-26 14:43 ` Johan Herland
2009-08-27 11:56 ` Johannes Schindelin
0 siblings, 2 replies; 30+ messages in thread
From: Andreas Ericsson @ 2009-08-26 13:27 UTC (permalink / raw)
To: Alex Riesen
Cc: Johan Herland, Johannes Schindelin, git, gitster, trast, tavestbo,
git, chriscool, spearce
Alex Riesen wrote:
> On Wed, Aug 26, 2009 at 14:56, Johan Herland<johan@herland.net> wrote:
>> On Wednesday 26 August 2009, Alex Riesen wrote:
>>> On Wed, Aug 26, 2009 at 12:31, Johan Herland<johan@herland.net> wrote:
>>>> The 256-tree structure is considerably faster than storing all
>>>> entries in a
>>> This part is confusing. Was 256-tree better (as in "faster") then?
>> 256-tree is faster than the everything-in-hash_map draft.
>> 16-tree is slightly faster than 256-tree
>>
>> 256-tree uses more memory (in the worst case) that the
>> everything-in-hash-map draft.
>> 16-tree uses less memory than both.
>>
>> Makes sense?
>
> Oh, it does, it is just confusingly presented. How about:
>
> The 16-tree is both faster and has lower footprint then 256-tree
> code, which in its turn is noticably faster and smaller then existing
> hash_map implementation. ...
If it's to be squashed in, why mention the 256-tree at all (except
for possibly as something to compare with at the end)?
If it goes on top, why mention the hash_map at all?
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Use a 16-tree instead of a 256-tree for storing notes
2009-08-26 13:27 ` Andreas Ericsson
@ 2009-08-26 14:43 ` Johan Herland
2009-08-27 11:56 ` Johannes Schindelin
1 sibling, 0 replies; 30+ messages in thread
From: Johan Herland @ 2009-08-26 14:43 UTC (permalink / raw)
To: Andreas Ericsson
Cc: Alex Riesen, Johannes Schindelin, git, gitster, trast, tavestbo,
git, chriscool, spearce
On Wednesday 26 August 2009, Andreas Ericsson wrote:
> Alex Riesen wrote:
> > On Wed, Aug 26, 2009 at 14:56, Johan Herland<johan@herland.net>
wrote:
> >> On Wednesday 26 August 2009, Alex Riesen wrote:
> >>> On Wed, Aug 26, 2009 at 12:31, Johan Herland<johan@herland.net>
wrote:
> >>>> The 256-tree structure is considerably faster than storing all
> >>>> entries in a
> >>>
> >>> This part is confusing. Was 256-tree better (as in "faster")
> >>> then?
> >>
> >> 256-tree is faster than the everything-in-hash_map draft.
> >> 16-tree is slightly faster than 256-tree
> >>
> >> 256-tree uses more memory (in the worst case) that the
> >> everything-in-hash-map draft.
> >> 16-tree uses less memory than both.
> >>
> >> Makes sense?
> >
> > Oh, it does, it is just confusingly presented. How about:
> >
> > The 16-tree is both faster and has lower footprint then 256-tree
> > code, which in its turn is noticably faster and smaller then
> > existing hash_map implementation. ...
>
> If it's to be squashed in, why mention the 256-tree at all (except
> for possibly as something to compare with at the end)?
> If it goes on top, why mention the hash_map at all?
Ah. Sorry for the confusion. These patches are not meant to standalone
patches in the jh/notes series. They just compare various solutions to
the problem of parsing a notes tree structure with fanout in an
efficient manner.
The next iteration of the jh/notes series will include the preferred
solution (16-tree unless something better shows up), _without_ talking
about the differences between alternative solutions. As such the
hash_map and 256-tree will not be mentioned at all.
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] Use a 16-tree instead of a 256-tree for storing notes
2009-08-26 13:27 ` Andreas Ericsson
2009-08-26 14:43 ` Johan Herland
@ 2009-08-27 11:56 ` Johannes Schindelin
1 sibling, 0 replies; 30+ messages in thread
From: Johannes Schindelin @ 2009-08-27 11:56 UTC (permalink / raw)
To: Andreas Ericsson
Cc: Alex Riesen, Johan Herland, git, gitster, trast, tavestbo, git,
chriscool, spearce
Hi,
On Wed, 26 Aug 2009, Andreas Ericsson wrote:
> If it's to be squashed in, why mention the 256-tree at all
It was labeled RFC, so I think it is perfectly fine to compare with other
contenders.
Thanks,
Dscho
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2009-08-27 11:56 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-29 2:25 [PATCHv3 0/8] RESEND: git notes Johan Herland
2009-07-29 2:25 ` [PATCHv3 1/8] Introduce commit notes Johan Herland
2009-07-29 8:52 ` Alex Riesen
2009-07-29 16:40 ` Johannes Schindelin
2009-07-30 0:50 ` Johan Herland
2009-07-30 1:14 ` Johannes Schindelin
2009-07-30 0:42 ` Johan Herland
2009-07-29 2:25 ` [PATCHv3 2/8] Add a script to edit/inspect notes Johan Herland
2009-07-29 2:25 ` [PATCHv3 3/8] Speed up git notes lookup Johan Herland
2009-07-29 2:25 ` [PATCHv3 4/8] Add an expensive test for git-notes Johan Herland
2009-07-29 2:25 ` [PATCHv3 5/8] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
2009-07-29 7:57 ` Thomas Rast
2009-07-30 1:02 ` Johan Herland
2009-07-29 2:25 ` [PATCHv3 6/8] First draft of notes tree parser with support for fanout subtrees Johan Herland
2009-07-29 16:45 ` Johannes Schindelin
2009-07-30 0:18 ` Testing performance of the notes lookup code (Was: [PATCHv3 6/8] First draft of notes tree parser with support for fanout subtrees) Johan Herland
2009-08-01 2:36 ` [RFC] First draft of 256-tree structure for storing notes Johan Herland
2009-08-13 3:00 ` [RFC] Store subtree entries in the same hash map as the note entries Johan Herland
2009-08-26 10:31 ` [RFC] Use a 16-tree instead of a 256-tree for storing notes Johan Herland
2009-08-26 12:05 ` Alex Riesen
2009-08-26 12:56 ` Johan Herland
2009-08-26 13:24 ` Alex Riesen
2009-08-26 13:27 ` Andreas Ericsson
2009-08-26 14:43 ` Johan Herland
2009-08-27 11:56 ` Johannes Schindelin
2009-07-29 2:25 ` [PATCHv3 7/8] fast-import: Add support for importing commit notes Johan Herland
2009-07-29 14:21 ` Shawn O. Pearce
2009-07-30 1:29 ` Johan Herland
2009-07-29 2:25 ` [PATCHv3 8/8] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
2009-07-29 16:46 ` Johannes Schindelin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).