* [PATCH 0/6] Introduce commit notes
@ 2007-07-15 23:19 Johannes Schindelin
2007-07-15 23:22 ` [PATCH 1/6] Rename git_one_line() to git_line_length() and export it Johannes Schindelin
` (6 more replies)
0 siblings, 7 replies; 42+ messages in thread
From: Johannes Schindelin @ 2007-07-15 23:19 UTC (permalink / raw)
To: Alberto Bertogli, git, gitster, Johan Herland
Hi,
This patch series replaces the series I sent out a while ago, which added
"commit annotations". Since "commit notes" was liked much better, here
they are.
It picks up the same idea, having a pseudo-branch whose revisions contain
a .git/objects/??/* like file structure, and whose blobs are the commit
notes.
By default, that pseudo-branch is "refs/notes/commits", but it is
overridable by the config variable core.notesRef, which in turn can be
overridden by the environment variable GIT_NOTES_REF. If the given ref
does not exist yet, it is interpreted as empty.
The biggest obstacle was a thinko about the scalability. Tree objects
take free form name entries, and therefore a binary search by name is not
possible.
Patch 6/6 is only a WIP patch, but it shows the road ahead. It adds code
to generate .git/notes-index from refs/notes/commits (or any other ref you
specify as notes ref), which is reused until refs/notes/commits^{tree}
changes. Patch 6/6 is only meant to assess which data structure yields
best performance, and how big the costs are.
However, as long as there are no public, fetchable commit notes, I think
the first 5 patches are safe for application and testing.
Ciao,
Dscho
Johannes Schindelin (6):
Rename git_one_line() to git_line_length() and export it
Introduce commit notes
Add git-notes
Add a test script for "git notes"
Document git-notes
notes: add notes-index for a substantial speedup.
.gitignore | 1 +
Documentation/cmd-list.perl | 1 +
Documentation/config.txt | 15 ++
Documentation/git-notes.txt | 45 ++++
Makefile | 5 +-
cache.h | 1 +
commit.c | 15 +-
commit.h | 1 +
config.c | 5 +
environment.c | 1 +
git-notes.sh | 61 ++++++
notes.c | 416 ++++++++++++++++++++++++++++++++++++++
notes.h | 9 +
t/t3301-notes.sh | 63 ++++++
t/t3302-notes-index-expensive.sh | 118 +++++++++++
15 files changed, 750 insertions(+), 7 deletions(-)
create mode 100644 Documentation/git-notes.txt
create mode 100755 git-notes.sh
create mode 100644 notes.c
create mode 100644 notes.h
create mode 100755 t/t3301-notes.sh
create mode 100755 t/t3302-notes-index-expensive.sh
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH 1/6] Rename git_one_line() to git_line_length() and export it
2007-07-15 23:19 [PATCH 0/6] Introduce commit notes Johannes Schindelin
@ 2007-07-15 23:22 ` Johannes Schindelin
2007-07-15 23:23 ` [PATCH 2/6] Introduce commit notes Johannes Schindelin
` (5 subsequent siblings)
6 siblings, 0 replies; 42+ messages in thread
From: Johannes Schindelin @ 2007-07-15 23:22 UTC (permalink / raw)
To: Alberto Bertogli, git, gitster, Johan Herland
The function get_one_line() really returns the line length, not the
whole line, but it is really useful, so do not hide it in commit.c,
under the wrong name.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
commit.c | 10 +++++-----
commit.h | 1 +
2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/commit.c b/commit.c
index 4c5dfa9..0c350bc 100644
--- a/commit.c
+++ b/commit.c
@@ -458,7 +458,7 @@ void clear_commit_marks(struct commit *commit, unsigned int mark)
/*
* Generic support for pretty-printing the header
*/
-static int get_one_line(const char *msg, unsigned long len)
+int get_line_length(const char *msg, unsigned long len)
{
int ret = 0;
@@ -950,7 +950,7 @@ static void pp_header(enum cmit_fmt fmt,
for (;;) {
const char *line = *msg_p;
char *dst;
- int linelen = get_one_line(*msg_p, *len_p);
+ int linelen = get_line_length(*msg_p, *len_p);
unsigned long len;
if (!linelen)
@@ -1041,7 +1041,7 @@ static void pp_title_line(enum cmit_fmt fmt,
title = xmalloc(title_alloc);
for (;;) {
const char *line = *msg_p;
- int linelen = get_one_line(line, *len_p);
+ int linelen = get_line_length(line, *len_p);
*msg_p += linelen;
*len_p -= linelen;
@@ -1118,7 +1118,7 @@ static void pp_remainder(enum cmit_fmt fmt,
int first = 1;
for (;;) {
const char *line = *msg_p;
- int linelen = get_one_line(line, *len_p);
+ int linelen = get_line_length(line, *len_p);
*msg_p += linelen;
*len_p -= linelen;
@@ -1214,7 +1214,7 @@ unsigned long pretty_print_commit(enum cmit_fmt fmt,
/* Skip excess blank lines at the beginning of body, if any... */
for (;;) {
- int linelen = get_one_line(msg, len);
+ int linelen = get_line_length(msg, len);
int ll = linelen;
if (!linelen)
break;
diff --git a/commit.h b/commit.h
index 467872e..fc6df23 100644
--- a/commit.h
+++ b/commit.h
@@ -60,6 +60,7 @@ enum cmit_fmt {
CMIT_FMT_UNSPECIFIED,
};
+extern int get_line_length(const char *msg, unsigned long len);
extern enum cmit_fmt get_commit_format(const char *arg);
extern unsigned long pretty_print_commit(enum cmit_fmt fmt, const struct commit *, unsigned long len, char **buf_p, unsigned long *space_p, int abbrev, const char *subject, const char *after_subject, enum date_mode dmode);
--
1.5.3.rc1.2718.gd2dc9-dirty
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 2/6] Introduce commit notes
2007-07-15 23:19 [PATCH 0/6] Introduce commit notes Johannes Schindelin
2007-07-15 23:22 ` [PATCH 1/6] Rename git_one_line() to git_line_length() and export it Johannes Schindelin
@ 2007-07-15 23:23 ` Johannes Schindelin
2007-07-15 23:36 ` Junio C Hamano
2007-07-16 5:11 ` Junio C Hamano
2007-07-15 23:23 ` [PATCH 3/6] Add git-notes Johannes Schindelin
` (4 subsequent siblings)
6 siblings, 2 replies; 42+ messages in thread
From: Johannes Schindelin @ 2007-07-15 23:23 UTC (permalink / raw)
To: Alberto Bertogli, git, gitster, Johan Herland
Commit notes are blobs which are shown together with the commit
message. These blobs are taken from the notes ref, which you can
configure by the config variable core.notesRef, which in turn can
be overridden by the environment variable GIT_NOTES_REF.
The notes ref is a branch which contains trees much like the
loose object trees in .git/objects/. In other words, to get
at the commit notes for a given SHA-1, take the first two
hex characters as directory name, and the remaining 38 hex
characters as base name, and look that up in the notes ref.
The rationale for putting this information into a ref is this: we
want to be able to fetch and possibly union-merge the notes,
maybe even look at the date when a note was introduced, and we
want to store them efficiently together with the other objects.
There is one severe shortcoming, though. Since tree objects can
contain file names of a variable length, it is not possible to
do a binary search for the correct base name in the tree object's
contents. Therefore this approach does not scale well, because
the average lookup time will be proportional to the number of
commit objects, and therefore the slowdown will be quadratic in
that number.
However, a remedy is near: in a later commit, a .git/notes-index
will be introduced, a cached mapping from commits to commit notes,
to be written when the tree name of the notes ref changes. In
case that notes-index cannot be written, the current (possibly
slow) code will come into effect again.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
Documentation/config.txt | 15 +++++++++++
Makefile | 3 +-
cache.h | 1 +
commit.c | 5 +++
config.c | 5 +++
environment.c | 1 +
notes.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
notes.h | 9 ++++++
8 files changed, 102 insertions(+), 1 deletions(-)
create mode 100644 notes.c
create mode 100644 notes.h
diff --git a/Documentation/config.txt b/Documentation/config.txt
index d0e9a17..5fe833d 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -285,6 +285,21 @@ core.pager::
The command that git will use to paginate output. Can be overridden
with the `GIT_PAGER` environment variable.
+core.notesRef::
+ When showing commit messages, also show notes which are stored in
+ the given ref. This ref is expected to contain paths of the form
+ ??/*, where the directory name consists of the first two
+ characters of the commit name, and the base name consists of
+ the remaining 38 characters.
++
+If such a path exists in the given ref, the referenced blob is read, and
+appended to the commit message, separated by a "Notes:" line. If the
+given ref itself does not exist, it is not an error, but means that no
+notes should be print.
++
+This setting defaults to "refs/notes/commits", and can be overridden by
+the `GIT_NOTES_REF` environment variable.
+
alias.*::
Command aliases for the gitlink:git[1] command wrapper - e.g.
after defining "alias.last = cat-file commit HEAD", the invocation
diff --git a/Makefile b/Makefile
index d7541b4..119d949 100644
--- a/Makefile
+++ b/Makefile
@@ -322,7 +322,8 @@ LIB_OBJS = \
write_or_die.o trace.o list-objects.o grep.o match-trees.o \
alloc.o merge-file.o path-list.o help.o unpack-trees.o $(DIFF_OBJS) \
color.o wt-status.o archive-zip.o archive-tar.o shallow.o utf8.o \
- convert.o attr.o decorate.o progress.o mailmap.o symlinks.o remote.o
+ convert.o attr.o decorate.o progress.o mailmap.o symlinks.o remote.o \
+ notes.o
BUILTIN_OBJS = \
builtin-add.o \
diff --git a/cache.h b/cache.h
index 328c1ad..c89cac5 100644
--- a/cache.h
+++ b/cache.h
@@ -309,6 +309,7 @@ extern size_t packed_git_window_size;
extern size_t packed_git_limit;
extern size_t delta_base_cache_limit;
extern int auto_crlf;
+extern char *notes_ref_name;
#define GIT_REPO_VERSION 0
extern int repository_format_version;
diff --git a/commit.c b/commit.c
index 0c350bc..3529b6a 100644
--- a/commit.c
+++ b/commit.c
@@ -6,6 +6,7 @@
#include "interpolate.h"
#include "diff.h"
#include "revision.h"
+#include "notes.h"
int save_commit_buffer = 1;
@@ -1254,6 +1255,10 @@ unsigned long pretty_print_commit(enum cmit_fmt fmt,
*/
if (fmt == CMIT_FMT_EMAIL && offset <= beginning_of_body)
buf[offset++] = '\n';
+
+ if (fmt != CMIT_FMT_ONELINE)
+ get_commit_notes(commit, buf_p, &offset, space_p);
+
buf[offset] = '\0';
free(reencoded);
return offset;
diff --git a/config.c b/config.c
index f89a611..05d2ad6 100644
--- a/config.c
+++ b/config.c
@@ -395,6 +395,11 @@ int git_default_config(const char *var, const char *value)
return 0;
}
+ if (!strcmp(var, "core.notesref")) {
+ notes_ref_name = xstrdup(value);
+ return 0;
+ }
+
if (!strcmp(var, "user.name")) {
strlcpy(git_default_name, value, sizeof(git_default_name));
return 0;
diff --git a/environment.c b/environment.c
index f83fb9e..2e677d3 100644
--- a/environment.c
+++ b/environment.c
@@ -34,6 +34,7 @@ char *pager_program;
int pager_in_use;
int pager_use_color = 1;
int auto_crlf = 0; /* 1: both ways, -1: only when adding git objects */
+char *notes_ref_name;
static const char *git_dir;
static char *git_object_dir, *git_index_file, *git_refs_dir, *git_graft_file;
diff --git a/notes.c b/notes.c
new file mode 100644
index 0000000..5d1bb1a
--- /dev/null
+++ b/notes.c
@@ -0,0 +1,64 @@
+#include "cache.h"
+#include "commit.h"
+#include "notes.h"
+#include "refs.h"
+
+static int initialized;
+
+void get_commit_notes(const struct commit *commit,
+ char **buf_p, unsigned long *offset_p, unsigned long *space_p)
+{
+ char name[80];
+ const char *hex;
+ unsigned char sha1[20];
+ char *msg;
+ unsigned long msgoffset, msglen;
+ enum object_type type;
+
+ if (!initialized) {
+ const char *env = getenv(GIT_NOTES_REF);
+ if (env) {
+ if (notes_ref_name)
+ free(notes_ref_name);
+ notes_ref_name = xstrdup(getenv(GIT_NOTES_REF));
+ } else if (!notes_ref_name)
+ notes_ref_name = xstrdup("refs/notes/commits");
+ if (notes_ref_name && read_ref(notes_ref_name, sha1)) {
+ free(notes_ref_name);
+ notes_ref_name = NULL;
+ }
+ initialized = 1;
+ }
+ if (!notes_ref_name)
+ return;
+
+ hex = sha1_to_hex(commit->object.sha1);
+ snprintf(name, sizeof(name), "%s:%.*s/%.*s",
+ notes_ref_name, 2, hex, 38, hex + 2);
+ if (get_sha1(name, sha1))
+ return;
+
+ if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen)
+ return;
+ /* we will end the annotation by a newline anyway. */
+ if (msg[msglen - 1] == '\n')
+ msglen--;
+
+ ALLOC_GROW(*buf_p, *offset_p + 14 + msglen, *space_p);
+ *offset_p += sprintf(*buf_p + *offset_p, "\nNotes:\n");
+
+ for (msgoffset = 0; msgoffset < msglen;) {
+ int linelen = get_line_length(msg + msgoffset, msglen);
+
+ ALLOC_GROW(*buf_p, *offset_p + linelen + 6, *space_p);
+ *offset_p += sprintf(*buf_p + *offset_p,
+ " %.*s", linelen, msg + msgoffset);
+ msgoffset += linelen;
+ }
+ ALLOC_GROW(*buf_p, *offset_p + 1, *space_p);
+ (*buf_p)[*offset_p] = '\n';
+ (*offset_p)++;
+ free(msg);
+}
+
+
diff --git a/notes.h b/notes.h
new file mode 100644
index 0000000..aed80e7
--- /dev/null
+++ b/notes.h
@@ -0,0 +1,9 @@
+#ifndef NOTES_H
+#define NOTES_H
+
+void get_commit_notes(const struct commit *commit,
+ char **buf_p, unsigned long *offset_p, unsigned long *space_p);
+
+#define GIT_NOTES_REF "GIT_NOTES_REF"
+
+#endif
--
1.5.3.rc1.2718.gd2dc9-dirty
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 3/6] Add git-notes
2007-07-15 23:19 [PATCH 0/6] Introduce commit notes Johannes Schindelin
2007-07-15 23:22 ` [PATCH 1/6] Rename git_one_line() to git_line_length() and export it Johannes Schindelin
2007-07-15 23:23 ` [PATCH 2/6] Introduce commit notes Johannes Schindelin
@ 2007-07-15 23:23 ` Johannes Schindelin
2007-07-16 5:11 ` Junio C Hamano
2007-07-15 23:24 ` [PATCH 4/6] Add a test script for "git notes" Johannes Schindelin
` (3 subsequent siblings)
6 siblings, 1 reply; 42+ messages in thread
From: Johannes Schindelin @ 2007-07-15 23:23 UTC (permalink / raw)
To: Alberto Bertogli, git, gitster, Johan Herland
This script allows you to edit and show commit notes easily.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
.gitignore | 1 +
Makefile | 2 +-
git-notes.sh | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 63 insertions(+), 1 deletions(-)
create mode 100755 git-notes.sh
diff --git a/.gitignore b/.gitignore
index 20ee642..125613f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -83,6 +83,7 @@ git-mktag
git-mktree
git-name-rev
git-mv
+git-notes
git-pack-redundant
git-pack-objects
git-pack-refs
diff --git a/Makefile b/Makefile
index 119d949..10a9342 100644
--- a/Makefile
+++ b/Makefile
@@ -213,7 +213,7 @@ SCRIPT_SH = \
git-merge-resolve.sh git-merge-ours.sh \
git-lost-found.sh git-quiltimport.sh git-submodule.sh \
git-filter-branch.sh \
- git-stash.sh
+ git-stash.sh git-notes.sh
SCRIPT_PERL = \
git-add--interactive.perl \
diff --git a/git-notes.sh b/git-notes.sh
new file mode 100755
index 0000000..e0ad0b9
--- /dev/null
+++ b/git-notes.sh
@@ -0,0 +1,61 @@
+#!/bin/sh
+
+USAGE="(edit | show) [commit]"
+. git-sh-setup
+
+test -n "$3" && usage
+
+test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="$(git config core.notesref)"
+test -z "$GIT_NOTES_REF" &&
+ die "No notes ref set."
+
+COMMIT=$(git rev-parse --verify --default HEAD "$2")
+NAME=$(echo $COMMIT | sed "s/^../&\//")
+
+case "$1" in
+edit)
+ MESSAGE="$GIT_DIR"/new-notes
+ GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MESSAGE"
+
+ GIT_INDEX_FILE="$MESSAGE".idx
+ export GIT_INDEX_FILE
+
+ CURRENT_HEAD=$(git show-ref $GIT_NOTES_REF | cut -f 1 -d ' ')
+ if [ -z "$CURRENT_HEAD" ]; then
+ PARENT=
+ else
+ PARENT="-p $OLDTIP"
+ git read-tree $GIT_NOTES_REF || die "Could not read index"
+ git cat-file blob :$NAME >> "$MESSAGE" 2> /dev/null
+ fi
+
+ ${VISUAL:-${EDITOR:-vi}} "$MESSAGE"
+
+ grep -v ^# < "$MESSAGE" | git stripspace > "$MESSAGE".processed
+ mv "$MESSAGE".processed "$MESSAGE"
+ if [ -z "$(cat "$MESSAGE")" ]; then
+ test -z "$CURRENT_HEAD" &&
+ die "Will not initialise with empty tree"
+ git update-index --force-remove $NAME ||
+ die "Could not update index"
+ else
+ BLOB=$(git hash-object -w "$MESSAGE") ||
+ die "Could not write into object database"
+ git update-index --add --cacheinfo 0644 $BLOB $NAME ||
+ die "Could not write index"
+ fi
+
+ TREE=$(git write-tree) || die "Could not write tree"
+ NEW_HEAD=$(: | git commit-tree $TREE $PARENT) ||
+ die "Could not annotate"
+ case "$CURRENT_HEAD" in
+ '') git update-ref $GIT_NOTES_REF $NEW_HEAD ;;
+ *) git update-ref $GIT_NOTES_REF $NEW_HEAD $CURRENT_HEAD;;
+ esac
+;;
+show)
+ git show "$GIT_NOTES_REF":$NAME
+;;
+*)
+ usage
+esac
--
1.5.3.rc1.2718.gd2dc9-dirty
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 4/6] Add a test script for "git notes"
2007-07-15 23:19 [PATCH 0/6] Introduce commit notes Johannes Schindelin
` (2 preceding siblings ...)
2007-07-15 23:23 ` [PATCH 3/6] Add git-notes Johannes Schindelin
@ 2007-07-15 23:24 ` Johannes Schindelin
2007-07-16 5:11 ` Junio C Hamano
2007-07-15 23:24 ` [PATCH 5/6] Document git-notes Johannes Schindelin
` (2 subsequent siblings)
6 siblings, 1 reply; 42+ messages in thread
From: Johannes Schindelin @ 2007-07-15 23:24 UTC (permalink / raw)
To: Alberto Bertogli, git, gitster, Johan Herland
Incidentally, a test for "git notes" implies a test for the
whole commit notes machinery.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
t/t3301-notes.sh | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 63 insertions(+), 0 deletions(-)
create mode 100755 t/t3301-notes.sh
diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh
new file mode 100755
index 0000000..eb50191
--- /dev/null
+++ b/t/t3301-notes.sh
@@ -0,0 +1,63 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johannes E. Schindelin
+#
+
+test_description='Test commit notes'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+ : > a1 &&
+ git add a1 &&
+ test_tick &&
+ git commit -m 1st &&
+ : > a2 &&
+ git add a2 &&
+ test_tick &&
+ git commit -m 2nd
+'
+
+cat > fake_editor.sh << EOF
+echo "\$MSG" > "\$1"
+echo "\$MSG" >& 2
+EOF
+chmod a+x fake_editor.sh
+VISUAL="$(pwd)"/fake_editor.sh
+export VISUAL
+
+
+test_expect_success 'need notes ref' '
+ ! MSG=1 git notes edit &&
+ ! MSG=2 git notes show
+'
+
+test_expect_success 'create notes' '
+ git config core.notesRef refs/notes/commits &&
+ MSG=b1 git notes edit &&
+cat .git/new-notes &&
+test b1 = "$(cat .git/new-notes)" &&
+ test 1 = $(git ls-tree refs/notes/commits | wc -l) &&
+ test b1 = $(git notes show) &&
+ git show HEAD^ &&
+ ! git notes show HEAD^
+'
+
+cat > expect << EOF
+commit 268048bfb8a1fb38e703baceb8ab235421bf80c5
+Author: A U Thor <author@example.com>
+Date: Thu Apr 7 15:14:13 2005 -0700
+
+ 2nd
+
+Notes:
+ b1
+EOF
+
+test_expect_success 'show notes' '
+ ! (git cat-file commit HEAD | grep b1) &&
+ git log -1 > output &&
+ git diff expect output
+'
+
+test_done
--
1.5.3.rc1.2718.gd2dc9-dirty
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH 5/6] Document git-notes
2007-07-15 23:19 [PATCH 0/6] Introduce commit notes Johannes Schindelin
` (3 preceding siblings ...)
2007-07-15 23:24 ` [PATCH 4/6] Add a test script for "git notes" Johannes Schindelin
@ 2007-07-15 23:24 ` Johannes Schindelin
2007-07-15 23:26 ` [WIP PATCH 6/6] notes: add notes-index for a substantial speedup Johannes Schindelin
2007-07-16 7:57 ` [PATCH 0/6] Introduce commit notes Andy Parkins
6 siblings, 0 replies; 42+ messages in thread
From: Johannes Schindelin @ 2007-07-15 23:24 UTC (permalink / raw)
To: Alberto Bertogli, git, gitster, Johan Herland
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
Documentation/cmd-list.perl | 1 +
Documentation/git-notes.txt | 45 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 46 insertions(+), 0 deletions(-)
create mode 100644 Documentation/git-notes.txt
diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl
index 2143995..f05e291 100755
--- a/Documentation/cmd-list.perl
+++ b/Documentation/cmd-list.perl
@@ -140,6 +140,7 @@ git-mergetool ancillarymanipulators
git-mktag plumbingmanipulators
git-mktree plumbingmanipulators
git-mv mainporcelain
+git-notes mainporcelain
git-name-rev plumbinginterrogators
git-pack-objects plumbingmanipulators
git-pack-redundant plumbinginterrogators
diff --git a/Documentation/git-notes.txt b/Documentation/git-notes.txt
new file mode 100644
index 0000000..331ed89
--- /dev/null
+++ b/Documentation/git-notes.txt
@@ -0,0 +1,45 @@
+git-notes(1)
+============
+
+NAME
+----
+git-notes - Add commit notes
+
+SYNOPSIS
+--------
+[verse]
+'git-notes' (edit | show) [commit
+
+DESCRIPTION
+-----------
+This command allows you to add notes to commit messages, after the
+fact. To discern these notes from the message stored in the commit
+object, the notes are indented like the message, after an unindented
+line saying "Notes:".
+
+To enable commit notes, you have to set the config variable
+core.notesRef to something like "refs/notes/commits". This setting
+can be overridden by the environment variable "GIT_NOTES_REF".
+
+
+SUBCOMMANDS
+-----------
+
+edit::
+ Edit the notes for a given commit (defaults to HEAD).
+
+show::
+ Show the notes for a given commit (defaults to HEAD).
+
+
+Author
+------
+Written by Johannes Schindelin <johannes.schindelin@gmx.de>
+
+Documentation
+-------------
+Documentation by Johannes Schindelin
+
+GIT
+---
+Part of the gitlink:git[7] suite
--
1.5.3.rc1.2718.gd2dc9-dirty
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [WIP PATCH 6/6] notes: add notes-index for a substantial speedup.
2007-07-15 23:19 [PATCH 0/6] Introduce commit notes Johannes Schindelin
` (4 preceding siblings ...)
2007-07-15 23:24 ` [PATCH 5/6] Document git-notes Johannes Schindelin
@ 2007-07-15 23:26 ` Johannes Schindelin
2007-07-15 23:33 ` Johannes Schindelin
2007-07-16 6:01 ` Shawn O. Pearce
2007-07-16 7:57 ` [PATCH 0/6] Introduce commit notes Andy Parkins
6 siblings, 2 replies; 42+ messages in thread
From: Johannes Schindelin @ 2007-07-15 23:26 UTC (permalink / raw)
To: Alberto Bertogli, git, gitster, Johan Herland
Actually, this commit adds two methods for a notes index:
- a sorted list with a fan out to help binary search, and
- a modified hash table.
It also adds a test which is used to determine the best algorithm.
---
Not signed off because this is not suitable to be applied as-is.
It is only meant to test the different approaches.
notes.c | 392 ++++++++++++++++++++++++++++++++++++--
t/t3302-notes-index-expensive.sh | 118 ++++++++++++
2 files changed, 490 insertions(+), 20 deletions(-)
create mode 100755 t/t3302-notes-index-expensive.sh
diff --git a/notes.c b/notes.c
index 5d1bb1a..5a90abf 100644
--- a/notes.c
+++ b/notes.c
@@ -1,10 +1,370 @@
#include "cache.h"
#include "commit.h"
+#include "tree-walk.h"
#include "notes.h"
#include "refs.h"
static int initialized;
+/*
+ * There are two choices of data structure for the notes index.
+ *
+ * A) Fan out enhanced sorted list.
+ *
+ * This is a regular sorted list with a 256 entry fan out. In other words,
+ * every time an entry is looked up, a binary search is performed over the
+ * sublist defined by the first byte of the SHA-1.
+ *
+ * The disadvantage is an average runtime logarithmic in the number of
+ * commit notes. The advantages are a compact representation on disk,
+ * and a _guaranteed_ logarithmic runtime.
+ *
+ * You could even squeeze out one more byte per entry, since the
+ * first byte is known from the fan out list. This would complicate our
+ * algorithm, though.
+ *
+ * B) Hash
+ *
+ * This is not your classical hash. It is _mostly_ like a hash, with
+ * a few notable exceptions:
+ *
+ * - it is possibly larger than size suggests: since it is file based,
+ * it is easier to write at the end than to wrap around.
+ *
+ * - as a consequence we can make the entries _strictly_ sorted. This
+ * is not only nice to look at, but makes incremental updates much,
+ * much easier.
+ *
+ * The disadvantages of a hash is its loose packing. In order to operate
+ * reasonably well, it needs a size roughly double the number of entries.
+ * It also has a worst runtime linear in the number of entries.
+ *
+ * The advantage is an expected constant lookup time.
+ *
+ * The performance of a hash map depends highly on a good hashing
+ * algorithm, to avoid collisions. Lucky us! SHA-1 is a pretty good
+ * hashing algorithm.
+ *
+ * There is another advantage to hash maps: with not much effort, the
+ * incremental update can be performed in place, relying on O_TRUNC to
+ * detect interruptions. This operation has an expected constant runtime.
+ */
+
+struct notes_entry {
+ unsigned char commit_sha1[20];
+ unsigned char notes_sha1[20];
+};
+
+struct notes_index {
+ char signature[4]; /* FANO for fan our, HASH for hash */
+
+ unsigned char tree_sha1[20];
+ unsigned char subtree_sha1[256][20]; /* for incremental caching */
+ off_t offsets[256]; /* for fan out */
+ off_t count, size; /* for hash */
+} notes_index;
+
+static int notes_index_fd;
+static int (*get_notes)(const unsigned char *commit_sha1,
+ unsigned char *notes_sha1);
+
+#define GIT_NOTES_MODE "GIT_NOTES_MODE"
+static int use_hash;
+
+static int index_uptodate_check(struct tree *tree) {
+ const char *signature = use_hash ? "HASH" : "FANO";
+ int fd = open(git_path("notes-index"), O_RDONLY);
+
+ if (fd < 0)
+ return fd;
+
+ notes_index_fd = fd;
+
+ return read_in_full(fd, ¬es_index, sizeof(notes_index)) < 0||
+ memcmp(notes_index.signature, signature, 4) ||
+ memcmp(notes_index.tree_sha1,
+ &tree->object.sha1, 20);
+}
+
+struct lock_file update_lock;
+
+/* this reads the remaining 38 hexchars */
+static int get_remaining_hexchars(unsigned char *sha1, const char *path)
+{
+ int i, j1, j2;
+ for (i = 0; i < 38; i += 2)
+ if ((j1 = hexval(path[i])) < 0 ||
+ (j2 = hexval(path[i + 1])) < 0)
+ return -1;
+ else
+ sha1[1 + i / 2] = (j1 << 4) | j2;
+ return path[38] != '\0';
+}
+
+static int get_notes_hash_count(struct tree *tree) {
+ struct tree_desc desc, desc2;
+ struct name_entry entry;
+ void *buf;
+ unsigned long count = 0;
+
+ buf = fill_tree_descriptor(&desc, notes_index.tree_sha1);
+ if (!buf)
+ return 0;
+ while (tree_entry(&desc, &entry)) {
+ void *buf2 = fill_tree_descriptor(&desc2, entry.sha1);
+ if (!buf2)
+ continue;
+ while (tree_entry(&desc2, &entry))
+ count++;
+ free(buf2);
+ }
+ free(buf);
+
+ return count;
+}
+
+static unsigned long get_hash_index(const unsigned char *sha1)
+{
+ return (ntohl(*(unsigned long *)sha1) % notes_index.size);
+}
+
+static int write_hash_gap(int fd, unsigned char *sha1)
+{
+ off_t min_offset = sizeof(notes_index) +
+ get_hash_index(sha1) * sizeof(struct notes_entry);
+ while (min_offset > lseek(fd, 0, SEEK_CUR))
+ if (write_in_full(fd, null_sha1, 20) < 0 ||
+ write_in_full(fd, null_sha1, 20) < 0)
+ return error("Could not write gaps in notes-index");
+ return 0;
+}
+
+static int update_index(struct tree *tree) {
+ /*
+ * Fan out sorted list:
+ *
+ * Write out the header, and seek back to it, in order to update it.
+ * Actually only seek at the end, and make sure that you write
+ * something big-endian.
+ *
+ * Plan for incremental: if subtree_sha1 is equal, copy out.
+ * Otherwise construct, and remember in the copy of the header.
+ *
+ * Hash:
+ *
+ * Always use a power of two as size. Not the next higher one, but
+ * the next next higher one.
+ *
+ * Read the tree recursively, and leave as many zeros as needed
+ * until the next entry comes. Or if the entry has a hash larger
+ * than the last free entry, write it at once.
+ */
+
+ /* Plan for incremental: (not in-place)
+ * Look at tree differences. Write null_sha1 until next, or next
+ * subtree. Continue writing until original entry is null_sha1 or
+ * greater than current subtree entry.
+ */
+
+ int new_fd = hold_lock_file_for_update(&update_lock,
+ git_path("notes-index"), 0);
+ struct tree_desc desc;
+ struct name_entry entry;
+ void *buf;
+ int i;
+
+ if (new_fd < 0)
+ return error("Could not construct notes-index");
+
+ memset(¬es_index, 0, sizeof(notes_index));
+ hashcpy(notes_index.tree_sha1, tree->object.sha1);
+ notes_index.offsets[0] = sizeof(notes_index);
+ if (use_hash) {
+ notes_index.count = get_notes_hash_count(tree);
+ for (notes_index.size = 1; notes_index.size / 2
+ >= notes_index.count; notes_index.size <<= 1)
+ ; /* do nothing */
+ memcpy(notes_index.signature, "HASH", 4);
+ } else
+ memcpy(notes_index.signature, "FANO", 4);
+
+ if (write_in_full(new_fd, ¬es_index, sizeof(notes_index)) < 0)
+ return error("Could not write notes-index");
+
+ buf = fill_tree_descriptor(&desc, notes_index.tree_sha1);
+ if (!buf)
+ return error("Could not read %s for notes-index",
+ sha1_to_hex(notes_index.tree_sha1));
+
+ i = 0;
+ while (tree_entry(&desc, &entry)) {
+ int j1, j2;
+ unsigned char sha1[20];
+ struct tree_desc desc2;
+ struct name_entry entry2;
+ void *buf2;
+
+ if (!S_ISDIR(entry.mode) ||
+ (j1 = hexval(entry.path[0])) < 0 ||
+ (j2 = hexval(entry.path[1])) < 0)
+ continue;
+ sha1[0] = j1 * 16 + j2;
+ while (++i < sha1[0])
+ notes_index.offsets[i] = notes_index.offsets[i - 1];
+
+ hashcpy(notes_index.subtree_sha1[i], entry.sha1);
+ buf2 = fill_tree_descriptor(&desc2, entry.sha1);
+ if (!buf2)
+ continue;
+ while(tree_entry(&desc2, &entry2)) {
+ if (get_remaining_hexchars(sha1, entry2.path))
+ continue;
+ if (use_hash && write_hash_gap(new_fd, sha1))
+ return -1;
+ if (write_in_full(new_fd, sha1, 20) < 0 ||
+ write_in_full(new_fd,
+ entry2.sha1, 20) < 0)
+ return error("Could not write notes-index");
+ }
+ free(buf2);
+ notes_index.offsets[i] = lseek(new_fd, 0, SEEK_CUR);
+ }
+ free(buf);
+
+ while (++i < 256)
+ notes_index.offsets[i] = notes_index.offsets[i - 1];
+
+ /* update fan_out */
+ lseek(new_fd, 0, SEEK_SET);
+ write(new_fd, ¬es_index, sizeof(notes_index));
+ lseek(new_fd, notes_index.offsets[255], SEEK_SET);
+
+ return close(new_fd) || commit_lock_file(&update_lock) ||
+ (notes_index_fd = open(git_path("notes-index"), O_RDONLY));
+}
+
+static void *notes_mmap;
+
+static void unmap_notes_mmap(void)
+{
+ munmap(notes_mmap, notes_index.offsets[255]);
+}
+
+static int get_notes_fan_out(const unsigned char *commit_sha1,
+ unsigned char *notes_sha1)
+{
+ /*
+ * Header is assumed to be read.
+ *
+ * mmap() the area, and bisect.
+ */
+ off_t off;
+ size_t size;
+ int i, i2, ret = -1;
+ struct notes_entry *list;
+
+ i = commit_sha1[0];
+ off = i ? notes_index.offsets[i - 1] : sizeof(notes_index);
+ size = notes_index.offsets[i] - off;
+ if (!size)
+ return -1;
+
+ if (!notes_mmap) {
+ notes_mmap = xmmap(NULL, notes_index.offsets[255],
+ PROT_READ, MAP_PRIVATE, notes_index_fd, 0);
+ atexit(unmap_notes_mmap);
+ }
+
+ list = (void *)((char *)notes_mmap + off);
+
+ i = 0;
+ i2 = size / sizeof(*list);
+ while (i + 1 < i2) {
+ int middle = (i + i2) / 2;
+ int cmp = hashcmp(commit_sha1, list[middle].commit_sha1);
+ if (cmp < 0)
+ i2 = middle;
+ else if (cmp > 0)
+ i = middle;
+ else {
+ hashcpy(notes_sha1, list[middle].notes_sha1);
+ i = middle;
+ ret = 0;
+ break;
+ }
+ }
+ if (i == 0 && !hashcmp(commit_sha1, list[i].commit_sha1)) {
+ hashcpy(notes_sha1, list[i].notes_sha1);
+ ret = 0;
+ }
+
+ return ret;
+}
+
+static int get_notes_hash(const unsigned char *commit_sha1,
+ unsigned char *notes_sha1)
+{
+ /*
+ * Header is assumed to be read. fd is still open.
+ *
+ * Seek to hash, read until lower or equal (0000... is lower...)
+ */
+ int i = get_hash_index(commit_sha1);
+ struct notes_entry entry;
+
+ lseek(notes_index_fd,
+ sizeof(notes_index) + i * sizeof(entry), SEEK_SET);
+ while (!read_in_full(notes_index_fd, &entry, sizeof(entry)) &&
+ !is_null_sha1(entry.commit_sha1)) {
+ int cmp = hashcmp(commit_sha1, entry.commit_sha1);
+ if (!cmp) {
+ hashcpy(notes_sha1, entry.notes_sha1);
+ return 0;
+ } else if (cmp < 0)
+ break;
+ }
+ return -1;
+}
+
+static inline void init_notes_index(void)
+{
+ const char *env;
+ struct commit *notes_ref;
+ unsigned char sha1[20];
+
+ if (initialized)
+ return;
+
+ initialized = 1;
+ env = getenv(GIT_NOTES_REF);
+ if (env) {
+ if (notes_ref_name)
+ free(notes_ref_name);
+ notes_ref_name = xstrdup(env);
+ } else if (!notes_ref_name)
+ notes_ref_name = xstrdup("refs/notes/commits");
+
+ if (!notes_ref_name)
+ return;
+ if (read_ref(notes_ref_name, sha1)) {
+ free(notes_ref_name);
+ notes_ref_name = NULL;
+ return;
+ }
+ env = getenv("GIT_NOTES_MODE");
+ if (env && !strcmp("HASH", env)) {
+ use_hash = 1;
+ get_notes = get_notes_hash;
+ } else if (env && !strcmp("FANO", env))
+ get_notes = get_notes_fan_out;
+ if (get_notes && !get_sha1(notes_ref_name, sha1) &&
+ (notes_ref = (struct commit *)parse_object(sha1)) &&
+ notes_ref->object.type == OBJ_COMMIT)
+ if (index_uptodate_check(notes_ref->tree))
+ if (update_index(notes_ref->tree))
+ get_notes = NULL; /* disable notes-index */
+}
+
void get_commit_notes(const struct commit *commit,
char **buf_p, unsigned long *offset_p, unsigned long *space_p)
{
@@ -15,28 +375,20 @@ void get_commit_notes(const struct commit *commit,
unsigned long msgoffset, msglen;
enum object_type type;
- if (!initialized) {
- const char *env = getenv(GIT_NOTES_REF);
- if (env) {
- if (notes_ref_name)
- free(notes_ref_name);
- notes_ref_name = xstrdup(getenv(GIT_NOTES_REF));
- } else if (!notes_ref_name)
- notes_ref_name = xstrdup("refs/notes/commits");
- if (notes_ref_name && read_ref(notes_ref_name, sha1)) {
- free(notes_ref_name);
- notes_ref_name = NULL;
- }
- initialized = 1;
- }
- if (!notes_ref_name)
- return;
+ init_notes_index();
- hex = sha1_to_hex(commit->object.sha1);
- snprintf(name, sizeof(name), "%s:%.*s/%.*s",
- notes_ref_name, 2, hex, 38, hex + 2);
- if (get_sha1(name, sha1))
+ if (!notes_ref_name)
return;
+ if (get_notes) {
+ if (get_notes(commit->object.sha1, sha1))
+ return;
+ } else {
+ hex = sha1_to_hex(commit->object.sha1);
+ snprintf(name, sizeof(name), "%s:%.*s/%.*s",
+ notes_ref_name, 2, hex, 38, hex + 2);
+ if (get_sha1(name, sha1))
+ return;
+ }
if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen)
return;
diff --git a/t/t3302-notes-index-expensive.sh b/t/t3302-notes-index-expensive.sh
new file mode 100755
index 0000000..075b8e2
--- /dev/null
+++ b/t/t3302-notes-index-expensive.sh
@@ -0,0 +1,118 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johannes E. Schindelin
+#
+
+test_description='Test commit notes index (expensive!)'
+
+. ./test-lib.sh
+
+test -z "$GIT_NOTES_TIMING_TESTS" && {
+ say Skipping timing tests
+ test_done
+ exit
+}
+
+create_repo () {
+ number_of_commits=$1
+ nr=0
+ parent=
+ test -d .git || {
+ git init &&
+ tree=$(git write-tree) &&
+ while [ $nr -lt $number_of_commits ]; do
+ test_tick &&
+ commit=$(echo $nr | git commit-tree $tree $parent) ||
+ return
+ parent="-p $commit"
+ nr=$(($nr+1))
+ done &&
+ git update-ref refs/heads/master $commit &&
+ {
+ export GIT_INDEX_FILE=.git/temp;
+ git rev-list HEAD | cat -n | sed "s/^[ ][ ]*/ /g" |
+ while read nr sha1; do
+ blob=$(echo note $nr | git hash-object -w --stdin) &&
+ echo $sha1 | sed "s/^../0644 $blob 0 &\//"
+ done | git update-index --index-info &&
+ tree=$(git write-tree) &&
+ test_tick &&
+ commit=$(echo notes | git commit-tree $tree) &&
+ git update-ref refs/notes/commits $commit
+ } &&
+ git config core.notesRef refs/notes/commits
+ }
+}
+
+test_notes () {
+ count=$1 &&
+ git config core.notesRef refs/notes/commits &&
+ git log | grep "^ " > output &&
+ i=1 &&
+ while [ $i -le $count ]; do
+ echo " $(($count-$i))" &&
+ echo " note $i" &&
+ i=$(($i+1));
+ done > expect &&
+ git diff expect output
+}
+
+cat > time_notes << EOF
+ mode=\$1
+ i=1
+ while [ \$i -lt \$2 ]; do
+ case \$1 in
+ no-notes)
+ export GIT_NOTES_REF=non-existing
+ ;;
+ no-cash)
+ unset GIT_NOTES_REF
+ export GIT_NOTES_MODE=NONE
+ ;;
+ hash-cache-create)
+ unset GIT_NOTES_REF
+ export GIT_NOTES_MODE=HASH
+ rm .git/notes-index
+ ;;
+ hash-cache)
+ unset GIT_NOTES_REF
+ export GIT_NOTES_MODE=HASH
+ ;;
+ sorted-list-cache-create)
+ unset GIT_NOTES_REF
+ export GIT_NOTES_MODE=FANO
+ rm .git/notes-index
+ ;;
+ sorted-list-cache)
+ unset GIT_NOTES_REF
+ export GIT_NOTES_MODE=FANO
+ ;;
+ esac
+ git log >/dev/null
+ i=\$((\$i+1))
+ done
+EOF
+
+time_notes () {
+ for mode in no-notes no-cash \
+ hash-cache-create hash-cache \
+ sorted-list-cache-create sorted-list-cache; do
+ echo $mode
+ /usr/bin/time sh ../trash/time_notes $mode $1
+ done
+}
+
+for count in 10 100 1000; do
+
+ test -d ../trash-$count || mkdir ../trash-$count
+ (cd ../trash-$count;
+
+ test_expect_success "setup $count" "create_repo $count"
+
+ test_expect_success 'notes work' "test_notes $count"
+
+ test_expect_success 'notes timing' "time_notes 100"
+ )
+done
+
+test_done
--
1.5.3.rc1.2718.gd2dc9-dirty
^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [WIP PATCH 6/6] notes: add notes-index for a substantial speedup.
2007-07-15 23:26 ` [WIP PATCH 6/6] notes: add notes-index for a substantial speedup Johannes Schindelin
@ 2007-07-15 23:33 ` Johannes Schindelin
2007-07-16 6:01 ` Shawn O. Pearce
1 sibling, 0 replies; 42+ messages in thread
From: Johannes Schindelin @ 2007-07-15 23:33 UTC (permalink / raw)
To: Alberto Bertogli, git, gitster, Johan Herland
Hi,
[this explains what Patch 6/6 is all about:]
If GIT_NOTES_TIMING_TESTS is set, t3302 will output some timing data.
It will create three repositories, the first with 10 commits and a
commit note for each, the second with 100, the third with 1000.
For each repository, it times "git log" 100 times in several modes:
- with GIT_NOTES_REF set to a non-existing ref (should be equivalent to
the timings without this patch series),
- with no .git/notes-index,
- recreating .git/notes-index as a hash map _every_ time,
- creating .git/notes-index as a hash map, and using it the rest of the time,
- recreating .git/notes-index as a sorted list _every_ time, and
- creating .git/notes-index as a sorted list only the first time, and then
using it to find the notes by binary search.
Here is the output:
* expecting success: create_repo 10
* ok 1: setup 10
* expecting success: test_notes 10
diff --git a/expect b/output
* ok 2: notes work
* expecting success: time_notes 100
no-notes
0.13user 0.08system 0:00.22elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+45271minor)pagefaults 0swaps
no-cash
0.14user 0.13system 0:00.28elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+49421minor)pagefaults 0swaps
hash-cache-create
0.16user 0.24system 0:00.41elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+73111minor)pagefaults 0swaps
hash-cache
0.10user 0.08system 0:00.18elapsed 101%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+45660minor)pagefaults 0swaps
sorted-list-cache-create
0.23user 0.17system 0:00.40elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+72056minor)pagefaults 0swaps
sorted-list-cache
0.12user 0.08system 0:00.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+46854minor)pagefaults 0swaps
* ok 3: notes timing
* expecting success: create_repo 100
* ok 1: setup 100
* expecting success: test_notes 100
diff --git a/expect b/output
* ok 2: notes work
* expecting success: time_notes 100
no-notes
0.38user 0.18system 0:00.56elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+54386minor)pagefaults 0swaps
no-cash
1.45user 0.66system 0:02.13elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+93980minor)pagefaults 0swaps
hash-cache-create
1.56user 0.98system 0:02.56elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+132604minor)pagefaults 0swaps
hash-cache
0.38user 0.17system 0:00.56elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+54785minor)pagefaults 0swaps
sorted-list-cache-create
1.56user 0.88system 0:02.47elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+124215minor)pagefaults 0swaps
sorted-list-cache
0.44user 0.23system 0:00.68elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+64952minor)pagefaults 0swaps
* ok 3: notes timing
* expecting success: create_repo 1000
* ok 1: setup 1000
* expecting success: test_notes 1000
diff --git a/expect b/output
* ok 2: notes work
* expecting success: time_notes 100
no-notes
2.95user 1.19system 0:04.18elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+144766minor)pagefaults 0swaps
no-cash
23.05user 5.86system 0:33.06elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+639774minor)pagefaults 0swaps
hash-cache-create
23.86user 7.21system 0:32.67elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+710958minor)pagefaults 0swaps
hash-cache
3.16user 1.18system 0:04.35elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+145160minor)pagefaults 0swaps
sorted-list-cache-create
23.22user 7.32system 0:31.66elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+686007minor)pagefaults 0swaps
sorted-list-cache
3.74user 1.81system 0:05.77elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+241987minor)pagefaults 0swaps
* ok 3: notes timing
Results:
These timings were taken from a desktop machine with a few background
processes running, so take them with a grain of salt.
As expected, without a .git/notes-index, it scales pretty badly. Creating
.git/notes-index is slightly worse than that, but it typically happens
much less often than looking at a commit message. Therefore the work is
worth it, since the lookup _with_ .git/notes-index is in the same ball park
as no notes at all, with the hash map being better than the sorted
list lookup.
Therefore I will go with the hash map approach, when cleaning up patch 6/6.
But not tonight.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 2/6] Introduce commit notes
2007-07-15 23:23 ` [PATCH 2/6] Introduce commit notes Johannes Schindelin
@ 2007-07-15 23:36 ` Junio C Hamano
2007-07-15 23:52 ` Johannes Schindelin
2007-07-16 0:05 ` Junio C Hamano
2007-07-16 5:11 ` Junio C Hamano
1 sibling, 2 replies; 42+ messages in thread
From: Junio C Hamano @ 2007-07-15 23:36 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Alberto Bertogli, git, Johan Herland
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> The notes ref is a branch which contains trees much like the
> loose object trees in .git/objects/. In other words, to get
> at the commit notes for a given SHA-1, take the first two
> hex characters as directory name, and the remaining 38 hex
> characters as base name, and look that up in the notes ref.
> ...
> However, a remedy is near: in a later commit, a .git/notes-index
> will be introduced, a cached mapping from commits to commit notes,
> to be written when the tree name of the notes ref changes. In
> case that notes-index cannot be written, the current (possibly
> slow) code will come into effect again.
I wonder if it is worth using the fan-out tree structure for the
underlying "note" trees, as the notes-index would be the primary
way to access them.
Not that I've looked at the code too deeply with an intention of
possibly including it early. I was hoping to see fixes to d/f
code in merge-recursive from either you or Alex instead ;-)
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 2/6] Introduce commit notes
2007-07-15 23:36 ` Junio C Hamano
@ 2007-07-15 23:52 ` Johannes Schindelin
2007-07-16 0:05 ` Junio C Hamano
1 sibling, 0 replies; 42+ messages in thread
From: Johannes Schindelin @ 2007-07-15 23:52 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Alberto Bertogli, git, Johan Herland
Hi,
On Sun, 15 Jul 2007, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > The notes ref is a branch which contains trees much like the
> > loose object trees in .git/objects/. In other words, to get
> > at the commit notes for a given SHA-1, take the first two
> > hex characters as directory name, and the remaining 38 hex
> > characters as base name, and look that up in the notes ref.
> > ...
> > However, a remedy is near: in a later commit, a .git/notes-index
> > will be introduced, a cached mapping from commits to commit notes,
> > to be written when the tree name of the notes ref changes. In
> > case that notes-index cannot be written, the current (possibly
> > slow) code will come into effect again.
>
> I wonder if it is worth using the fan-out tree structure for the
> underlying "note" trees, as the notes-index would be the primary
> way to access them.
The fan-out tree is a nice fallback solution when you cannot write the
notes-index.
> Not that I've looked at the code too deeply with an intention of
> possibly including it early. I was hoping to see fixes to d/f code in
> merge-recursive from either you or Alex instead ;-)
Well, yeah. I was kind of trying to cool off from my unpleasant
unpack_trees() experience.
But I'll look into the issue again this week. Promise.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 2/6] Introduce commit notes
2007-07-15 23:36 ` Junio C Hamano
2007-07-15 23:52 ` Johannes Schindelin
@ 2007-07-16 0:05 ` Junio C Hamano
1 sibling, 0 replies; 42+ messages in thread
From: Junio C Hamano @ 2007-07-16 0:05 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Alberto Bertogli, git, Johan Herland
Junio C Hamano <gitster@pobox.com> writes:
> I wonder if it is worth using the fan-out tree structure for the
> underlying "note" trees, as the notes-index would be the primary
> way to access them.
Actually now I think about it I think this was a stupid
suggestion. Creation of a new note in a reasonably well
populated note tree would be made 256-fold more efficient by
having the fan-out, as write-tree does not have to recompute the
other 255 tree objects thanks to the cache-tree data being
fresh.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 2/6] Introduce commit notes
2007-07-15 23:23 ` [PATCH 2/6] Introduce commit notes Johannes Schindelin
2007-07-15 23:36 ` Junio C Hamano
@ 2007-07-16 5:11 ` Junio C Hamano
2007-07-19 2:30 ` [REVISED PATCH " Johannes Schindelin
1 sibling, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2007-07-16 5:11 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Alberto Bertogli, git, gitster, Johan Herland
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> +core.notesRef::
> + When showing commit messages, also show notes which are stored in
> + the given ref. This ref is expected to contain paths of the form
> + ??/*, where the directory name consists of the first two
> + characters of the commit name, and the base name consists of
> + the remaining 38 characters.
> ++
> +If such a path exists in the given ref, the referenced blob is read, and
> +appended to the commit message, separated by a "Notes:" line. If the
> +given ref itself does not exist, it is not an error, but means that no
> +notes should be print.
> ++
> +This setting defaults to "refs/notes/commits", and can be overridden by
> +the `GIT_NOTES_REF` environment variable.
> +
This design forces "one blob and only one blob decorates a
commit". It certainly makes the implementation and semantics
simpler -- if I have this note and you have that note on the
same commit, comparing notes eventually should result in a merge
of our notes. But is it sufficient in real life usage scenarios
(what's the use case)? One example that was raised on the list
is to collect "Acked-by", "Tested-by", etc., and in that case
perhaps one set "refs/notes/acks" may hold the former while
"refs/notes/tests" the latter. If we wanted to show both at the
same time, is it the only option to put them in the same "note"
blob and not use "refs/notes/{acks,tests}"?
> diff --git a/commit.c b/commit.c
> index 0c350bc..3529b6a 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -6,6 +6,7 @@
> #include "interpolate.h"
> #include "diff.h"
> #include "revision.h"
> +#include "notes.h"
>
> int save_commit_buffer = 1;
>
> @@ -1254,6 +1255,10 @@ unsigned long pretty_print_commit(enum cmit_fmt fmt,
> */
> if (fmt == CMIT_FMT_EMAIL && offset <= beginning_of_body)
> buf[offset++] = '\n';
> +
> + if (fmt != CMIT_FMT_ONELINE)
> + get_commit_notes(commit, buf_p, &offset, space_p);
> +
> buf[offset] = '\0';
> free(reencoded);
> return offset;
This makes me wonder if there are cases where "notes" need to be
reencoded to honor log_output_encoding.
Since more and more people live in UTF-8 only world, and this is
a _new_ feature anyway, we could declare that "notes" blobs MUST
be encoded in UTF-8 upfront, but even if we did so we would need
reencoding to log_output_encoding, I suspect.
> diff --git a/notes.c b/notes.c
> new file mode 100644
> index 0000000..5d1bb1a
> --- /dev/null
> +++ b/notes.c
> @@ -0,0 +1,64 @@
> +#include "cache.h"
> +#include "commit.h"
> +#include "notes.h"
> +#include "refs.h"
> +
> +static int initialized;
> +
> +void get_commit_notes(const struct commit *commit,
> + char **buf_p, unsigned long *offset_p, unsigned long *space_p)
> +{
> + char name[80];
> + const char *hex;
> + unsigned char sha1[20];
> + char *msg;
> + unsigned long msgoffset, msglen;
> + enum object_type type;
> +
> + if (!initialized) {
> + const char *env = getenv(GIT_NOTES_REF);
> + if (env) {
> + if (notes_ref_name)
> + free(notes_ref_name);
> + notes_ref_name = xstrdup(getenv(GIT_NOTES_REF));
xstrdup(env)?
> + } else if (!notes_ref_name)
> + notes_ref_name = xstrdup("refs/notes/commits");
We would probably want to give another preprocessor constant for
this hardcoded string, next to GIT_NOTES_REF definition in cache.h.
> + if (notes_ref_name && read_ref(notes_ref_name, sha1)) {
> + free(notes_ref_name);
> + notes_ref_name = NULL;
> + }
> + initialized = 1;
> + }
> + if (!notes_ref_name)
> + return;
> +
> + hex = sha1_to_hex(commit->object.sha1);
> + snprintf(name, sizeof(name), "%s:%.*s/%.*s",
> + notes_ref_name, 2, hex, 38, hex + 2);
Too long a notes_ref_name and it won't overrun the buffer but
the failure is not detected, and ...
> + if (get_sha1(name, sha1))
> + return;
... this would fail silently, leaving the user scratching his head.
> +
> + if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen)
> + return;
What's in "type" at this point? Having a tree there is not an
error?
> + /* we will end the annotation by a newline anyway. */
> + if (msg[msglen - 1] == '\n')
> + msglen--;
> + ALLOC_GROW(*buf_p, *offset_p + 14 + msglen, *space_p);
> + *offset_p += sprintf(*buf_p + *offset_p, "\nNotes:\n");
Fourteen is because...
> +
> + for (msgoffset = 0; msgoffset < msglen;) {
> + int linelen = get_line_length(msg + msgoffset, msglen);
> +
> + ALLOC_GROW(*buf_p, *offset_p + linelen + 6, *space_p);
> + *offset_p += sprintf(*buf_p + *offset_p,
> + " %.*s", linelen, msg + msgoffset);
Six is because...
> + msgoffset += linelen;
> + }
> + ALLOC_GROW(*buf_p, *offset_p + 1, *space_p);
> + (*buf_p)[*offset_p] = '\n';
> + (*offset_p)++;
> + free(msg);
> +}
> +
> +
> diff --git a/notes.h b/notes.h
> new file mode 100644
> index 0000000..aed80e7
> --- /dev/null
> +++ b/notes.h
> @@ -0,0 +1,9 @@
> +#ifndef NOTES_H
> +#define NOTES_H
> +
> +void get_commit_notes(const struct commit *commit,
> + char **buf_p, unsigned long *offset_p, unsigned long *space_p);
> +
> +#define GIT_NOTES_REF "GIT_NOTES_REF"
Judging from the existing entries in cache.h, it seems that
GIT_NOTES_REF_ENVIRONMENT would be more appropriate preprocessor
symbol for this. Also let's have this in cache.h next to
GIT_DIR_ENVIRONMENT and friends, with another definition for
"refs/notes/commits".
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 3/6] Add git-notes
2007-07-15 23:23 ` [PATCH 3/6] Add git-notes Johannes Schindelin
@ 2007-07-16 5:11 ` Junio C Hamano
2007-07-19 2:31 ` [REVISED PATCH " Johannes Schindelin
0 siblings, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2007-07-16 5:11 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Alberto Bertogli, git, gitster, Johan Herland
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> This script allows you to edit and show commit notes easily.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> .gitignore | 1 +
> Makefile | 2 +-
> git-notes.sh | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 63 insertions(+), 1 deletions(-)
> create mode 100755 git-notes.sh
>
> diff --git a/git-notes.sh b/git-notes.sh
> new file mode 100755
> index 0000000..e0ad0b9
> --- /dev/null
> +++ b/git-notes.sh
> @@ -0,0 +1,61 @@
> +#!/bin/sh
> +
> +USAGE="(edit | show) [commit]"
> +. git-sh-setup
> +
> +test -n "$3" && usage
> +
> +test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="$(git config core.notesref)"
> +test -z "$GIT_NOTES_REF" &&
> + die "No notes ref set."
test -n "${GIT_NOTES_REF=$(git config core.notesref)}" || die
> +COMMIT=$(git rev-parse --verify --default HEAD "$2")
This silently annotates the HEAD commit if $2 is misspelled, I
suspect. Also if HEAD does not exist, COMMIT will be empty and
this whole command will exit with non-zero status, which you
would want to catch here...
> +NAME=$(echo $COMMIT | sed "s/^../&\//")
... or here.
> +case "$1" in
> +edit)
> + MESSAGE="$GIT_DIR"/new-notes
> + GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MESSAGE"
$MESSAGE and its associated temporary file needs to be cleaned
up upon command exit; perhaps a trap is in order.
> + GIT_INDEX_FILE="$MESSAGE".idx
> + export GIT_INDEX_FILE
> +
> + CURRENT_HEAD=$(git show-ref $GIT_NOTES_REF | cut -f 1 -d ' ')
> + if [ -z "$CURRENT_HEAD" ]; then
> + PARENT=
> + else
> + PARENT="-p $OLDTIP"
> + git read-tree $GIT_NOTES_REF || die "Could not read index"
> + git cat-file blob :$NAME >> "$MESSAGE" 2> /dev/null
> + fi
I take that OLDTIP is a typo.
if CURRENT_HEAD=$(git show-ref -s "$GIT_NOTES_REF")
then
PARENT="-p $CURRENT_HEAD"
...
else
PARENT=
fi
> +
> + ${VISUAL:-${EDITOR:-vi}} "$MESSAGE"
> +
> + grep -v ^# < "$MESSAGE" | git stripspace > "$MESSAGE".processed
Makes us wonder if we would want to teach hash-stripping to
git-stripspace, doesn't it?
> + mv "$MESSAGE".processed "$MESSAGE"
> + if [ -z "$(cat "$MESSAGE")" ]; then
Make this 'if test -s "$MESSAGE"' and swap then/else clause
around; no reason to slurp the value into your shell.
> + test -z "$CURRENT_HEAD" &&
> + die "Will not initialise with empty tree"
> + git update-index --force-remove $NAME ||
> + die "Could not update index"
> + else
> + BLOB=$(git hash-object -w "$MESSAGE") ||
> + die "Could not write into object database"
> + git update-index --add --cacheinfo 0644 $BLOB $NAME ||
> + die "Could not write index"
> + fi
> +
> + TREE=$(git write-tree) || die "Could not write tree"
> + NEW_HEAD=$(: | git commit-tree $TREE $PARENT) ||
> + die "Could not annotate"
Hmph. How about "echo Annotate $COMMIT | git commit-tree..."?
> + case "$CURRENT_HEAD" in
> + '') git update-ref $GIT_NOTES_REF $NEW_HEAD ;;
> + *) git update-ref $GIT_NOTES_REF $NEW_HEAD $CURRENT_HEAD;;
> + esac
> +;;
There are some places that have "$GIT_NOTES_REF" in dq and some
places you don't. I think GIT_NOTES_REF begins with refs/ and
consists only of valid refname characters, so unless the user
wants to shoot himself in the foot it should be Ok, but we
probably would want to quote it.
Also, as unquoted $CURRENT_HEAD will not even count as a
parameter to update-ref, you do not have to do that case/esac,
but simply do:
git update-ref "$GIT_NOTES_REF" $NEW_HEAD $CURRENT_HEAD
Would we have reflog for this ref? What would we want to see as
the message if we do?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 4/6] Add a test script for "git notes"
2007-07-15 23:24 ` [PATCH 4/6] Add a test script for "git notes" Johannes Schindelin
@ 2007-07-16 5:11 ` Junio C Hamano
2007-07-19 2:32 ` [REVISED PATCH " Johannes Schindelin
0 siblings, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2007-07-16 5:11 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Alberto Bertogli, git, gitster, Johan Herland
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh
> new file mode 100755
> index 0000000..eb50191
> --- /dev/null
> +++ b/t/t3301-notes.sh
> @@ -0,0 +1,63 @@
> +#!/bin/sh
> +#
> +# Copyright (c) 2007 Johannes E. Schindelin
> +#
> +
> +test_description='Test commit notes'
> +
> +. ./test-lib.sh
> +
> +test_expect_success setup '
> + : > a1 &&
> + git add a1 &&
> + test_tick &&
> + git commit -m 1st &&
> + : > a2 &&
> + git add a2 &&
> + test_tick &&
> + git commit -m 2nd
> +'
Does not test the failure mode of not having a HEAD yet.
> +cat > fake_editor.sh << EOF
> +echo "\$MSG" > "\$1"
> +echo "\$MSG" >& 2
> +EOF
You can avoid all these backslashes by saying:
cat >fake_editor.sh <<\EOF
echo "$MSG" >"$1"
echo "$MSG" >&2
EOF
> +chmod a+x fake_editor.sh
> +VISUAL="$(pwd)"/fake_editor.sh
> +export VISUAL
Not that it hurts anybody, but do you really need that $(pwd),
instead of "./fake_editor.sh"?
> +
> +test_expect_success 'need notes ref' '
> + ! MSG=1 git notes edit &&
> + ! MSG=2 git notes show
> +'
> +
> +test_expect_success 'create notes' '
> + git config core.notesRef refs/notes/commits &&
> + MSG=b1 git notes edit &&
> +cat .git/new-notes &&
> +test b1 = "$(cat .git/new-notes)" &&
> + test 1 = $(git ls-tree refs/notes/commits | wc -l) &&
> + test b1 = $(git notes show) &&
> + git show HEAD^ &&
> + ! git notes show HEAD^
> +'
Is there particular reason for that (lack of) indentation for
the two lines among them?
I think it is a bug to leave ".git/new-notes" and friends
behind.
> +
> +cat > expect << EOF
> +commit 268048bfb8a1fb38e703baceb8ab235421bf80c5
> +Author: A U Thor <author@example.com>
> +Date: Thu Apr 7 15:14:13 2005 -0700
> +
> + 2nd
> +
> +Notes:
> + b1
> +EOF
> +
> +test_expect_success 'show notes' '
> + ! (git cat-file commit HEAD | grep b1) &&
> + git log -1 > output &&
> + git diff expect output
> +'
> +
> +test_done
Hmph. This makes the reader wonder why this is not optional,
perhaps linked to --decorate option somehow.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [WIP PATCH 6/6] notes: add notes-index for a substantial speedup.
2007-07-15 23:26 ` [WIP PATCH 6/6] notes: add notes-index for a substantial speedup Johannes Schindelin
2007-07-15 23:33 ` Johannes Schindelin
@ 2007-07-16 6:01 ` Shawn O. Pearce
2007-07-16 16:29 ` Johannes Schindelin
1 sibling, 1 reply; 42+ messages in thread
From: Shawn O. Pearce @ 2007-07-16 6:01 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Alberto Bertogli, git, gitster, Johan Herland
Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>
> Actually, this commit adds two methods for a notes index:
>
> - a sorted list with a fan out to help binary search, and
> - a modified hash table.
>
> It also adds a test which is used to determine the best algorithm.
I know this is a nice backwards compatible way to organize notes,
and to make them reasonably efficiently found, but I'd almost
rather just see them crammed into the packfile alongside of the
commit it annotates, so that the packfile reader can quickly find
the annotation at the same time it finds the commit.
aka packv4.
Ok, enough dreaming for today.
--
Shawn.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 0/6] Introduce commit notes
2007-07-15 23:19 [PATCH 0/6] Introduce commit notes Johannes Schindelin
` (5 preceding siblings ...)
2007-07-15 23:26 ` [WIP PATCH 6/6] notes: add notes-index for a substantial speedup Johannes Schindelin
@ 2007-07-16 7:57 ` Andy Parkins
2007-07-16 8:11 ` Junio C Hamano
6 siblings, 1 reply; 42+ messages in thread
From: Andy Parkins @ 2007-07-16 7:57 UTC (permalink / raw)
To: git; +Cc: Johannes Schindelin, Alberto Bertogli, gitster, Johan Herland
On Monday 2007 July 16, Johannes Schindelin wrote:
> The biggest obstacle was a thinko about the scalability. Tree objects
> take free form name entries, and therefore a binary search by name is not
> possible.
I might be misunderstanding, but in the case of the notes tree objects isn't
it true that the name entries aren't free form, but are guaranteed to be of a
fixed length form:
XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
In which case you can binary search?
Andy
--
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 0/6] Introduce commit notes
2007-07-16 7:57 ` [PATCH 0/6] Introduce commit notes Andy Parkins
@ 2007-07-16 8:11 ` Junio C Hamano
2007-07-16 16:26 ` Johannes Schindelin
0 siblings, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2007-07-16 8:11 UTC (permalink / raw)
To: Andy Parkins; +Cc: git, Johannes Schindelin, Alberto Bertogli, Johan Herland
Andy Parkins <andyparkins@gmail.com> writes:
> On Monday 2007 July 16, Johannes Schindelin wrote:
>
>> The biggest obstacle was a thinko about the scalability. Tree objects
>> take free form name entries, and therefore a binary search by name is not
>> possible.
>
> I might be misunderstanding, but in the case of the notes tree objects isn't
> it true that the name entries aren't free form, but are guaranteed to be of a
> fixed length form:
>
> XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>
> In which case you can binary search?
Hmph, you are right. In this sequence:
hex = sha1_to_hex(commit->object.sha1);
snprintf(name, sizeof(name), "%s:%.*s/%.*s",
notes_ref_name, 2, hex, 38, hex + 2);
if (get_sha1(name, sha1))
return;
Instead, we could read the tree object by hand in the commit
that is referenced by notes_ref_name, which has uniform two
letter names for subtrees which can be binary searched, open the
tree for that entry, again by hand, and do another binary search
because that tree has uniform 38-letter names. That certainly
could be done.
Sounds like a "fun" project for some definition of the word.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 0/6] Introduce commit notes
2007-07-16 8:11 ` Junio C Hamano
@ 2007-07-16 16:26 ` Johannes Schindelin
2007-07-16 17:56 ` Junio C Hamano
0 siblings, 1 reply; 42+ messages in thread
From: Johannes Schindelin @ 2007-07-16 16:26 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Andy Parkins, git, Alberto Bertogli, Johan Herland
Hi,
On Mon, 16 Jul 2007, Junio C Hamano wrote:
> Andy Parkins <andyparkins@gmail.com> writes:
>
> > On Monday 2007 July 16, Johannes Schindelin wrote:
> >
> >> The biggest obstacle was a thinko about the scalability. Tree
> >> objects take free form name entries, and therefore a binary search by
> >> name is not possible.
> >
> > I might be misunderstanding, but in the case of the notes tree objects
> > isn't it true that the name entries aren't free form, but are
> > guaranteed to be of a fixed length form:
> >
> > XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> >
> > In which case you can binary search?
>
> Hmph, you are right. In this sequence:
>
> hex = sha1_to_hex(commit->object.sha1);
> snprintf(name, sizeof(name), "%s:%.*s/%.*s",
> notes_ref_name, 2, hex, 38, hex + 2);
> if (get_sha1(name, sha1))
> return;
>
> Instead, we could read the tree object by hand in the commit that is
> referenced by notes_ref_name, which has uniform two letter names for
> subtrees which can be binary searched, open the tree for that entry,
> again by hand, and do another binary search because that tree has
> uniform 38-letter names. That certainly could be done.
>
> Sounds like a "fun" project for some definition of the word.
I disagree. One disadvantage to using tree objects is that it is much
easier to have pilot errors. You could even make a new working tree
checking out refs/notes/commits and change/add/remove files.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [WIP PATCH 6/6] notes: add notes-index for a substantial speedup.
2007-07-16 6:01 ` Shawn O. Pearce
@ 2007-07-16 16:29 ` Johannes Schindelin
0 siblings, 0 replies; 42+ messages in thread
From: Johannes Schindelin @ 2007-07-16 16:29 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Alberto Bertogli, git, gitster, Johan Herland
Hi,
On Mon, 16 Jul 2007, Shawn O. Pearce wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> >
> > Actually, this commit adds two methods for a notes index:
> >
> > - a sorted list with a fan out to help binary search, and
> > - a modified hash table.
> >
> > It also adds a test which is used to determine the best algorithm.
>
> I know this is a nice backwards compatible way to organize notes,
> and to make them reasonably efficiently found, but I'd almost
> rather just see them crammed into the packfile alongside of the
> commit it annotates, so that the packfile reader can quickly find
> the annotation at the same time it finds the commit.
>
> aka packv4.
>
> Ok, enough dreaming for today.
Yes, I also dream of having the time to play with packv4. If you read my
comments in the commit-annotation thread, you'll see that I stated that
packv4 would solve the problem, too.
The reason I did this series was not to push commit notes, but to make
good for stalling Johan's efforts. Including a proof that the commit
notes as I introduced them can be relatively cheap, too.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 0/6] Introduce commit notes
2007-07-16 16:26 ` Johannes Schindelin
@ 2007-07-16 17:56 ` Junio C Hamano
2007-07-19 1:34 ` Johannes Schindelin
0 siblings, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2007-07-16 17:56 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Andy Parkins, git, Alberto Bertogli, Johan Herland
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>> Hmph, you are right. In this sequence:
>>
>> hex = sha1_to_hex(commit->object.sha1);
>> snprintf(name, sizeof(name), "%s:%.*s/%.*s",
>> notes_ref_name, 2, hex, 38, hex + 2);
>> if (get_sha1(name, sha1))
>> return;
>>
>> Instead, we could read the tree object by hand in the commit that is
>> referenced by notes_ref_name, which has uniform two letter names for
>> subtrees which can be binary searched, open the tree for that entry,
>> again by hand, and do another binary search because that tree has
>> uniform 38-letter names. That certainly could be done.
>>
>> Sounds like a "fun" project for some definition of the word.
>
> I disagree. One disadvantage to using tree objects is that it is much
> easier to have pilot errors. You could even make a new working tree
> checking out refs/notes/commits and change/add/remove files.
I suspect you read me wrong. I was saying that it is possible
to use a specialized tree object parser in place of get_sha1()
only in the above code to read the tree objects that represents
a 'note'. You obviously would want to do a sanity check such
as:
- The size of the tree object your customized tree parser is
fed is multiple of expected entry size (mode word + 20 SHA1 +
2 + NUL for fan-out, replace 2 with 38 for lower level);
- mode word for the entry is sane (an entry in the fan-out tree
would point at a tree object, an entry in lower level would
point at a blob);
- The name part (2 or 38) are lowercase hexadecimal strings;
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 0/6] Introduce commit notes
2007-07-16 17:56 ` Junio C Hamano
@ 2007-07-19 1:34 ` Johannes Schindelin
0 siblings, 0 replies; 42+ messages in thread
From: Johannes Schindelin @ 2007-07-19 1:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Andy Parkins, git, Alberto Bertogli, Johan Herland
Hi,
On Mon, 16 Jul 2007, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> >> Hmph, you are right. In this sequence:
> >>
> >> hex = sha1_to_hex(commit->object.sha1);
> >> snprintf(name, sizeof(name), "%s:%.*s/%.*s",
> >> notes_ref_name, 2, hex, 38, hex + 2);
> >> if (get_sha1(name, sha1))
> >> return;
> >>
> >> Instead, we could read the tree object by hand in the commit that is
> >> referenced by notes_ref_name, which has uniform two letter names for
> >> subtrees which can be binary searched, open the tree for that entry,
> >> again by hand, and do another binary search because that tree has
> >> uniform 38-letter names. That certainly could be done.
> >>
> >> Sounds like a "fun" project for some definition of the word.
> >
> > I disagree. One disadvantage to using tree objects is that it is much
> > easier to have pilot errors. You could even make a new working tree
> > checking out refs/notes/commits and change/add/remove files.
>
> I suspect you read me wrong. I was saying that it is possible to use a
> specialized tree object parser in place of get_sha1() only in the above
> code to read the tree objects that represents a 'note'. You obviously
> would want to do a sanity check such as:
>
> - The size of the tree object your customized tree parser is
> fed is multiple of expected entry size (mode word + 20 SHA1 +
> 2 + NUL for fan-out, replace 2 with 38 for lower level);
>
> - mode word for the entry is sane (an entry in the fan-out tree
> would point at a tree object, an entry in lower level would
> point at a blob);
>
> - The name part (2 or 38) are lowercase hexadecimal strings;
In which case it is not _that_ attractive any more, since you
- have to have a fallback anyway, and
- have a relatively complex thing.
Instead, I want to go with the hash map approach, if only to have a O(1)
behaviour instead of O(log N).
Ciao,
Dscho
^ permalink raw reply [flat|nested] 42+ messages in thread
* [REVISED PATCH 2/6] Introduce commit notes
2007-07-16 5:11 ` Junio C Hamano
@ 2007-07-19 2:30 ` Johannes Schindelin
2007-07-19 3:28 ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 42+ messages in thread
From: Johannes Schindelin @ 2007-07-19 2:30 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Alberto Bertogli, git, Johan Herland
Commit notes are blobs which are shown together with the commit
message. These blobs are taken from the notes ref, which you can
configure by the config variable core.notesRef, which in turn can
be overridden by the environment variable GIT_NOTES_REF.
The notes ref is a branch which contains trees much like the
loose object trees in .git/objects/. In other words, to get
at the commit notes for a given SHA-1, take the first two
hex characters as directory name, and the remaining 38 hex
characters as base name, and look that up in the notes ref.
The rationale for putting this information into a ref is this: we
want to be able to fetch and possibly union-merge the notes,
maybe even look at the date when a note was introduced, and we
want to store them efficiently together with the other objects.
There is one severe shortcoming, though. Since tree objects can
contain file names of a variable length, it is not possible to
do a binary search for the correct base name in the tree object's
contents. Therefore this approach does not scale well, because
the average lookup time will be proportional to the number of
commit objects, and therefore the slowdown will be quadratic in
that number.
However, a remedy is near: in a later commit, a .git/notes-index
will be introduced, a cached mapping from commits to commit notes,
to be written when the tree name of the notes ref changes. In
case that notes-index cannot be written, the current (possibly
slow) code will come into effect again.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
On Sun, 15 Jul 2007, Junio C Hamano wrote:
> This design forces "one blob and only one blob decorates a
> commit". It certainly makes the implementation and semantics
> simpler -- if I have this note and you have that note on the
> same commit, comparing notes eventually should result in a merge
> of our notes. But is it sufficient in real life usage scenarios
> (what's the use case)? One example that was raised on the list
> is to collect "Acked-by", "Tested-by", etc., and in that case
> perhaps one set "refs/notes/acks" may hold the former while
> "refs/notes/tests" the latter. If we wanted to show both at the
> same time, is it the only option to put them in the same "note"
> blob and not use "refs/notes/{acks,tests}"?
Would that not make things even slower? I am hesitant.
All other concerns should be addressed, here and in the two
upcoming revised patches.
Documentation/config.txt | 15 +++++++++
Makefile | 3 +-
cache.h | 3 ++
commit.c | 5 +++
config.c | 5 +++
environment.c | 1 +
notes.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++
notes.h | 8 +++++
8 files changed, 116 insertions(+), 1 deletions(-)
create mode 100644 notes.c
create mode 100644 notes.h
diff --git a/Documentation/config.txt b/Documentation/config.txt
index d0e9a17..5fe833d 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -285,6 +285,21 @@ core.pager::
The command that git will use to paginate output. Can be overridden
with the `GIT_PAGER` environment variable.
+core.notesRef::
+ When showing commit messages, also show notes which are stored in
+ the given ref. This ref is expected to contain paths of the form
+ ??/*, where the directory name consists of the first two
+ characters of the commit name, and the base name consists of
+ the remaining 38 characters.
++
+If such a path exists in the given ref, the referenced blob is read, and
+appended to the commit message, separated by a "Notes:" line. If the
+given ref itself does not exist, it is not an error, but means that no
+notes should be print.
++
+This setting defaults to "refs/notes/commits", and can be overridden by
+the `GIT_NOTES_REF` environment variable.
+
alias.*::
Command aliases for the gitlink:git[1] command wrapper - e.g.
after defining "alias.last = cat-file commit HEAD", the invocation
diff --git a/Makefile b/Makefile
index d7541b4..119d949 100644
--- a/Makefile
+++ b/Makefile
@@ -322,7 +322,8 @@ LIB_OBJS = \
write_or_die.o trace.o list-objects.o grep.o match-trees.o \
alloc.o merge-file.o path-list.o help.o unpack-trees.o $(DIFF_OBJS) \
color.o wt-status.o archive-zip.o archive-tar.o shallow.o utf8.o \
- convert.o attr.o decorate.o progress.o mailmap.o symlinks.o remote.o
+ convert.o attr.o decorate.o progress.o mailmap.o symlinks.o remote.o \
+ notes.o
BUILTIN_OBJS = \
builtin-add.o \
diff --git a/cache.h b/cache.h
index 328c1ad..df45336 100644
--- a/cache.h
+++ b/cache.h
@@ -204,6 +204,8 @@ enum object_type {
#define GITATTRIBUTES_FILE ".gitattributes"
#define INFOATTRIBUTES_FILE "info/attributes"
#define ATTRIBUTE_MACRO_PREFIX "[attr]"
+#define GIT_NOTES_REF_ENVIRONMENT "GIT_NOTES_REF"
+#define GIT_NOTES_DEFAULT_REF "refs/notes/commits"
extern int is_bare_repository_cfg;
extern int is_bare_repository(void);
@@ -309,6 +311,7 @@ extern size_t packed_git_window_size;
extern size_t packed_git_limit;
extern size_t delta_base_cache_limit;
extern int auto_crlf;
+extern char *notes_ref_name;
#define GIT_REPO_VERSION 0
extern int repository_format_version;
diff --git a/commit.c b/commit.c
index 0c350bc..8911a18 100644
--- a/commit.c
+++ b/commit.c
@@ -6,6 +6,7 @@
#include "interpolate.h"
#include "diff.h"
#include "revision.h"
+#include "notes.h"
int save_commit_buffer = 1;
@@ -1254,6 +1255,10 @@ unsigned long pretty_print_commit(enum cmit_fmt fmt,
*/
if (fmt == CMIT_FMT_EMAIL && offset <= beginning_of_body)
buf[offset++] = '\n';
+
+ if (fmt != CMIT_FMT_ONELINE)
+ get_commit_notes(commit, buf_p, &offset, space_p, encoding);
+
buf[offset] = '\0';
free(reencoded);
return offset;
diff --git a/config.c b/config.c
index f89a611..05d2ad6 100644
--- a/config.c
+++ b/config.c
@@ -395,6 +395,11 @@ int git_default_config(const char *var, const char *value)
return 0;
}
+ if (!strcmp(var, "core.notesref")) {
+ notes_ref_name = xstrdup(value);
+ return 0;
+ }
+
if (!strcmp(var, "user.name")) {
strlcpy(git_default_name, value, sizeof(git_default_name));
return 0;
diff --git a/environment.c b/environment.c
index f83fb9e..2e677d3 100644
--- a/environment.c
+++ b/environment.c
@@ -34,6 +34,7 @@ char *pager_program;
int pager_in_use;
int pager_use_color = 1;
int auto_crlf = 0; /* 1: both ways, -1: only when adding git objects */
+char *notes_ref_name;
static const char *git_dir;
static char *git_object_dir, *git_index_file, *git_refs_dir, *git_graft_file;
diff --git a/notes.c b/notes.c
new file mode 100644
index 0000000..6207f95
--- /dev/null
+++ b/notes.c
@@ -0,0 +1,77 @@
+#include "cache.h"
+#include "commit.h"
+#include "notes.h"
+#include "refs.h"
+#include "utf8.h"
+
+static int initialized;
+
+void get_commit_notes(const struct commit *commit,
+ char **buf_p, unsigned long *offset_p, unsigned long *space_p,
+ const char *output_encoding)
+{
+ static const char *utf8 = "utf-8";
+ char name[80];
+ const char *hex;
+ unsigned char sha1[20];
+ char *msg;
+ unsigned long msgoffset, msglen;
+ enum object_type type;
+
+ if (!initialized) {
+ const char *env = getenv(GIT_NOTES_REF_ENVIRONMENT);
+ if (env)
+ notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT);
+ else if (!notes_ref_name)
+ notes_ref_name = GIT_NOTES_DEFAULT_REF;
+ if (notes_ref_name && read_ref(notes_ref_name, sha1))
+ notes_ref_name = NULL;
+ initialized = 1;
+ }
+ if (!notes_ref_name)
+ return;
+
+ hex = sha1_to_hex(commit->object.sha1);
+ if (snprintf(name, sizeof(name), "%s:%.*s/%.*s",
+ notes_ref_name, 2, hex, 38, hex + 2)
+ >= sizeof(name) - 1) {
+ error("Notes ref name too long: %.*s", 60, notes_ref_name);
+ return;
+ }
+ if (get_sha1(name, sha1))
+ return;
+
+ if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen ||
+ type != OBJ_BLOB)
+ return;
+ if (output_encoding && *output_encoding &&
+ strcmp(utf8, output_encoding)) {
+ char *reencoded = reencode_string(msg, output_encoding, utf8);
+ if (reencoded) {
+ free(msg);
+ msg = reencoded;
+ msglen = strlen(msg);
+ }
+ }
+ /* we will end the annotation by a newline anyway. */
+ if (msg[msglen - 1] == '\n')
+ msglen--;
+
+ ALLOC_GROW(*buf_p, *offset_p + 8 + msglen, *space_p);
+ *offset_p += sprintf(*buf_p + *offset_p, "\nNotes:\n");
+
+ for (msgoffset = 0; msgoffset < msglen;) {
+ int linelen = get_line_length(msg + msgoffset, msglen);
+
+ ALLOC_GROW(*buf_p, *offset_p + linelen + 5, *space_p);
+ *offset_p += sprintf(*buf_p + *offset_p,
+ " %.*s", linelen, msg + msgoffset);
+ msgoffset += linelen;
+ }
+ ALLOC_GROW(*buf_p, *offset_p + 1, *space_p);
+ (*buf_p)[*offset_p] = '\n';
+ (*offset_p)++;
+ free(msg);
+}
+
+
diff --git a/notes.h b/notes.h
new file mode 100644
index 0000000..fe8f209
--- /dev/null
+++ b/notes.h
@@ -0,0 +1,8 @@
+#ifndef NOTES_H
+#define NOTES_H
+
+void get_commit_notes(const struct commit *commit,
+ char **buf_p, unsigned long *offset_p, unsigned long *space_p,
+ const char *output_encoding);
+
+#endif
--
1.5.3.rc1.16.g9d6f-dirty
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [REVISED PATCH 3/6] Add git-notes
2007-07-16 5:11 ` Junio C Hamano
@ 2007-07-19 2:31 ` Johannes Schindelin
2007-07-19 2:54 ` Johannes Schindelin
0 siblings, 1 reply; 42+ messages in thread
From: Johannes Schindelin @ 2007-07-19 2:31 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Alberto Bertogli, git, Johan Herland
This script allows you to edit and show commit notes easily.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
.gitignore | 1 +
Makefile | 2 +-
git-notes.sh | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 64 insertions(+), 1 deletions(-)
create mode 100755 git-notes.sh
diff --git a/.gitignore b/.gitignore
index 20ee642..125613f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -83,6 +83,7 @@ git-mktag
git-mktree
git-name-rev
git-mv
+git-notes
git-pack-redundant
git-pack-objects
git-pack-refs
diff --git a/Makefile b/Makefile
index 119d949..10a9342 100644
--- a/Makefile
+++ b/Makefile
@@ -213,7 +213,7 @@ SCRIPT_SH = \
git-merge-resolve.sh git-merge-ours.sh \
git-lost-found.sh git-quiltimport.sh git-submodule.sh \
git-filter-branch.sh \
- git-stash.sh
+ git-stash.sh git-notes.sh
SCRIPT_PERL = \
git-add--interactive.perl \
diff --git a/git-notes.sh b/git-notes.sh
new file mode 100755
index 0000000..031e911
--- /dev/null
+++ b/git-notes.sh
@@ -0,0 +1,62 @@
+#!/bin/sh
+
+USAGE="(edit | show) [commit]"
+. git-sh-setup
+
+test -n "$3" && usage
+
+test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="$(git config core.notesref)"
+test -z "$GIT_NOTES_REF" && GIT_NOTES_REF="refs/notes/commits"
+
+COMMIT=$(git rev-parse --verify --default HEAD "$2") || die "Invalid ref: $2"
+NAME=$(echo $COMMIT | sed "s/^../&\//")
+
+MESSAGE="$GIT_DIR"/new-notes
+trap '
+ test -f "$MESSAGE" && rm "$MESSAGE"
+' 0
+
+case "$1" in
+edit)
+ GIT_NOTES_REF= git log -1 $COMMIT | sed "s/^/#/" > "$MESSAGE"
+
+ GIT_INDEX_FILE="$MESSAGE".idx
+ export GIT_INDEX_FILE
+
+ CURRENT_HEAD=$(git show-ref "$GIT_NOTES_REF" | cut -f 1 -d ' ')
+ if [ -z "$CURRENT_HEAD" ]; then
+ PARENT=
+ else
+ PARENT="-p $CURRENT_HEAD"
+ git read-tree "$GIT_NOTES_REF" || die "Could not read index"
+ git cat-file blob :$NAME >> "$MESSAGE" 2> /dev/null
+ fi
+
+ ${VISUAL:-${EDITOR:-vi}} "$MESSAGE"
+
+ grep -v ^# < "$MESSAGE" | git stripspace > "$MESSAGE".processed
+ mv "$MESSAGE".processed "$MESSAGE"
+ if [ -s "$MESSAGE" ]; then
+ BLOB=$(git hash-object -w "$MESSAGE") ||
+ die "Could not write into object database"
+ git update-index --add --cacheinfo 0644 $BLOB $NAME ||
+ die "Could not write index"
+ else
+ test -z "$CURRENT_HEAD" &&
+ die "Will not initialise with empty tree"
+ git update-index --force-remove $NAME ||
+ die "Could not update index"
+ fi
+
+ TREE=$(git write-tree) || die "Could not write tree"
+ NEW_HEAD=$(echo Annotate $COMMIT | git commit-tree $TREE $PARENT) ||
+ die "Could not annotate"
+ git update-ref -m "Annotate $COMMIT" \
+ "$GIT_NOTES_REF" $NEW_HEAD $CURRENT_HEAD
+;;
+show)
+ git show "$GIT_NOTES_REF":$NAME
+;;
+*)
+ usage
+esac
--
1.5.3.rc1.16.g9d6f-dirty
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [REVISED PATCH 4/6] Add a test script for "git notes"
2007-07-16 5:11 ` Junio C Hamano
@ 2007-07-19 2:32 ` Johannes Schindelin
0 siblings, 0 replies; 42+ messages in thread
From: Johannes Schindelin @ 2007-07-19 2:32 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Alberto Bertogli, git, Johan Herland
Incidentally, a test for "git notes" implies a test for the
whole commit notes machinery.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
t/t3301-notes.sh | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 65 insertions(+), 0 deletions(-)
create mode 100755 t/t3301-notes.sh
diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh
new file mode 100755
index 0000000..ba42c45
--- /dev/null
+++ b/t/t3301-notes.sh
@@ -0,0 +1,65 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johannes E. Schindelin
+#
+
+test_description='Test commit notes'
+
+. ./test-lib.sh
+
+cat > fake_editor.sh << \EOF
+echo "$MSG" > "$1"
+echo "$MSG" >& 2
+EOF
+chmod a+x fake_editor.sh
+VISUAL=./fake_editor.sh
+export VISUAL
+
+test_expect_success 'cannot annotate non-existing HEAD' '
+ ! MSG=3 git notes edit
+'
+
+test_expect_success setup '
+ : > a1 &&
+ git add a1 &&
+ test_tick &&
+ git commit -m 1st &&
+ : > a2 &&
+ git add a2 &&
+ test_tick &&
+ git commit -m 2nd
+'
+
+test_expect_success 'need valid notes ref' '
+ ! MSG=1 GIT_NOTES_REF='/' git notes edit &&
+ ! MSG=2 GIT_NOTES_REF='/' git notes show
+'
+
+test_expect_success 'create notes' '
+ git config core.notesRef refs/notes/commits &&
+ MSG=b1 git notes edit &&
+ test ! -f .git/new-notes &&
+ test 1 = $(git ls-tree refs/notes/commits | wc -l) &&
+ test b1 = $(git notes show) &&
+ git show HEAD^ &&
+ ! git notes show HEAD^
+'
+
+cat > expect << EOF
+commit 268048bfb8a1fb38e703baceb8ab235421bf80c5
+Author: A U Thor <author@example.com>
+Date: Thu Apr 7 15:14:13 2005 -0700
+
+ 2nd
+
+Notes:
+ b1
+EOF
+
+test_expect_success 'show notes' '
+ ! (git cat-file commit HEAD | grep b1) &&
+ git log -1 > output &&
+ git diff expect output
+'
+
+test_done
--
1.5.3.rc1.16.g9d6f-dirty
^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [REVISED PATCH 3/6] Add git-notes
2007-07-19 2:31 ` [REVISED PATCH " Johannes Schindelin
@ 2007-07-19 2:54 ` Johannes Schindelin
0 siblings, 0 replies; 42+ messages in thread
From: Johannes Schindelin @ 2007-07-19 2:54 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Alberto Bertogli, git, Johan Herland
Hi,
On Thu, 19 Jul 2007, Johannes Schindelin wrote:
> +MESSAGE="$GIT_DIR"/new-notes
> +trap '
> + test -f "$MESSAGE" && rm "$MESSAGE"
> +' 0
Oh, well. Probably this should use mktemp and should handle
GIT_INDEX_FILE, too.
Will do that tomorrow.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [REVISED PATCH 2/6] Introduce commit notes
2007-07-19 2:30 ` [REVISED PATCH " Johannes Schindelin
@ 2007-07-19 3:28 ` Linus Torvalds
2007-07-19 5:13 ` Junio C Hamano
` (2 more replies)
2007-07-19 9:05 ` Wincent Colaiuta
2007-07-19 9:54 ` Sven Verdoolaege
2 siblings, 3 replies; 42+ messages in thread
From: Linus Torvalds @ 2007-07-19 3:28 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, Alberto Bertogli, git, Johan Herland
On Thu, 19 Jul 2007, Johannes Schindelin wrote:
>
> There is one severe shortcoming, though. Since tree objects can
> contain file names of a variable length, it is not possible to
> do a binary search for the correct base name in the tree object's
> contents.
Well, I've been thinking about this, and that's not really entirely
correct.
It *is* possible to do a binary search, it's just a bit complicated,
because you have to take the "halfway" thing, and find the beginning of
an entry.
But the good news is that the tree entries have a very fixed format, and
one that is actually amenable to finding where they start. It gets a bit
complicated, but:
- SHA1's contain random bytes, so we cannot really depend on their
content. Fair enough. But on the other hand, they are fixed length,
which means..
- Each SHA1 is always preceded by a zero byte (it is what separates the
filename from the SHA1), and while filenames too can have arbitrary
content (and arbitrary length), we know that the *filename* doesn't
have a zero byte in it.
- so finding the beginning of a tree entry should be as easy as finding
two zero bytes that are have at least 20 bytes in between them, and
then you *know* that the second zero byte is the one that starts a new
SHA1 (it cannot be _inside_ a SHA1: if it was, there would be less
than twenty bytes to the previous '\0', and it cannot be inside the
filename either).
- And you know that 20 bytes after that '\0' is the next tree entry!
Now, what does this mean? It means that if we actually know the filename
we're looking for, and we're looking at a large range, we really *could*
start out with binary searching. We would do something like this:
unsigned char *start;
unsigned long size;
while (size > 200) {
/*
* Look halfway, and then back up a bit, because we
* expect it to take us about 20 characters to find
* the zero we look for, and an additional 20
* characters is the subsequent SHA1.
*/
unsigned long guess = size / 2 - 40;
/*
* This is the offset past which a zero means that
* we're good. If we don't find a zero in the first
* twenty bytes, that means that the first zero we
* find must be the beginning of a SHA1!
*/
unsigned long goal_zero = guess + 20;
for (;;) {
unsigned char c;
/*
* We need at least 22 characters more: the
* '\0' and the SHA1, and then the next entry.
* We know the ASCII mode is 4 characters, so
* we migth as well make the rule "within 26 of
* end end".
*/
if (guess >= size-26)
goto fall_back_to_linear_search;
c = start[guess++];
if (c)
continue;
/* Found it? */
if (guess > goal_zero)
break;
/*
* We found a zero that wasn't 20 bytes away,
* that means we have to reset out goal..
*/
last_zero = guess + 20;
}
/*
* "guess" now points to one past the '\0': the SHA1 of
* the previous entry. Add 20, and it points at the start
* of a valid tree entry.
*/
guess = guess + 20;
/* Length of the entry: ascii string + '\0' + SHA1 */
thisentrylen = strlen(start + guess) + 1 + 20;
.. compare the entry we found with
.. the entry we are looking for!
if (found < lookedfor) {
size = guess;
continue;
} else if (found == lookedfor) {
Yay! FOUND!
} else {
guess += thisentry;
size -= guess;
start += guess;
continue;
}
}
fall_back_to_linear_search:
.. linear search in [ start, size ] ..
Anyway, as you can tell, the above is totally untested, but I really think
it should work. Whether it really helps, I dunno. But if somebody is
interested in trying, it might be cool to see.
And yes, the "search for zero bytes" is not *guaranteed* to find any
beginning at all, if you have lots of short names, *and* lots of zero
bytes in the SHA1's. But while short names may be common, zero bytes in
SHA1's are not so much (since you should expect to see a very even
distribution of bytes, and as such most SHA1's by far should have no zero
bytes at all!)
So if you're really really *really* unlucky, you might end up having to
fall back on the linear search. But it still works!
Can anybody see anything wrong in my thinking above?
(And the real question is whether it really helps. I suspect it does
actually help for big directories, and that it is worth doing, but maybe
the magic number in "while (size > 200)" could be tweaked.
The logic of that was that binary searching doesn't work very well for
just a few entries (and "size < 200" implies ~5-6 directory entries), but
also that linear search is actually perfectly good when it's just a couple
of cache-lines, and binary searching - especially with the complication of
having to find the beginning - isn't worth it unless it really means that
we can avoid a cache miss.
Of course, it may well be that the *real* cost of the directories is just
the uncompression thing, and that the search is not the problem. Who
knows?
Linus
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [REVISED PATCH 2/6] Introduce commit notes
2007-07-19 3:28 ` Linus Torvalds
@ 2007-07-19 5:13 ` Junio C Hamano
2007-07-19 9:34 ` Junio C Hamano
2007-07-19 17:20 ` Linus Torvalds
2007-07-19 9:50 ` Johannes Schindelin
2007-07-19 10:34 ` Olivier Galibert
2 siblings, 2 replies; 42+ messages in thread
From: Junio C Hamano @ 2007-07-19 5:13 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Johannes Schindelin, Alberto Bertogli, git, Johan Herland
Linus Torvalds <torvalds@linux-foundation.org> writes:
> And yes, the "search for zero bytes" is not *guaranteed* to find any
> beginning at all, if you have lots of short names, *and* lots of zero
> bytes in the SHA1's. But while short names may be common, zero bytes in
> SHA1's are not so much (since you should expect to see a very even
> distribution of bytes, and as such most SHA1's by far should have no zero
> bytes at all!)
>
> So if you're really really *really* unlucky, you might end up having to
> fall back on the linear search. But it still works!
>
> Can anybody see anything wrong in my thinking above?
Another anchoring clue you seem not to be exploiting fully is
that the ASCII part must match "^[1-7][0-7]{4,5} " (mode bytes).
But the real problem of this approach of course is that this is
not reliable and can get a false match. You can find your
beginning NUL in the SHA-1 part of one entry, and terminating
NUL later in the SHA-1 part of next entry, and you will never
notice.
However, in the case of Dscho's "notes" code, I do not think (1)
you do not have to guess like the above, and (2) the problem is
much simpler.
Dcsho's "note" looks like a tree full of two-byte [0-9a-f]{2}
names, each of them points at another tree, with the second
level tree being full of 32-byte [0-9a-f]{38} names, each of
them points at a blob. So it is a much more regular, strict
shape. And in order to look for a note for an object whose name
is ([0-9a-f]{2})([0-9a-f]{38}), you will find the blob that is
at "$1/$2" in a "note".
I was suggesting to have a specialized parser only to read such
tree objects that are "abused" to represent notes. You can
cheaply validate that these trees are of expected shape.
(1) Validate that size of the toplevel tree is multiple of 29 =
(5 + 1 + 2 + 1 + 20); the second level should be multiple
of 66 = (6 + 1 + 38 + 1 + 20). These two levels of trees
are of fixed-entry-length that allows easy binary search.
(2) While binary searching trees of either level, you can
validate that the entry looks like from a note (for the
toplevel, "40000 [0-9a-f]{2}\0", for the second level,
"100644 [0-9a-f]{38}\0").
For an added safety, a "notes" writer could even throw in
signature bytes (say, a symlink whose name is " !" in the
top-level tree, and another symlink " !{37}" in the second-level
tree) to protect the reader.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [REVISED PATCH 2/6] Introduce commit notes
2007-07-19 2:30 ` [REVISED PATCH " Johannes Schindelin
2007-07-19 3:28 ` Linus Torvalds
@ 2007-07-19 9:05 ` Wincent Colaiuta
2007-07-19 9:24 ` Johannes Schindelin
2007-07-19 9:54 ` Sven Verdoolaege
2 siblings, 1 reply; 42+ messages in thread
From: Wincent Colaiuta @ 2007-07-19 9:05 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, Alberto Bertogli, git, Johan Herland
El 19/7/2007, a las 4:30, Johannes Schindelin escribió:
> Commit notes are blobs which are shown together with the commit
> message. These blobs are taken from the notes ref, which you can
> configure by the config variable core.notesRef, which in turn can
> be overridden by the environment variable GIT_NOTES_REF.
I was trying to look back and find out what the rationale/usage
scenario for these commit notes might be but Googling for 'git
"commit notes"' doesn't turn up much other than the original patch
you sent a few days ago.
Is this an evolution of the "git-note: A mechanisim for providing
free-form after-the-fact annotations on commits" first introduced here?:
<http://lists.zerezo.com/git/msg465441.html>
Cheers,
Wincent
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [REVISED PATCH 2/6] Introduce commit notes
2007-07-19 9:05 ` Wincent Colaiuta
@ 2007-07-19 9:24 ` Johannes Schindelin
0 siblings, 0 replies; 42+ messages in thread
From: Johannes Schindelin @ 2007-07-19 9:24 UTC (permalink / raw)
To: Wincent Colaiuta; +Cc: Junio C Hamano, Alberto Bertogli, git, Johan Herland
Hi,
On Thu, 19 Jul 2007, Wincent Colaiuta wrote:
> El 19/7/2007, a las 4:30, Johannes Schindelin escribi?:
>
> > Commit notes are blobs which are shown together with the commit
> > message. These blobs are taken from the notes ref, which you can
> > configure by the config variable core.notesRef, which in turn can
> > be overridden by the environment variable GIT_NOTES_REF.
>
> I was trying to look back and find out what the rationale/usage scenario for
> these commit notes might be but Googling for 'git "commit notes"' doesn't
> turn up much other than the original patch you sent a few days ago.
>
> Is this an evolution of the "git-note: A mechanisim for providing free-form
> after-the-fact annotations on commits" first introduced here?:
>
> <http://lists.zerezo.com/git/msg465441.html>
Almost. It is an evolution of the evolution of this.
http://thread.gmane.org/gmane.comp.version-control.git/52598/focus=52603
(which started this thread you were replying to) hints at that, but you're
right, I failed to give an explicit reference:
http://article.gmane.org/gmane.comp.version-control.git/49588
Background: It was discussed how to go about storing notes (in the mail
you cited). I was convinced that Johan's 15-strong patch series was not
optimal, in that it tried to introduce a _second_ object store,
_exclusively_ for commit notes, with all kinds of problems like "how to
fetch it?".
After thinking about how to avoid duplicating the object store, I posted
my proposal, in the second link I gave.
It was shot down, because of scalability problems. They were not serious,
but hurt enough that I stalled working on it, until Alberto reminded me.
Since I felt bad about shooting down Johan's patch series, and then not
completing my alternative solution, I ended up working on it some more.
The WIP patch 6/6 hints at what I will submit in the next days, to speed
up in a transparent manner what would otherwise not scale well.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [REVISED PATCH 2/6] Introduce commit notes
2007-07-19 5:13 ` Junio C Hamano
@ 2007-07-19 9:34 ` Junio C Hamano
2007-07-19 9:57 ` Adam Hayek
` (3 more replies)
2007-07-19 17:20 ` Linus Torvalds
1 sibling, 4 replies; 42+ messages in thread
From: Junio C Hamano @ 2007-07-19 9:34 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Johannes Schindelin, Alberto Bertogli, git, Johan Herland
Junio C Hamano <gitster@pobox.com> writes:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> ...
>> So if you're really really *really* unlucky, you might end up having to
>> fall back on the linear search. But it still works!
>>
>> Can anybody see anything wrong in my thinking above?
> ...
> But the real problem of this approach of course is that this is
> not reliable and can get a false match. You can find your
> beginning NUL in the SHA-1 part of one entry, and terminating
> NUL later in the SHA-1 part of next entry, and you will never
> notice.
In other words, if you are really really *really* unlucky, not
only you might end up being fooled by random byte sequences in
SHA-1 part of the tree object, you would not even notice that
you have to fall back on the linear search.
I've long time ago concluded that if we care about reliability
(and we do very much), a bisectable tree without breaking
backward compatibility is impossible. I was hoping to find a
"hole" in tree object format so that I can place an extended
section that is invisible to older versions of git, and place a
table that records offsets of each tree entries to help
bisection and/or perhaps a hash table to help look-up, but I do
not think it is possible. In the case of index file, the
original file format had a hole after the cache-entry array
where we can later squeeze an extension section that is
invisible to older versions of git. But the tree object format
is designed so tight that I do not see there is any place to put
an extension section.
Side note: I also think adding "extension section" to tree
object is not a good idea to begin with. The data nor length of
such a section cannot participate in hash computation to derive
the tree's object name so that we can still compare two tree
objects (with and without such extension) that have the same
contents by only looking at their object names. But having
contents that are not counted as parts of the object's name goes
against the reliability and safety of git.
> ...
> I was suggesting to have a specialized parser only to read such
> tree objects that are "abused" to represent notes. You can
> cheaply validate that these trees are of expected shape.
> ...
> For an added safety, a "notes" writer could even throw in
> signature bytes (say, a symlink whose name is " !" in the
> top-level tree, and another symlink " !{37}" in the second-level
> tree) to protect the reader.
Of course, even with the above trick with relatively cheap
validation based on size, entry format, and "signature entries",
the way I outlined to speed up "notes" access really relies on
the tree objects used in "notes" to be well formed. If somebody
throws in a tree that is not really a "note" to refs/notes/, and
if I am really really *really* unlucky, not only I might end up
being fooled by random byte sequences in SHA-1 part of the tree
object, I would not even notice that I am reading garbage and
end up giving garbage as "note" to the object back to the user.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [REVISED PATCH 2/6] Introduce commit notes
2007-07-19 3:28 ` Linus Torvalds
2007-07-19 5:13 ` Junio C Hamano
@ 2007-07-19 9:50 ` Johannes Schindelin
2007-07-19 10:34 ` Olivier Galibert
2 siblings, 0 replies; 42+ messages in thread
From: Johannes Schindelin @ 2007-07-19 9:50 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, Alberto Bertogli, git, Johan Herland
Hi,
On Wed, 18 Jul 2007, Linus Torvalds wrote:
> On Thu, 19 Jul 2007, Johannes Schindelin wrote:
> >
> > There is one severe shortcoming, though. Since tree objects can
> > contain file names of a variable length, it is not possible to do a
> > binary search for the correct base name in the tree object's contents.
>
> Well, I've been thinking about this, and that's not really entirely
> correct.
>
> It *is* possible to do a binary search, it's just a bit complicated,
> because you have to take the "halfway" thing, and find the beginning of
> an entry.
I will try to work from your proposal, and do some timings. But for the
notes, I really, really like the average constant running time of the hash
map. As you can see from my timings in
http://thread.gmane.org/gmane.comp.version-control.git/52598/focus=52603
it does make a difference, compared to binary search.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [REVISED PATCH 2/6] Introduce commit notes
2007-07-19 2:30 ` [REVISED PATCH " Johannes Schindelin
2007-07-19 3:28 ` Linus Torvalds
2007-07-19 9:05 ` Wincent Colaiuta
@ 2007-07-19 9:54 ` Sven Verdoolaege
2 siblings, 0 replies; 42+ messages in thread
From: Sven Verdoolaege @ 2007-07-19 9:54 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, Alberto Bertogli, git, Johan Herland
On Thu, Jul 19, 2007 at 03:30:43AM +0100, Johannes Schindelin wrote:
> +If such a path exists in the given ref, the referenced blob is read, and
> +appended to the commit message, separated by a "Notes:" line. If the
> +given ref itself does not exist, it is not an error, but means that no
> +notes should be print.
printed?
skimo
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [REVISED PATCH 2/6] Introduce commit notes
2007-07-19 9:34 ` Junio C Hamano
@ 2007-07-19 9:57 ` Adam Hayek
2007-07-19 10:58 ` Andy Parkins
` (2 subsequent siblings)
3 siblings, 0 replies; 42+ messages in thread
From: Adam Hayek @ 2007-07-19 9:57 UTC (permalink / raw)
To: Junio C Hamano, git
On 7/19/07, Junio C Hamano <gitster@pobox.com> wrote:
> Side note: I also think adding "extension section" to tree
> object is not a good idea to begin with. The data nor length of
> such a section cannot participate in hash computation to derive
> the tree's object name so that we can still compare two tree
> objects (with and without such extension) that have the same
> contents by only looking at their object names. But having
> contents that are not counted as parts of the object's name goes
> against the reliability and safety of git.
Excuse me for being brand new to this list and to git itself, but if
the issue is where to put "extra" data to go along with a given object
there should be a relatively simple way to do it. If your object's
hash/name is X, take the string ("%s-extra", X), hash that, and use
the resulting hash as the name of the file to store whatever extra
data you have. There would be the issue of when and how to move this
new file when the original file is moved, but old versions of git at
least wouldn't break, they'd just never know about the extra data in
the separate file. Of course you could do endless variations of this
to store whatever classes of extra data separately.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [REVISED PATCH 2/6] Introduce commit notes
2007-07-19 3:28 ` Linus Torvalds
2007-07-19 5:13 ` Junio C Hamano
2007-07-19 9:50 ` Johannes Schindelin
@ 2007-07-19 10:34 ` Olivier Galibert
2007-07-19 17:50 ` Linus Torvalds
2 siblings, 1 reply; 42+ messages in thread
From: Olivier Galibert @ 2007-07-19 10:34 UTC (permalink / raw)
To: Linus Torvalds
Cc: Johannes Schindelin, Junio C Hamano, Alberto Bertogli, git,
Johan Herland
On Wed, Jul 18, 2007 at 08:28:27PM -0700, Linus Torvalds wrote:
> And yes, the "search for zero bytes" is not *guaranteed* to find any
> beginning at all, if you have lots of short names, *and* lots of zero
> bytes in the SHA1's. But while short names may be common, zero bytes in
> SHA1's are not so much (since you should expect to see a very even
> distribution of bytes, and as such most SHA1's by far should have no zero
> bytes at all!)
The probability of a sha1 to have a zero is approximatively 0.075.
That's 1 in 13, more or less.
OG.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [REVISED PATCH 2/6] Introduce commit notes
2007-07-19 9:34 ` Junio C Hamano
2007-07-19 9:57 ` Adam Hayek
@ 2007-07-19 10:58 ` Andy Parkins
2007-07-19 11:10 ` Johannes Schindelin
2007-07-19 17:42 ` Linus Torvalds
2007-07-20 4:59 ` Shawn O. Pearce
3 siblings, 1 reply; 42+ messages in thread
From: Andy Parkins @ 2007-07-19 10:58 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Linus Torvalds, Johannes Schindelin,
Alberto Bertogli, Johan Herland
On Thursday 2007 July 19, Junio C Hamano wrote:
> I've long time ago concluded that if we care about reliability
> (and we do very much), a bisectable tree without breaking
> backward compatibility is impossible. I was hoping to find a
> "hole" in tree object format so that I can place an extended
In the case of the notes system, is there not a big hole available because the
layout is under tight control?
100644 blob 24631df5c6fceef7f0859903397d81f99a723197 __notes_index
040000 tree dd3f40129c8731b1bdce1d3939de3cdc24a87783 00
040000 tree 2b25612b5d8ee9ef469e72bbf74eab0ec00ae87f 01
In fact, this technique would work for normal tree objects too, except that
you'd have to be willing to pick some blob name that would always be the
first entry in every tree object, and would never clash with a real file in
the tree. Speaking off the top of my head, anything with "/" in it would be
an invalid name so
100644 blob 24631df5c6fceef7f0859903397d81f99a723197 /tree_index
040000 tree dd3f40129c8731b1bdce1d3939de3cdc24a87783 00
040000 tree 2b25612b5d8ee9ef469e72bbf74eab0ec00ae87f 01
Would be an easy one to special case, and would be guaranteed not to clash
with a file in the tree.
Just an idea. I would imagine it's as daft as all my others :-)
Andy
--
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [REVISED PATCH 2/6] Introduce commit notes
2007-07-19 10:58 ` Andy Parkins
@ 2007-07-19 11:10 ` Johannes Schindelin
2007-07-19 14:33 ` Andy Parkins
0 siblings, 1 reply; 42+ messages in thread
From: Johannes Schindelin @ 2007-07-19 11:10 UTC (permalink / raw)
To: Andy Parkins
Cc: git, Junio C Hamano, Linus Torvalds, Alberto Bertogli,
Johan Herland
Hi,
On Thu, 19 Jul 2007, Andy Parkins wrote:
> On Thursday 2007 July 19, Junio C Hamano wrote:
>
> > I've long time ago concluded that if we care about reliability
> > (and we do very much), a bisectable tree without breaking
> > backward compatibility is impossible. I was hoping to find a
> > "hole" in tree object format so that I can place an extended
>
> In the case of the notes system, is there not a big hole available
> because the layout is under tight control?
No. It is a tree object, referenced from a ref. You can always check it
out, modify it, and check it in. If only by mistake.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [REVISED PATCH 2/6] Introduce commit notes
2007-07-19 11:10 ` Johannes Schindelin
@ 2007-07-19 14:33 ` Andy Parkins
0 siblings, 0 replies; 42+ messages in thread
From: Andy Parkins @ 2007-07-19 14:33 UTC (permalink / raw)
To: git
Cc: Johannes Schindelin, Junio C Hamano, Linus Torvalds,
Alberto Bertogli, Johan Herland
On Thursday 2007 July 19, Johannes Schindelin wrote:
> > In the case of the notes system, is there not a big hole available
> > because the layout is under tight control?
>
> No. It is a tree object, referenced from a ref. You can always check it
> out, modify it, and check it in. If only by mistake.
I was arguing for the tree-index being special cased though (ideally with an
invalid filename), such that it could never actually be checked out or
checked in, but would be maintained automatically "git-side". For backwards
compatibility, it would be optional; and making it an invalid filename
It was only a suggestion to answer Junio's request for a "hole" through which
a tree-object index could be poked.
If we're only talking about the notes tree, then would it matter that it could
be checked out and checked in? If someone chose to do that then it would be
their own fault when the index didn't work. If I wanted I could
edit .git/objects/ directly - I wouldn't expect poor git to work correctly
afterwards though.
Andy
--
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [REVISED PATCH 2/6] Introduce commit notes
2007-07-19 5:13 ` Junio C Hamano
2007-07-19 9:34 ` Junio C Hamano
@ 2007-07-19 17:20 ` Linus Torvalds
1 sibling, 0 replies; 42+ messages in thread
From: Linus Torvalds @ 2007-07-19 17:20 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Schindelin, Alberto Bertogli, git, Johan Herland
On Wed, 18 Jul 2007, Junio C Hamano wrote:
>
> Another anchoring clue you seem not to be exploiting fully is
> that the ASCII part must match "^[1-7][0-7]{4,5} " (mode bytes).
I did that on purpose.
The SHA1 *can* contain those characters too, so that's not really useful
to us, and the only special character really is the NUL character (which
is the only one cannot exists in the ASCII part - old-style trees can
contain '/' too, although that's going away).
Also, the mode bytes may not be visible: if we start in a long filename,
we'll never have looked at the mode bytes, but if we see a NUL character
after having seen 20 non-NUL characters (long filename), we already know
we got it. So I don't think we can even usefully use the other knowledge
of the format of the ASCII part (other than to know it doesn't contain
NUL's).
Of course, we can (and should) verify that the tree entry we find is
valid, and *then* it makes sense to check the rules for the ASCII part.
But that's only after we have already found the place.
> I was suggesting to have a specialized parser only to read such
> tree objects that are "abused" to represent notes. You can
> cheaply validate that these trees are of expected shape.
Sure. That said, I'm less interested in the notes than I am in the cost fo
"git blame", and that could be optimized by having some special code in
"tree_entry_interesting()" to find the tree entries using binary search.
The special code would trigger only for:
- large trees
- "opt->nr_paths == 1"
but the latter case is the one that matters for blame in the first place,
so..
Linus
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [REVISED PATCH 2/6] Introduce commit notes
2007-07-19 9:34 ` Junio C Hamano
2007-07-19 9:57 ` Adam Hayek
2007-07-19 10:58 ` Andy Parkins
@ 2007-07-19 17:42 ` Linus Torvalds
2007-07-20 0:20 ` Junio C Hamano
2007-07-20 4:59 ` Shawn O. Pearce
3 siblings, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2007-07-19 17:42 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Schindelin, Alberto Bertogli, git, Johan Herland
On Thu, 19 Jul 2007, Junio C Hamano wrote:
> > ...
> > But the real problem of this approach of course is that this is
> > not reliable and can get a false match. You can find your
> > beginning NUL in the SHA-1 part of one entry, and terminating
> > NUL later in the SHA-1 part of next entry, and you will never
> > notice.
[ I didn't react to this in your first email, because I thought you were
talking about your "use the rules for the ASCII part", and thought you
talked about how *that* was not reliable and can get a false match). But
it seems that you were actually talking about the NUL character test ]
Nope, wrong.
Why? Because there must always be a NUL *between* different SHA1's.
There's *always* a NUL character that precedes a SHA1. So when you have
two NUL characters (with no other NUL's between them), you *know* that
they cannot be from two different SHA1's. If the first one was from an
earlier SHA1, then the second one is *guaranteed* to be the one that
happens *before* the next SHA1.
See?
You really have two, and only two cases:
- NUL's that are within 20 bytes of each other: you don't know anything
about them. It might be that they are both within the *same* SHA1, or
the first one was the one that separated the ASCII part from the SHA1,
or the first one was a NUL in the previous SHA1 and the second one was
the NUL after the ASCII part.
So two NUL's in the same 21-byte region are not reliable (ie less than
20 bytes in *between* them). They tell you nothing, and you must just
ignore them.
- NUL's that are more than 20 bytes apart: the second NUL is *guaranteed*
to be the start of the next SHA1.
They cannot be part of the same "NUL + sha1", and thus the first NUL
*must* be from a previous SHA1 (or the NUL that preceded it). And that
means that the second NUL *must* be the NUL that precedes the next
SHA1.
So there is *no* ambiguity what-so-ever. It's not about guessing, and it's
not about "luck". If you don't find two NUL bytes separated by more than
20 bytes, you start the linear search.
> In other words, if you are really really *really* unlucky, not
> only you might end up being fooled by random byte sequences in
> SHA-1 part of the tree object, you would not even notice that
> you have to fall back on the linear search.
Wrong. Either you find a guanteed rigth place, or you ran out of the
buffer and know you have to fall back on the linear search.
No fooled.
> I've long time ago concluded that if we care about reliability
> (and we do very much), a bisectable tree without breaking
> backward compatibility is impossible.
No. You concluded incorrectly. I'm pretty damn sure the current tree
format is perfectly fine. It's dense, it's nice and linear, and it's
easily bisectable.
Linus
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [REVISED PATCH 2/6] Introduce commit notes
2007-07-19 10:34 ` Olivier Galibert
@ 2007-07-19 17:50 ` Linus Torvalds
0 siblings, 0 replies; 42+ messages in thread
From: Linus Torvalds @ 2007-07-19 17:50 UTC (permalink / raw)
To: Olivier Galibert
Cc: Johannes Schindelin, Junio C Hamano, Alberto Bertogli, git,
Johan Herland
On Thu, 19 Jul 2007, Olivier Galibert wrote:
> On Wed, Jul 18, 2007 at 08:28:27PM -0700, Linus Torvalds wrote:
> > And yes, the "search for zero bytes" is not *guaranteed* to find any
> > beginning at all, if you have lots of short names, *and* lots of zero
> > bytes in the SHA1's. But while short names may be common, zero bytes in
> > SHA1's are not so much (since you should expect to see a very even
> > distribution of bytes, and as such most SHA1's by far should have no zero
> > bytes at all!)
>
> The probability of a sha1 to have a zero is approximatively 0.075.
> That's 1 in 13, more or less.
Sure. And since we handle it fine even when it does happen, we don't care.
In fact, since we only need 20 non-zero bytes in between zeroes to know
that it's ok, and since the ASCII part is already 7 bytes of "mode +
space" plus <n> bytes of actual name (let's say that names average to be
about 8 characters - which is low: in the kernel it seems to be about 10.5
characters), we can say that the ASCII part of a tree tends to be about 15
characters.
So in order to be unlucky, it's not enough for the previous SHA1 to have a
NUL character in it, it actually has to be in the last five bytes of the
SHA1 - so now we're talking something like a 1:50 chance.
And with longer names, it matters even less (to the point where it
matters not at all if all filenames are >= 14 characters in length).
So we can be unlucky, but it's fairly rare, and when it happens, at worst
we'll just need to scan to the next entry (and if we're *really* unlucky
and it keeps happening until we scan until the end, we'll have to do the
linear search).
The point being that you always get the right answer, and the likelihood
that you have to do something slow to get that rigth answer is really
really low.
Linus
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [REVISED PATCH 2/6] Introduce commit notes
2007-07-19 17:42 ` Linus Torvalds
@ 2007-07-20 0:20 ` Junio C Hamano
0 siblings, 0 replies; 42+ messages in thread
From: Junio C Hamano @ 2007-07-20 0:20 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Johannes Schindelin, Alberto Bertogli, git, Johan Herland
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Thu, 19 Jul 2007, Junio C Hamano wrote:
>> > ...
>> > But the real problem of this approach of course is that this is
>> > not reliable and can get a false match. You can find your
>> > beginning NUL in the SHA-1 part of one entry, and terminating
>> > NUL later in the SHA-1 part of next entry, and you will never
>> > notice.
>
> [ I didn't react to this in your first email, because I thought you were
> talking about your "use the rules for the ASCII part", and thought you
> talked about how *that* was not reliable and can get a false match). But
> it seems that you were actually talking about the NUL character test ]
>
> Nope, wrong.
>
> Why? Because there must always be a NUL *between* different SHA1's.
> There's *always* a NUL character that precedes a SHA1. So when you have
> two NUL characters (with no other NUL's between them), you *know* that
> they cannot be from two different SHA1's. If the first one was from an
> earlier SHA1, then the second one is *guaranteed* to be the one that
> happens *before* the next SHA1.
>
> See?
Ok. As usual, you are more right than I am ;-).
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [REVISED PATCH 2/6] Introduce commit notes
2007-07-19 9:34 ` Junio C Hamano
` (2 preceding siblings ...)
2007-07-19 17:42 ` Linus Torvalds
@ 2007-07-20 4:59 ` Shawn O. Pearce
3 siblings, 0 replies; 42+ messages in thread
From: Shawn O. Pearce @ 2007-07-20 4:59 UTC (permalink / raw)
To: Junio C Hamano
Cc: Linus Torvalds, Johannes Schindelin, Alberto Bertogli, git,
Johan Herland
Junio C Hamano <gitster@pobox.com> wrote:
> I've long time ago concluded that if we care about reliability
> (and we do very much), a bisectable tree without breaking
> backward compatibility is impossible. I was hoping to find a
> "hole" in tree object format so that I can place an extended
> section that is invisible to older versions of git, and place a
> table that records offsets of each tree entries to help
> bisection and/or perhaps a hash table to help look-up, but I do
> not think it is possible.
...
> But the tree object format
> is designed so tight that I do not see there is any place to put
> an extension section.
I came to the same conclusion the last time I thought about this
problem, for all the same reasons you outlined. And came up with
pack v4. Because the only way I could see that we could produce
a more optimal tree was to just use a different *compression* of
the tree, while still keeping its data the same. Nico seemed to
agree at the time, because he worked on the prototype with me. :-)
Its still hanging around in my fastimport repository. But has not
been merged with any recent Git, and it still needs a lot of work.
--
Shawn.
^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2007-07-20 4:59 UTC | newest]
Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-15 23:19 [PATCH 0/6] Introduce commit notes Johannes Schindelin
2007-07-15 23:22 ` [PATCH 1/6] Rename git_one_line() to git_line_length() and export it Johannes Schindelin
2007-07-15 23:23 ` [PATCH 2/6] Introduce commit notes Johannes Schindelin
2007-07-15 23:36 ` Junio C Hamano
2007-07-15 23:52 ` Johannes Schindelin
2007-07-16 0:05 ` Junio C Hamano
2007-07-16 5:11 ` Junio C Hamano
2007-07-19 2:30 ` [REVISED PATCH " Johannes Schindelin
2007-07-19 3:28 ` Linus Torvalds
2007-07-19 5:13 ` Junio C Hamano
2007-07-19 9:34 ` Junio C Hamano
2007-07-19 9:57 ` Adam Hayek
2007-07-19 10:58 ` Andy Parkins
2007-07-19 11:10 ` Johannes Schindelin
2007-07-19 14:33 ` Andy Parkins
2007-07-19 17:42 ` Linus Torvalds
2007-07-20 0:20 ` Junio C Hamano
2007-07-20 4:59 ` Shawn O. Pearce
2007-07-19 17:20 ` Linus Torvalds
2007-07-19 9:50 ` Johannes Schindelin
2007-07-19 10:34 ` Olivier Galibert
2007-07-19 17:50 ` Linus Torvalds
2007-07-19 9:05 ` Wincent Colaiuta
2007-07-19 9:24 ` Johannes Schindelin
2007-07-19 9:54 ` Sven Verdoolaege
2007-07-15 23:23 ` [PATCH 3/6] Add git-notes Johannes Schindelin
2007-07-16 5:11 ` Junio C Hamano
2007-07-19 2:31 ` [REVISED PATCH " Johannes Schindelin
2007-07-19 2:54 ` Johannes Schindelin
2007-07-15 23:24 ` [PATCH 4/6] Add a test script for "git notes" Johannes Schindelin
2007-07-16 5:11 ` Junio C Hamano
2007-07-19 2:32 ` [REVISED PATCH " Johannes Schindelin
2007-07-15 23:24 ` [PATCH 5/6] Document git-notes Johannes Schindelin
2007-07-15 23:26 ` [WIP PATCH 6/6] notes: add notes-index for a substantial speedup Johannes Schindelin
2007-07-15 23:33 ` Johannes Schindelin
2007-07-16 6:01 ` Shawn O. Pearce
2007-07-16 16:29 ` Johannes Schindelin
2007-07-16 7:57 ` [PATCH 0/6] Introduce commit notes Andy Parkins
2007-07-16 8:11 ` Junio C Hamano
2007-07-16 16:26 ` Johannes Schindelin
2007-07-16 17:56 ` Junio C Hamano
2007-07-19 1:34 ` Johannes Schindelin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).