* Re: Why 'git commit --amend' generates different HEAD sha1 each time when no content changes
From: Ping Yin @ 2007-12-30 11:19 UTC (permalink / raw)
To: Matthias Kestenholz; +Cc: Git Mailing List
In-Reply-To: <1199012360.15996.6.camel@futex>
On Dec 30, 2007 6:59 PM, Matthias Kestenholz <mk@spinlock.ch> wrote:
>
> On Sun, 2007-12-30 at 18:56 +0800, Ping Yin wrote:
> > AFAIK, commit sha1 is only determined by commit object content (say
> > parent commit, tree sha1 and so on). So why 'git commit --amend'
> > changes the commit sha1 when no content changes as following shows.
> >
>
> The full commit includes a timestamp too, which changed. Try setting the
> GIT_AUTHOR_DATE and GIT_COMMITTER_DATE environment variables, you should
> get the same SHA-1 everytime.
>
--
Ping Yin
^ permalink raw reply
* Re: [PATCH] Force new line at end of commit message
From: Johannes Schindelin @ 2007-12-30 11:03 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Shawn O. Pearce, Bernt Hansen, git
In-Reply-To: <7v63yga20u.fsf@gitster.siamese.dyndns.org>
Hi,
On Sun, 30 Dec 2007, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > But if I understood the OP correctly, the problem was a missing
> > newline at the end of the commit message, no?
>
> That's why the "echo" was moved out of the conditional, to make sure "#
> This is the $(nth)" begins on a fresh line.
Not that I care too deeply, but does that not add a newline regardless
whether it is needed or not?
Thanks,
Dscho
^ permalink raw reply
* Why 'git commit --amend' generates different HEAD sha1 each time when no content changes
From: Ping Yin @ 2007-12-30 10:56 UTC (permalink / raw)
To: Git Mailing List
AFAIK, commit sha1 is only determined by commit object content (say
parent commit, tree sha1 and so on). So why 'git commit --amend'
changes the commit sha1 when no content changes as following shows.
$ mkdir A && cd A && echo foo >foo
$ git init && git add foo && git commit -m 'add file foo'
Created initial commit 8626800: add file foo
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 foo
$ git commit --amend
Created commit 3591035: add file foo
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 foo
$ git commit --amend
Created commit 3927d9a: add file foo
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 foo
--
Ping Yin
^ permalink raw reply
* Re: How to find a commit containing a given sha1
From: Johannes Schindelin @ 2007-12-30 10:56 UTC (permalink / raw)
To: Ping Yin; +Cc: Git Mailing List
In-Reply-To: <46dff0320712300239m54ab8280w9c1ec6347f095b40@mail.gmail.com>
Hi,
On Sun, 30 Dec 2007, Ping Yin wrote:
> Given a blob sha1, is there a simple way to find which file it
> corresponds to and which commit it is contained?
Yes. "git log --raw --abbrev=40 --all" and then search for the blob name
(SHA-1).
Hth,
Dscho
^ permalink raw reply
* Re: [PATCH] Force new line at end of commit message
From: Junio C Hamano @ 2007-12-30 10:51 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Shawn O. Pearce, Bernt Hansen, git
In-Reply-To: <Pine.LNX.4.64.0712301124510.14355@wbgn129.biozentrum.uni-wuerzburg.de>
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> But if I understood the OP correctly, the problem was a missing newline at
> the end of the commit message, no?
That's why the "echo" was moved out of the conditional, to make
sure "# This is the $(nth)" begins on a fresh line.
^ permalink raw reply
* Re: [PATCH] git-filter-branch could be confused by similar names
From: Johannes Schindelin @ 2007-12-30 10:46 UTC (permalink / raw)
To: Dmitry Potapov; +Cc: git
In-Reply-To: <20071230103146.GU13968@dpotapov.dyndns.org>
Hi,
On Sun, 30 Dec 2007, Dmitry Potapov wrote:
> On Sat, Dec 29, 2007 at 11:36:51PM +0100, Johannes Schindelin wrote:
> >
> > On Tue, 25 Dec 2007, Dmitry Potapov wrote:
> >
> > > 'git-filter-branch branch' could fail producing the error: "Which
> > > ref do you want to rewrite?" if existed another branch or tag, which
> > > name was 'branch-something' or 'something/branch'.
> > >
> > > Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
> > > ---
> > > git-filter-branch.sh | 2 +-
> > > t/t7003-filter-branch.sh | 10 ++++++++++
> > > 2 files changed, 11 insertions(+), 1 deletions(-)
> > >
> > > diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> > > index dbab1a9..b89a720 100755
> > > --- a/git-filter-branch.sh
> > > +++ b/git-filter-branch.sh
> > > @@ -219,7 +219,7 @@ do
> > > ;;
> > > *)
> > > ref="$(git for-each-ref --format='%(refname)' |
> > > - grep /"$ref")"
> > > + grep '^refs/[^/]\+/'"$ref"'$')"
> >
> > Hmm. I wonder if this is a proper solution. It still does not error
> > out when you have a tag and a branch of the same name.
>
> Are you sure? I had created a tag and a branch with the same name, and
> then tried git filter-branch on it, and it did error out:
> ===
> warning: refname 'test1' is ambiguous.
> Which ref do you want to rewrite?
> ===
Okay, bad example. But try "heads/master". Or "origin" in a repository
which has "refs/remotes/origin/HEAD".
Ciao,
Dscho
^ permalink raw reply
* How to find a commit containing a given sha1
From: Ping Yin @ 2007-12-30 10:39 UTC (permalink / raw)
To: Git Mailing List
In-Reply-To: <46dff0320712300238p67e82c66r5bf49900dd21aefa@mail.gmail.com>
Given a blob sha1, is there a simple way to find which file it
corresponds to and which commit it is contained?
--
Ping Yin
^ permalink raw reply
* Re: [PATCH] Teach revision walker about reflog ranges
From: Johannes Schindelin @ 2007-12-30 10:32 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <7vve6h9gvz.fsf@gitster.siamese.dyndns.org>
Hi,
On Sat, 29 Dec 2007, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Now you can ask for a revision range
> >
> > master@{2.weeks.ago..1.day.ago}
> >
> > or even something like
> >
> > HEAD@{20..yesterday}
>
> You can _ask_ all you want, but it is not clear what it does from this
> description. I guess you are rewriting master@{A..B} to
> master@{A}..master@{B}, but that is not clear from the commit log nor
> documentation (did I even see a documentation patch?).
Oh, sorry, I meant to mark this as RFC-after-1.5.4. It's just that I had
a need for it, and hacked it.
> Also, I am not convinced that the rewrite gives the semantics the users
> naturally expect from @{A..B}. I would even suspect that people would
> expect "git log master@{0..2}" to behave more like "git show master@{0}
> master@{1} master@{2}".
Is that so? I would have expected "git log -g master@{2..0}" like that.
It would be relatively easy to accomodate your wish (almost: it would not
handle 0..2, but only 2..0) by calling
init_reflog_walk(&revs->reflog_info);
in the case that ".." was found inside "@{[...]}".
But my use case was to make it easy to see what changed in a multi-branch
remote without much typing: "git log origin/master@{1..}", which would not
be helped by that change.
Ciao,
Dscho
^ permalink raw reply
* Re: [PATCH] git-filter-branch could be confused by similar names
From: Dmitry Potapov @ 2007-12-30 10:31 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0712292334080.14355@wbgn129.biozentrum.uni-wuerzburg.de>
Hi,
On Sat, Dec 29, 2007 at 11:36:51PM +0100, Johannes Schindelin wrote:
>
> On Tue, 25 Dec 2007, Dmitry Potapov wrote:
>
> > 'git-filter-branch branch' could fail producing the error:
> > "Which ref do you want to rewrite?" if existed another branch
> > or tag, which name was 'branch-something' or 'something/branch'.
> >
> > Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
> > ---
> > git-filter-branch.sh | 2 +-
> > t/t7003-filter-branch.sh | 10 ++++++++++
> > 2 files changed, 11 insertions(+), 1 deletions(-)
> >
> > diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> > index dbab1a9..b89a720 100755
> > --- a/git-filter-branch.sh
> > +++ b/git-filter-branch.sh
> > @@ -219,7 +219,7 @@ do
> > ;;
> > *)
> > ref="$(git for-each-ref --format='%(refname)' |
> > - grep /"$ref")"
> > + grep '^refs/[^/]\+/'"$ref"'$')"
>
> Hmm. I wonder if this is a proper solution. It still does not error out
> when you have a tag and a branch of the same name.
>
Are you sure? I had created a tag and a branch with the same name, and
then tried git filter-branch on it, and it did error out:
===
warning: refname 'test1' is ambiguous.
Which ref do you want to rewrite?
===
Maybe, my fix is not a perfect solution, but it works correctly in all
known to me situations, while the original code is clearly broken in
most common cases, like when you have created a tag with a name that
consists of the name of a branch plus some arbitrary suffix. When you
run git-filter-branch on that branch, you only get: "Which ref do you
want to rewrite?", which is very confusing, because you have only one
reference with the given name.
Dmitry
^ permalink raw reply
* Re: [PATCH] Force new line at end of commit message
From: Johannes Schindelin @ 2007-12-30 10:26 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Shawn O. Pearce, Bernt Hansen, git
In-Reply-To: <7vodc99gpy.fsf@gitster.siamese.dyndns.org>
Hi,
On Sat, 29 Dec 2007, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> >> echo "# This is a combination of $COUNT commits."
> >> - sed -n "2,\$p" < "$SQUASH_MSG"
> >> + sed -e 1d -e '2,/^./{
> >> + /^$/d
> >> + }' <"$SQUASH_MSG"
> >
> > If I read this correctly (haven't tested), then _all_ empty lines are
> > removed from the SQUASH_MSG, right? This is not what I want.
>
> That is no what I wanted either.
>
> "1d" removes the "# This is a combination of <$n> commits." from
> the previous round, "2,/^./{ ... }" says apply what's in {} to
> lines from second line to the first non-empty line, and what's
> in {} is to remove empty ones.
Okay, so it removes only the empty lines between the first line and the
next non-empty line.
But if I understood the OP correctly, the problem was a missing newline at
the end of the commit message, no?
Thanks,
Dscho
^ permalink raw reply
* [PATCH WIP] sha1-lookup: more memory efficient search in sorted list of SHA-1
From: Junio C Hamano @ 2007-12-30 10:22 UTC (permalink / raw)
To: git
Currently, when looking for a packed object from the pack idx, a
simple binary search is used.
A conventional binary search loop looks like this:
unsigned lo, hi;
do {
unsigned mi = (lo + hi) / 2;
int cmp = "entry pointed at by mi" minus "target";
if (!cmp)
return (mi is the wanted one)
if (cmp > 0)
hi = mi; "mi is larger than target"
else
lo = mi+1; "mi is smaller than target"
} while (lo < hi);
The invariants are:
- When entering the loop, 'lo' points at a slot that is never
above the target (it could be at the target), 'hi' points at
a slot that is guaranteed to be above the target (it can
never be at the target).
- We find a point 'mi' between 'lo' and 'hi' ('mi' could be
the same as 'lo', but never can be the same as 'hi'), and
check if 'mi' hits the target. There are three cases:
- if it is a hit, we have found the SHA-1;
- if it is strictly higher than the target, we set it to
'hi', and repeat the search.
- if it is strictly lower than the target, we update 'lo'
to one slot after it, because we allow 'lo' to be at the
target.
If the loop exits, there is no matching entry.
When choosing 'mi', we do not have to take the "middle" but
anywhere in between 'lo' and 'hi', as long as lo <= mi < hi is
satisfied. When we somehow know that the distance between the
target and 'lo' is much shorter than the target and 'hi', we
could pick 'mi' that is much closer to 'lo' than (hi+lo)/2,
which a conventional binary search would pick.
This patch takes advantage of the fact that the SHA-1 is a good
hash function, and as long as there are enough entries in the
table, we can expect uniform distribution. An entry that begins
with for example "deadbeef..." is much likely to appear much
later than in the midway of a reasonably populated table. In
fact, it can be expected to be near 87% (222/256) from the top
of the table.
This is a work-in-progress and has switches to allow easier
experiments and debugging. Exporting GIT_USE_LOOKUP environment
variable enables this code.
On my admittedly memory starved machine, with a partial KDE
repository (3.0G pack with 95M idx):
$ GIT_USE_LOOKUP=t git log -800 --stat HEAD >/dev/null
3.93user 0.16system 0:04.09elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+55588minor)pagefaults 0swaps
Without the patch, the numbers are:
$ git log -800 --stat HEAD >/dev/null
4.00user 0.15system 0:04.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+60258minor)pagefaults 0swaps
In the same repository:
$ GIT_USE_LOOKUP=t git log -2000 HEAD >/dev/null
0.12user 0.00system 0:00.12elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+4241minor)pagefaults 0swaps
Without the patch, the numbers are:
$ git log -2000 HEAD >/dev/null
0.05user 0.01system 0:00.07elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+8506minor)pagefaults 0swaps
There isn't much time difference, but the number of minor faults
seems to show that we are touching much smaller number of pages,
which is expected.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Makefile | 6 ++-
sha1-lookup.c | 153 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
sha1-lookup.h | 9 +++
sha1_file.c | 35 ++++++++++++-
4 files changed, 198 insertions(+), 5 deletions(-)
create mode 100644 sha1-lookup.c
create mode 100644 sha1-lookup.h
diff --git a/Makefile b/Makefile
index 21c80e6..9cff3ca 100644
--- a/Makefile
+++ b/Makefile
@@ -293,7 +293,8 @@ LIB_H = \
run-command.h strbuf.h tag.h tree.h git-compat-util.h revision.h \
tree-walk.h log-tree.h dir.h path-list.h unpack-trees.h builtin.h \
utf8.h reflog-walk.h patch-ids.h attr.h decorate.h progress.h \
- mailmap.h remote.h parse-options.h transport.h diffcore.h hash.h
+ mailmap.h remote.h parse-options.h transport.h diffcore.h hash.h \
+ sha1-lookup.h
DIFF_OBJS = \
diff.o diff-lib.o diffcore-break.o diffcore-order.o \
@@ -316,7 +317,8 @@ LIB_OBJS = \
alloc.o merge-file.o path-list.o help.o unpack-trees.o $(DIFF_OBJS) \
color.o wt-status.o archive-zip.o archive-tar.o shallow.o utf8.o \
convert.o attr.o decorate.o progress.o mailmap.o symlinks.o remote.o \
- transport.o bundle.o walker.o parse-options.o ws.o
+ transport.o bundle.o walker.o parse-options.o ws.o \
+ sha1-lookup.o
BUILTIN_OBJS = \
builtin-add.o \
diff --git a/sha1-lookup.c b/sha1-lookup.c
new file mode 100644
index 0000000..f5c9094
--- /dev/null
+++ b/sha1-lookup.c
@@ -0,0 +1,153 @@
+#include "cache.h"
+#include "sha1-lookup.h"
+
+/*
+ * Conventional binary search loop looks like this:
+ *
+ * unsigned lo, hi;
+ * do {
+ * unsigned mi = (lo + hi) / 2;
+ * int cmp = "entry pointed at by mi" minus "target";
+ * if (!cmp)
+ * return (mi is the wanted one)
+ * if (cmp > 0)
+ * hi = mi; "mi is larger than target"
+ * else
+ * lo = mi+1; "mi is smaller than target"
+ * } while (lo < hi);
+ *
+ * The invariants are:
+ *
+ * - When entering the loop, lo points at a slot that is never
+ * above the target (it could be at the target), hi points at a
+ * slot that is guaranteed to be above the target (it can never
+ * be at the target).
+ *
+ * - We find a point 'mi' between lo and hi (mi could be the same
+ * as lo, but never can be as same as hi), and check if it hits
+ * the target. There are three cases:
+ *
+ * - if it is a hit, we are happy.
+ *
+ * - if it is strictly higher than the target, we set it to hi,
+ * and repeat the search.
+ *
+ * - if it is strictly lower than the target, we update lo to
+ * one slot after it, because we allow lo to be at the target.
+ *
+ * If the loop exits, there is no matching entry.
+ *
+ * When choosing 'mi', we do not have to take the "middle" but
+ * anywhere in between lo and hi, as long as lo <= mi < hi is
+ * satisfied. When we somehow know that the distance between the
+ * target and lo is much shorter than the target and hi, we could
+ * pick mi that is much closer to lo than the midway.
+ *
+ * Now, we can take advantage of the fact that SHA-1 is a good hash
+ * function, and as long as there are enough entries in the table, we
+ * can expect uniform distribution. An entry that begins with for
+ * example "deadbeef..." is much likely to appear much later than in
+ * the midway of the table. It can reasonably be expected to be near
+ * 87% (222/256) from the top of the table.
+ *
+ * The table at "table" holds at least "nr" entries of "elem_size"
+ * bytes each. Each entry has the SHA-1 key at "key_offset". The
+ * table is sorted by the SHA-1 key of the entries. The caller wants
+ * to find the entry with "key", and knows that the entry at "lo" is
+ * not higher than the entry it is looking for, and that the entry at
+ * "hi" is higher than the entry it is looking for.
+ */
+int sha1_entry_pos(const void *table,
+ size_t elem_size,
+ size_t key_offset,
+ unsigned lo, unsigned hi, unsigned nr,
+ const unsigned char *key)
+{
+ const unsigned char *base = table;
+ const unsigned char *hi_key, *lo_key;
+ unsigned ofs_0;
+ static int debug_lookup = -1;
+
+ if (debug_lookup < 0)
+ debug_lookup = !!getenv("GIT_DEBUG_LOOKUP");
+
+ if (!nr || lo >= hi)
+ return -1;
+
+ if (nr == hi)
+ hi_key = NULL;
+ else
+ hi_key = base + elem_size * hi + key_offset;
+ lo_key = base + elem_size * lo + key_offset;
+
+ ofs_0 = 0;
+ do {
+ int cmp;
+ unsigned ofs, mi, range;
+ unsigned lov, hiv, kyv;
+ const unsigned char *mi_key;
+
+ range = hi - lo;
+ if (hi_key) {
+ for (ofs = ofs_0; ofs < 20; ofs++)
+ if (lo_key[ofs] != hi_key[ofs])
+ break;
+ ofs_0 = ofs;
+ /*
+ * byte 0 thru (ofs-1) are the same between
+ * lo and hi; ofs is the first byte that is
+ * different.
+ */
+ hiv = hi_key[ofs_0];
+ if (ofs_0 < 19)
+ hiv = (hiv << 8) | hi_key[ofs_0+1];
+ } else {
+ hiv = 256;
+ if (ofs_0 < 19)
+ hiv <<= 8;
+ }
+ lov = lo_key[ofs_0];
+ kyv = key[ofs_0];
+ if (ofs_0 < 19) {
+ lov = (lov << 8) | lo_key[ofs_0+1];
+ kyv = (kyv << 8) | key[ofs_0+1];
+ }
+ assert(lov < hiv);
+
+ if (kyv < lov)
+ return -1 - lo;
+ if (hiv < kyv)
+ return -1 - hi;
+
+ if (kyv == lov && lov < hiv - 1)
+ kyv++;
+ else if (kyv == hiv - 1 && lov < kyv)
+ kyv--;
+
+ mi = (range - 1) * (kyv - lov) / (hiv - lov) + lo;
+
+ if (debug_lookup) {
+ printf("lo %u hi %u rg %u mi %u ", lo, hi, range, mi);
+ printf("ofs %u lov %x, hiv %x, kyv %x\n",
+ ofs_0, lov, hiv, kyv);
+ }
+ if (!(lo <= mi && mi < hi))
+ die("assertion failure lo %u mi %u hi %u %s",
+ lo, mi, hi, sha1_to_hex(key));
+
+ mi_key = base + elem_size * mi + key_offset;
+ cmp = memcmp(mi_key + ofs_0, key + ofs_0, 20 - ofs_0);
+ if (!cmp)
+ return mi;
+ if (cmp > 0) {
+ hi = mi;
+ hi_key = mi_key;
+ }
+ else {
+ lo = mi + 1;
+ lo_key = mi_key + elem_size;
+ }
+ } while (lo < hi);
+ return -lo-1;
+}
+
diff --git a/sha1-lookup.h b/sha1-lookup.h
new file mode 100644
index 0000000..3249a81
--- /dev/null
+++ b/sha1-lookup.h
@@ -0,0 +1,9 @@
+#ifndef SHA1_LOOKUP_H
+#define SHA1_LOOKUP_H
+
+extern int sha1_entry_pos(const void *table,
+ size_t elem_size,
+ size_t key_offset,
+ unsigned lo, unsigned hi, unsigned nr,
+ const unsigned char *key);
+#endif
diff --git a/sha1_file.c b/sha1_file.c
index b0c2435..e99136a 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -14,6 +14,7 @@
#include "tag.h"
#include "tree.h"
#include "refs.h"
+#include "sha1-lookup.h"
#ifndef O_NOATIME
#if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -1655,7 +1656,12 @@ off_t find_pack_entry_one(const unsigned char *sha1,
{
const uint32_t *level1_ofs = p->index_data;
const unsigned char *index = p->index_data;
- unsigned hi, lo;
+ unsigned hi, lo, stride;
+ static int use_lookup = -1;
+ static int debug_lookup = -1;
+
+ if (debug_lookup < 0)
+ debug_lookup = !!getenv("GIT_DEBUG_LOOKUP");
if (!index) {
if (open_pack_index(p))
@@ -1670,11 +1676,34 @@ off_t find_pack_entry_one(const unsigned char *sha1,
index += 4 * 256;
hi = ntohl(level1_ofs[*sha1]);
lo = ((*sha1 == 0x0) ? 0 : ntohl(level1_ofs[*sha1 - 1]));
+ if (p->index_version > 1) {
+ stride = 20;
+ } else {
+ stride = 24;
+ index += 4;
+ }
+
+ if (debug_lookup)
+ printf("%02x%02x%02x... lo %u hi %u nr %u\n",
+ sha1[0], sha1[1], sha1[2], lo, hi, p->num_objects);
+
+ if (use_lookup < 0)
+ use_lookup = !!getenv("GIT_USE_LOOKUP");
+ if (use_lookup) {
+ int pos = sha1_entry_pos(index, stride, 0,
+ lo, hi, p->num_objects, sha1);
+ if (pos < 0)
+ return 0;
+ return nth_packed_object_offset(p, pos);
+ }
do {
unsigned mi = (lo + hi) / 2;
- unsigned x = (p->index_version > 1) ? (mi * 20) : (mi * 24 + 4);
- int cmp = hashcmp(index + x, sha1);
+ int cmp = hashcmp(index + mi * stride, sha1);
+
+ if (debug_lookup)
+ printf("lo %u hi %u rg %u mi %u\n",
+ lo, hi, hi - lo, mi);
if (!cmp)
return nth_packed_object_offset(p, mi);
if (cmp > 0)
--
1.5.4.rc2.3.g441ed
^ permalink raw reply related
* Re: [PATCH] Optimize prefixcmp()
From: Junio C Hamano @ 2007-12-30 0:44 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Marco Costalba, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0712292019450.14355@wbgn129.biozentrum.uni-wuerzburg.de>
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> Certain codepaths (notably "git log --pretty=format...") use
> prefixcmp() extensively, with very short prefixes. In those cases,
> calling strlen() is a wasteful operation, so avoid it.
>
> Initial patch by Marco Costalba.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>
> On Sat, 29 Dec 2007, Marco Costalba wrote:
>
> > In case the prefix string is a single char avoid a costly call
> > to strlen() + strncmp()
>
> Could you test this patch, please?
>
> Not only does it avoid the strlen() call also for longer prefixes;
> it also avoids a C++ comment.
>
> git-compat-util.h | 6 +++++-
> 1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 79eb10e..7059cbd 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -398,7 +398,11 @@ static inline int sane_case(int x, int high)
>
> static inline int prefixcmp(const char *str, const char *prefix)
> {
> - return strncmp(str, prefix, strlen(prefix));
> + for (; ; str++, prefix++)
> + if (!*prefix)
> + return 0;
> + else if (*str != *prefix)
> + return (unsigned char)*prefix - (unsigned char)*str;
> }
Losing the unnecessary check for !str || !prefix is a good
change.
While I think, for the readability's sake, Marco's original
without the unnecessary check would be the way to go, a profile
from your totally inlined version would also be interesting, as
it may or may not beat the underlying strncmp(), which could be
highly optimized.
^ permalink raw reply
* Re: [PATCH] Force new line at end of commit message
From: Junio C Hamano @ 2007-12-30 0:19 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Shawn O. Pearce, Bernt Hansen, git
In-Reply-To: <Pine.LNX.4.64.0712291426500.14355@wbgn129.biozentrum.uni-wuerzburg.de>
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>> echo "# This is a combination of $COUNT commits."
>> - sed -n "2,\$p" < "$SQUASH_MSG"
>> + sed -e 1d -e '2,/^./{
>> + /^$/d
>> + }' <"$SQUASH_MSG"
>
> If I read this correctly (haven't tested), then _all_ empty lines are
> removed from the SQUASH_MSG, right? This is not what I want.
That is no what I wanted either.
"1d" removes the "# This is a combination of <$n> commits." from
the previous round, "2,/^./{ ... }" says apply what's in {} to
lines from second line to the first non-empty line, and what's
in {} is to remove empty ones.
> Unfortunately, I am unable to provide a proper patch with proper testing
> right now,...
That's Ok. Seeing what the patch already in front of you does
would be less time consuming.
^ permalink raw reply
* Re: [PATCH] Teach revision walker about reflog ranges
From: Junio C Hamano @ 2007-12-30 0:16 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0712300000170.14355@wbgn129.biozentrum.uni-wuerzburg.de>
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> Now you can ask for a revision range
>
> master@{2.weeks.ago..1.day.ago}
>
> or even something like
>
> HEAD@{20..yesterday}
You can _ask_ all you want, but it is not clear what it does
from this description. I guess you are rewriting master@{A..B}
to master@{A}..master@{B}, but that is not clear from the commit
log nor documentation (did I even see a documentation patch?).
Also, I am not convinced that the rewrite gives the semantics
the users naturally expect from @{A..B}. I would even suspect
that people would expect "git log master@{0..2}" to behave more
like "git show master@{0} master@{1} master@{2}".
^ permalink raw reply
* Re: [PATCH] Speedup prefixcmp() common case
From: Junio C Hamano @ 2007-12-30 0:05 UTC (permalink / raw)
To: Marco Costalba; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <e5bfff550712291214p7554ea52o875667906ce1a22d@mail.gmail.com>
"Marco Costalba" <mcostalba@gmail.com> writes:
> On Dec 29, 2007 8:32 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Why isn't it like this?
>>
>> if (!prefix[1])
>>
>
> well, what about if prefix == NULL ?
What about it? Do not trim what's relevant when your quote,
please.
Your slow path does this:
> return strncmp(str, prefix, strlen(prefix));
> }
So it will barf when prefix == NULL anyway due to strlen(). I
think passing NULL as prefix to prefixcmp() is a caller-error.
I think my version is also buggy. Passing "" as prefix to
prefixcmp() is nonsense but is supported, and checking prefix[1]
without looking at prefix[0] reads past the end of the string.
So, in summary, I think the following is what we would want.
static inline int prefixcmp(const char *str, const char *prefix)
{
+ // shortcut common case of a single char prefix
+ if (prefix[0] && !prefix[1])
+ return *str - *prefix;
+
return strncmp(str, prefix, strlen(prefix));
}
^ permalink raw reply
* [PATCH] Teach revision walker about reflog ranges
From: Johannes Schindelin @ 2007-12-29 23:02 UTC (permalink / raw)
To: git, gitster
Now you can ask for a revision range
master@{2.weeks.ago..1.day.ago}
or even something like
HEAD@{20..yesterday}
It does this by allocating an strbuf to construct the second ref string
(in the above examples "master@{1.day.ago}" and "HEAD@{yesterday}",
respectively), which is never free()d.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
Another nail in the coffin of libification, but for the common
one-shot command, it is the easiest way to support reflog ranges.
revision.c | 14 ++++++++++++++
1 files changed, 14 insertions(+), 0 deletions(-)
diff --git a/revision.c b/revision.c
index 6e85aaa..3e7a834 100644
--- a/revision.c
+++ b/revision.c
@@ -794,10 +794,24 @@ int handle_revision_arg(const char *arg, struct rev_info *revs,
const char *this = arg;
int symmetric = *next == '.';
unsigned int flags_exclude = flags ^ UNINTERESTING;
+ const char *at;
*dotdot = 0;
next += symmetric;
+ at = strstr(arg, "@{");
+ if (at && !strchr(at + 2, '}')) {
+ struct strbuf buf;
+ strcpy(dotdot, "}");
+ strbuf_init(&buf, 0);
+ strbuf_insert(&buf, 0, arg, at + 2 - arg);
+ if (!strcmp(next, "}"))
+ strbuf_addch(&buf, '0');
+ strbuf_addstr(&buf, next);
+ /* we will not free() this buffer */
+ next = buf.buf;
+ }
+
if (!*next)
next = "HEAD";
if (dotdot == arg)
--
1.5.4.rc2.5.g44b6d-dirty
^ permalink raw reply related
* Re: [PATCH] Optimize prefixcmp()
From: Marco Costalba @ 2007-12-29 22:44 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0712292307210.14355@wbgn129.biozentrum.uni-wuerzburg.de>
On Dec 29, 2007 11:15 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> However, since you already seem to have a profiling setup ready, I would
> be interested in some numbers, i.e. if this patch is faster for you or
> slower, or shows no effect at all.
>
Ok. I will do some tests with your patch and I'll let you know.
Marco
^ permalink raw reply
* Re: [PATCH] git-filter-branch could be confused by similar names
From: Johannes Schindelin @ 2007-12-29 22:36 UTC (permalink / raw)
To: Dmitry Potapov; +Cc: git
In-Reply-To: <1198593316-7712-1-git-send-email-dpotapov@gmail.com>
Hi,
On Tue, 25 Dec 2007, Dmitry Potapov wrote:
> 'git-filter-branch branch' could fail producing the error:
> "Which ref do you want to rewrite?" if existed another branch
> or tag, which name was 'branch-something' or 'something/branch'.
>
> Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
> ---
> git-filter-branch.sh | 2 +-
> t/t7003-filter-branch.sh | 10 ++++++++++
> 2 files changed, 11 insertions(+), 1 deletions(-)
>
> diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> index dbab1a9..b89a720 100755
> --- a/git-filter-branch.sh
> +++ b/git-filter-branch.sh
> @@ -219,7 +219,7 @@ do
> ;;
> *)
> ref="$(git for-each-ref --format='%(refname)' |
> - grep /"$ref")"
> + grep '^refs/[^/]\+/'"$ref"'$')"
Hmm. I wonder if this is a proper solution. It still does not error out
when you have a tag and a branch of the same name.
I kinda hoped that by 1.5.4, rewrite-commits would be finished, but it
seems that nothing happened in that area after 1.5.3-rcX.
It would be so much easier to have checks like this -- returning the real
refname for short but unique short refnames -- in C.
Ciao,
Dscho
^ permalink raw reply
* Re: [PATCH] Optimize prefixcmp()
From: Johannes Schindelin @ 2007-12-29 22:15 UTC (permalink / raw)
To: Marco Costalba; +Cc: Git Mailing List
In-Reply-To: <e5bfff550712291239y5648b923y8d332d9c40a8c97b@mail.gmail.com>
Hi,
On Sat, 29 Dec 2007, Marco Costalba wrote:
> What your patch does not seem to avoid is a segfault if prefix or str
> are NULL pointers.
I am quite certain that it is not allowed to pass NULL pointers to strcmp,
and even if it was, I maintain that it is bad style.
FWIW the test suite seems to agree with me, as it passes with my patch.
However, since you already seem to have a profiling setup ready, I would
be interested in some numbers, i.e. if this patch is faster for you or
slower, or shows no effect at all.
Ciao,
Dscho
^ permalink raw reply
* Re: [PATCH] Optimize prefixcmp()
From: Andy Parkins @ 2007-12-29 21:54 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Marco Costalba, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0712292019450.14355@wbgn129.biozentrum.uni-wuerzburg.de>
On Saturday 2007, December 29, Johannes Schindelin wrote:
> Not only does it avoid the strlen() call also for longer prefixes;
> it also avoids a C++ comment.
I'm sure it doesn't matter; but they're allowed in C99. So it's not a C++
comment any more :-)
Andy
--
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com
^ permalink raw reply
* Re: [PATCH] Speedup prefixcmp() common case
From: Marco Costalba @ 2007-12-29 20:43 UTC (permalink / raw)
To: Git Mailing List
In-Reply-To: <e5bfff550712291001q5f246ceah6700b98308fb96f1@mail.gmail.com>
In case the prefix string is a single char avoid a
costly call to strlen() + strncmp()
With this patch git log with --pretty=format option
is 10% faster
With suggestions by Junio C Hamano
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
---
git-compat-util.h | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/git-compat-util.h b/git-compat-util.h
index 79eb10e..e26b684 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -398,6 +398,10 @@ static inline int sane_case
static inline int prefixcmp(const char *str, const char *prefix)
{
+ /* shortcut common case of a single char prefix */
+ if (prefix && !prefix[1] && str)
+ return *str - *prefix;
+
return strncmp(str, prefix, strlen(prefix));
}
--
1.5.4.rc2-dirty
^ permalink raw reply related
* Re: [PATCH] Optimize prefixcmp()
From: Marco Costalba @ 2007-12-29 20:39 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0712292019450.14355@wbgn129.biozentrum.uni-wuerzburg.de>
On Dec 29, 2007 8:22 PM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>
> Not only does it avoid the strlen() call also for longer prefixes;
> it also avoids a C++ comment.
>
Avoiding a C++ comment is good ;-) sorry, it slipped to me.
What your patch does not seem to avoid is a segfault if prefix or str
are NULL pointers.
Marco
^ permalink raw reply
* Re: [PATCH] Speedup prefixcmp() common case
From: Marco Costalba @ 2007-12-29 20:14 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List
In-Reply-To: <7v63yhb8kf.fsf@gitster.siamese.dyndns.org>
On Dec 29, 2007 8:32 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> Why isn't it like this?
>
> if (!prefix[1])
>
well, what about if prefix == NULL ?
Actually I didn't checked if strncmp() checks for NULL pointers before
to proceed, if this is the case I managed to keep the same semantic.
You could say "Why, lazy you, didn't you checked if strncmp() checks
for NULL pointers? "...but I hope you are foregiving ;-)
Thanks
Marco
^ permalink raw reply
* Re: [RFH] git-log vs git-rev-list performance
From: Marco Costalba @ 2007-12-29 20:05 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <alpine.LFD.0.9999.0712291038560.2778@woody.linux-foundation.org>
On Dec 29, 2007 7:51 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> On Sat, 29 Dec 2007, Marco Costalba wrote:
> >
> > [marco@localhost linux-2.6]$ time git log --topo-order --no-color
> > --parents --boundary -z --log-size
> > --pretty=format:"%m%HX%PX%n%an<%ae>%n%at%n%s%n%b" HEAD > /dev/null
>
> Don't compare "--pretty=format" to the pre-formatted versions.
>
> Use "--pretty=raw" for "git log" if you want to approximate "git
> rev-list --header".
>
I have switched to --pretty=format instead of preformatted one to save
RAM, becuase needed memory is about 35% less with a custom format, the
preformatted ones give me additional info that is not shown on qgit so
it's just a waste.
As example a full Linux tree loaded with qgit takes less then 80MB,
with gitk as comparison we are above 400MB although of course the
optimized format is not the whole reason for this difference.
What I have seen looking expecially at the pretty.c sources with a
profiler is that the custom format is continuosly reparsed _for each
revision_ also if it never changes during the whole git-log run. This
could explain why the custom format although cheaper in terms of
quantity of outputted data is slower then a preformatted one.
A caching of the parsed custom --pretty=format at the beginning of
git-log could help...
Marco
^ permalink raw reply
* Re: [PATCH] Speedup prefixcmp() common case
From: Junio C Hamano @ 2007-12-29 19:32 UTC (permalink / raw)
To: Marco Costalba; +Cc: Git Mailing List
In-Reply-To: <e5bfff550712291001q5f246ceah6700b98308fb96f1@mail.gmail.com>
"Marco Costalba" <mcostalba@gmail.com> writes:
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 79eb10e..e26b684 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -398,6 +398,10 @@ static inline int sane_case
>
> static inline int prefixcmp(const char *str, const char *prefix)
> {
> + // shortcut common case of a single char prefix
> + if (prefix && *(prefix + 1) == '\0' && str)
> + return *str - *prefix;
> +
Why isn't it like this?
if (!prefix[1])
return *str - *prefix;
> return strncmp(str, prefix, strlen(prefix));
> }
>
> --
> 1.5.4.rc2-dirty
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox