* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: R. Tyler Ballance @ 2009-01-07 8:32 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Linus Torvalds, Nicolas Pitre, Jan Krüger, Git ML
In-Reply-To: <7vaba3bken.fsf@gitster.siamese.dyndns.org>
[-- Attachment #1: Type: text/plain, Size: 823 bytes --]
On Wed, 2009-01-07 at 00:16 -0800, Junio C Hamano wrote:
> "R. Tyler Ballance" <tyler@slide.com> writes:
>
> > Unfortunately it doesn't, what I did notice was this when I did a `git
> > status` in the directory right after untarring:
> > tyler@grapefruit:~/jburgess_main> git status
> > #
> > # ---impressive amount of file names fly by---
> > # ----snip---
> > ...
> > Basically, somehow Git thinks that *every* file in the repository is
> > deleted at this point.
>
> That makes me suspect that your .git/index file is corrupt.
Would this be tied to the corrupted pack file issue, or separate.
Either way, how could I verify your assumptions? (i'll be lurking in
#git for a while if you want to interactively help ;))
Cheers
--
-R. Tyler Ballance
Slide, Inc.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply
* Re: Problems getting rid of large files using git-filter-branch
From: Øyvind Harboe @ 2009-01-07 8:26 UTC (permalink / raw)
To: git
In-Reply-To: <c09652430901061359q7a02291fk656ab23e54b19f5e@mail.gmail.com>
Here is a summary of the solution I used. I'm a beginner in git
and just summarizing what others told me and what I did. Use at
your own risk!
1. Remove anything you know should be removed, e.g.:
git filter-branch --tree-filter 'find . -regex ".*toolchain\..*" -exec
rm -f {} \;' HEAD
2. Expire the log:
git reflog expire --all
3. Delete stuff from .git that should be manually "verified" to be
correct. I don't actually
know how to "verify" that at this point... Use backups Luke!
rm -rf .git/refs/original
# delete lines w/"refs/original" from .git/packed-refs
vi .git/packed-refs
# for good measure...
git reflog expire --all
git gc
4. Your repository is still huge. By creating a new repository and pulling from
this one, the garbage will stay in the old one...
mkdir newrep
cd newrep
git init
git pull file:///oldrep
5. Check size of .git. If it is still too big, try figuring out which
files that are big by looking at the packs(.git/objects/pack/xxx):
$ git verify-pack -v $PACK | grep -v "^chain " | sort -n -k 4
and then for the last few lines do a
$ git rev-list --all --objects | grep $SHA1
6. Go back to #1 until done.
Your repository should now be of reasonable size...
I've found some great scripts for converting from svn/cvs, but really
the above procedure
is necessary to run when converting nasty old repositories...
--
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 XScale Cortex
JTAG debugger and flash programmer
^ permalink raw reply
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: Junio C Hamano @ 2009-01-07 8:16 UTC (permalink / raw)
To: R. Tyler Ballance; +Cc: Linus Torvalds, Nicolas Pitre, Jan Krüger, Git ML
In-Reply-To: <1231314099.8870.415.camel@starfruit>
"R. Tyler Ballance" <tyler@slide.com> writes:
> Unfortunately it doesn't, what I did notice was this when I did a `git
> status` in the directory right after untarring:
> tyler@grapefruit:~/jburgess_main> git status
> #
> # ---impressive amount of file names fly by---
> # ----snip---
> ...
> Basically, somehow Git thinks that *every* file in the repository is
> deleted at this point.
That makes me suspect that your .git/index file is corrupt.
^ permalink raw reply
* [RFC/PATCH 3/3] replace_object: add a test case
From: Christian Couder @ 2009-01-07 7:43 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In this patch the setup code is very big, but this will be used in
test cases that will be added later.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
t/t6050-replace.sh | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 75 insertions(+), 0 deletions(-)
create mode 100755 t/t6050-replace.sh
diff --git a/t/t6050-replace.sh b/t/t6050-replace.sh
new file mode 100755
index 0000000..0412659
--- /dev/null
+++ b/t/t6050-replace.sh
@@ -0,0 +1,75 @@
+#!/bin/sh
+#
+# Copyright (c) 2008 Christian Couder
+#
+test_description='Tests replace refs functionality'
+
+exec </dev/null
+
+. ./test-lib.sh
+
+add_and_commit_file()
+{
+ _file="$1"
+ _msg="$2"
+
+ git add $_file || return $?
+ test_tick || return $?
+ git commit --quiet -m "$_file: $_msg"
+}
+
+HASH1=
+HASH2=
+HASH3=
+HASH4=
+HASH5=
+HASH6=
+HASH7=
+
+test_expect_success 'set up buggy branch' '
+ echo "line 1" >> hello &&
+ echo "line 2" >> hello &&
+ echo "line 3" >> hello &&
+ echo "line 4" >> hello &&
+ add_and_commit_file hello "4 lines" &&
+ HASH1=$(git rev-parse --verify HEAD) &&
+ echo "line BUG" >> hello &&
+ echo "line 6" >> hello &&
+ echo "line 7" >> hello &&
+ echo "line 8" >> hello &&
+ add_and_commit_file hello "4 more lines with a BUG" &&
+ HASH2=$(git rev-parse --verify HEAD) &&
+ echo "line 9" >> hello &&
+ echo "line 10" >> hello &&
+ add_and_commit_file hello "2 more lines" &&
+ HASH3=$(git rev-parse --verify HEAD) &&
+ echo "line 11" >> hello &&
+ add_and_commit_file hello "1 more line" &&
+ HASH4=$(git rev-parse --verify HEAD) &&
+ sed -e "s/BUG/5/" hello > hello.new &&
+ mv hello.new hello &&
+ add_and_commit_file hello "BUG fixed" &&
+ HASH5=$(git rev-parse --verify HEAD) &&
+ echo "line 12" >> hello &&
+ echo "line 13" >> hello &&
+ add_and_commit_file hello "2 more lines" &&
+ HASH6=$(git rev-parse --verify HEAD)
+ echo "line 14" >> hello &&
+ echo "line 15" >> hello &&
+ echo "line 16" >> hello &&
+ add_and_commit_file hello "again 3 more lines" &&
+ HASH7=$(git rev-parse --verify HEAD)
+'
+
+test_expect_success 'replace the author' '
+ git cat-file commit $HASH2 | grep "author A U Thor" &&
+ R=$(git cat-file commit $HASH2 | sed -e "s/A U/O/" | git hash-object -t commit --stdin -w) &&
+ git cat-file commit $R | grep "author O Thor" &&
+ git update-ref refs/replace/$HASH2 $R &&
+ git show HEAD~5 | grep "O Thor" &&
+ git show $HASH2 | grep "A U Thor"
+'
+
+#
+#
+test_done
--
1.6.1.162.g1cd53
^ permalink raw reply related
* [RFC/PATCH 2/3] replace_object: add mechanism to replace objects found in "refs/replace/"
From: Christian Couder @ 2009-01-07 7:43 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
The code of this mechanism has been copied from the commit graft code.
Currently this mechanism is only used from the "parse_commit_buffer"
function in "commit.c". It should probably be used from "fsck.c" too.
(For information, grafts are looked up only from "parse_commit_buffer"
function in "commit.c" and from "fsck_commit" in "fsck.c".)
In "parse_commit_buffer", the parent sha1s from the original commit
or from a commit graft that match a ref name in "refs/replace/" are
replaced by the commit sha1 that has been read in the ref.
This means that for example "git show <original commit sha1>" will
display information about the original commit. If the mechanism
had been called from "read_sha1_file" instead of when parents
are read, then "git show <original commit sha1>" would display
information about the commit that replaces the original one.
This may be seen as a feature or as a bug depending on the point
of view.
Anyway this implementation makes sure that the mechanism is
triggered only when commit graft could be triggered, so hopefully the
object reachability traverser will ignore this mechanism as it
ignores the graft mechanism.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
Makefile | 1 +
commit.c | 7 +++-
commit.h | 2 +
replace_object.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 110 insertions(+), 2 deletions(-)
create mode 100644 replace_object.c
diff --git a/Makefile b/Makefile
index aabf013..f355e63 100644
--- a/Makefile
+++ b/Makefile
@@ -471,6 +471,7 @@ LIB_OBJS += read-cache.o
LIB_OBJS += reflog-walk.o
LIB_OBJS += refs.o
LIB_OBJS += remote.o
+LIB_OBJS += replace_object.o
LIB_OBJS += rerere.o
LIB_OBJS += revision.o
LIB_OBJS += run-command.o
diff --git a/commit.c b/commit.c
index c99db16..0014174 100644
--- a/commit.c
+++ b/commit.c
@@ -241,6 +241,7 @@ int parse_commit_buffer(struct commit *item, void *buffer, unsigned long size)
char *tail = buffer;
char *bufptr = buffer;
unsigned char parent[20];
+ const unsigned char *parent_sha1;
struct commit_list **pptr;
struct commit_graft *graft;
@@ -268,7 +269,8 @@ int parse_commit_buffer(struct commit *item, void *buffer, unsigned long size)
bufptr += 48;
if (graft)
continue;
- new_parent = lookup_commit(parent);
+ parent_sha1 = lookup_replace_object(parent);
+ new_parent = lookup_commit(parent_sha1);
if (new_parent)
pptr = &commit_list_insert(new_parent, pptr)->next;
}
@@ -276,7 +278,8 @@ int parse_commit_buffer(struct commit *item, void *buffer, unsigned long size)
int i;
struct commit *new_parent;
for (i = 0; i < graft->nr_parent; i++) {
- new_parent = lookup_commit(graft->parent[i]);
+ parent_sha1 = lookup_replace_object(graft->parent[i]);
+ new_parent = lookup_commit(parent_sha1);
if (!new_parent)
continue;
pptr = &commit_list_insert(new_parent, pptr)->next;
diff --git a/commit.h b/commit.h
index 3a7b06a..37aa763 100644
--- a/commit.h
+++ b/commit.h
@@ -122,6 +122,8 @@ struct commit_graft *read_graft_line(char *buf, int len);
int register_commit_graft(struct commit_graft *, int);
struct commit_graft *lookup_commit_graft(const unsigned char *sha1);
+const unsigned char *lookup_replace_object(const unsigned char *sha1);
+
extern struct commit_list *get_merge_bases(struct commit *rev1, struct commit *rev2, int cleanup);
extern struct commit_list *get_merge_bases_many(struct commit *one, int n, struct commit **twos, int cleanup);
extern struct commit_list *get_octopus_merge_bases(struct commit_list *in);
diff --git a/replace_object.c b/replace_object.c
new file mode 100644
index 0000000..b50890d
--- /dev/null
+++ b/replace_object.c
@@ -0,0 +1,102 @@
+#include "cache.h"
+#include "refs.h"
+
+static struct replace_object {
+ unsigned char sha1[2][20];
+} **replace_object;
+
+static int replace_object_alloc, replace_object_nr;
+
+static int replace_object_pos(const unsigned char *sha1)
+{
+ int lo, hi;
+ lo = 0;
+ hi = replace_object_nr;
+ while (lo < hi) {
+ int mi = (lo + hi) / 2;
+ struct replace_object *rep = replace_object[mi];
+ int cmp = hashcmp(sha1, rep->sha1[0]);
+ if (!cmp)
+ return mi;
+ if (cmp < 0)
+ hi = mi;
+ else
+ lo = mi + 1;
+ }
+ return -lo - 1;
+}
+
+static int register_replace_object(struct replace_object *replace,
+ int ignore_dups)
+{
+ int pos = replace_object_pos(replace->sha1[0]);
+
+ if (0 <= pos) {
+ if (ignore_dups)
+ free(replace);
+ else {
+ free(replace_object[pos]);
+ replace_object[pos] = replace;
+ }
+ return 1;
+ }
+ pos = -pos - 1;
+ if (replace_object_alloc <= ++replace_object_nr) {
+ replace_object_alloc = alloc_nr(replace_object_alloc);
+ replace_object = xrealloc(replace_object,
+ sizeof(*replace_object) *
+ replace_object_alloc);
+ }
+ if (pos < replace_object_nr)
+ memmove(replace_object + pos + 1,
+ replace_object + pos,
+ (replace_object_nr - pos - 1) *
+ sizeof(*replace_object));
+ replace_object[pos] = replace;
+ return 0;
+}
+
+static int register_replace_ref(const char *refname,
+ const unsigned char *sha1,
+ int flag, void *cb_data)
+{
+ /* Get sha1 from refname */
+ const char *slash = strrchr(refname, '/');
+ const char *hash = slash ? slash + 1 : refname;
+ struct replace_object * repl_obj = xmalloc(sizeof(*repl_obj));
+
+ if (strlen(hash) != 40 || get_sha1_hex(hash, repl_obj->sha1[0])) {
+ free(repl_obj);
+ warning("bad replace ref name: %s", refname);
+ }
+
+ /* Copy sha1 from the read ref */
+ hashcpy(repl_obj->sha1[1], sha1);
+
+ /* Register new object */
+ if (register_replace_object(repl_obj, 1))
+ warning("duplicate replace ref: %s", refname);
+
+ return 0;
+}
+
+static void prepare_replace_object(void)
+{
+ static int replace_object_prepared;
+
+ if (replace_object_prepared)
+ return;
+
+ for_each_replace_ref(register_replace_ref, NULL);
+ replace_object_prepared = 1;
+}
+
+const unsigned char *lookup_replace_object(const unsigned char *sha1)
+{
+ int pos;
+
+ prepare_replace_object();
+ pos = replace_object_pos(sha1);
+
+ return (0 <= pos) ? replace_object[pos]->sha1[1] : sha1;
+}
--
1.6.1.162.g1cd53
^ permalink raw reply related
* [RFC/PATCH 1/3] refs: add a "for_each_replace_ref" function
From: Christian Couder @ 2009-01-07 7:43 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
This is some preparation work for the following patches that are using
the "refs/replace/" ref namespace.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
refs.c | 5 +++++
refs.h | 1 +
2 files changed, 6 insertions(+), 0 deletions(-)
Junio wrote:
> What I thought we
> discussed during GitTogether was to write out the object name of the
> replacement object in refs/replace/<sha1>.
>
> When the caller asks read_sha1_file() for an object whose object name is
> <sha1>, you see if there is refs/replace/<sha1> in the repository, and
> read the ref to learn the object name of the object that replaces it.
> And you return that as if it is the original object.
Patch 2/3 in this series implements the new mechanism. As you can see I
prefered it to be called when reading parent commits than from
"read_sha1_file", because it seems to simplify things. I hope you still like
it.
Regards,
Christian.
diff --git a/refs.c b/refs.c
index 33ced65..042106d 100644
--- a/refs.c
+++ b/refs.c
@@ -632,6 +632,11 @@ int for_each_remote_ref(each_ref_fn fn, void *cb_data)
return do_for_each_ref("refs/remotes/", fn, 13, cb_data);
}
+int for_each_replace_ref(each_ref_fn fn, void *cb_data)
+{
+ return do_for_each_ref("refs/replace/", fn, 13, cb_data);
+}
+
/*
* Make sure "ref" is something reasonable to have under ".git/refs/";
* We do not like it if:
diff --git a/refs.h b/refs.h
index 06ad260..8d2ee5a 100644
--- a/refs.h
+++ b/refs.h
@@ -23,6 +23,7 @@ extern int for_each_ref(each_ref_fn, void *);
extern int for_each_tag_ref(each_ref_fn, void *);
extern int for_each_branch_ref(each_ref_fn, void *);
extern int for_each_remote_ref(each_ref_fn, void *);
+extern int for_each_replace_ref(each_ref_fn, void *);
/*
* Extra refs will be listed by for_each_ref() before any actual refs
--
1.6.1.162.g1cd53
^ permalink raw reply related
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: R. Tyler Ballance @ 2009-01-07 7:41 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML
In-Reply-To: <alpine.LFD.2.00.0901062026500.3057@localhost.localdomain>
[-- Attachment #1: Type: text/plain, Size: 5231 bytes --]
On Tue, 2009-01-06 at 20:54 -0800, Linus Torvalds wrote:
>
> On Tue, 6 Jan 2009, R. Tyler Ballance wrote:
> >
> > I'll back the patch out and redeploy, it's worth mentioning that a
> > coworker of mine just got the issue as well (on 1.6.1). He was able to
> > `git pull` and the error went away, but I doubt that it "magically fixed
> > itself"
>
> Quite frankly, that behaviour sounds like a disk _cache_ corruption issue.
> The fact that some corruption "comes and goes" and sometimes magically
> heals itself sounds very much like some disk cache problem, and then that
> particular part of the cache gets replaced and then when re-populated it
> is magically correct.
>
> We had that in one case with a Linux NFS client, where a rename across
> directories caused problems.
>
> This was a networked filesystem on OS X, right? File caching is much more
> "interesting" in networked filesystems than it is in normal private
> on-disk ones.
Not quite, what I meant was that some users (not all) who've experienced
this issue are using Samba to copy files over directly into the Git
repository. I was mentioning this in case somewhere between Finder,
Samba, ext3 and Git, some file system change events were pissing Git off
and causing it. I don't think this is the case as the coworker that I
mentioned earlier doesn't use Samba and neither do I (we both experience
the issue today, mine disappeared by upgrading to 1.6.1, his by `git
pull`).
>
> > I've tarred one of the repositories that had it in a reproducible state
> > so I can create a build and extract the tar and run against that to
> > verify any patches anybody might have, but unfortunately at 7GB of
> > company code and assets, I can't exactly share ;)
>
> The thing to do is
>
> - untar it on some trusted machine with a local disk and a known-good
> filesystem.
>
> IOW, not that networked samba share.
>
> - verify that it really does happen on that machine, with that untarred
> image. Because maybe it doesn't.
Unfortunately it doesn't, what I did notice was this when I did a `git
status` in the directory right after untarring:
tyler@grapefruit:~/jburgess_main> git status
#
# ---impressive amount of file names fly by---
# ----snip---
#
# Untracked files:
# (use "git add <file>..." to include in what will be
committed)
#
# artwork/
# bt/
# flash/
tyler@grapefruit:~/jburgess_main>
Basically, somehow Git thinks that *every* file in the repository is
deleted at this point. I went ahead and performed a `git reset --hard`
to see if the issue would manifest itself thereafter, but it did not.
I did try to do a git-fsck(1), and this is what I got:
tyler@grapefruit:~/jburgess_main> /usr/local/bin/git fsck --full
[1] 19381 segmentation fault /usr/local/bin/git fsck --full
tyler@grapefruit:~/jburgess_main>
>
> The hope is that you caught the corruption in the cache, and it
> actually got written out to the tar-file. But if it _is_ a disk cache
> (well, network cache) issue, maybe the IO required to tar everything up
> was enough to flush it, and the tar-file actually _works_ because it
> got repopulated correctly.
When I was working through this with Jan, one of the things that we did
was move the actual object file in .git/objects, they existed so maybe I
could look into those to check?
>
> So that's why you should double-check that it really ends up being
> corrupt after being untarred again.
>
> - go back and test the original git repo on the network share, preferably
> on another client. See if the error has gone away.
Unfortunately the repository is being used by the original developer I
tarred from with our 1.6.1 build, he hasn't reported any issues, but I
can't exactly steal it back (that's why I made the tar)
> The fact that you seem to get a _lot_ of these errors really does make
> it
> sound like something in your environment. It's actually really hard to get
> git to corrupt anything. Especially objects that got packed. They've been
> quiescent for a long time, they got repacked in a very simple way, they
> are totally read-only.
I checked with our operations team, and contrary to my suspicion (your
NFS comment piqued my curiosity), these disks that are actually on the
machines are not NFS mounts but rather local disk arrays.
--> is it NFSd? or all local storage
<== all local
<== df -h
<== mount
<== /dev/sda5 705G 247G 423G 37% /nail
--> hm, there goes that theory
<== git corruption?
--> yeah, looking into it
<== sucks
--> Linus had a theory about NFS/etc corruption of the disk
cache
<== when the company folds we can all blame you... and your
silly git games
<== (think positive, joel)
--> thanks
;)
Any thing else I can do to help debug this? :-/
Cheers
--
-R. Tyler Ballance
Slide, Inc.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply
* Re: Comments on Presentation Notes Request.
From: david @ 2009-01-07 8:33 UTC (permalink / raw)
To: Tim Visher; +Cc: git
In-Reply-To: <c115fd3c0901061433i78bf3b26v77e5981aada6728e@mail.gmail.com>
On Tue, 6 Jan 2009, Tim Visher wrote:
> *** Natural Backup
>
> Because every developer has a copy of the repository, every developer
> you add adds an extra failure point. The more developers you have,
> the more backups you have of the repository.
this needs to be re-worded. 'extra failure point' can be read to mean
redundancy in what would otherwide be a single point of failure, but it
can also mean another point where things can fail.
something like 'every developer adds an extra layer of redundancy' would
be much less ambiguous.
David Lang
^ permalink raw reply
* Re: [PATCH] Fix sourcing "test-lib.sh" using dash shell in "t3003-ls-files-narrow-match.sh"
From: Christian Couder @ 2009-01-07 7:29 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Nguyen Thai Ngoc Duy, git
In-Reply-To: <7vzli4kftt.fsf@gitster.siamese.dyndns.org>
Le mardi 6 janvier 2009, Junio C Hamano a écrit :
> Christian Couder <chriscool@tuxfamily.org> writes:
> > dash barfs, on my old Ubuntu box, when "test-lib.sh" is sourced
> > without "./".
> >
> > Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> > ---
> > t/t3003-ls-files-narrow-match.sh | 2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > This patch applies to "pu".
>
> Thanks; I hope you don't mind squashing this in to 'Introduce "sparse
> patterns"'.
No problem.
Thanks,
Christian.
^ permalink raw reply
* Re: [RFC PATCH] diff --no-index: test for pager after option parsing
From: Junio C Hamano @ 2009-01-07 7:02 UTC (permalink / raw)
To: Miklos Vajna; +Cc: Thomas Rast, git
In-Reply-To: <20090107032013.GO21154@genesis.frugalware.org>
Miklos Vajna <vmiklos@frugalware.org> writes:
> On Tue, Jan 06, 2009 at 04:09:18PM -0800, Junio C Hamano <gitster@pobox.com> wrote:
>> But I wonder if it still makes a difference in real life.idn't we stop
>> reporting the exit status from the pager some time ago?
>
> I just wanted to write this, I think that code could be just removed
> since ea27a18 (spawn pager via run_command interface, 2008-07-22).
I think we shouldn't.
People may already have got used to "git diff --exit-code" to disable the
pager, and doing the same for "git diff --exit-code --no-index" should be
with less surprises.
I'll queue the "--" fix, "-q" fix and this pager fix. Thanks.
^ permalink raw reply
* Re: How make "git checkout <commit> <file>" *not* alter index?
From: chris @ 2009-01-07 6:55 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <7vk599ne6a.fsf@gitster.siamese.dyndns.org>
On Mon, Jan 05, 2009 at 10:26:05PM -0800, Junio C Hamano wrote:
> $ git checkout HEAD~43 Makefile
> $ git reset Makefile
Thank you very much. git reset looks that just what I need.
cs
^ permalink raw reply
* Re: [RFC PATCH] diff --no-index: test for pager after option parsing
From: Jeff King @ 2009-01-07 6:42 UTC (permalink / raw)
To: Miklos Vajna; +Cc: Junio C Hamano, Thomas Rast, git
In-Reply-To: <20090107032013.GO21154@genesis.frugalware.org>
On Wed, Jan 07, 2009 at 04:20:13AM +0100, Miklos Vajna wrote:
> On Tue, Jan 06, 2009 at 04:09:18PM -0800, Junio C Hamano <gitster@pobox.com> wrote:
> > But I wonder if it still makes a difference in real life.idn't we stop
> > reporting the exit status from the pager some time ago?
>
> I just wanted to write this, I think that code could be just removed
> since ea27a18 (spawn pager via run_command interface, 2008-07-22).
I don't think just removing it is right. You would also need to put
SETUP_PAGER into the flags for calling cmd_diff.
We do pass along the error code properly these days, but I think it is
nice that --exit-code always just suppresses the pager. Otherwise a
script like this:
if git diff --exit-code $x $y; then
do something
fi
will invoke the pager (and not everybody's setup immediately exits if
there is no output, either because they have different LESS options or
because they use a different pager). Of course one might argue that the
script should not be using "git diff" porcelain at all, but I don't
think there is another way to get a --no-index diff.
-Peff
^ permalink raw reply
* Re: Comments on Presentation Notes Request.
From: Jeff King @ 2009-01-07 6:36 UTC (permalink / raw)
To: Tim Visher; +Cc: git
In-Reply-To: <c115fd3c0901061433i78bf3b26v77e5981aada6728e@mail.gmail.com>
On Tue, Jan 06, 2009 at 05:33:02PM -0500, Tim Visher wrote:
> ** Advantages of SCM
> *** One Source to Rule Them All.
> *** Unlimited Undo/Redo.
> *** Safe Concurrent Editing.
> *** Diff Debugging
I would add to this metadata and "software archeology": finding the
author of a change or piece of code, the motivation behind it, related
changes (by position within history, by content, or by commit message),
etc.
I think people who have not used an SCM before, and people coming from
SCMs where it is painful to look at history (like CVS) undervalue this
because it's not part of their workflow. But having used git for a few
years now, it is an integral part of how I develop (especially when
doing maintenance or bugfixes).
You touch on this in "Diff Debugging", but I think bisection is just a
part of it.
> * SCM Best Practices
>
> ** Commit Early, Commit Often
> ** Don't Commit Broken Code (To the Public Tree)
People talk a lot about using their SCM on a plane, but I think these
two seemingly opposite commands highlight the _real_ useful thing about
a distributed system for most people: commit and publish are two
separate actions.
So I think it might be better to say "Commit Early, Commit Often" but
"Don't _Publish_ Broken Code". Which is what you end up saying in the
discussion, but I think using that terminology makes clear the important
distinction between two actions that are convoluted in centralized
systems.
> *** Backup Becomes A Separate Process
> Because there is only a single repository, you need a back-up strategy
> or else you are exposing yourself to a single point of failure.
> [...]
> *** Natural Backup
> Because every developer has a copy of the repository, every developer
> you add adds an extra failure point. The more developers you have,
> the more backups you have of the repository.
The "natural backup" thing gets brought out a lot for DVCS. And it is
sort of true: instead of each developer having a backup of the latest
version (or some recent version which they checked out), they have a
backup of the whole history. But they still might not have everything.
Developers might not clone all branches. They might not be up to date
with some "master" repository. Useful work might be unpublished in the
master repo (e.g., I am working on feature X which is 99% complete, but
not ready for me to merge into master and push).
So yes, you are much more likely to salvage useful (if not all) data
from developer repositories in the event of a crash. But I still think
it's crazy not to have a backup strategy for your DVCS repo.
> ** Fast
>
> Git's implementation just happens to be wickedly fast. It's faster
> than mercurial, it's faster than bazaar, etc. Everything, committing,
> merging, viewing history, branching, and even updating and and pushing
> are all faster.
A lot of people say "So what? System X is fast enough for me already."
And I used to be one of them. But one point I have made in similar talks
is that it isn't just about shaving a few seconds off your task. It's
about being able to ask fundamentally different questions because they
can be answered in seconds, not minutes or hours. I haven't benchmarked,
but I shudder at the thought of pickaxe (git log -S), code movement in
blame, or bisecting in CVS.
> ** Excellent Merge algorithms
>
> Git has excellent merge algorithms. This is widely attributed and
> doesn't require much explanation. It was one of Git's original design
> goals, and it has been proven by Git's implementation. Merging in Git
> is _much_ less painful than in other systems.
Actually, git has a really _stupid_ merge algorithm that has been around
forever: the 3-way merge. And by stupid I don't mean bad, but just
simple and predictable. I think the git philosophy is more about making
it easy to merge often, and about making sure conflicts are simple to
understand and fix, than it is about being clever.
Which isn't to say there aren't systems with less clever merge
algorithms. CVS doesn't even do a 3-way merge, since it doesn't bother
to remember where the last branch intersection was.
BTW, I think Junio's 2006 OLS talk has some nice pictures of a 3-way
merge which help to explain it (see slides 23-32):
http://members.cox.net/junkio/200607-ols.pdf
That's just my two cents from skimming over your notes. Hope it helps.
-Peff
^ permalink raw reply
* Re: Error: unable to unlink ... when using "git gc"
From: Boyd Stephen Smith Jr. @ 2009-01-07 6:27 UTC (permalink / raw)
To: git; +Cc: Sitaram Chamarty
In-Reply-To: <slrngm6hoj.n4a.sitaramc@sitaramc.homelinux.net>
[-- Attachment #1: Type: text/plain, Size: 1264 bytes --]
On Tuesday 06 January 2009, Sitaram Chamarty <sitaramc@gmail.com> wrote
about 'Re: Error: unable to unlink ... when using "git gc"':
>On 2009-01-06, Jeff King <peff@peff.net> wrote:
>> If you are going to have multiple users sharing a repository, generally
>> they should be in the same group and the core.sharedrepository config
>> option should be set (see "git help config", or the "shared" option to
>> git-init).
>If you're not worried about the finer-grained access control
>that acl(5) gives you, just do what "git init
>--shared=group" does:
>
> git config core.sharedrepository 1 # as mentioned above
> chmod g+ws .git
>
>Now set the group to something (I use "gitpushers" ;-)
>
> chgrp -R gitpushers .git
>
>amd make sure all your users are part of that group.
>
>Works fine for small teams...
ISTR this breaking here when someone on the team had a umask like 077 and
was using file:// or ssh:// to push. I tended up "fixing" things with a
cronjob, (which is a bit of a hack) IIRC.
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss@iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply
* Re: [PATCH] tutorial.txt renamed
From: Junio C Hamano @ 2009-01-07 6:27 UTC (permalink / raw)
To: Brian Gernhardt; +Cc: Christian Couder, Joey Hess, git
In-Reply-To: <DA6E1A83-FFBA-46BC-9BCF-ED8A2D8F12E7@silverinsanity.com>
Brian Gernhardt <benji@silverinsanity.com> writes:
> This is the README file for the project, so it should advise looking
> at the Documentation directory as neither the man pages or git command
> are likely installed at this point.
I think that is a sane suggestion. It is better to keep the number of
prerequisites to the minimum for the user in order to follow README (and
INSTALL, of course).
^ permalink raw reply
* Re: [PATCH] tutorial.txt renamed
From: Christian Couder @ 2009-01-07 6:07 UTC (permalink / raw)
To: Brian Gernhardt; +Cc: Joey Hess, git
In-Reply-To: <DA6E1A83-FFBA-46BC-9BCF-ED8A2D8F12E7@silverinsanity.com>
Le mercredi 7 janvier 2009, Brian Gernhardt a écrit :
> On Jan 7, 2009, at 12:28 AM, Christian Couder wrote:
> > Le mercredi 7 janvier 2009, Joey Hess a écrit :
> >> diff --git a/README b/README
> >> index 548142c..5fa41b7 100644
> >> --- a/README
> >> +++ b/README
> >> @@ -24,7 +24,7 @@ It was originally written by Linus Torvalds with
> >> help
> >> of a group of hackers around the net. It is currently maintained by
> >> Junio
> >> C Hamano.
> >>
> >> Please read the file INSTALL for installation instructions.
> >> -See Documentation/tutorial.txt to get started, then see
> >> +See Documentation/gittutorial.txt to get started,
> >
> > "man gittutorial" and "git help tutorial" should work to display the
> > tutorial, so perhaps we should advise to use them instead of the
> > source,
> > since we are advising to use "man git-commandname" below to get help
> > on
> > each command.
>
> This is the README file for the project, so it should advise looking
> at the Documentation directory as neither the man pages or git command
> are likely installed at this point.
Well, this is debatable, because we first ask the user to read the INSTALL
file, and a tutorial for git may not be very usefull if you don't have it
installed to try out the tutorial commands.
Regards,
Christian.
^ permalink raw reply
* Re: Error: unable to unlink ... when using "git gc"
From: Jeff King @ 2009-01-07 5:59 UTC (permalink / raw)
To: Sitaram Chamarty; +Cc: git
In-Reply-To: <slrngm6uf5.vuo.sitaramc@sitaramc.homelinux.net>
On Tue, Jan 06, 2009 at 03:33:57PM +0000, Sitaram Chamarty wrote:
> > We also plan to do it in this way, just a small wondering that it
> > looks a kind of workaround instead of a more graceful solution.
>
> I wouldn't consider it a workaround. It uses normal Unix
> permissions the way they were designed to, including setgid
> for directories.
Yes, I think core.sharedrepository is the "official" way to do this, so
it is definitely not a workaround.
> Actually, I am yet to come up with a situation where I
> actually needed ACLs, though they are more generalised, and
> fine-grained.
I like ACLs mainly because you don't have to bug root to change
permissions (like you do to get them to create or modify a group).
-Peff
^ permalink raw reply
* Re: [PATCH] tutorial.txt renamed
From: Brian Gernhardt @ 2009-01-07 5:36 UTC (permalink / raw)
To: Christian Couder; +Cc: Joey Hess, git
In-Reply-To: <200901070628.38019.chriscool@tuxfamily.org>
On Jan 7, 2009, at 12:28 AM, Christian Couder wrote:
> Le mercredi 7 janvier 2009, Joey Hess a écrit :
>> diff --git a/README b/README
>> index 548142c..5fa41b7 100644
>> --- a/README
>> +++ b/README
>> @@ -24,7 +24,7 @@ It was originally written by Linus Torvalds with
>> help
>> of a group of hackers around the net. It is currently maintained by
>> Junio
>> C Hamano.
>>
>> Please read the file INSTALL for installation instructions.
>> -See Documentation/tutorial.txt to get started, then see
>> +See Documentation/gittutorial.txt to get started,
>
> "man gittutorial" and "git help tutorial" should work to display the
> tutorial, so perhaps we should advise to use them instead of the
> source,
> since we are advising to use "man git-commandname" below to get help
> on
> each command.
This is the README file for the project, so it should advise looking
at the Documentation directory as neither the man pages or git command
are likely installed at this point.
>> CVS users may also want to read Documentation/cvs-migration.txt.
>
> The "cvs-migration.txt" was also renamed "gitcvs-migration.txt". It
> should
> be available with "man gitcvs-migration" and "git help cvs-migration".
This however is a valid point.
I would also suggest that the patch have a different name, as I
expected it to be renaming tutorial.txt and I was going to ask why.
Perhaps "README: correct for renamed files"?
~~ Brian G.
^ permalink raw reply
* Re: [PATCH] tutorial.txt renamed
From: Christian Couder @ 2009-01-07 5:28 UTC (permalink / raw)
To: Joey Hess; +Cc: git
In-Reply-To: <20090107042337.GA24735@gnu.kitenet.net>
Le mercredi 7 janvier 2009, Joey Hess a écrit :
> The tutorial.txt file was renamed to gittutorial.txt some time ago,
> update README.
>
> Signed-off-by: Joey Hess <joey@gnu.kitenet.net>
> ---
> README | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/README b/README
> index 548142c..5fa41b7 100644
> --- a/README
> +++ b/README
> @@ -24,7 +24,7 @@ It was originally written by Linus Torvalds with help
> of a group of hackers around the net. It is currently maintained by Junio
> C Hamano.
>
> Please read the file INSTALL for installation instructions.
> -See Documentation/tutorial.txt to get started, then see
> +See Documentation/gittutorial.txt to get started,
"man gittutorial" and "git help tutorial" should work to display the
tutorial, so perhaps we should advise to use them instead of the source,
since we are advising to use "man git-commandname" below to get help on
each command.
> then see
> Documentation/everyday.txt for a useful minimum set of commands,
(But the everyday.txt file has not been converted to a man page, so we
cannot advise to use "man giteveryday".)
> and "man git-commandname" for documentation of each command.
Yeah "man git-commandname" and "git help commandname" should work.
> CVS users may also want to read Documentation/cvs-migration.txt.
The "cvs-migration.txt" was also renamed "gitcvs-migration.txt". It should
be available with "man gitcvs-migration" and "git help cvs-migration".
Thanks,
Christian.
> --
> 1.5.6.5
^ permalink raw reply
* Re: JGit vs. Git
From: Vagmi Mudumbai @ 2009-01-07 5:08 UTC (permalink / raw)
To: git
In-Reply-To: <alpine.DEB.1.00.0901062240240.30769@pacific.mpi-cbg.de>
Hi,
@Stephen
>> Or think about extending the Ruby gem grit to also use JGit. Which would certainly improve grit and probably help improve JGit also.
I just started working on that. It will be close to Grit let me see
how far I get with it. If you have an existing repo that you have
already worked on, please feel free to share it.
The JGit code both tests and the UI is quite readable.
Thanks a ton for all your help. :-)
Regards,
Vagmi
http://blog.vagmim.com
"Teaching children to use Windows is like teaching them to smoke
tobacco—in a world where only one company sells tobacco." - Richard
Stallman
^ permalink raw reply
* Re: git-rev-parse --symbolic-abbrev-name
From: Arnaud Lacombe @ 2009-01-07 4:58 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Karl Chen, Miklos Vajna, David Aguilar, Git mailing list
In-Reply-To: <7vocykkftg.fsf@gitster.siamese.dyndns.org>
On Tue, Jan 6, 2009 at 3:18 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> diff --git a/builtin-rev-parse.c b/builtin-rev-parse.c
>> index 81d5a6f..70f4a33 100644
>> --- a/builtin-rev-parse.c
>> +++ b/builtin-rev-parse.c
>> @@ -24,6 +24,7 @@ static int show_type = NORMAL;
>>
>> #define SHOW_SYMBOLIC_ASIS 1
>> #define SHOW_SYMBOLIC_FULL 2
>> +#define SHOW_SYMBOLIC_SHORT 3
>> static int symbolic;
>> static int abbrev;
>> static int output_sq;
>
> I think --symbolic-short makes the most sense.
>
ok, thanks.
>> @@ -125,13 +129,20 @@ static void show_rev(int type, const unsigned char *sha1, const char *name)
>> */
>> break;
>> case 1: /* happy */
>> + if (symbolic == SHOW_SYMBOLIC_SHORT) {
>> + char *p;
>> + p = strrchr(full, (int)'/');
>> + if (p != NULL)
>> + full = p + 1;
>> + }
>
> However, this is not a good way to do it, I suspect. This patch most
> likely will be queued to the al/symbolic-short topic branch, but you are
> losing information here. You'd probably want to try substings from the
> tail of the full name (e.g. symbolic-short, al/symbolic-short,
> heads/al/symbolic-short, and finally refs/heads/al/symbolic-short) and
> feed them to dwim_ref() and pick the shortest one that yields the same ref
> unambiguously, or something like that.
>
ok, I see what you mean, I'll rework the patch to fix this. I was
about to do a proper patch submission when I saw you reply, so it will
be for next time!
> By the way, I do not see why you need to cast '/'.
>
overzealous type casting due to lack of cafeine in blood :-)
regards,
- Arnaud
^ permalink raw reply
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: Linus Torvalds @ 2009-01-07 4:54 UTC (permalink / raw)
To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML
In-Reply-To: <1231292360.8870.61.camel@starfruit>
On Tue, 6 Jan 2009, R. Tyler Ballance wrote:
>
> I'll back the patch out and redeploy, it's worth mentioning that a
> coworker of mine just got the issue as well (on 1.6.1). He was able to
> `git pull` and the error went away, but I doubt that it "magically fixed
> itself"
Quite frankly, that behaviour sounds like a disk _cache_ corruption issue.
The fact that some corruption "comes and goes" and sometimes magically
heals itself sounds very much like some disk cache problem, and then that
particular part of the cache gets replaced and then when re-populated it
is magically correct.
We had that in one case with a Linux NFS client, where a rename across
directories caused problems.
This was a networked filesystem on OS X, right? File caching is much more
"interesting" in networked filesystems than it is in normal private
on-disk ones.
> I've tarred one of the repositories that had it in a reproducible state
> so I can create a build and extract the tar and run against that to
> verify any patches anybody might have, but unfortunately at 7GB of
> company code and assets, I can't exactly share ;)
The thing to do is
- untar it on some trusted machine with a local disk and a known-good
filesystem.
IOW, not that networked samba share.
- verify that it really does happen on that machine, with that untarred
image. Because maybe it doesn't.
The hope is that you caught the corruption in the cache, and it
actually got written out to the tar-file. But if it _is_ a disk cache
(well, network cache) issue, maybe the IO required to tar everything up
was enough to flush it, and the tar-file actually _works_ because it
got repopulated correctly.
So that's why you should double-check that it really ends up being
corrupt after being untarred again.
- go back and test the original git repo on the network share, preferably
on another client. See if the error has gone away.
- If so, try to compare that known-corrupt filesystem with the original
one: and preferably do this on another machine over the network mount.
See if they differ. They obviously should *not* differ, since it's an
tar/untar of the same files, but ...
The fact that you seem to get a _lot_ of these errors really does make it
sound like something in your environment. It's actually really hard to get
git to corrupt anything. Especially objects that got packed. They've been
quiescent for a long time, they got repacked in a very simple way, they
are totally read-only.
But it is _not_ hard to corrupt network filesystems. It's downright
trivial with some of them, especially with some hardware (eg there's no
end-to-end checksumming except for the _extremely_ weak 16-bit IP csum,
and even that has been known to be disabled, or screwed up by ethernet
cards that do IP packet offloading and thus computing the csum not on the
data that tee user actually wrote, but the data that the card received,
which is not necessarily at all the same thing).
And while ethernet uses a stronger CRC, that one is not end-to-end, so
corruption on the card or in a switch in between easily defeats that too.
Just google for something like
"OS X" SMB "file corruption"
and you'll find quite a bit of hits. Not all that unusual.
Linus
^ permalink raw reply
* [PATCH] gitweb: support the rel=vcs microformat
From: Joey Hess @ 2009-01-07 4:25 UTC (permalink / raw)
To: git
The rel=vcs microformat allows a web page to indicate the locations of
repositories related to it in a machine-parseable manner.
(See http://kitenet.net/~joey/rfc/rel-vcs/)
Make gitweb use the microformat in the header of pages it generates,
if it has been configured with project url information in any of the usual
ways.
Since getting the urls can require hitting disk, I avoided putting the
microformat on *every* page gitweb generates. Just put it on the project
summary page, the project list page, and the forks list page.
The first of these already looks up the urls, so adding the microformat was
free. There is a small overhead in including the microformat on the
latter two pages, but getting the project descriptions for those pages
already incurs a similar overhead, and the ability to get every repo url
in one place seems worthwhile.
This changes git_get_project_description() to not check wantarray, and only
return in list context -- the only way it is used AFAICS.
Signed-off-by: Joey Hess <joey@gnu.kitenet.net>
---
gitweb/gitweb.perl | 38 ++++++++++++++++++++++++++------------
1 files changed, 26 insertions(+), 12 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 99f71b4..3f8a228 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -789,6 +789,9 @@ $git_dir = "$projectroot/$project" if $project;
our @snapshot_fmts = gitweb_get_feature('snapshot');
@snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
+# populated later with git urls for the project
+our @git_url_list;
+
# dispatch
if (!defined $action) {
if (defined $hash) {
@@ -2100,17 +2103,22 @@ sub git_show_project_tagcloud {
}
sub git_get_project_url_list {
+ # use per project git URL list in $projectroot/$path/cloneurl
+ # or make project git URL from git base URL and project name
my $path = shift;
+ my @ret;
+
$git_dir = "$projectroot/$path";
- open my $fd, "$git_dir/cloneurl"
- or return wantarray ?
- @{ config_to_multi(git_get_project_config('url')) } :
- config_to_multi(git_get_project_config('url'));
- my @git_project_url_list = map { chomp; $_ } <$fd>;
- close $fd;
+ if (open my $fd, "$git_dir/cloneurl") {
+ @ret = map { chomp; $_ } <$fd>;
+ close $fd;
+ }
+ else {
+ @ret = @{ config_to_multi(git_get_project_config('url')) };
+ }
- return wantarray ? @git_project_url_list : \@git_project_url_list;
+ return @ret ? @ret : map { "$_/$project" } @git_base_url_list;
}
sub git_get_projects_list {
@@ -2953,6 +2961,10 @@ EOF
print qq(<link rel="shortcut icon" href="$favicon" type="image/png" />\n);
}
+ foreach my $url (@git_url_list) {
+ print qq{<link rel="vcs" type="git" href="$url" />\n};
+ }
+
print "</head>\n" .
"<body>\n";
@@ -4380,6 +4392,8 @@ sub git_project_list {
die_error(404, "No projects found");
}
+ @git_url_list = map { git_get_project_url_list($_->{path}) } @list;
+
git_header_html();
if (-f $home_text) {
print "<div class=\"index_include\">\n";
@@ -4400,6 +4414,8 @@ sub git_forks {
if (defined $order && $order !~ m/none|project|descr|owner|age/) {
die_error(400, "Unknown order parameter");
}
+
+ @git_url_list = map { git_get_project_url_list($_->{path}) } @list;
my @list = git_get_projects_list($project);
if (!@list) {
@@ -4457,6 +4473,8 @@ sub git_summary {
@forklist = git_get_projects_list($project);
}
+ @git_url_list = git_get_project_url_list($project);
+
git_header_html();
git_print_page_nav('summary','', $head);
@@ -4468,12 +4486,8 @@ sub git_summary {
print "<tr id=\"metadata_lchange\"><td>last change</td><td>$cd{'rfc2822'}</td></tr>\n";
}
- # use per project git URL list in $projectroot/$project/cloneurl
- # or make project git URL from git base URL and project name
my $url_tag = "URL";
- my @url_list = git_get_project_url_list($project);
- @url_list = map { "$_/$project" } @git_base_url_list unless @url_list;
- foreach my $git_url (@url_list) {
+ foreach my $git_url (@git_url_list) {
next unless $git_url;
print "<tr class=\"metadata_url\"><td>$url_tag</td><td>$git_url</td></tr>\n";
$url_tag = "";
--
1.5.6.5
^ permalink raw reply related
* [PATCH] tutorial.txt renamed
From: Joey Hess @ 2009-01-07 4:23 UTC (permalink / raw)
To: git
The tutorial.txt file was renamed to gittutorial.txt some time ago, update
README.
Signed-off-by: Joey Hess <joey@gnu.kitenet.net>
---
README | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/README b/README
index 548142c..5fa41b7 100644
--- a/README
+++ b/README
@@ -24,7 +24,7 @@ It was originally written by Linus Torvalds with help of a group of
hackers around the net. It is currently maintained by Junio C Hamano.
Please read the file INSTALL for installation instructions.
-See Documentation/tutorial.txt to get started, then see
+See Documentation/gittutorial.txt to get started, then see
Documentation/everyday.txt for a useful minimum set of commands,
and "man git-commandname" for documentation of each command.
CVS users may also want to read Documentation/cvs-migration.txt.
--
1.5.6.5
--
see shy jo
^ permalink raw reply related
* Re: [JGIT RFC] How read versions of a specific object
From: Shawn O. Pearce @ 2009-01-07 4:04 UTC (permalink / raw)
To: Imran M Yousuf; +Cc: Git Mailing List
In-Reply-To: <7bfdc29a0901061944x454a9t1d01e6744f08cf78@mail.gmail.com>
Imran M Yousuf <imyousuf@gmail.com> wrote:
> I am trying to read all or n-th version of an object. Currently to do
> this I am using the following piece of code, which has to walk to
> every commit is present and from there prepare a set of its object id,
> it is definitely expensive if the commit history is huge, is there a
> faster/better way to achieve it?
Not really. You can more efficiently use JGit and reduce some of
the overheads, but that's about it.
> for (int i = 0; i < App.OBJECT_COUNT;
> ++i) {
> ObjectWalk objectWalk = new ObjectWalk(repo);
Don't use ObjectWalk, use a RevWalk. You don't need it to keep
track of tree or blob identities. The ObjectWalk code has more
overhead to do that bookkeeping.
> Commit revision = repo.mapCommit(revObject.getId());
> Tree versionTree = repo.mapTree(revision.getTreeId());
> if (versionTree.existsBlob(isbn)) {
> revisions.add(versionTree.findBlobMember(isbn).getId());
Use a TreeWalk to do this. Its quicker because it doesn't
have to parse as much data to come up with the same result.
More specifically there's a static factory method that sets up for
a path limited walk and returns the TreeWalk pointing at that entry.
You can use the fact that RevWalk.next() returns a RevCommit to get
you the RevTree, which is the tree you need to give to the TreeWalk
constructor (its the root level tree of the commit).
But if App.OBJECT_COUNT is quite large and covers most of your
objects, you are probably better off using a loop over the commits
and diff'ing against the ancestor:
final HashMap<String, Set<ObjectId>> versions = ...;
final RevWalk rw = new RevWalk(repo);
final TreeWalk tw = new TreeWalk(repo);
rw.markStart(rw.parseCommit(repo.parse(HEAD)));
tw.setFilter(TreeFilter.ANY_DIFF);
RevCommit c;
while ((c = rw.next()) != null) {
final ObjectId[] p = new ObjectId[c.getParentCount() + 1];
for (int i = 0; i < c.getParentCount(); i++) {
rw.parse(c.getParent(i));
p[i] = c.getParent(i).getTree();
}
final int me = p.length -1;
p[me] = c.getTree();
tw.reset(p);
while (tw.next()) {
if (tw.getFileMode(me).getObjectType() == Constants.OBJ_BLOB) {
// This path was modified relative to the ancestor(s).
//
String s = tw.getPathString();
Set<ObjectId> i = versions.get(s);
if (i == null)
versions.put(s, i = new HashSet<ObjectId>());
i.add(tw.getObjectId(me));
}
if (tw.isSubtree()) {
// make sure we recurse into modified directories
tw.enterSubtree();
}
}
}
--
Shawn.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox