Git development

Git development
 help / color / mirror / Atom feed

* Re: git behaviour question regarding SHA-1 and commits
From: Jeff King @ 2011-11-14 11:48 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, vinassa vinassa, git
In-Reply-To: <7vwrb3l6v2.fsf@alter.siamese.dyndns.org>

On Sun, Nov 13, 2011 at 07:29:05PM -0800, Junio C Hamano wrote:

> If the collision is between commit objects, for example, we would write
> the (old) commit object name to the tip of the current branch. Most
> likely, the tree object recorded in the (old) commit would not match the
> tree object your "git commit" wanted to record (otherwise you have hit
> SHA-1 collision twice in a row ;-), which would mean "git status" would
> show that a whole bunch of paths have changed between the HEAD and the
> index. Also "git log" would show the history leading to the (old) commit
> that is likely to be very different from what you would expect immediately
> after committing the collided commit. Of course, you could recover from it
> with "git reset --soft" after finding out what the previous HEAD was from
> the reflog, but it won't be a pleasant experience.
> 
> There can be other kinds of collisions (e.g. your latest commit might have
> collided with an existing blob or tree, in which case it is likely that
> almost nothing would work after finding a blob or tree in HEAD).

You are more likely to just have blobs collide, since we generate many
more blobs than commits (each commit should have at least one changed
blob, but typically has more).

And in that case, I expect git would silently lose that state. We would
fail to write the new blob to the object db, but "git diff" would report
nothing, as it would see that the index entry's sha1 is the same as what
is in HEAD, and that the file is up to date with respect to the stat
information in the index. So if you were to "git checkout", your content
would be lost forever. However, if you instead modify the file further,
the new content will be kept (and you will get a very confusing diff).

-Peff

^ permalink raw reply

* Re: git behaviour question regarding SHA-1 and commits
From: Jeff King @ 2011-11-14 11:32 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: vinassa vinassa, git, Ævar Arnfjörð Bjarmason
In-Reply-To: <20111113182757.GA15194@elie.hsd1.il.comcast.net>

On Sun, Nov 13, 2011 at 12:27:57PM -0600, Jonathan Nieder wrote:

> Though I haven't tested.  It would be nice to have an md5git (or even
> truncated-sha1-git) program to test this kind of thing with.

Fortunately we have such a thing:

  http://article.gmane.org/gmane.comp.version-control.git/184243

That one actually has 40 bits of hash entropy, so you'd expect to
generate 2^20 (about a million) commits before accidentally colliding.
If you want an easier experiment, you could truncate it even further.

-Peff

^ permalink raw reply

* Re: [PATCH 2/2] Copy resolve_ref() return value for longer use
From: Jeff King @ 2011-11-14 11:24 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Junio C Hamano, git
In-Reply-To: <CACsJy8BnqoPVJiM6mbq7p3gKtLh-KGUuTshcukGokC3istTxMQ@mail.gmail.com>

On Mon, Nov 14, 2011 at 10:32:11AM +0700, Nguyen Thai Ngoc Duy wrote:

> 2011/11/14 Junio C Hamano <gitster@pobox.com>:
> >> diff --git a/builtin/branch.c b/builtin/branch.c
> >> index 0fe9c4d..5b6d839 100644
> >> --- a/builtin/branch.c
> >> +++ b/builtin/branch.c
> >> @@ -115,8 +115,10 @@ static int branch_merged(int kind, const char *name,
> >>                   branch->merge[0] &&
> >>                   branch->merge[0]->dst &&
> >>                   (reference_name =
> >> -                  resolve_ref(branch->merge[0]->dst, sha1, 1, NULL)) != NULL)
> >> +                  resolve_ref(branch->merge[0]->dst, sha1, 1, NULL)) != NULL) {
> >> +                     reference_name = xstrdup(reference_name);
> >>                       reference_rev = lookup_commit_reference(sha1);
> >> +             }
> >>       }
> >>       if (!reference_rev)
> >>               reference_rev = head_rev;
> >> @@ -141,6 +143,7 @@ static int branch_merged(int kind, const char *name,
> >>                               "         '%s', even though it is merged to HEAD."),
> >>                               name, reference_name);
> >>       }
> >> +     free((char*)reference_name);
> >>       return merged;
> >>  }
> >
> > Now reference_name stores the result of xstrdup(), it does not have reason
> > to be of type "const char *". It is preferable to lose the cast here, I
> > think. The same comment applies to the remainder of the patch.
> 
> But resolve_ref() returns "const char *", we need to type cast at
> least once, either at resolve_ref() assignment or at free(), until we
> change resolve_ref(). Or should we change resolve_ref() to return
> "char *" now?

Your problem is that you are using the same variable for two different
things: storing the pointer to non-owned memory returned from
resolve_ref, and then storing the owned memory that comes from xstrdup.
Those two things have different types, since we use "const" on non-owned
memory. Thus you end up casting.

So your code isn't wrong, but I do think it would be more obviously
correct to a reader if it used two variables and dropped the cast.

-Peff

^ permalink raw reply

* Re: Git shouldn't allow to push a new branch called HEAD
From: Jeff King @ 2011-11-14 11:16 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Daniele Segato, Git Mailing List
In-Reply-To: <4EC0F15A.9010502@alum.mit.edu>

On Mon, Nov 14, 2011 at 11:45:46AM +0100, Michael Haggerty wrote:

> The whole time, victim's .git/HEAD contains "ref: refs/heads/master",
> .git/refs/remotes/origin/HEAD contains "ref:
> refs/remotes/origin/master", and its packed-refs file contains
> 
> # pack-refs with: peeled
> 4c9ebba3c0618bd6238a810013da4a8cd4f2213b refs/remotes/origin/master
> 
> In "remote.git", refs/heads/HEAD contains not a symbolic reference but
> the explicit SHA1 "4c9ebba...".  This is of course not affected by
> running "git fetch" in the "victim" tree.  Deleting this file makes the
> problem go away.
> 
> 
> Given that this problem seems to be in the remote protocol rather than
> in the refs API, I think I'll stop working on this.  I hope that my
> observations are helpful to somebody.

I didn't recreate the test situation and look closely, but my impression
is that this isn't a code bug at all, but rather a design problem in the
way we store remote namespaces. That is, we make "refs/remotes/foo/HEAD"
a symbolic ref with special meaning, but then fetch into it from the
remote's refs/heads namespace, writing remote's HEAD branch into
whatever our HEAD symref points to.

So one solution is to block fetching of remote branches called HEAD
(which I would be OK with). But another is to use a more sensible layout
for representing the remote refs, like:

   refs/remotes/origin/HEAD            (a symbolic ref)
   refs/remotes/origin/heads/master
   refs/remotes/origin/tags/v1.0

etc. Then the namespaces are properly separated, and the magic remote
"HEAD" symref is not in the way.

Obviously there's a lot more to it than just tweaking the default fetch
refspecs. The ref lookup rules need to be changed to take this into
account. There was some discussion about this over the summer (under the
subject of possible "1.8.0" changes), but I don't think any work has
been done.

-Peff

^ permalink raw reply

* Re: What's cooking in git.git (Nov 2011, #03; Sun, 13)
From: Jeff King @ 2011-11-14 11:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vmxbzl5ch.fsf@alter.siamese.dyndns.org>

On Sun, Nov 13, 2011 at 08:01:50PM -0800, Junio C Hamano wrote:

> * jc/lookup-object-hash (2011-08-11) 6 commits
>  - object hash: replace linear probing with 4-way cuckoo hashing
>  - object hash: we know the table size is a power of two
>  - object hash: next_size() helper for readability
>  - pack-objects --count-only
>  - object.c: remove duplicated code for object hashing
>  - object.c: code movement for readability
> 
> I do not think there is anything fundamentally wrong with this series, but
> the risk of breakage far outweighs observed performance gain in one
> particular workload.

FWIW, I finally got a chance to read through this series. It was fun, as
I had not looked at cuckoo hashing before. However, the performance
results were a bit underwhelming, and the code is more complex, which
left me a bit negative. I also took a quick try at quadratic probing,
which is only a few extra lines of code.  I wasn't able to show any real
performance improvement, though.

I suspect it is because our hash table is not all that big, and we keep
it pretty sparse, so linear probing does well. Googling around, it seems
that linear probing performs well up to about 70% load factor, but
there's surprisingly little theory behind it.

I notice that the decorate.c hash keeps us below 2/3 full, but the
object.c hash keeps us at 1/2. From my reading, that's just wasting
space. Pushing the boundary up to 2/3 and trying your "--count-objects"
on git.git, I don't see a big performance difference (with my change,
the best-of-5 was a little better, but well within the noise). It does
drop the maxresident by a few percent.

So I don't think it's a big deal either way, but the code change is
pretty trivial.

-Peff

^ permalink raw reply

* [PATCH] tag: implement --no-strip option
From: Kirill A. Shutemov @ 2011-11-14 11:08 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill@shutemov.name>

--no-strip turns off strip any comments or empty lines.

It's useful if you want to take a tag message as-is, without any
stripping.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
 Documentation/git-tag.txt |    4 ++++
 builtin/tag.c             |   13 ++++++++-----
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-tag.txt b/Documentation/git-tag.txt
index c83cb13..947d4e5 100644
--- a/Documentation/git-tag.txt
+++ b/Documentation/git-tag.txt
@@ -99,6 +99,10 @@ OPTIONS
 	Implies `-a` if none of `-a`, `-s`, or `-u <key-id>`
 	is given.
 
+-S::
+--no-strip::
+	Take tag message as-is. Do not strip any comments or empty lines.
+
 <tagname>::
 	The name of the tag to create, delete, or describe.
 	The new tag name must pass all checks defined by
diff --git a/builtin/tag.c b/builtin/tag.c
index 9b6fd95..427d646 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -320,7 +320,7 @@ static int build_tag_object(struct strbuf *buf, int sign, unsigned char *result)
 }
 
 static void create_tag(const unsigned char *object, const char *tag,
-		       struct strbuf *buf, int message, int sign,
+		       struct strbuf *buf, int message, int sign, int nostrip,
 		       unsigned char *prev, unsigned char *result)
 {
 	enum object_type type;
@@ -356,7 +356,7 @@ static void create_tag(const unsigned char *object, const char *tag,
 
 		if (!is_null_sha1(prev))
 			write_tag_body(fd, prev);
-		else
+		else if (!nostrip)
 			write_or_die(fd, _(tag_template), strlen(_(tag_template)));
 		close(fd);
 
@@ -367,7 +367,8 @@ static void create_tag(const unsigned char *object, const char *tag,
 		}
 	}
 
-	stripspace(buf, 1);
+	if (!nostrip)
+		stripspace(buf, 1);
 
 	if (!message && !buf->len)
 		die(_("no tag message?"));
@@ -423,7 +424,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 	const char *object_ref, *tag;
 	struct ref_lock *lock;
 
-	int annotate = 0, sign = 0, force = 0, lines = -1,
+	int annotate = 0, sign = 0, nostrip = 0, force = 0, lines = -1,
 		list = 0, delete = 0, verify = 0;
 	const char *msgfile = NULL, *keyid = NULL;
 	struct msg_arg msg = { 0, STRBUF_INIT };
@@ -443,6 +444,8 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 			     "tag message", parse_msg_arg),
 		OPT_FILENAME('F', "file", &msgfile, "read message from file"),
 		OPT_BOOLEAN('s', "sign", &sign, "annotated and GPG-signed tag"),
+		OPT_BOOLEAN('S', "no-strip", &nostrip,
+					"turn off tag message stripping"),
 		OPT_STRING('u', "local-user", &keyid, "key-id",
 					"use another key to sign the tag"),
 		OPT__FORCE(&force, "replace the tag if exists"),
@@ -525,7 +528,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 
 	if (annotate)
 		create_tag(object, tag, &buf, msg.given || msgfile,
-			   sign, prev, object);
+			   sign, nostrip, prev, object);
 
 	lock = lock_any_ref_for_update(ref.buf, prev, 0);
 	if (!lock)
-- 
1.7.7.2

^ permalink raw reply related

* Re: Git shouldn't allow to push a new branch called HEAD
From: Michael Haggerty @ 2011-11-14 10:45 UTC (permalink / raw)
  To: Daniele Segato; +Cc: Git Mailing List
In-Reply-To: <1321261662.2941.13.camel@mastroc3.mobc3.local>

On 11/14/2011 10:07 AM, Daniele Segato wrote:
> On Fri, 2011-10-14 at 13:35 +0200, Daniele Segato wrote:
>> On Fri, 2011-10-14 at 13:31 +0200, Daniele Segato wrote:
>>> following from a discussion in IRC freenode #git between me, sitaram an
>>> shruggar
>>>
>>>
>>> step to reproduce:
>>>
>>> $ mkdir /tmp/gitbug
>>> $ cd /tmp/gitbug/
>>>
>>> $ # create a fake remote repo
>>> $ git init --bare remote.git
>>>
>>> $ # clone it with the user that will generate the bug
>>> $ git clone remote.git buggenerator
>>> $ cd buggenerator/
>>> $ touch whatever
>>> $ git add .
>>> $ git commit -m "first commit"
>>> $ git push origin master 
>>>
>>> $ # now clone the same repo the other guy is the "victim" of this issue
>>> $ cd ..
>>> $ git clone remote.git victim
>>>
>>> $ # time to create the remote HEAD branch
>>> $ cd buggenerator/
>>> $ git push origin HEAD:HEAD
>>>
>>> $ # the remote refs has been created!
>>> $ git ls-remote
>>>
>>> $ # another commit
>>> $ echo 'any change' >> whatever 
>>> $ git commit -a -m "some change"
>>> $ git push origin master 
>>>
>>> $ # the refs/heads/HEAD is still where it was
>>> $ git ls-remote
>>>
>>> $ # now from the victim perspective
>>> $ cd ../victim/
>>>
>>> $ # every time executing a fetch he will get a force update
>>> $ # or maybe even an error, seen it my real repo, don't know how
>>> $ # to reproduce
>>> $ git fetch 
>>> $ git fetch 
>>> $ git ls-remote
>>> $ git fetch 
>>> $ git ls-remote
>>> $ git branch -a
>>
>> This should also help understanding what happen in the "victim" local
>> repo at every fetch:
>>
>> mastro@mastroc3 /tmp/gitbug/victim (master) $ git br -av
>> * master                11d0a12 [behind 1] first commit
>>   remotes/origin/HEAD   -> origin/master
>>   remotes/origin/master 77852ef some change
>> mastro@mastroc3 /tmp/gitbug/victim (master) $ git fetch 
>> From /tmp/gitbug/remote
>>  + 77852ef...11d0a12 HEAD       -> origin/HEAD  (forced update)
>> mastro@mastroc3 /tmp/gitbug/victim (master) $ git br -av
>> * master                11d0a12 first commit
>>   remotes/origin/HEAD   -> origin/master
>>   remotes/origin/master 11d0a12 first commit
> 
> I'm aware my request has been ignored for a good reason but I would
> appreciate someone stepping in and explaining to me why this is not a
> bug or why it has been ignored.

This is a nice little bug.

I'm sure that you noticed that running "git fetch" repeatedly from the
"victim" repository alternates between two behaviors (I'm using 1.7.7.2):

> $ git fetch
> From /home/mhagger/tmp/gitbug/remote
>  + 6bf3df1...4c9ebba HEAD       -> origin/HEAD  (forced update)
> $ git for-each-ref
> 4c9ebba3c0618bd6238a810013da4a8cd4f2213b commit	refs/heads/master
> 4c9ebba3c0618bd6238a810013da4a8cd4f2213b commit	refs/remotes/origin/HEAD
> 4c9ebba3c0618bd6238a810013da4a8cd4f2213b commit	refs/remotes/origin/master
> $ git fetch
> From /home/mhagger/tmp/gitbug/remote
>    4c9ebba..6bf3df1  master     -> origin/master
> $ git for-each-ref
> 4c9ebba3c0618bd6238a810013da4a8cd4f2213b commit	refs/heads/master
> 6bf3df178cd92ca72625ae5bda9206c4333fd807 commit	refs/remotes/origin/HEAD
> 6bf3df178cd92ca72625ae5bda9206c4333fd807 commit	refs/remotes/origin/master
> $ git fetch
> From /home/mhagger/tmp/gitbug/remote
>  + 6bf3df1...4c9ebba HEAD       -> origin/HEAD  (forced update)
> $ git fetch
> From /home/mhagger/tmp/gitbug/remote
>    4c9ebba..6bf3df1  master     -> origin/master

The whole time, victim's .git/HEAD contains "ref: refs/heads/master",
.git/refs/remotes/origin/HEAD contains "ref:
refs/remotes/origin/master", and its packed-refs file contains

# pack-refs with: peeled
4c9ebba3c0618bd6238a810013da4a8cd4f2213b refs/remotes/origin/master

In "remote.git", refs/heads/HEAD contains not a symbolic reference but
the explicit SHA1 "4c9ebba...".  This is of course not affected by
running "git fetch" in the "victim" tree.  Deleting this file makes the
problem go away.


Given that this problem seems to be in the remote protocol rather than
in the refs API, I think I'll stop working on this.  I hope that my
observations are helpful to somebody.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply

* Re: [RFC] deprecating and eventually removing "git relink"?
From: Jeff King @ 2011-11-14 10:34 UTC (permalink / raw)
  To: Simon Brenner; +Cc: git
In-Reply-To: <CAD=rjTXgH+AivmK+zLurQVC+=p1UYqFy_p=wBF-1-TOQ=Cqjtw@mail.gmail.com>

On Mon, Nov 14, 2011 at 09:48:07AM +0100, Simon Brenner wrote:

> On Mon, Nov 14, 2011 at 7:06 AM, Miles Bader <miles@gnu.org> wrote:
> > It might be nice to have a mechanism where new objects would update
> > the _alternate_ rather than the object-store in the tree where the
> > command was run... then you could easily have a bunch of trees using a
> > central object store without needing to update the central store
> > occasionally by hand (and do gc in its "clients")...
> 
> This sounds like a nice way forward: replace/extend the current
> alternates system with support for a shared object store that is
> "intelligently" shared so that it can be gc:d based on all refs from
> all referring repositories. I imagine it would be something very much
> like a bare repository - except it wouldn't have any refs of its own,
> just a list of other repositories it should search for refs when
> GC:ing.

Yes, I think that is sensible. I'm not sure there is even any core git
code to be written. I think a wrapper that does the following would
probably work:

  1. Make new repo groups. E.g.:

       $ git share init foo

     which would be implemented something like:

       ROOT=$HOME/.git-share
       git init --bare $ROOT/$1

  2. Add a repo to a group.

       $ git share add foo

     implemented as:

       echo $ROOT/$1/objects >>.git/objects/info/alternates
       git --git-dir=$ROOT/$1 config --add share.child $PWD

  3. Compact a group.

       $ git share compact foo

     implemented as:

       # delete any existing refs
       git for-each-ref --format='%(refname)' | xargs git update-ref -d

       # now make new refs for each child
       n=1
       for dir in `git config --all share.child`; do
              if ! test -d $dir; then
                      echo >&2 "warning: $dir went away"
                      continue
              fi
              git fetch $dir refs/*:refs/$1/*
              n=$(($n + 1))
       done

       # and then repack/prune
       git repack -ad

       # and then gc each child, dropping anything in the share
       for dir in `git config --all share.child`; do
              git --git-dir=$dir gc
       done

I'm sure I'm missing a corner case or two, and of course there are
quoting issues and error handling missing. But the point is, I don't
think there's a real reason that the UI can't wrap the existing
mechanism, creating a momentary list of refs and pruning based on that.

One issue with this scheme (or most similar schemes) is that child repos
are uniquely identified by their directory name. In the absence of
alternates, it's perfectly reasonable to do:

  git init; hack hack hack; commit commit commit
  cd .. ; mv project new-project-name

but here it would break the shared repo's link to the child (which is
not just inconvenient, but dangerous, as we will not respect its refs
when pruning). Probably the "warning" above should actually error out
and force the user to say "yes, I deleted this child" or "no, I moved it
here".

You could try to be clever with assigning each child a UUID, but then
you have to resort to grepping the filesystem for the UUID to detect a
move. Which is complex and still not foolproof (i.e., if you don't find
it, is it because the repo was deleted, or because it got moved
somewhere that we didn't look?).

-Peff

^ permalink raw reply

* Re: [RFC] deprecating and eventually removing "git relink"?
From: Junio C Hamano @ 2011-11-14 10:24 UTC (permalink / raw)
  To: Miles Bader; +Cc: git
In-Reply-To: <buomxbzutjm.fsf@dhlpc061.dev.necel.com>

Miles Bader <miles@gnu.org> writes:

> It might be nice to have a mechanism where new objects would update
> the _alternate_ rather than the object-store in the tree where the
> command was run.

With the alternate mechanism, your borrowing is read-only and that is
exactly why you can borrow from other peoples' repositories to which you
have no write permission to.

What you are suggesting is fundamentally different from the alternates
mechanism. I am not saying it is better or worse, though. Not yet at this
point in this message.

> .. then you could easily have a bunch of trees using a
> central object store without needing to update the central store
> occasionally by hand (and do gc in its "clients")...

If you write objects to the central store, "gc" in the "clients" will be a
no-op because they do not have their own objects. But instead, crufts your
"clients" accumulate will be in the central store. There is still need for
"gc" at the central store to remove things that are no longer used by any
client, isn't it? Unless you declare that you do not care because perhaps
the central store is large enough, that is.

At least with the alternates, running "gc" in the "clients" is a safe
operation and the only change necessary is to make fsck/repack aware of
the repositories that borrow from the repository these commands are run,
and the logic to do so is exactly the same as the case to run "gc" in your
central store, I would think.

^ permalink raw reply

* Re: Git shouldn't allow to push a new branch called HEAD
From: Daniele Segato @ 2011-11-14  9:07 UTC (permalink / raw)
  To: Git Mailing List
In-Reply-To: <1318592153.2938.21.camel@mastroc3.mobc3.local>

On Fri, 2011-10-14 at 13:35 +0200, Daniele Segato wrote:
> On Fri, 2011-10-14 at 13:31 +0200, Daniele Segato wrote:
> > Hi all,
> > 
> > 
> > following from a discussion in IRC freenode #git between me, sitaram an
> > shruggar
> > 
> > 
> > step to reproduce:
> > 
> > $ mkdir /tmp/gitbug
> > $ cd /tmp/gitbug/
> > 
> > $ # create a fake remote repo
> > $ git init --bare remote.git
> > 
> > $ # clone it with the user that will generate the bug
> > $ git clone remote.git buggenerator
> > $ cd buggenerator/
> > $ touch whatever
> > $ git add .
> > $ git commit -m "first commit"
> > $ git push origin master 
> > 
> > $ # now clone the same repo the other guy is the "victim" of this issue
> > $ cd ..
> > $ git clone remote.git victim
> > 
> > $ # time to create the remote HEAD branch
> > $ cd buggenerator/
> > $ git push origin HEAD:HEAD
> > 
> > $ # the remote refs has been created!
> > $ git ls-remote
> > 
> > $ # another commit
> > $ echo 'any change' >> whatever 
> > $ git commit -a -m "some change"
> > $ git push origin master 
> > 
> > $ # the refs/heads/HEAD is still where it was
> > $ git ls-remote
> > 
> > $ # now from the victim perspective
> > $ cd ../victim/
> > 
> > $ # every time executing a fetch he will get a force update
> > $ # or maybe even an error, seen it my real repo, don't know how
> > $ # to reproduce
> > $ git fetch 
> > $ git fetch 
> > $ git ls-remote
> > $ git fetch 
> > $ git ls-remote
> > $ git branch -a
> 
> This should also help understanding what happen in the "victim" local
> repo at every fetch:
> 
> mastro@mastroc3 /tmp/gitbug/victim (master) $ git br -av
> * master                11d0a12 [behind 1] first commit
>   remotes/origin/HEAD   -> origin/master
>   remotes/origin/master 77852ef some change
> mastro@mastroc3 /tmp/gitbug/victim (master) $ git fetch 
> From /tmp/gitbug/remote
>  + 77852ef...11d0a12 HEAD       -> origin/HEAD  (forced update)
> mastro@mastroc3 /tmp/gitbug/victim (master) $ git br -av
> * master                11d0a12 first commit
>   remotes/origin/HEAD   -> origin/master
>   remotes/origin/master 11d0a12 first commit


Hi again,

I'm aware my request has been ignored for a good reason but I would
appreciate someone stepping in and explaining to me why this is not a
bug or why it has been ignored.

Thanks.

Regards,
Daniele Segato

^ permalink raw reply

* Re: [RFC] deprecating and eventually removing "git relink"?
From: Chris Packham @ 2011-11-14  9:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Miles Bader, git
In-Reply-To: <7v62inkymg.fsf@alter.siamese.dyndns.org>

On 14/11/11 19:27, Junio C Hamano wrote:
> Miles Bader <miles@gnu.org> writes:
> 
>> Do you mean a more elaborate UI that does this nicely...?
> 
> Yes, that is what I meant. I also have a feeling that people would prefer
> to have an option that treats these two repositories equally; your
> illustration makes one a subordinate to the other.

Not sure if it's what you're after but there was this patch [1] that I
was kicking around a while back. I've still got the code in an old
branch if there is interest in resurrecting it. It looks like I started
addressing Junio's comments and never posted v3.

[1] http://article.gmane.org/gmane.comp.version-control.git/143164

^ permalink raw reply

* Re: [RFC] deprecating and eventually removing "git relink"?
From: Simon Brenner @ 2011-11-14  8:48 UTC (permalink / raw)
  To: git
In-Reply-To: <buomxbzutjm.fsf@dhlpc061.dev.necel.com>

I think one of the most annoying aspects of alternates (beyond the
hassle of adding/removing them except using clone --reference) is the
danger of losing data if you aren't absolutely sure that your
alternate is stable and won't ever lose references to objects.

If the alternate just had links to the referring repositories, I think
this hole could be neatly closed.

On Mon, Nov 14, 2011 at 7:06 AM, Miles Bader <miles@gnu.org> wrote:
> It might be nice to have a mechanism where new objects would update
> the _alternate_ rather than the object-store in the tree where the
> command was run... then you could easily have a bunch of trees using a
> central object store without needing to update the central store
> occasionally by hand (and do gc in its "clients")...

This sounds like a nice way forward: replace/extend the current
alternates system with support for a shared object store that is
"intelligently" shared so that it can be gc:d based on all refs from
all referring repositories. I imagine it would be something very much
like a bare repository - except it wouldn't have any refs of its own,
just a list of other repositories it should search for refs when
GC:ing.

The object store currently built into each git repository could even
become a special case of that: a shared object store (that happens to
reside under .git) with a single referring repository (the parent .git
dir). If the location of the object store is configurable, clone
--reference could simply point the new repository directly to the
shared store instead of ever setting up a local object store.

// Simon

^ permalink raw reply

* Re: git behaviour question regarding SHA-1 and commits
From: Johannes Sixt @ 2011-11-14  7:39 UTC (permalink / raw)
  To: vinassa vinassa; +Cc: git
In-Reply-To: <CAJuRt+r9BjYcead6hgzdUT0Bisz1D48cegqkoJ0S537VMYBy_g@mail.gmail.com>

Am 11/13/2011 18:04, schrieb vinassa vinassa:
> I am wondering about how git behaves currently, if I kinda win the
> lottery of the universe, and happen to create a commit with a SHA-1
> that is already the SHA-1 of another commit in the previous history.
> However improbable.
> 
> Would that be detected, so that I could just add a newline, and then
> commit with a different resulting SHA-1,
> would I just lose one of those commits (hopefully the new one), would
> I end up with a corrupted repository?

I *think* the following would happen:

1. Git detects that the (commit) object that it is about to generate
already exists, and does not write a new one.

2. Then the branch's ref is updated to the SHA-1. Since the original
commit is somewhere back in history, this is effectively like 'git reset
--soft that-commit'.

3. At your next 'git diff --cached', you notice unexpected differences
between the index and the branch head. You will wonder what happened.
("Who typed 'git reset --soft that-commit' while I was looking the other
way??")

4. To recover, you just 'git reset --soft @{1}' to revert to the state
before the commit attempt, and commit again. Your commit message from the
first attempt will be lost unless you have used -C or -F for your commit.
At any rate, you can reuse the exact same commit message for this second
commit attempt, because by now time will have advanced by at least one
second, which gives you a different commit timestamp and, hence, a
different commit object.

-- Hannes

^ permalink raw reply

* Re: What's cooking in git.git (Nov 2011, #03; Sun, 13)
From: Johannes Sixt @ 2011-11-14  7:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Vincent van Ravesteijn, Ramsay Jones, msysGit
In-Reply-To: <7vmxbzl5ch.fsf@alter.siamese.dyndns.org>

IMO, these two topics can move forward:

> * vr/msvc (2011-10-31) 3 commits
>  - MSVC: Remove unneeded header stubs
>  - Compile fix for MSVC: Include <io.h>
>  - Compile fix for MSVC: Do not include sys/resources.h
> 
> It seems this needs to be rehashed with msysgit folks.

With these patches, git can be built with MSVC. The result is usable,
although a few tests still fail.

> * na/strtoimax (2011-11-05) 3 commits
>  - Support sizes >=2G in various config options accepting 'g' sizes.
>  - Compatibility: declare strtoimax() under NO_STRTOUMAX
>  - Add strtoimax() compatibility function.
> 
> It seems this needs to be rehashed with msysgit folks.

There were a few curiosities around strtoimax being present in MinGW or
not, but these have been resolved. Also, whether or not we should define
NO_STRTOUMAX for the MinGW build is an independent matter.

-- Hannes

^ permalink raw reply

* Re: [RFC] deprecating and eventually removing "git relink"?
From: Junio C Hamano @ 2011-11-14  6:27 UTC (permalink / raw)
  To: Miles Bader; +Cc: git
In-Reply-To: <buomxbzutjm.fsf@dhlpc061.dev.necel.com>

Miles Bader <miles@gnu.org> writes:

> Do you mean a more elaborate UI that does this nicely...?

Yes, that is what I meant. I also have a feeling that people would prefer
to have an option that treats these two repositories equally; your
illustration makes one a subordinate to the other.

^ permalink raw reply

* Re: [RFC] deprecating and eventually removing "git relink"?
From: Miles Bader @ 2011-11-14  6:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v4ny7mtbx.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:
>  (2) allowing two repositories that started independently to share objects
>      using the alternates mechanism after the fact.

Can they not already?

I mean, it works great right now to do:

  cd $REP2
  echo $REP1/.git/objects > .git/objects/info/alternates
  git gc

Do you mean a more elaborate UI that does this nicely...? or something
else?

It might be nice to have a mechanism where new objects would update
the _alternate_ rather than the object-store in the tree where the
command was run... then you could easily have a bunch of trees using a
central object store without needing to update the central store
occasionally by hand (and do gc in its "clients")...

-Miles

-- 
"Most attacks seem to take place at night, during a rainstorm, uphill,
 where four map sheets join."   -- Anon. British Officer in WW I

^ permalink raw reply

* Re: [PATCH 3/4] pack-objects: don't traverse objects unnecessarily
From: Junio C Hamano @ 2011-11-14  5:40 UTC (permalink / raw)
  To: Dan McGee; +Cc: GIT Mailing-list
In-Reply-To: <CAEik5nPJ3r6gp9Lttzh5aQmiPRFxpZvhTBXZoreY98QV6Cocdg@mail.gmail.com>

Dan McGee <dpmcgee@gmail.com> writes:

>>> unable to figure out how you generated those numbers so I wasn't able
>>> to do so (and had planned to get back to you to find out how you made
>>> those tables). Were you able to verify the ordering did not regress?
>>
>> No; I was hoping you would redo the benchmark using 5f44324 (core: log
>> offset pack data accesses happened, 2011-07-06).
>
> I'm still not sure what you used to parse these results,...

Ah, in the kernel repository, after running "repack -a -d -f" with
versions of git and copying the resulting packfiles in PACK-OLD/ and
PACK-NEW/, I used these scripts to examine the access pattern.

-- >8 -- DOIT.sh -- >8 --
#!/bin/sh

tmp=/var/tmp/ll$
trap 'rm -f "$tmp.*"' 0

ln -f PACK-OLD/* .git/objects/pack/. || exit
log="$tmp.old"
eval '/usr/bin/time rungit test -c core.logpackaccess="$log" '"$*"

ln -f PACK-NEW/* .git/objects/pack/. || exit
log="$tmp.new"
eval '/usr/bin/time rungit test -c core.logpackaccess="$log" '"$*"

perl OFS.perl "$tmp.old" "$tmp.new"
-- 8< -- DOIT.sh -- 8< --

-- >8 -- OFS.perl -- >8 --
#!/usr/bin/perl

use strict;
use warnings;
use Getopt::Long;

my $verbose;

exit(1) if (!GetOptions("verbose" => \$verbose));

sub take_one {
	my ($filename) = @_;
	my (%lofs, $num);
	my @diff;
	open my $in, '<', $filename;

	$num = 0;
	while (<$in>) {
		my ($file, $ofs) = split(' ');
		if (!exists $lofs{$file}) {
			$lofs{$file} = [$num++, 0];
		}
		my $diff = $ofs - $lofs{$file}[1];
		$lofs{$file}[1] = $ofs;
		push @diff, abs($diff);
		print "$lofs{$file}[0] $diff $ofs\n" if $verbose;
	}
	return \@diff;
}

sub bsearch {
	my ($list, $target) = @_;
	my ($hi, $lo) = ((scalar @$list), 0);

	while ($lo < $hi) {
		my $mi = int(($lo + $hi) / 2);
		if ($list->[$mi] == $target) {
			return $mi;
		} elsif ($list->[$mi] < $target) {
			$lo = $mi + 1;
		} else {
			$hi = $mi;
		}
	}
	return $hi;
}

my @percentile = ();
for (my $i = 0; $i < 100; $i += 10) {
	push @percentile, $i;
}
push @percentile, 95, 99, 99.9, 99.99;

sub thcomma {
	my ($intval) = @_;
	my $result = "";
	while ($intval > 1000) {
		my $rem = $intval % 1000;
		if ($result ne "") {
			$result = sprintf "%03d,%s", $rem, $result;
		} else {
			$result = sprintf "%03d", $rem;
		}
		$intval -= $rem;
		$intval /= 1000;
	}
	if ($intval) {
		if ($result ne "") {
			$result = sprintf "%d,%s", $intval, $result;
		} else {
			$result = sprintf "%d", $intval;
		}
	}
	$result =~ s/^[0,]*//;
	$result = "0" if ($result eq "");
	return $result;
}

sub show_stat {
	my ($diff1, $diff2) = @_;
	my ($i, $ix);
	if ($diff2) {
		@$diff2 = sort { $a <=> $b } @$diff2;
	}
	@$diff1 = sort { $a <=> $b } @$diff1;
	printf "\nTotal number of access : %12s", thcomma(scalar(@$diff1));
	printf "%12s", thcomma(scalar(@$diff2)) if ($diff2);
	for $i (@percentile) {
		$ix = scalar(@$diff1) * $i / 100;
		printf "\n     %5.2f%% percentile : %12s", $i, thcomma($diff1->[$ix]);
		if ($diff2) {
			$ix = scalar(@$diff2) * $i / 100;
			printf "%12s", thcomma($diff2->[$ix]);
		}
	}

	$ix = bsearch($diff1, 2 * 1024 * 1024);
	printf "\n   Less than 2MiB seek :       %5.2f%%", ($ix * 100.0 / @$diff1);
	if ($diff2) {
		$ix = bsearch($diff2, 2 * 1024 * 1024);
		printf "      %5.2f%%", ($ix * 100.0 / @$diff2);
	}

	print "\n";
}

my ($diff1, $diff2);
$diff1 = take_one($ARGV[0]);
$diff2 = take_one($ARGV[1]) if ($ARGV[1]);

show_stat($diff1, $diff2);
-- 8< -- OFS.perl -- 8< --


	

^ permalink raw reply

* Re: [PATCH 2/2] Copy resolve_ref() return value for longer use
From: Junio C Hamano @ 2011-11-14  4:03 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git
In-Reply-To: <CACsJy8BnqoPVJiM6mbq7p3gKtLh-KGUuTshcukGokC3istTxMQ@mail.gmail.com>

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

>> Now reference_name stores the result of xstrdup(), it does not have reason
>> to be of type "const char *". It is preferable to lose the cast here, I
>> think. The same comment applies to the remainder of the patch.
>
> But resolve_ref() returns "const char *", we need to type cast at
> least once, either at resolve_ref() assignment or at free(), until we
> change resolve_ref().

In any case, I do not think it matters either way, so I queued this patch
to 'pu' unmodified.

This patch uses xstrdup() on return value of resolve_ref() only at
hand-picked places; while the choice of the places the patch decided not
to call free() looked reasonable from a quick review, both of us may be
blind and it may introduce huge leaks in a repeatedly called function.

A more extensive patch that would turn resolve_ref() to return an
allocated piece of memory share the same risk of adding new leaks at the
callsites, and this late in the cycle, neither will be in 1.7.8 anyway.

Given that if we build the more extensive patch on top of this one after
1.7.8, it will need to undo xstrdup() in this patch, add free()s that
become necessary, in addition to having to add free()s that this patch
might have potentially forgot, I have a feeling that we should just drop
this [2/2] and do a more thorough fix after 1.7.8 release is done
immediately on top of [1/2].

Thanks.

^ permalink raw reply

* What's cooking in git.git (Nov 2011, #03; Sun, 13)
From: Junio C Hamano @ 2011-11-14  4:01 UTC (permalink / raw)
  To: git

What's cooking in git.git (Nov 2011, #03; Sun, 13)
--------------------------------------------------

Here are the topics that have been cooking.  Commits prefixed with '-' are
only in 'pu' (proposed updates) while commits prefixed with '+' are in 'next'.

Here are the repositories that have my integration branches:

With maint, master, next, pu, todo:

        git://git.kernel.org/pub/scm/git/git.git
        git://repo.or.cz/alt-git.git
        https://code.google.com/p/git-core/
        https://github.com/git/git

With only maint and master:

        git://git.sourceforge.jp/gitroot/git-core/git.git
        git://git-core.git.sourceforge.net/gitroot/git-core/git-core

With all the topics and integration branches:

        https://github.com/gitster/git

The preformatted documentation in HTML and man format are found in:

        git://git.kernel.org/pub/scm/git/git-{htmldocs,manpages}.git/
        git://repo.or.cz/git-{htmldocs,manpages}.git/
        https://code.google.com/p/git-{htmldocs,manpages}.git/
        https://github.com/gitster/git-{htmldocs,manpages}.git/

--------------------------------------------------
[New Topics]

* jc/commit-tree-extra (2011-11-12) 2 commits
 - commit-tree: teach -C <extra-commit>
 - commit-tree: teach -x <extra>
 (this branch uses jc/pull-signed-tag; is tangled with jc/signed-commit.)

* nd/resolve-ref (2011-11-13) 2 commits
 - Copy resolve_ref() return value for longer use
 - Convert many resolve_ref() calls to read_ref*() and ref_exists()

--------------------------------------------------
[Graduated to "master"]

* ab/i18n-test-fix (2011-11-05) 2 commits
  (merged to 'next' on 2011-11-06 at f1de9a6)
 + t/t7508-status.sh: use test_i18ncmp
 + t/t6030-bisect-porcelain.sh: use test_i18ngrep

* fc/remote-seturl-usage-fix (2011-11-06) 1 commit
  (merged to 'next' on 2011-11-06 at 6c8328c)
 + remote: fix remote set-url usage

* jc/remote-setbranches-usage-fix (2011-11-06) 1 commit
  (merged to 'next' on 2011-11-06 at 017606d)
 + remote: fix set-branches usage

* pw/p4-appledouble-fix (2011-11-05) 1 commit
  (merged to 'next' on 2011-11-06 at 2ec0af3)
 + git-p4: ignore apple filetype

Regression fix for the upcoming release.

* sn/http-auth-with-netrc-fix (2011-11-04) 1 commit
  (merged to 'next' on 2011-11-06 at 60b7f96)
 + http: don't always prompt for password

Regression fix for the upcoming release.

--------------------------------------------------
[Stalled]

* hv/submodule-merge-search (2011-10-13) 4 commits
 - submodule.c: make two functions static
 - allow multiple calls to submodule merge search for the same path
 - push: Don't push a repository with unpushed submodules
 - push: teach --recurse-submodules the on-demand option

What the topic aims to achieve may make sense, but the implementation
looked somewhat suboptimal.

* sr/transport-helper-fix-rfc (2011-07-19) 2 commits
 - t5800: point out that deleting branches does not work
 - t5800: document inability to push new branch with old content

See comments on sr/fix-fast-export-tips topic.

* sr/fix-fast-export-tips (2011-11-05) 3 commits
 - fast-export: output reset command for commandline revs
 - fast-export: do not refer to non-existing marks
 - t9350: point out that refs are not updated correctly

The bottom commit from the stalled sr/transport-helper-fix-rfc topic is
fixed with this. It may make sense to drop the other topic and include
that commit in this series.

The command line parser is still too lax and accepts malformed input, but
this is a step in the right direction and tightening the command line now
should be doable without a low level surgery that touches codepaths that
are unrelated to the command line processing like the previous attempt
used to do.

* jc/lookup-object-hash (2011-08-11) 6 commits
 - object hash: replace linear probing with 4-way cuckoo hashing
 - object hash: we know the table size is a power of two
 - object hash: next_size() helper for readability
 - pack-objects --count-only
 - object.c: remove duplicated code for object hashing
 - object.c: code movement for readability

I do not think there is anything fundamentally wrong with this series, but
the risk of breakage far outweighs observed performance gain in one
particular workload.

* jc/verbose-checkout (2011-10-16) 2 commits
 - checkout -v: give full status output after switching branches
 - checkout: move the local changes report to the end

This is just to leave a record that the reason why we do not do this not
because we are incapable of coding this, but because it is not a good idea
to do this. I suspect people who are new to git that might think they need
it would soon realize the don't.

Will keep in 'pu' as a showcase for a while and then will drop.

* eh/grep-scale-to-cpunum (2011-11-05) 1 commit
 - grep: detect number of CPUs for thread spawning

Kills I/O parallelism and needs to be improved or discarded.

* vr/msvc (2011-10-31) 3 commits
 - MSVC: Remove unneeded header stubs
 - Compile fix for MSVC: Include <io.h>
 - Compile fix for MSVC: Do not include sys/resources.h

It seems this needs to be rehashed with msysgit folks.

* na/strtoimax (2011-11-05) 3 commits
 - Support sizes >=2G in various config options accepting 'g' sizes.
 - Compatibility: declare strtoimax() under NO_STRTOUMAX
 - Add strtoimax() compatibility function.

It seems this needs to be rehashed with msysgit folks.

--------------------------------------------------
[Cooking]

* jc/signed-commit (2011-11-12) 4 commits
 - pretty: %G[?GS] placeholders
 - test "commit -S" and "log --show-signature"
 - log: --show-signature
 - commit: teach --gpg-sign option
 (this branch uses jc/pull-signed-tag; is tangled with jc/commit-tree-extra.)

Rebased on top of jc/pull-signed-tag topic, after reverting the old one
out of 'next'.

* jc/pull-signed-tag (2011-11-12) 15 commits
 - commit-tree: teach -m/-F options to read logs from elsewhere
 - commit-tree: update the command line parsing
 - commit: teach --amend to carry forward extra headers
 - merge: force edit and no-ff mode when merging a tag object
 - commit: copy merged signed tags to headers of merge commit
 - merge: record tag objects without peeling in MERGE_HEAD
 - merge: make usage of commit->util more extensible
 - fmt-merge-msg: Add contents of merged tag in the merge message
 - fmt-merge-msg: package options into a structure
 - fmt-merge-msg: avoid early returns
 - refs DWIMmery: use the same rule for both "git fetch" and others
 - fetch: allow "git fetch $there v1.0" to fetch a tag
 - merge: notice local merging of tags and keep it unwrapped
 - fetch: do not store peeled tag object names in FETCH_HEAD
 - Split GPG interface into its own helper library
 (this branch is used by jc/commit-tree-extra and jc/signed-commit.)

Further updated to allow "commit --amend" to retain the mergetag
headers. I think this is ready for the cycle after upcoming 1.7.8.

* ab/clang-lints (2011-11-06) 2 commits
  (merged to 'next' on 2011-11-13 at a573aec)
 + cast variable in call to free() in builtin/diff.c and submodule.c
 + apply: get rid of useless x < 0 comparison on a size_t type

Will keep in 'next' during this cycle.

* ab/pull-rebase-config (2011-11-07) 1 commit
  (merged to 'next' on 2011-11-13 at 72bb2d5)
 + pull: introduce a pull.rebase option to enable --rebase

Will keep in 'next' during this cycle.

* nd/fsck-progress (2011-11-06) 4 commits
  (merged to 'next' on 2011-11-13 at 8831811)
 + fsck: print progress
 + fsck: avoid reading every object twice
 + verify_packfile(): check as many object as possible in a pack
 + fsck: return error code when verify_pack() goes wrong

Will keep in 'next' during this cycle.

* nd/prune-progress (2011-11-07) 3 commits
  (merged to 'next' on 2011-11-13 at c5722ac)
 + reachable: per-object progress
 + prune: handle --progress/no-progress
 + prune: show progress while marking reachable objects

Will keep in 'next' during this cycle.

* jc/stream-to-pack (2011-11-03) 4 commits
 - Bulk check-in
 - finish_tmp_packfile(): a helper function
 - create_tmp_packfile(): a helper function
 - write_pack_header(): a helper function

Teaches "git add" to send large-ish blob data straight to a packfile.
This is a continuation to the "large file support" topic. I think this
codepath to move data from worktree to repository needs to become aware of
streaming, just like the checkout codepath that goes the other way, which
was done in the previous "large file support" topic in the 1.7.7 cycle.

* jn/gitweb-side-by-side-diff (2011-10-31) 8 commits
 - gitweb: Add navigation to select side-by-side diff
 - gitweb: Use href(-replay=>1,...) for formats links in "commitdiff"
 - t9500: Add basic sanity tests for side-by-side diff in gitweb
 - t9500: Add test for handling incomplete lines in diff by gitweb
 - gitweb: Give side-by-side diff extra CSS styling
 - gitweb: Add a feature to show side-by-side diff
 - gitweb: Extract formatting of diff chunk header
 - gitweb: Refactor diff body line classification

Replaces a series from Kato Kazuyoshi on the same topic.

* mf/curl-select-fdset (2011-11-04) 4 commits
  (merged to 'next' on 2011-11-06 at a49516f)
 + http: drop "local" member from request struct
 + http.c: Rely on select instead of tracking whether data was received
 + http.c: Use timeout suggested by curl instead of fixed 50ms timeout
 + http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping

Reduces unnecessary waits.

* nd/misc-cleanups (2011-10-27) 6 commits
  (merged to 'next' on 2011-10-28 at 2527a49)
 + unpack_object_header_buffer(): clear the size field upon error
 + tree_entry_interesting: make use of local pointer "item"
 + tree_entry_interesting(): give meaningful names to return values
 + read_directory_recursive: reduce one indentation level
 + get_tree_entry(): do not call find_tree_entry() on an empty tree
 + tree-walk.c: do not leak internal structure in tree_entry_len()

These are unquestionably good parts taken out of a larger series, so that
we can focus more on the other changes in later rounds of review.

Will keep in 'next' during this cycle.

* rs/allocate-cache-entry-individually (2011-10-26) 2 commits
  (merged to 'next' on 2011-10-27 at 2e4acd6)
 + cache.h: put single NUL at end of struct cache_entry
 + read-cache.c: allocate index entries individually

Will keep in 'next' during this cycle.

* mh/ref-api-3 (2011-10-19) 11 commits
  (merged to 'next' on 2011-10-23 at 92e2d35)
 + is_refname_available(): reimplement using do_for_each_ref_in_array()
 + names_conflict(): simplify implementation
 + names_conflict(): new function, extracted from is_refname_available()
 + repack_without_ref(): reimplement using do_for_each_ref_in_array()
 + do_for_each_ref_in_array(): new function
 + do_for_each_ref(): correctly terminate while processesing extra_refs
 + add_ref(): take a (struct ref_entry *) parameter
 + create_ref_entry(): extract function from add_ref()
 + parse_ref_line(): add a check that the refname is properly formatted
 + repack_without_ref(): remove temporary
 + Rename another local variable name -> refname
 (this branch uses mh/ref-api-2.)

Will keep in 'next' during this cycle.

* rr/revert-cherry-pick (2011-10-23) 5 commits
  (merged to 'next' on 2011-10-26 at 27b7496)
 + revert: simplify communicating command-line arguments
 + revert: allow mixed pick and revert instructions
 + revert: make commit subjects in insn sheet optional
 + revert: simplify getting commit subject in format_todo()
 + revert: free msg in format_todo()

The internals of "git revert/cherry-pick" has been further refactored to
serve as the basis for the sequencer.

Will keep in 'next' during this cycle.

* cb/daemon-permission-errors (2011-10-17) 2 commits
 - daemon: report permission denied error to clients
 - daemon: add tests

The tip commit might be loosening things a bit too much.
Will keep in 'pu' until hearing a convincing argument for the patch.

* mh/ref-api-2 (2011-10-17) 14 commits
  (merged to 'next' on 2011-10-19 at cc89f0e)
 + resolve_gitlink_ref_recursive(): change to work with struct ref_cache
 + Pass a (ref_cache *) to the resolve_gitlink_*() helper functions
 + resolve_gitlink_ref(): improve docstring
 + get_ref_dir(): change signature
 + refs: change signatures of get_packed_refs() and get_loose_refs()
 + is_dup_ref(): extract function from sort_ref_array()
 + add_ref(): add docstring
 + parse_ref_line(): add docstring
 + is_refname_available(): remove the "quiet" argument
 + clear_ref_array(): rename from free_ref_array()
 + refs: rename parameters result -> sha1
 + refs: rename "refname" variables
 + struct ref_entry: document name member
 + cache.h: add comments for git_path() and git_path_submodule()
 (this branch is used by mh/ref-api-3.)

Will keep in 'next' during this cycle.

* sg/complete-refs (2011-10-21) 9 commits
  (merged to 'next' on 2011-10-26 at d65e2b4)
 + completion: remove broken dead code from __git_heads() and __git_tags()
 + completion: fast initial completion for config 'remote.*.fetch' value
 + completion: improve ls-remote output filtering in __git_refs_remotes()
 + completion: query only refs/heads/ in __git_refs_remotes()
 + completion: support full refs from remote repositories
 + completion: improve ls-remote output filtering in __git_refs()
 + completion: make refs completion consistent for local and remote repos
 + completion: optimize refs completion
 + completion: document __gitcomp()

Will keep in 'next' until an Ack or two from completion folks.

* jc/request-pull-show-head-4 (2011-11-09) 12 commits
  (merged to 'next' on 2011-11-13 at e473fd2)
 + request-pull: use the annotated tag contents
  (merged to 'next' on 2011-10-15 at 7e340ff)
 + fmt-merge-msg.c: Fix an "dubious one-bit signed bitfield" sparse error
  (merged to 'next' on 2011-10-10 at 092175e)
 + environment.c: Fix an sparse "symbol not declared" warning
 + builtin/log.c: Fix an "Using plain integer as NULL pointer" warning
  (merged to 'next' on 2011-10-07 at fcaeca0)
 + fmt-merge-msg: use branch.$name.description
  (merged to 'next' on 2011-10-06 at fa5e0fe)
 + request-pull: use the branch description
 + request-pull: state what commit to expect
 + request-pull: modernize style
 + branch: teach --edit-description option
 + format-patch: use branch description in cover letter
 + branch: add read_branch_desc() helper function
 + Merge branch 'bk/ancestry-path' into jc/branch-desc

Allow setting "description" for branches and use it to help communications
between humans in various workflow elements. It also allows requesting for
a signed tag to be pulled and shows the tag message in the generated message.

Will keep in 'next' during this cycle.

--------------------------------------------------
[Discarded]

* kk/gitweb-side-by-side-diff (2011-10-17) 2 commits
 . gitweb: add a feature to show side-by-side diff
 . gitweb: change format_diff_line() to remove leading SP from $diff_class

* jc/check-ref-format-fixup (2011-10-19) 2 commits
  (merged to 'next' on 2011-10-19 at 98981be)
 + Revert "Restrict ref-like names immediately below $GIT_DIR"
  (merged to 'next' on 2011-10-15 at 8e89bc5)
 + Restrict ref-like names immediately below $GIT_DIR

This became a no-op except for the bottom one which is part of the other
topic now.

^ permalink raw reply

* Re: [PATCH 2/2] Copy resolve_ref() return value for longer use
From: Nguyen Thai Ngoc Duy @ 2011-11-14  3:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vbosfoiuy.fsf@alter.siamese.dyndns.org>

2011/11/14 Junio C Hamano <gitster@pobox.com>:
>> diff --git a/builtin/branch.c b/builtin/branch.c
>> index 0fe9c4d..5b6d839 100644
>> --- a/builtin/branch.c
>> +++ b/builtin/branch.c
>> @@ -115,8 +115,10 @@ static int branch_merged(int kind, const char *name,
>>                   branch->merge[0] &&
>>                   branch->merge[0]->dst &&
>>                   (reference_name =
>> -                  resolve_ref(branch->merge[0]->dst, sha1, 1, NULL)) != NULL)
>> +                  resolve_ref(branch->merge[0]->dst, sha1, 1, NULL)) != NULL) {
>> +                     reference_name = xstrdup(reference_name);
>>                       reference_rev = lookup_commit_reference(sha1);
>> +             }
>>       }
>>       if (!reference_rev)
>>               reference_rev = head_rev;
>> @@ -141,6 +143,7 @@ static int branch_merged(int kind, const char *name,
>>                               "         '%s', even though it is merged to HEAD."),
>>                               name, reference_name);
>>       }
>> +     free((char*)reference_name);
>>       return merged;
>>  }
>
> Now reference_name stores the result of xstrdup(), it does not have reason
> to be of type "const char *". It is preferable to lose the cast here, I
> think. The same comment applies to the remainder of the patch.

But resolve_ref() returns "const char *", we need to type cast at
least once, either at resolve_ref() assignment or at free(), until we
change resolve_ref(). Or should we change resolve_ref() to return
"char *" now?
-- 
Duy

^ permalink raw reply

* Re: git behaviour question regarding SHA-1 and commits
From: Junio C Hamano @ 2011-11-14  3:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: vinassa vinassa, git
In-Reply-To: <CACBZZX7VTdc2wHYHb1BB-wCJbKLVEmbzQaBTV04S1KDrqeN73A@mail.gmail.com>

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> This is not something you have to worry about, just get on with using
> Git and stop worrying about phenomenally unlikely edge cases that are
> never going to happen.

People who repeated answers along this line, you can stop. The message has
been heard, but without answering the original question.

When we create a new object (i.e. "git add" to register a new blob
contents, "git commit" that internally generates new tree objects to
record updated "whole contents" and then records the commit object), we
first compute what the object name of the new object would be, and then
check if we already have an object with the same object name in the object
store. If we do, we do not write the new copy of the object out (see the
function write_sha1_file() in sha1_file.c and the call to has_sha1_file()
that bypasses write_loose_object()).

So the old contents will be kept without getting overwritten.

Which sounds nice, but it has interesting consequences, as we do not
bother running byte-for-byte comparison when we find what we tried to
write already existed in the object store in order to error out in fear of
the miniscule chance that we would hit a SHA-1 collision.

If the collision is between commit objects, for example, we would write
the (old) commit object name to the tip of the current branch. Most
likely, the tree object recorded in the (old) commit would not match the
tree object your "git commit" wanted to record (otherwise you have hit
SHA-1 collision twice in a row ;-), which would mean "git status" would
show that a whole bunch of paths have changed between the HEAD and the
index. Also "git log" would show the history leading to the (old) commit
that is likely to be very different from what you would expect immediately
after committing the collided commit. Of course, you could recover from it
with "git reset --soft" after finding out what the previous HEAD was from
the reflog, but it won't be a pleasant experience.

There can be other kinds of collisions (e.g. your latest commit might have
collided with an existing blob or tree, in which case it is likely that
almost nothing would work after finding a blob or tree in HEAD).

^ permalink raw reply

* Re: Git:  CVS to Git import
From: Matthew Ogilvie @ 2011-11-14  2:44 UTC (permalink / raw)
  To: Jvsrvcs; +Cc: git
In-Reply-To: <1321053453892-6987037.post@n2.nabble.com>

On Fri, Nov 11, 2011 at 03:17:33PM -0800, Jvsrvcs wrote:
> Git:  CVS to Git import
> 
> We are moving from CVS to Git and want to know if anyone has had any
> experience there doing this and could share do's  / dont's, best practices
> when doing the initial import.

Some ideas:

I wouldn't trust "git cvsimport".  In my testing, it was actaully fairly
common for the resulting git tags and branches to be inconsistent with the
original CVS tags and branches: checking out a tag from CVS and the same
tag from GIT, the trees were often different.  See the manpage
for a list of some of the known issues. 

Use cvs2git instead.

Write up your own script to do the conversion.  Iteratively inspect
the results, find ways to fix up anything you don't like,
and re-run the script.  Any "fixups" you want should be
scripted, so that you can try different things, examine
the result.  Then when the actual "real" conversion
happens, you have a minimal amount of downtime as you your
already-tested script runs.

The exact fixups your script should do depend on your
circumstances, but in my case, some of things my script did included:

  - First, copy the CVS repository, and work with the copy:
  - Delete some ",v" files we didn't interested in importing into git for
    various reasons.
  - Tweak some CVS commit timestamps in some files (such as a version
    file), to reduce import odditities.  (The most common oddities
    resulted from an old CVS workflow that would often sequence:
    (a) checkout, (b) modify version number file, (c) build, (d) commit
    the new version number file, and (e) tag the sandbox.  It was
    was moderately common for other changes (in other files) to
    be committed between (a) and (d), which will either cause
    strange import artifacts or actually break import tools, due to
    the out-of-order timestamps.  Tweaking back the timestamp in the
    CVS file typically allows the import tool to avoid the
    oddity.  Completely cleaning this up would have been a
    lot of work, so I focused just on just improving recent
    history.)  (sed -i ...)
  - Do the bulk of the import work using cvs2git.
  - Graft on appropriate merge history (multiple parents) for
    CVS merges.  To save time, I only worried about recent merges.
     - If you have a nice consistent tag naming
       convention, there are ways to do this as part of cvs2git.
       Unfortunately, we didn't.
     - Do not refer to a previous run's commit SHA-1's; they'll
       likely change as things change.  Use CVS tags instead.
     - git rev-parse is useful for looking up current references
       to construct graft lines.
  - Use git filter-branch to both make the above grafts permanent,
    and to fix commiter/author username/email.
  - Move imported tags and branches to refs/oldcvstags/*
    and refs/oldcvsbranches/*, to bury a lot of the noise
    (automatic build tags, tags applied as part of doing a
    merge, etc) to where a normal "git clone" will not grab
    them, but they can still be fetched manually if necessary.
  - Copy/rename a few recent release tags and branches to
    normal refs/tags/* and refs/heads/*, when they are actually
    useful. (git pack-refs and sed)
  - Something like: sleep 5 ; git gc --aggressive --prune='1 second ago'

--
Matthew Ogilvie   [mmogilvi_git@miniinfo.net]

^ permalink raw reply

* [RFC] deprecating and eventually removing "git relink"?
From: Junio C Hamano @ 2011-11-14  0:38 UTC (permalink / raw)
  To: git

What do people think about the subject?  As to the timeframe I am thinking
about deprecation at v1.7.9 (late January 2012) and removal at v1.7.12
(early August 2012) [*1*].

The script more-or-less outlived its usefulness in July 2005 when
packfiles were introduced.

You are better off repacking the repositories in the first place before
accumulating so many loose object that linking the same ones between the
repositories would give you major saving.  It is theoretically possible
that two repositories happen to have the same pack and it would give you
saving to hardlink them together, but even in that case, the saving would
not survive repacking unless they are marked with a ".keep" marker, which
the script does not do anyway.

A more useful feature to attack a similar issue would be to make it easier
to use "--reference" aka "objects/info/alternates". Namely:

 (1) devise a way to make it safer by allowing the repository whose
     objects are borrowed by other repositories a way to protect objects
     that it does not need but may be needed by others from repacking; and

 (2) allowing two repositories that started independently to share objects
     using the alternates mechanism after the fact.

but that is a separate issue.

[Footnote]

*1* Both dates were derived from a mechanical "9-week per release cycle".

^ permalink raw reply

* Revisiting metadata storage
From: Richard Hartmann @ 2011-11-14  0:07 UTC (permalink / raw)
  To: Git List

Hi all,

every few months, a thread seems to pop up regarding metadata storage,
be it owner, mtime, xattr or what have you.

As of today there is (ttbomk) still no sane way to carry & restore
metadata across different repositories.

metastore[1] is closest to a working solution but its storage format
is binary and merge-unfriendly and it does not store mtime which is a
deal-breaker, for me.
I tried extend & fix metastore, but failed. Others looked at it,
deemed it possible and lost interest and/or their development box.

I know from Joey Hess [2] that the GitTogether 2011 saw some
discussion about metadata, but I was unable to find any follow-up to
this issue.

To make a long story short: Does anyone have a working solution,
today? If not, is anyone working on one? If not, is anyone interested
in working on one? And is there any follow-up to the GitTogether
discussion?

The feature set of any solution should probably include save, display,
diff, and apply on a per-metadata and per-file basis.

Thanks,
Richard

[1] http://david.hardeman.nu/software.php
[2] http://kitenet.net/~joey/blog/entry/GitTogether2011/

^ permalink raw reply

* Re: [PATCH 0/7] New sequencer workflow!
From: Junio C Hamano @ 2011-11-14  0:04 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Jonathan Nieder, Christian Couder
In-Reply-To: <7v39droi63.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> Almost. If these are replaced with "git cherry-pick --continue" and "git
> revert --continue" that internally triggers "git sequencer --continue"
> *and* errors out if a wrong command was used to "--continue", it would be
> perfect.
>
> The longer-term ultimate goal would be to make it "git continue" that
> covers more than the sequencer based workflow, i.e. allow "git merge" that
> conflicted to be concluded with "edit && git add && git continue".

To clarify a bit.

While I do not mind "sequencer --continue" as a step to get us closer to a
more pleasant user experience at the implementation level, exposing the
name "sequencer" or having the user to type "sequencer --continue" is going
in a very wrong direction at the UI level.

There are many commands in the Git suite that take an order from the user,
attempt to perform the necessary operation but stops in the middle to ask
for help from the user. "cherry-pick" and "revert" are two of them, and
there are many others: e.g. am, merge, pull, rebase, "rebase -i". They use
different mechanisms to keep track of their states before giving the
control back to the user when they ask for help.

The original workflow was "pull; edit && add && commit", and it is very
unlikely this will change while we are still in Git 1.X series. The
original single commit variants of "cherry-pick" and "revert" are in the
same category. We would need to keep supporting "cherry-pick/revert; edit
&& add && commit" as a workflow for a while.

Others that deal with more than one stop-point follow a different pattern
from necessity. The user tells the same command to continue after the
command asks for help in "am; edit && add && am --continue" and "rebase
[-i]; edit && add && rebase --continue" sequence. Multi-commit variants of
"cherry-pick" and "revert" take the same approach for consistency.

In the shorter term, a person new to Git, after learning "run command X, X
gives back control asking for help, help X by editing and adding, and
telling X to continue" pattern from these commands, will eventually find
that the commands in single stop-point category (i.e. "pull", "merge" and
single-commit "cherry-pick" and "revert") inconsistent from others that
take "--continue" to resume. For this reason, "git cherry-pick --continue"
that would work even when picking a single commit like this would be
beneficial:

    $ git cherry-pick X
    ... conflicts
    $ edit && add
    $ git cherry-pick --continue

That is, no "commit" by the user. The "helping" part is literally that;
the user helps Git because Git is too stupid to read the user's mind to
come up with a perfect conflict resolution. Git knows, given a correct
resolution, how to make a commit out of it perfectly fine and does not
need help from the user to commit the result.

In the medium term, extending the above "shorter term" goal, it would make
sense to support (but not necessarily require) the following flow for
consistency:

    $ git pull ;# or "git merge branch"
    ... conflicts
    $ edit && add ;# again, no "commit"
    $ git pull --continue ;# or "git merge --continue"

Now, if you rename "cherry-pick --continue" and "revert --continue" to
"sequencer --continue", what message are you sending to the users? They
now need to learn that only these two commands are based on something
called "sequencer", can be resumed with "sequencer --continue", but other
commands need to be continued with "X --continue".

That is totally backwards, no? You are _adding_ mental burden to your
users by introducing another thing or two they need to learn about.

In the longer term (now we are talking about Git 2.X version bump), it
would be ideal if all the "git X --continue" can be consolidated to a
single "git continue" command (and "git abort" to give up the sequence
of operations).

Given that bash-completion script can tell in what state the repository
is, I think this is doable. "git continue" may invoke your implementation
of "git sequencer --continue" internally when it detects that the state is
something the "sequencer" machinery knows how to continue, and if it is in
the middle of conflicted "am -3" or rejected "am", the command may invoke
"git am --continue" instead.

That way, the user does not have to learn which command can be resumed
with "sequencer --continue" and which other command needs to be called
with "--continue" to resume. The user does not even need to know there is
a mechanism called sequencer, some commands are already using it while
others are not yet using it, and these other commands are in the process
of being rewritten to use the sequencer machinery.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox