Git development

Git development
 help / color / mirror / Atom feed

* [PATCH] Fix incorrect ref namespace check
From: Nguyễn Thái Ngọc Duy @ 2012-01-05 12:35 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

The reason why the trailing slash is needed is obvious. refs/stash is
not a namespace, but a single ref. Do full string compare on it.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/fetch.c  |    2 +-
 builtin/remote.c |    2 +-
 log-tree.c       |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 33ad3aa..daa68d2 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -573,7 +573,7 @@ static void find_non_local_tags(struct transport *transport,
 
 	for_each_ref(add_existing, &existing_refs);
 	for (ref = transport_get_remote_refs(transport); ref; ref = ref->next) {
-		if (prefixcmp(ref->name, "refs/tags"))
+		if (prefixcmp(ref->name, "refs/tags/"))
 			continue;
 
 		/*
diff --git a/builtin/remote.c b/builtin/remote.c
index 583eec9..f54a89a 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -534,7 +534,7 @@ static int add_branch_for_removal(const char *refname,
 	}
 
 	/* don't delete non-remote-tracking refs */
-	if (prefixcmp(refname, "refs/remotes")) {
+	if (prefixcmp(refname, "refs/remotes/")) {
 		/* advise user how to delete local branches */
 		if (!prefixcmp(refname, "refs/heads/"))
 			string_list_append(branches->skipped,
diff --git a/log-tree.c b/log-tree.c
index 319bd31..9a88fcc 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -119,7 +119,7 @@ static int add_ref_decoration(const char *refname, const unsigned char *sha1, in
 		type = DECORATION_REF_REMOTE;
 	else if (!prefixcmp(refname, "refs/tags/"))
 		type = DECORATION_REF_TAG;
-	else if (!prefixcmp(refname, "refs/stash"))
+	else if (!strcmp(refname, "refs/stash"))
 		type = DECORATION_REF_STASH;
 	else if (!prefixcmp(refname, "HEAD"))
 		type = DECORATION_REF_HEAD;
-- 
1.7.8.36.g69ee2

^ permalink raw reply related

* RE: Re: checkout on an empty directory fails
From: René Doß @ 2012-01-05 12:38 UTC (permalink / raw)
  To: git
In-Reply-To: <CACsJy8A42n4t+WqGaTx7vDQ3jP_YkD1bB0WL9amrrg1B4eOx7w@mail.gmail.com>

  Thank you for your help. I can not understand what is the mistake?  
qgit displays me the tree correct.
git status says not special informations.
Only git  checkout .    works.

What means the point in checkout?

René

red@linux-nrd1:~/iso/a> git status
# On branch master
# Changed but not updated:
#   (use "git add/rm <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working 
directory)
#
#       deleted:    SP601_RevC_annotated_master_ucf_8-28-09.ucf
#       deleted:    rtl/ether_speed.vhd
#       deleted:    rtl/ether_top.vhd
#       deleted:    rtl/ether_tx.vhd
#       deleted:    rtl/takt.vhd
#       deleted:    sim/makefile
#       deleted:    sim/tb_ether_top.vhd
#
no changes added to commit (use "git add" and/or "git commit -a")

ed@linux-nrd1:~/iso/a> ls
red@linux-nrd1:~/iso/a> git reset --hard
HEAD is now at efb7b86 Simulation ergaenzt
red@linux-nrd1:~/iso/a> git checkout master
Already on 'master'
red@linux-nrd1:~/iso/a> git checkout .
red@linux-nrd1:~/iso/a> ls
rtl  sim  SP601_RevC_annotated_master_ucf_8-28-09.ucf <--here are the 
files back!!!!
red@linux-nrd1:~/iso/a>

^ permalink raw reply

* [PATCH] Fix incorrect ref namespace check
From: Nguyễn Thái Ngọc Duy @ 2012-01-05 12:39 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
In-Reply-To: <1325766924-14943-1-git-send-email-pclouds@gmail.com>

The reason why the trailing slash is needed is obvious. refs/stash and
HEAD are not namespace, but complete refs. Do full string compare on them.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 I missed prefixcmp(..., "HEAD") right below prefixcmp(..., "refs/stash")

 builtin/fetch.c  |    2 +-
 builtin/remote.c |    2 +-
 log-tree.c       |    4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 33ad3aa..daa68d2 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -573,7 +573,7 @@ static void find_non_local_tags(struct transport *transport,
 
 	for_each_ref(add_existing, &existing_refs);
 	for (ref = transport_get_remote_refs(transport); ref; ref = ref->next) {
-		if (prefixcmp(ref->name, "refs/tags"))
+		if (prefixcmp(ref->name, "refs/tags/"))
 			continue;
 
 		/*
diff --git a/builtin/remote.c b/builtin/remote.c
index 583eec9..f54a89a 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -534,7 +534,7 @@ static int add_branch_for_removal(const char *refname,
 	}
 
 	/* don't delete non-remote-tracking refs */
-	if (prefixcmp(refname, "refs/remotes")) {
+	if (prefixcmp(refname, "refs/remotes/")) {
 		/* advise user how to delete local branches */
 		if (!prefixcmp(refname, "refs/heads/"))
 			string_list_append(branches->skipped,
diff --git a/log-tree.c b/log-tree.c
index 319bd31..535b905 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -119,9 +119,9 @@ static int add_ref_decoration(const char *refname, const unsigned char *sha1, in
 		type = DECORATION_REF_REMOTE;
 	else if (!prefixcmp(refname, "refs/tags/"))
 		type = DECORATION_REF_TAG;
-	else if (!prefixcmp(refname, "refs/stash"))
+	else if (!strcmp(refname, "refs/stash"))
 		type = DECORATION_REF_STASH;
-	else if (!prefixcmp(refname, "HEAD"))
+	else if (!strcmp(refname, "HEAD"))
 		type = DECORATION_REF_HEAD;
 
 	if (!cb_data || *(int *)cb_data == DECORATE_SHORT_REFS)
-- 
1.7.8.36.g69ee2

^ permalink raw reply related

* [PATCH] clone: allow detached checkout when --branch takes a tag
From: Nguyễn Thái Ngọc Duy @ 2012-01-05 13:49 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

This allows you to do "git clone --branch=v1.7.8 git.git" and work
right away from there. No big deal, just one more convenient step, I
think. --branch taking a tag may be confusing though.

We can still have master in this case instead of detached HEAD, which
may make more sense because we use --branch. I don't care much which
way should be used.

Like? Dislike?

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/clone.c |   20 +++++++++++++++++++-
 1 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 8f29912..97af4bd 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -23,6 +23,7 @@
 #include "branch.h"
 #include "remote.h"
 #include "run-command.h"
+#include "tag.h"
 
 /*
  * Overall FIXMEs:
@@ -721,6 +722,14 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 			strbuf_release(&head);
 
 			if (!our_head_points_at) {
+				strbuf_addstr(&head, "refs/tags/");
+				strbuf_addstr(&head, option_branch);
+				our_head_points_at =
+					find_ref_by_name(mapped_refs, head.buf);
+				strbuf_release(&head);
+			}
+
+			if (!our_head_points_at) {
 				warning(_("Remote branch %s not found in "
 					"upstream %s, using HEAD instead"),
 					option_branch, option_origin);
@@ -750,7 +759,16 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 			      reflog_msg.buf);
 	}
 
-	if (our_head_points_at) {
+	if (our_head_points_at &&
+	    !prefixcmp(our_head_points_at->name, "refs/tags/")) {
+		const struct ref *ref = our_head_points_at;
+		struct object *o;
+
+		/* Detached HEAD */
+		o = deref_tag(parse_object(ref->old_sha1), NULL, 0);
+		update_ref(reflog_msg.buf, "HEAD", o->sha1, NULL,
+			   REF_NODEREF, DIE_ON_ERR);
+	} else if (our_head_points_at) {
 		/* Local default branch link */
 		create_symref("HEAD", our_head_points_at->name, NULL);
 		if (!option_bare) {
-- 
1.7.8.36.g69ee2

^ permalink raw reply related

* Re: checkout on an empty directory fails
From: Holger Hellmuth @ 2012-01-05 13:59 UTC (permalink / raw)
  To: René Doß; +Cc: git
In-Reply-To: <4F0599E0.7090902@gmx.de>

On 05.01.2012 13:38, René Doß wrote:
> git status says not special informations.

  versus

> red@linux-nrd1:~/iso/a> git status
> # On branch master
> # Changed but not updated:
> # (use "git add/rm <file>..." to update what will be committed)
> # (use "git checkout -- <file>..." to discard changes in working directory)
> #
> # deleted: SP601_RevC_annotated_master_ucf_8-28-09.ucf
> # deleted: rtl/ether_speed.vhd
> # deleted: rtl/ether_top.vhd
> # deleted: rtl/ether_tx.vhd
> # deleted: rtl/takt.vhd
> # deleted: sim/makefile
> # deleted: sim/tb_ether_top.vhd
> #

This *is* special information: It tells you that master has those 7 
files but your working directory has none of them (i.e. it is as if you 
had deleted them from your working directory).

"git checkout <branch>" switches between branches, *but* leaves changes 
you made (files you edited, added or deleted) intact! This is so you can 
switch branches before commiting if you suddenly realize you are in the 
wrong branch.

"git checkout -- <paths...>" or in your case "git checkout -- ." is 
different, it really overwrites the files in your working dir with the 
versions stored somewhere else, by default from the index.

 > What means the point in checkout?

"." is simply your current directory

^ permalink raw reply

* Re: [PATCH] clone: allow detached checkout when --branch takes a tag
From: Jeff King @ 2012-01-05 14:18 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git
In-Reply-To: <1325771380-18862-1-git-send-email-pclouds@gmail.com>

On Thu, Jan 05, 2012 at 08:49:40PM +0700, Nguyen Thai Ngoc Duy wrote:

> This allows you to do "git clone --branch=v1.7.8 git.git" and work
> right away from there. No big deal, just one more convenient step, I
> think. --branch taking a tag may be confusing though.
> 
> We can still have master in this case instead of detached HEAD, which
> may make more sense because we use --branch. I don't care much which
> way should be used.
> 
> Like? Dislike?

Seems like a reasonable goal to me. I agree that "--branch=v1.7.8" is a
little confusing, but not the end of the world. If we were designing it
from scratch, I might call it "--head" or "--checkout" or something to
indicate that it is what we are putting in HEAD. But I don't know that
it is worth renaming the option or adding a new option.

> @@ -721,6 +722,14 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  			strbuf_release(&head);
>  
>  			if (!our_head_points_at) {
> +				strbuf_addstr(&head, "refs/tags/");
> +				strbuf_addstr(&head, option_branch);
> +				our_head_points_at =
> +					find_ref_by_name(mapped_refs, head.buf);
> +				strbuf_release(&head);
> +			}
> +
> +			if (!our_head_points_at) {

Hmm. The context just above your patch that got snipped does this:

    strbuf_addstr(&head, src_ref_prefix);
    strbuf_addstr(&head, option_branch);
    our_head_points_at =
        find_ref_by_name(mapped_refs, head.buf);

where src_ref_prefix typically is "refs/heads/", and clearly you are
meaning to do the same thing for tags. But the use of "src_ref_prefix"
is interesting.

It is always "refs/heads/" unless we are cloning into a bare mirror, in
which case it is "refs/". So with your patch in the non-mirror case,
doing "--branch=foo" would try "refs/heads/foo" followed by
"refs/tags/foo". Which makes sense. But in the mirror case, it will try
"refs/foo" followed by "refs/tags/foo", which is kind of odd.

I wonder, though, if the original code makes any sense. By using
"refs/", I would have to say "--branch=heads/foo", which is kind of
weird and undocumented. I think it should probably always be
"refs/heads/", no matter if we are mirroring or not.

> @@ -750,7 +759,16 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  			      reflog_msg.buf);
>  	}
>  
> -	if (our_head_points_at) {
> +	if (our_head_points_at &&
> +	    !prefixcmp(our_head_points_at->name, "refs/tags/")) {

I think I would prefer this check to be:

  prefixcmp(our_head_points_at->name, "refs/heads/")

which more closely matches the rules for what is allowed to go in HEAD
as a symbolic ref. It's pretty hard to get something other than heads or
tags, but you can do it with "git clone --bare --mirror --branch=foo/bar".
I did argue above for doing away with that "feature", but I still think
it future-proofs this section of code to handle anything.

> +		const struct ref *ref = our_head_points_at;
> +		struct object *o;
> +
> +		/* Detached HEAD */
> +		o = deref_tag(parse_object(ref->old_sha1), NULL, 0);
> +		update_ref(reflog_msg.buf, "HEAD", o->sha1, NULL,
> +			   REF_NODEREF, DIE_ON_ERR);

It's unlikely, but deref_tag can return NULL, in which case this will
segfault (ditto with parse_object, I think). I suspect that is a problem
in lots of places, though. I wonder if deref_tag should simply die if we
have a missing object (and we can add a _gently form for things like
fsck which want to handle the error condition).

Also, any reason the "warn" flag to deref_tag should not be 1?

Other than those minor complaints, the patch looks good to me.

-Peff

^ permalink raw reply

* Re: [PATCH] Do not fetch tags on new shallow clones
From: Shawn Pearce @ 2012-01-05 15:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nguyễn Thái Ngọc, git
In-Reply-To: <7vfwfuofnk.fsf@alter.siamese.dyndns.org>

2012/1/4 Junio C Hamano <gitster@pobox.com>:
> Shawn Pearce <spearce@spearce.org> writes:
>> ... Its useful because cloning a branch immediately after it
>> has been tagged for a release should have `git describe` provide back
>> the name of the release from the tag (assuming of course no new
>> commits were made since the tag).
...
> ... this thing, once you have a "single ref only" stuff working.  After
> Linus announces that he released 3.2, you would do the poor emulation of
> "git archive | tar xf -" with something like:
>
>    git clone --single=v3.2 --shallow $there linux-3.2
>
> and your "git describe" should fall out as a natural consequence out of
> everything else, without the usual "tag following" semantics, no?

I said "branch" not "tag". Of course a single ref clone might be able
to be used on a tag.

If my project maintainer tags a release from "maint" and announces
that, I should also be able to shallow clone maint and pick up the tag
automatically if it is within the depth I have asked for from the
server.

Consider this case, a client clones shallow with a depth of 1. Then
does normal `git fetch` to keep up-to-date with the project. When the
project places a new tag on a branch, our shallow follower will
automatically get that tag during her next `git fetch`, because auto
following tags is enabled in fetch and the tag's referent was included
in the pack. Why is this case permitted to get a tag, and shallow
clone is not?

Actually, I think you might find that a shallow client with depth of 1
will automatically pick up a missing tag at the branch head on its
next `git fetch`. It will see the tag's ^{} line advertise an object
it has, and ask for the tag.

We really should support auto-following tags within the history space
the client already has. Its mostly done for us with the include-tag
capability, the client just needs to make sure it asks for it from the
server, and check to see if any tag reference points to an object it
has.

> you would do the poor emulation of
> "git archive | tar xf -" with something like:
>
>    git clone --single=v3.2 --shallow $there linux-3.2

Is it really that poor of an emulation? Like tar, we get only one copy
of each file (assuming depth 1). Assuming --format=tar.gz, both are
compressed. I wonder how much better or worse the Git cross-object
delta compression is than the libz rolling window. I could see how Git
might be able to compress something like C source code smaller than
tar | gzip by using delta compression on related files (e.g. Makefile
in every directory, or *.h and *.c files pairing by type). An added
advantage of the shallow clone is you can incrementally update that
stream, as its easy to fetch a v3.2.1 patch release, or apply a patch
and record it on top.

^ permalink raw reply

* Re: git-subtree
From: David A. Greene @ 2012-01-05 15:03 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: David Greene, git
In-Reply-To: <CALkWK0nU9iO_6CCbWw8c_Fz=xodkaAW4300Jpc7M7D+kBP=QRg@mail.gmail.com>

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Hi David,
>
> David Greene wrote:
>> I have a patch ready.
>> How does the git community want the patch presented?
>
> Please read and follow the guidelines listed in
> Documentation/SubmittingPatches.  The TL;DR version is: break it up
> into logical reviewable commits based on the current `master` and use
> git format-patch/ git send-email to send those commits to this mailing
> list.

I've read that document.  The issue is that I didn't develop the code,
Avery did.  This is a completely new tool for git and I don't have the
first idea of what "logical" chunks would look like.  I assume, for
example, that we'd want the first "chunk" to actually work and do
something interesting.  I can go spend a bunch of time to see if I can
grok enough to create these chunks but I wanted to check first and make
sure that would be absolutely necessary.  It's a lot of time to learn a
completely new codebase.  I was hoping to submit something soon and then
learn the codebase gradually during maintenance/further development.

How have completely new tools be introduced into the git mainline in the
past?

Thanks!

                              -Dave

^ permalink raw reply

* Re: How to deal with historic tar-balls
From: Neal Kreitzinger @ 2012-01-05 15:25 UTC (permalink / raw)
  To: nn6eumtr; +Cc: git
In-Reply-To: <4EFF5CDA.5050809@gmail.com>

On 12/31/2011 1:04 PM, nn6eumtr wrote:
> I have a number of older projects that I want to bring into a git
> repository. They predate a lot of the popular scm systems, so they
> are primarily a collection of tarballs today.
>
> I'm fairly new to git so I have a couple questions related to this:
>
> - What is the best approach for bringing them in? Do I just create a
>  repository, then unpack the files, commit them, clean out the
> directory unpack the next tarball, and repeat until everything is
> loaded?
>
> - Do I need to pay special attention to files that are
> renamed/removed from version to version?
>
> - If the timestamps change on a file but the actual content does not,
>  will git treat it as a non-change once it realizes the content
> hasn't changed?
>
> - Last, if after loading the repository I find another version of the
>  files that predates those I've loaded, or are intermediate between
> two commits I've already loaded, is there a way to go say that commit
> B is actually the ancestor of commit C? (i.e. a->c becomes a->b->c if
> you were to visualize the commit timeline or do diffs) Or do I just
> reload the tarballs in order to achieve this?
>
The git-rm manpage contains instructions under the "vendor code drop"
section on how to do this.  I imagine you will want to do each one
manually instead of queueing them up in a script because you are likely 
going to want to do appropriate clean up of the working tree in each 
iteration before committing.  This is where you would review 
renames/removes with git-status before you git-add and git-commit. 
Also, if you are tracking permissions in git (the executable bit) then 
you will want to filter out any noise generated by frivolous permissions 
changes between the tarball contents.

In regard to inserting tarballs into the history that depends on when 
you think you plan on doing that.  You are only going to be able to do 
that before the history is published (made "public" for other repos to 
pull down).  Otherwise you will be rewriting published history which is 
a big no-no (see git-rebase manpage).  I suggest you do your homework 
and order them properly before you start because that will be less work. 
  If you still find that you missed something then you can use 
interactive git-rebase to insert.  I'm assuming a single "master" branch 
with linear history is your desired end result.  If you want to create 
maintenance branches showing release history then you will definitely 
need to do your homework first (see gitworkflow manpage).

If you venture into rebase territory by rewriting history (inserting 
missed tarballs in between older commits) you will need to be sure to 
review your automatic merge resolutions.  Git only generates 
merge-conflicts on same-file-same-line conflicts.  It will auto-merge 
same-file-different-line changes.

You also need to ask yourself if you really need a history of all those 
versions.  To exaggerate, if all you really need is the current state 
then you need to ask yourself if it's worth the effort to record the 
previous states.  Maybe what you want is something in-between (a happy 
medium).

In regard to the 'start-over' method of inserting missed tarballs you 
would just git-reset --hard to the commit you want to insert on-top-of, 
add the tarball, and then re-apply the subsequent tarballs.  If you are 
doing cleanup between commits then the rebase or cherry-pick of the 
already cleaned-up subsequent commits from the "old-branch" (previous 
attempt) onto the 'do-over' branch will likely be easier.  (You can just 
do 'git branch old-branch' on your branch before the git-reset --hard 
(do-over) and that will give you a "backup copy" of the "previous 
attempt" called "old-branch" that you can salvage already-done-work from 
by using rebase or cherry-pick.)

Hope this helps.

v/r,
neal

^ permalink raw reply

* Re: git-subtree
From: Ramkumar Ramachandra @ 2012-01-05 15:32 UTC (permalink / raw)
  To: David A. Greene; +Cc: David Greene, git, Junio C Hamano
In-Reply-To: <87ipkq199w.fsf@smith.obbligato.org>

Hi again,

[+CC: Junio Hamano, our maintainer]

David A. Greene wrote:
> I've read that document.  The issue is that I didn't develop the code,
> Avery did.

Not an issue as long as you have Avery's signoff.

> It's a lot of time to learn a
> completely new codebase.  I was hoping to submit something soon and then
> learn the codebase gradually during maintenance/further development.

We certainly don't want badly reviewed code that nobody understands
floating around in the codebase- so, I'd suggest sending out whatever
you think is appropriate for the first round of reviews, and see how
things shape up from there.

> How have completely new tools be introduced into the git mainline in the
> past?

Yes.  For an example of something I was involved with but didn't
author, see vcs-svn/.

-- Ram

^ permalink raw reply

* 'fatal: Out of memory? mmap failed: No such device' using cifs
From: Bruno Bigras @ 2012-01-05 15:44 UTC (permalink / raw)
  To: git

Hi,

I got : 'fatal: Out of memory? mmap failed: No such device' when doing
'git init' in a directory on a mounted cifs share. Any ideas?

I'm using cifs with autofs, here's what I use :
win1
-fstype=smbfs,rw,credentials=/etc/smb.auth,gid=admin,file_mode=0777,dir_mode=0777,nocase,directio,sfu,iocharset=utf8
        ://10.1.1.8/DATA/

git version 1.7.8.2
2.6.32-37-generic-pae

$ mount
//10.1.1.8/DATA/ on /net/smb/win1 type cifs (rw,mand)

Thanks,

Bruno

^ permalink raw reply

* Re: git-subtree
From: Jeff King @ 2012-01-05 15:47 UTC (permalink / raw)
  To: David A. Greene; +Cc: Ramkumar Ramachandra, David Greene, git
In-Reply-To: <87ipkq199w.fsf@smith.obbligato.org>

On Thu, Jan 05, 2012 at 09:03:38AM -0600, David A. Greene wrote:

> > Please read and follow the guidelines listed in
> > Documentation/SubmittingPatches.  The TL;DR version is: break it up
> > into logical reviewable commits based on the current `master` and use
> > git format-patch/ git send-email to send those commits to this mailing
> > list.
> 
> I've read that document.  The issue is that I didn't develop the code,
> Avery did.  This is a completely new tool for git and I don't have the
> first idea of what "logical" chunks would look like.  I assume, for
> example, that we'd want the first "chunk" to actually work and do
> something interesting.  I can go spend a bunch of time to see if I can
> grok enough to create these chunks but I wanted to check first and make
> sure that would be absolutely necessary.  It's a lot of time to learn a
> completely new codebase.  I was hoping to submit something soon and then
> learn the codebase gradually during maintenance/further development.

I think this is also somewhat different in that git-subtree has a
multi-year history in git that we may want to keep. So it is more
analogous to something like gitweb or git-gui, which we have brought in
(using subtree merges, no less).

The biggest decision is whether or not to import the existing history.
If we do, then we have to decide whether it becomes a sub-component like
gitweb (e.g., it gets pulled into a "subtree" directory, and we have
make recurse into it), or whether it gets overlaid into the main
directory (i.e., we clean and munge the subtree repo a bit, then just
"git merge" the history in).

If we want to throw away the existing history, then I think you end up
doing the same munging as the latter option above, and then just make a
single patch out of it instead of a merge.

I don't use git-subtree, but just glancing over the repo, it looks like
that munging is mostly:

  1. git-subtree.sh stays, and gets added to git.git's top-level Makefile

  2. the test.sh script gets adapted into t/tXXXX-subtree.sh

  3. git-subtree.txt goes into Documentation/

  4. The rest of the files are infrastructure that can go away, as they
     are a subset of what git.git already contains.

I'd favor keeping the history and doing the munge-overlay thing.
Although part of me wants to join the histories in a subtree so that we
can use "git subtree" to do it (which would just be cool), I think the
resulting code layout doesn't make much sense unless git-subtree is
going to be maintained separately.

-Peff

^ permalink raw reply

* Re: git-subtree
From: Junio C Hamano @ 2012-01-05 15:53 UTC (permalink / raw)
  To: David Greene; +Cc: git
In-Reply-To: <nngaa638nwf.fsf@transit.us.cray.com>

David Greene <dag@cray.com> writes:

> How does the git community want the patch presented?  Right now it's one
> monolithic thing.  I understand that isn't ideal but I don't think
> incorporating the entire GitHub master history is necessarily the best
> idea either.

It depends on the longer term vision of how the result of this submission
will evolve and more importantly, where you fit in the piture.

One possible answer you could give us might go like this:

    The longer term vision is for "git subtree" to become, and be
    developed further as, an integral part of the core git suite.

    I have been an active contributor to the "git subtree" project for
    quite some time, and am very familiar with the code. Avery has been
    too busy to properly take care of the maintenance of "git subtree",
    and expected to be so for the foreseeable future. I will address any
    issue raised during the initial review and will be taking over its
    maintenance and further development.

    My plan is to put this first to contrib/ area, keep it there for a few
    release cycles while ironing out remaining kinks in the code, and
    eventually make it one of the "git" subcommands. Avery's external tree
    will cease to exist as future development will happen in-tree in the
    git repository.

Your answer might differ, of course, but the point is that we would need
to weigh pros and cons between inclusion of it in the git repository and
keeping it in Avery's repository and have him and his contributors
maintain, enhance and distribute it from there, and it largely depends on
the nature of the submission. Is it a "throw it over the wall" dump of a
large code of unknown quality that we need to clean up first without
knowing the vision of how "git subtree" should evolve by original author
and/or people who have been actively developing it?

^ permalink raw reply

* Re: [PATCH 1/2] daemon: add tests
From: Clemens Buchacher @ 2012-01-05 16:06 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, git, Jonathan Nieder, Erik Faye-Lund,
	Ilari Liusvaara, Nguyễn Thái Ngọc Duy
In-Reply-To: <20120105025559.GB7326@sigill.intra.peff.net>

On Wed, Jan 04, 2012 at 09:55:59PM -0500, Jeff King wrote:
> 
> It so happens that I have just the patch you need. I've been meaning to
> go over it again and submit it:
> 
>   run-command: optionally kill children on exit
>   https://github.com/peff/git/commit/5523d7ebf2a0386c9c61d7bfbc21375041df4989

Thanks, looks great. But if I add this on top (to enable this for
"git daemon"), then t0001 kills my entire X session. Not sure yet
what's going.

diff --git a/run-command.c b/run-command.c
index aeb9c6e..53218df 100644
--- a/run-command.c
+++ b/run-command.c
@@ -497,6 +497,7 @@ static void prepare_run_command_v_opt(struct child_process *cmd,
        cmd->stdout_to_stderr = opt & RUN_COMMAND_STDOUT_TO_STDERR ? 1 : 0;
        cmd->silent_exec_failure = opt & RUN_SILENT_EXEC_FAILURE ? 1 : 0;
        cmd->use_shell = opt & RUN_USING_SHELL ? 1 : 0;
+       cmd->clean_on_exit = 1;
 }
 
 int run_command_v_opt(const char **argv, int opt)

^ permalink raw reply related

* Re: [PATCH] clone: allow detached checkout when --branch takes a tag
From: Junio C Hamano @ 2012-01-05 16:22 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git
In-Reply-To: <1325771380-18862-1-git-send-email-pclouds@gmail.com>

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> This allows you to do "git clone --branch=v1.7.8 git.git" and work
> right away from there. No big deal, just one more convenient step, I
> think. --branch taking a tag may be confusing though.
>
> We can still have master in this case instead of detached HEAD, which
> may make more sense because we use --branch. I don't care much which
> way should be used.

You clone a single lineage of the history, either shallowly or fully,
either starting at the tip of one single branch or a named tag.

What is the expected use scenario of a resulting repository of this new
feature? As this is creating a repository, not a tarball extract, you
certainly would want the user to build further history in the resulting
repository, and it would need a real branch at some point, preferably
before any new commit is made. Which makes me think that the only reason
we would use a detached HEAD would be because we cannot decide what name
to give to that single branch and make it the responsibility of the user
to run "git checkout -b $whatever" as the first thing.

I think the real cause of the above is because this patch and its previous
companion patch conflate the meaning of the "--branch" option with the
purpose of specifying which lineage of the history to copy. The option is
described to name the local branch that is checked out, instead of using
the the same name the remote's primary branch. But these patches abuse the
option to name something different at the same time---the endpoint of the
single lineage to be copied.

These two may often be the same, and use of "clone --branch=master" in
such a case would mean that you want to name the local branch of the final
checkout to be "master" _and_ the endpoint of the single lineage you are
copying is also their "master".

But the "tag" extension proposed with this change is different.

You are specifying an endpoint of the single lineage with the option that
is different from any of the branches at the origin, and because you used
the "--branch" option for that purpose, you lost the way to specify the
primary thing the option wanted to express: what the name of the resulting
checkout should be.

Perhaps something like "clone --branch=master --$endpoint=v1.7.8" that
says "I want a clone of the repository limited to a single lineage, whose
history ends at the commit pointed by the v1.7.8 tag, and name the local
checkout my master branch" be more appropriate?

Also, the user is likely to want to fetch and integrate from the origin
with his own history. How should "git pull" and "git fetch" work in the
resulting repository? What should the remote.origin.* look like?

^ permalink raw reply

* Re: [PATCH] Fix incorrect ref namespace check
From: Junio C Hamano @ 2012-01-05 16:23 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Michael Haggerty
In-Reply-To: <1325767180-15083-1-git-send-email-pclouds@gmail.com>

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> The reason why the trailing slash is needed is obvious. refs/stash and
> HEAD are not namespace, but complete refs. Do full string compare on them.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  I missed prefixcmp(..., "HEAD") right below prefixcmp(..., "refs/stash")

As Michael has been actively showing interest in cleaning up the area, he
should have been CC'ed, I would think.

>
>  builtin/fetch.c  |    2 +-
>  builtin/remote.c |    2 +-
>  log-tree.c       |    4 ++--
>  3 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 33ad3aa..daa68d2 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -573,7 +573,7 @@ static void find_non_local_tags(struct transport *transport,
>  
>  	for_each_ref(add_existing, &existing_refs);
>  	for (ref = transport_get_remote_refs(transport); ref; ref = ref->next) {
> -		if (prefixcmp(ref->name, "refs/tags"))
> +		if (prefixcmp(ref->name, "refs/tags/"))
>  			continue;
>  
>  		/*
> diff --git a/builtin/remote.c b/builtin/remote.c
> index 583eec9..f54a89a 100644
> --- a/builtin/remote.c
> +++ b/builtin/remote.c
> @@ -534,7 +534,7 @@ static int add_branch_for_removal(const char *refname,
>  	}
>  
>  	/* don't delete non-remote-tracking refs */
> -	if (prefixcmp(refname, "refs/remotes")) {
> +	if (prefixcmp(refname, "refs/remotes/")) {
>  		/* advise user how to delete local branches */
>  		if (!prefixcmp(refname, "refs/heads/"))
>  			string_list_append(branches->skipped,
> diff --git a/log-tree.c b/log-tree.c
> index 319bd31..535b905 100644
> --- a/log-tree.c
> +++ b/log-tree.c
> @@ -119,9 +119,9 @@ static int add_ref_decoration(const char *refname, const unsigned char *sha1, in
>  		type = DECORATION_REF_REMOTE;
>  	else if (!prefixcmp(refname, "refs/tags/"))
>  		type = DECORATION_REF_TAG;
> -	else if (!prefixcmp(refname, "refs/stash"))
> +	else if (!strcmp(refname, "refs/stash"))
>  		type = DECORATION_REF_STASH;
> -	else if (!prefixcmp(refname, "HEAD"))
> +	else if (!strcmp(refname, "HEAD"))
>  		type = DECORATION_REF_HEAD;
>  
>  	if (!cb_data || *(int *)cb_data == DECORATE_SHORT_REFS)

^ permalink raw reply

* Re: Warning from AV software about kill.exe
From: Erik Faye-Lund @ 2012-01-05 16:33 UTC (permalink / raw)
  To: Erik Blake; +Cc: Pat Thoyts, Thomas Rast, git
In-Reply-To: <4F0418B1.5050403@icefield.yk.ca>

On Wed, Jan 4, 2012 at 10:15 AM, Erik Blake <erik@icefield.yk.ca> wrote:
> On 2011-12-22 19:19, Pat Thoyts wrote:
>> Thomas Rast<trast@student.ethz.ch>  writes:
>>> Erik Blake<erik@icefield.yk.ca>  writes:
>>>
>>>> I'm running git under Win7 64. As I selected "Repository|Visualize all
>>>> branch history" in the git gui, my AV software (Trustport) trapped the
>>>> bin\kill.exe program for "trying to modify system global settings
>>>> (time, timezone, registry quota, etc.)"
>>>>
>>>> Does anyone know the details of this process and what it's function
>>>> is? First time I've seen it, though I'm a relatively new user.
>>>
>>> 'kill' is a standard unix utility that sends signals to processes, in
>>> particular signals that cause the processes to exit or be killed
>>> forcibly by the kernel, hence the name.  (I don't know how the windows
>>> equivalent works under the hood, but presumably it's something similar.)
>>>
>>> git-gui and gitk use kill to terminate background worker processes that
>>> are no longer needed because you closed the window their output would
>>> have been displayed in, etc.
>>
>> You might try replacing the command in the tcl scripts with 'exec
>> taskkill /f /pid $pid' and see if that avoids the error. taskkill is
>> present on XP and above as part of the OS distribution so shouldn't
>> suffer any AV complaints.
>>
>
> Another way to implement this (on Windows) would be for the git programs to
> tag themselves with a mutex. Then the "kill" program can determine which git
> programs are running and send them user-defined windows messages to shut
> themselves down. Alternatively, you could send the programs the standard
> windows WM_CLOSE message, but the OS or an AV program might still be
> troubled by that behaviour.
>
> This is how we implement this type of behaviour in our windows programs. It
> does not raise the ire of the OS or AV since you do not have one process
> trying to shut down another. It also bypasses all issues with process
> privileges etc.
>
> Erik
>

No thanks. A process is allowed to terminate another process on
Windows (as long as they are running as the same user, and the access
token has not been messed with). If your AV detects this and prevents
it, then your AV is broken. Re-building a kind of cooperative process
termination for that reason is not the way forward.

But the problem might be that MSYS' kill does more than it's supposed
to (or misbehaves in some other way). This is, however, something you
should take up with the MSYS developers, not the git development
community.

I would take this up with Trustport support. Overly eager AV
heuristics is a fairly common problem, and usually gets fixed quickly.

^ permalink raw reply

* Re: checkout on an empty directory fails
From: Dirk Süsserott @ 2012-01-05 19:33 UTC (permalink / raw)
  To: Holger Hellmuth; +Cc: René Doß, git
In-Reply-To: <4F05ACD6.6040603@ira.uka.de>

Am 05.01.2012 14:59 schrieb Holger Hellmuth:
> On 05.01.2012 13:38, René Doß wrote:
>> git status says not special informations.
> 
>  versus
> 
>> red@linux-nrd1:~/iso/a> git status
>> # On branch master
>> # Changed but not updated:
>> # (use "git add/rm <file>..." to update what will be committed)
>> # (use "git checkout -- <file>..." to discard changes in working
>> directory)
>> #
>> # deleted: SP601_RevC_annotated_master_ucf_8-28-09.ucf
>> # deleted: rtl/ether_speed.vhd
>> # deleted: rtl/ether_top.vhd
>> # deleted: rtl/ether_tx.vhd
>> # deleted: rtl/takt.vhd
>> # deleted: sim/makefile
>> # deleted: sim/tb_ether_top.vhd
>> #
> 
> This *is* special information: It tells you that master has those 7
> files but your working directory has none of them (i.e. it is as if you
> had deleted them from your working directory).
> 
> "git checkout <branch>" switches between branches, *but* leaves changes
> you made (files you edited, added or deleted) intact! This is so you can
> switch branches before commiting if you suddenly realize you are in the
> wrong branch.
> 
> "git checkout -- <paths...>" or in your case "git checkout -- ." is
> different, it really overwrites the files in your working dir with the
> versions stored somewhere else, by default from the index.
> 
>> What means the point in checkout?
> 
> "." is simply your current directory

Another way of reviving the deleted files and restore the master branch is

$ git checkout -f master # or git checkout --force master

This will unconditionally checkout master and overwrite the local
changes, including the deletions Holger mentioned.

For me, "checkout --force" is more intuitive than "reset --hard" or
"checkout .".

    Dirk

^ permalink raw reply

* [PATCH 1/2] gitweb: Fix file links in "grep" search
From: Jakub Narebski @ 2012-01-05 20:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Thomas Perl, git
In-Reply-To: <CANQwDwfnp167Uth5TLbCD6OR-Xe6JD-2vENiJVnipi1YdjnMPQ@mail.gmail.com>

There were two bugs in generating file links (links to "blob" view),
one hidden by the other.  The correct way of generating file link is

	href(action=>"blob", hash_base=>$co{'id'},
	     file_name=>$file);

It was $co{'hash'} (this key does not exist, and therefore this is
undef), and 'hash' instead of 'hash_base'.

To have this fix applied in single place, this commit also reduces
code duplication by saving file link (which is used for line links) in
$file_href.

Reported-by: Thomas Perl <th.perl@gmail.com>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
On Wed, 4 Jan 2012, Jakub Narębski wrote:
> On Wed, Jan 4, 2012 at 1:28 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Thomas Perl <th.perl@gmail.com> writes:
>>
>>> I think I found a bug in gitweb when grep'ing for text in a branch
>>> different from "master". Here's how to reproduce it:
>>
>> Thanks for a detailed report (and thanks for gpodder ;-).
>>
>> Jakub, care to take a look?
> 
> I see the bug: it should be 'hash_base' not 'hash' in href()
> creating link to "blob" view in git_search_files().
> 
> I'll try to send a fix soon...

Actually there were two errors, one hiding the other...


Thomas, could you check if this fixes your issue?

 gitweb/gitweb.perl |   15 +++++++--------
 1 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index fc41b07..fa58156 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -5852,7 +5852,7 @@ sub git_search_files {
 	my $lastfile = '';
 	while (my $line = <$fd>) {
 		chomp $line;
-		my ($file, $lno, $ltext, $binary);
+		my ($file, $file_href, $lno, $ltext, $binary);
 		last if ($matches++ > 1000);
 		if ($line =~ /^Binary file (.+) matches$/) {
 			$file = $1;
@@ -5867,10 +5867,10 @@ sub git_search_files {
 			} else {
 				print "<tr class=\"light\">\n";
 			}
+			$file_href = href(action=>"blob", hash_base=>$co{'id'},
+			                  file_name=>$file);
 			print "<td class=\"list\">".
-				$cgi->a({-href => href(action=>"blob", hash=>$co{'hash'},
-						       file_name=>"$file"),
-					-class => "list"}, esc_path($file));
+				$cgi->a({-href => $file_href, -class => "list"}, esc_path($file));
 			print "</td><td>\n";
 			$lastfile = $file;
 		}
@@ -5888,10 +5888,9 @@ sub git_search_files {
 				$ltext = esc_html($ltext, -nbsp=>1);
 			}
 			print "<div class=\"pre\">" .
-				$cgi->a({-href => href(action=>"blob", hash=>$co{'hash'},
-						       file_name=>"$file").'#l'.$lno,
-					-class => "linenr"}, sprintf('%4i', $lno))
-				. ' ' .  $ltext . "</div>\n";
+				$cgi->a({-href => $file_href.'#l'.$lno,
+				        -class => "linenr"}, sprintf('%4i', $lno)) .
+				' ' .  $ltext . "</div>\n";
 		}
 	}
 	if ($lastfile) {
-- 
1.7.6

^ permalink raw reply related

* [PATCH 2/2] gitweb: Harden "grep" search against filenames with ':'
From: Jakub Narebski @ 2012-01-05 20:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Thomas Perl, git
In-Reply-To: <201201052126.49087.jnareb@gmail.com>

Run "git grep" in "grep" search with '-z' option, to be able to parse
response also for files with filename containing ':' character.  The
':' character is otherwise (without '-z') used to separate filename
from line number and from matched line.

Note that this does not protect files with filename containing
embedded newline.  This would be hard but doable for text files, and
harder or even currently impossible with binary files: git does not
quote filename in

  "Binary file <foo> matches"

message, but new `--break` and/or `--header` options to git-grep could
help here.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This is what I did after fixing previous issue, after looking at current
code.  Hopefully nobody sane uses filenames with embedded newlines...

  http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html

 gitweb/gitweb.perl |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index fa58156..f884dfe 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -5836,7 +5836,7 @@ sub git_search_files {
 	my %co = @_;
 
 	local $/ = "\n";
-	open my $fd, "-|", git_cmd(), 'grep', '-n',
+	open my $fd, "-|", git_cmd(), 'grep', '-n', '-z',
 		$search_use_regexp ? ('-E', '-i') : '-F',
 		$searchtext, $co{'tree'}
 			or die_error(500, "Open git-grep failed");
@@ -5858,7 +5858,8 @@ sub git_search_files {
 			$file = $1;
 			$binary = 1;
 		} else {
-			(undef, $file, $lno, $ltext) = split(/:/, $line, 4);
+			($file, $lno, $ltext) = split(/\0/, $line, 3);
+			$file =~ s/^$co{'tree'}://;
 		}
 		if ($file ne $lastfile) {
 			$lastfile and print "</td></tr>\n";
-- 
1.7.6

^ permalink raw reply related

* [PATCH] parse_object: try internal cache before reading object db
From: Jeff King @ 2012-01-05 21:00 UTC (permalink / raw)
  To: git; +Cc: git-dev

When parse_object is called, we do the following:

  1. read the object data into a buffer via read_sha1_file

  2. call parse_object_buffer, which then:

     a. calls the appropriate lookup_{commit,tree,blob,tag}
	to either create a new "struct object", or to find
	an existing one. We know the appropriate type from
	the lookup in step 1.

     b. calls the appropriate parse_{commit,tree,blob,tag}
        to parse the buffer for the new (or existing) object

In step 2b, all of the called functions are no-ops for
object "X" if "X->object.parsed" is set. I.e., when we have
already parsed an object, we end up going to a lot of work
just to find out at a low level that there is nothing left
for us to do (and we throw away the data from read_sha1_file
unread).

We can optimize this by moving the check for "do we have an
in-memory object" from 2a before the expensive call to
read_sha1_file in step 1.

This might seem circular, since step 2a uses the type
information determined in step 1 to call the appropriate
lookup function. However, we can notice that all of the
lookup_* functions are backed by lookup_object. In other
words, all of the objects are kept in a master hash table,
and we don't actually need the type to do the "do we have
it" part of the lookup, only to do the "and create it if it
doesn't exist" part.

This can save time whenever we call parse_object on the same
sha1 twice in a single program. Some code paths already
perform this optimization manually, with either:

  if (!obj->parsed)
	  obj = parse_object(obj->sha1);

if you already have a "struct object", or:

  struct object *obj = lookup_unknown_object(sha1);
  if (!obj || !obj->parsed)
	  obj = parse_object(sha1);

if you don't.  This patch moves the optimization into
parse_object itself.

Most git operations won't notice any impact. Either they
don't parse a lot of duplicate sha1s, or the calling code
takes special care not to re-parse objects. I timed two
code paths that do benefit (there may be more, but these two
were immediately obvious and easy to time).

The first is fast-export, which calls parse_object on each
object it outputs, like this:

  object = parse_object(sha1);
  if (!object)
	  die(...);
  if (object->flags & SHOWN)
	  return;

which means that just to realize we have already shown an
object, we will read the whole object from disk!

With this patch, my best-of-five time for "fast-export --all" on
git.git dropped from 26.3s to 21.3s.

The second case is upload-pack, which will call parse_object
for each advertised ref (because it needs to peel tags to
show "^{}" entries). This doesn't matter for most
repositories, because they don't have a lot of refs pointing
to the same objects. However, if you have a big alternates
repository with a shared object db for a number of child
repositories, then the alternates repository will have
duplicated refs representing each of its children.

For example, GitHub's alternates repository for git.git has
~120,000 refs, of which only ~3200 are unique. The time for
upload-pack to print its list of advertised refs dropped
from 3.4s to 0.76s.

Signed-off-by: Jeff King <peff@peff.net>
---
 object.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/object.c b/object.c
index d8d09f9..6b06297 100644
--- a/object.c
+++ b/object.c
@@ -191,10 +191,15 @@ struct object *parse_object(const unsigned char *sha1)
 	enum object_type type;
 	int eaten;
 	const unsigned char *repl = lookup_replace_object(sha1);
-	void *buffer = read_sha1_file(sha1, &type, &size);
+	void *buffer;
+	struct object *obj;
+
+	obj = lookup_object(sha1);
+	if (obj && obj->parsed)
+		return obj;

+	buffer = read_sha1_file(sha1, &type, &size);
 	if (buffer) {
-		struct object *obj;
 		if (check_sha1_signature(repl, buffer, size, typename(type)) < 0) {
 			free(buffer);
 			error("sha1 mismatch %s\n", sha1_to_hex(repl));
-- 
1.7.6.5.6.ge6248

^ permalink raw reply related

* Re: [PATCH v2] Limit refs to fetch to minimum in shallow clones
From: Junio C Hamano @ 2012-01-05 21:25 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Shawn O. Pearce
In-Reply-To: <1325743516-14940-1-git-send-email-pclouds@gmail.com>

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> The main purpose of shallow clones is to reduce download by only
> fetching objects up to a certain depth from the given refs. The number
> of objects depends on how many refs to follow. So:
>
>  - Only fetch HEAD or the ref specified by --branch
>  - Only fetch tags that point to downloaded objects
>
> More tags/branches can be fetched later using git-fetch as usual.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  Only lightly tested, but seems to work.

Thanks.

Perhaps you would want to add tests so that you do not have to say
"lightly tested"?

> diff --git a/builtin/clone.c b/builtin/clone.c
> index efe8b6c..8de9248 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -48,6 +48,7 @@ static int option_verbosity;
>  static int option_progress;
>  static struct string_list option_config;
>  static struct string_list option_reference;
> +static char *src_ref_prefix = "refs/heads/";

Would this be const?

>  static int opt_parse_reference(const struct option *opt, const char *arg, int unset)
>  {
> @@ -427,9 +428,27 @@ static struct ref *wanted_peer_refs(const struct ref *refs,
>  	struct ref *local_refs = head;
>  	struct ref **tail = head ? &head->next : &local_refs;
>  
> -	get_fetch_map(refs, refspec, &tail, 0);
> -	if (!option_mirror)
> -		get_fetch_map(refs, tag_refspec, &tail, 0);
> +	if (option_depth) {
> +		struct ref *remote_head = NULL;
> +
> +		if (!option_branch)
> +			remote_head = guess_remote_head(head, refs, 0);
> +		else {
> +			struct strbuf sb = STRBUF_INIT;
> +			strbuf_addstr(&sb, src_ref_prefix);
> +			strbuf_addstr(&sb, option_branch);
> +			remote_head = find_ref_by_name(refs, sb.buf);
> +			strbuf_release(&sb);
> +		}
> +
> +		if (remote_head)
> +			get_fetch_map(remote_head, refspec, &tail, 0);

What happens when we fail to find any remote_head and make no call to
get_fetch_map() here?  I am wondering if that should trigger an error
here.

Also this breaks 5500 for rather obvious reasons, as the point of this
patch is to reduce the object transferred when a shallow clone is made.

Perhaps there should be an option to give users the historical "all
branches equally shallow" behaviour?

^ permalink raw reply

* Re: [PATCH] parse_object: try internal cache before reading object db
From: Junio C Hamano @ 2012-01-05 21:35 UTC (permalink / raw)
  To: Jeff King; +Cc: git, git-dev
In-Reply-To: <20120105210001.GA30549@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> This might seem circular, since step 2a uses the type
> information determined in step 1 to call the appropriate
> lookup function. However, we can notice that all of the
> lookup_* functions are backed by lookup_object. In other
> words, all of the objects are kept in a master hash table,
> and we don't actually need the type to do the "do we have
> it" part of the lookup,...

The only case that might matter is where you read one object, you have
written another object of a different type but that happens to hash to the
same SHA-1 value. The other existing optimizations do not take that into
account, so I do not think there is any new issue here.

> For example, GitHub's alternates repository for git.git has
> ~120,000 refs, of which only ~3200 are unique. The time for
> upload-pack to print its list of advertised refs dropped
> from 3.4s to 0.76s.

Nice. I am more impressed by 120k/3.4 than 3.2k/0.76, though ;-)

Thanks.

^ permalink raw reply

* Re: [PATCH] parse_object: try internal cache before reading object db
From: Jeff King @ 2012-01-05 21:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, git-dev
In-Reply-To: <7vipkpn87d.fsf@alter.siamese.dyndns.org>

On Thu, Jan 05, 2012 at 01:35:50PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > This might seem circular, since step 2a uses the type
> > information determined in step 1 to call the appropriate
> > lookup function. However, we can notice that all of the
> > lookup_* functions are backed by lookup_object. In other
> > words, all of the objects are kept in a master hash table,
> > and we don't actually need the type to do the "do we have
> > it" part of the lookup,...
> 
> The only case that might matter is where you read one object, you have
> written another object of a different type but that happens to hash to the
> same SHA-1 value. The other existing optimizations do not take that into
> account, so I do not think there is any new issue here.

Yeah, I tried to think of issues like that. Even if you protected
against that, you'd still have the issue of reading one object, then
writing another of the _same_ type but with different content. We
wouldn't notice with the current code path (we'd just recreationally
read the data from disk and then throw it away).

The worst potential problem I could come up with is if you somehow had
an object whose "parsed" flag was set, but somehow didn't have its other
fields set (like type). But I think you'd have to be abusing the lookup
functions pretty hard to get into such a state (how would you be parsing
if you didn't know the type?). The parsed flag only gets set by the
type-specific lookup functions.

So I think it is safe short of somebody doing some horrible manual
munging of a "struct object".

> > For example, GitHub's alternates repository for git.git has
> > ~120,000 refs, of which only ~3200 are unique. The time for
> > upload-pack to print its list of advertised refs dropped
> > from 3.4s to 0.76s.
> 
> Nice. I am more impressed by 120k/3.4 than 3.2k/0.76, though ;-)

You can thank optimized zlib for that. We spent 60% of our time there.
:)

-Peff

^ permalink raw reply

* Re: [PATCH] parse_object: try internal cache before reading object db
From: Junio C Hamano @ 2012-01-05 21:55 UTC (permalink / raw)
  To: Jeff King; +Cc: git, git-dev
In-Reply-To: <20120105214941.GA31836@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> The worst potential problem I could come up with is if you somehow had
> an object whose "parsed" flag was set, but somehow didn't have its other
> fields set (like type).
> ...
> So I think it is safe short of somebody doing some horrible manual
> munging of a "struct object".

Yeah, I was worried about codepaths like commit-pretty-printing might be
mucking with the contents of commit->buffer, perhaps reencoding the text
and then calling parse_object() to get the unmodified original back, or
something silly like that. But the lookup_object() call at the beginning
of the parse_object() already prevents us from doing such a thing, so we
should be OK, I would think.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox