Git development

Git development
 help / color / mirror / Atom feed

* Re: How to deal with historic tar-balls
From: Neal Kreitzinger @ 2012-01-05 15:25 UTC (permalink / raw)
  To: nn6eumtr; +Cc: git
In-Reply-To: <4EFF5CDA.5050809@gmail.com>

On 12/31/2011 1:04 PM, nn6eumtr wrote:
> I have a number of older projects that I want to bring into a git
> repository. They predate a lot of the popular scm systems, so they
> are primarily a collection of tarballs today.
>
> I'm fairly new to git so I have a couple questions related to this:
>
> - What is the best approach for bringing them in? Do I just create a
>  repository, then unpack the files, commit them, clean out the
> directory unpack the next tarball, and repeat until everything is
> loaded?
>
> - Do I need to pay special attention to files that are
> renamed/removed from version to version?
>
> - If the timestamps change on a file but the actual content does not,
>  will git treat it as a non-change once it realizes the content
> hasn't changed?
>
> - Last, if after loading the repository I find another version of the
>  files that predates those I've loaded, or are intermediate between
> two commits I've already loaded, is there a way to go say that commit
> B is actually the ancestor of commit C? (i.e. a->c becomes a->b->c if
> you were to visualize the commit timeline or do diffs) Or do I just
> reload the tarballs in order to achieve this?
>
The git-rm manpage contains instructions under the "vendor code drop"
section on how to do this.  I imagine you will want to do each one
manually instead of queueing them up in a script because you are likely 
going to want to do appropriate clean up of the working tree in each 
iteration before committing.  This is where you would review 
renames/removes with git-status before you git-add and git-commit. 
Also, if you are tracking permissions in git (the executable bit) then 
you will want to filter out any noise generated by frivolous permissions 
changes between the tarball contents.

In regard to inserting tarballs into the history that depends on when 
you think you plan on doing that.  You are only going to be able to do 
that before the history is published (made "public" for other repos to 
pull down).  Otherwise you will be rewriting published history which is 
a big no-no (see git-rebase manpage).  I suggest you do your homework 
and order them properly before you start because that will be less work. 
  If you still find that you missed something then you can use 
interactive git-rebase to insert.  I'm assuming a single "master" branch 
with linear history is your desired end result.  If you want to create 
maintenance branches showing release history then you will definitely 
need to do your homework first (see gitworkflow manpage).

If you venture into rebase territory by rewriting history (inserting 
missed tarballs in between older commits) you will need to be sure to 
review your automatic merge resolutions.  Git only generates 
merge-conflicts on same-file-same-line conflicts.  It will auto-merge 
same-file-different-line changes.

You also need to ask yourself if you really need a history of all those 
versions.  To exaggerate, if all you really need is the current state 
then you need to ask yourself if it's worth the effort to record the 
previous states.  Maybe what you want is something in-between (a happy 
medium).

In regard to the 'start-over' method of inserting missed tarballs you 
would just git-reset --hard to the commit you want to insert on-top-of, 
add the tarball, and then re-apply the subsequent tarballs.  If you are 
doing cleanup between commits then the rebase or cherry-pick of the 
already cleaned-up subsequent commits from the "old-branch" (previous 
attempt) onto the 'do-over' branch will likely be easier.  (You can just 
do 'git branch old-branch' on your branch before the git-reset --hard 
(do-over) and that will give you a "backup copy" of the "previous 
attempt" called "old-branch" that you can salvage already-done-work from 
by using rebase or cherry-pick.)

Hope this helps.

v/r,
neal

^ permalink raw reply

* Re: git-subtree
From: Ramkumar Ramachandra @ 2012-01-05 15:32 UTC (permalink / raw)
  To: David A. Greene; +Cc: David Greene, git, Junio C Hamano
In-Reply-To: <87ipkq199w.fsf@smith.obbligato.org>

Hi again,

[+CC: Junio Hamano, our maintainer]

David A. Greene wrote:
> I've read that document.  The issue is that I didn't develop the code,
> Avery did.

Not an issue as long as you have Avery's signoff.

> It's a lot of time to learn a
> completely new codebase.  I was hoping to submit something soon and then
> learn the codebase gradually during maintenance/further development.

We certainly don't want badly reviewed code that nobody understands
floating around in the codebase- so, I'd suggest sending out whatever
you think is appropriate for the first round of reviews, and see how
things shape up from there.

> How have completely new tools be introduced into the git mainline in the
> past?

Yes.  For an example of something I was involved with but didn't
author, see vcs-svn/.

-- Ram

^ permalink raw reply

* 'fatal: Out of memory? mmap failed: No such device' using cifs
From: Bruno Bigras @ 2012-01-05 15:44 UTC (permalink / raw)
  To: git

Hi,

I got : 'fatal: Out of memory? mmap failed: No such device' when doing
'git init' in a directory on a mounted cifs share. Any ideas?

I'm using cifs with autofs, here's what I use :
win1
-fstype=smbfs,rw,credentials=/etc/smb.auth,gid=admin,file_mode=0777,dir_mode=0777,nocase,directio,sfu,iocharset=utf8
        ://10.1.1.8/DATA/

git version 1.7.8.2
2.6.32-37-generic-pae

$ mount
//10.1.1.8/DATA/ on /net/smb/win1 type cifs (rw,mand)

Thanks,

Bruno

^ permalink raw reply

* Re: git-subtree
From: Jeff King @ 2012-01-05 15:47 UTC (permalink / raw)
  To: David A. Greene; +Cc: Ramkumar Ramachandra, David Greene, git
In-Reply-To: <87ipkq199w.fsf@smith.obbligato.org>

On Thu, Jan 05, 2012 at 09:03:38AM -0600, David A. Greene wrote:

> > Please read and follow the guidelines listed in
> > Documentation/SubmittingPatches.  The TL;DR version is: break it up
> > into logical reviewable commits based on the current `master` and use
> > git format-patch/ git send-email to send those commits to this mailing
> > list.
> 
> I've read that document.  The issue is that I didn't develop the code,
> Avery did.  This is a completely new tool for git and I don't have the
> first idea of what "logical" chunks would look like.  I assume, for
> example, that we'd want the first "chunk" to actually work and do
> something interesting.  I can go spend a bunch of time to see if I can
> grok enough to create these chunks but I wanted to check first and make
> sure that would be absolutely necessary.  It's a lot of time to learn a
> completely new codebase.  I was hoping to submit something soon and then
> learn the codebase gradually during maintenance/further development.

I think this is also somewhat different in that git-subtree has a
multi-year history in git that we may want to keep. So it is more
analogous to something like gitweb or git-gui, which we have brought in
(using subtree merges, no less).

The biggest decision is whether or not to import the existing history.
If we do, then we have to decide whether it becomes a sub-component like
gitweb (e.g., it gets pulled into a "subtree" directory, and we have
make recurse into it), or whether it gets overlaid into the main
directory (i.e., we clean and munge the subtree repo a bit, then just
"git merge" the history in).

If we want to throw away the existing history, then I think you end up
doing the same munging as the latter option above, and then just make a
single patch out of it instead of a merge.

I don't use git-subtree, but just glancing over the repo, it looks like
that munging is mostly:

  1. git-subtree.sh stays, and gets added to git.git's top-level Makefile

  2. the test.sh script gets adapted into t/tXXXX-subtree.sh

  3. git-subtree.txt goes into Documentation/

  4. The rest of the files are infrastructure that can go away, as they
     are a subset of what git.git already contains.

I'd favor keeping the history and doing the munge-overlay thing.
Although part of me wants to join the histories in a subtree so that we
can use "git subtree" to do it (which would just be cool), I think the
resulting code layout doesn't make much sense unless git-subtree is
going to be maintained separately.

-Peff

^ permalink raw reply

* Re: git-subtree
From: Junio C Hamano @ 2012-01-05 15:53 UTC (permalink / raw)
  To: David Greene; +Cc: git
In-Reply-To: <nngaa638nwf.fsf@transit.us.cray.com>

David Greene <dag@cray.com> writes:

> How does the git community want the patch presented?  Right now it's one
> monolithic thing.  I understand that isn't ideal but I don't think
> incorporating the entire GitHub master history is necessarily the best
> idea either.

It depends on the longer term vision of how the result of this submission
will evolve and more importantly, where you fit in the piture.

One possible answer you could give us might go like this:

    The longer term vision is for "git subtree" to become, and be
    developed further as, an integral part of the core git suite.

    I have been an active contributor to the "git subtree" project for
    quite some time, and am very familiar with the code. Avery has been
    too busy to properly take care of the maintenance of "git subtree",
    and expected to be so for the foreseeable future. I will address any
    issue raised during the initial review and will be taking over its
    maintenance and further development.

    My plan is to put this first to contrib/ area, keep it there for a few
    release cycles while ironing out remaining kinks in the code, and
    eventually make it one of the "git" subcommands. Avery's external tree
    will cease to exist as future development will happen in-tree in the
    git repository.

Your answer might differ, of course, but the point is that we would need
to weigh pros and cons between inclusion of it in the git repository and
keeping it in Avery's repository and have him and his contributors
maintain, enhance and distribute it from there, and it largely depends on
the nature of the submission. Is it a "throw it over the wall" dump of a
large code of unknown quality that we need to clean up first without
knowing the vision of how "git subtree" should evolve by original author
and/or people who have been actively developing it?

^ permalink raw reply

* Re: [PATCH 1/2] daemon: add tests
From: Clemens Buchacher @ 2012-01-05 16:06 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, git, Jonathan Nieder, Erik Faye-Lund,
	Ilari Liusvaara, Nguyễn Thái Ngọc Duy
In-Reply-To: <20120105025559.GB7326@sigill.intra.peff.net>

On Wed, Jan 04, 2012 at 09:55:59PM -0500, Jeff King wrote:
> 
> It so happens that I have just the patch you need. I've been meaning to
> go over it again and submit it:
> 
>   run-command: optionally kill children on exit
>   https://github.com/peff/git/commit/5523d7ebf2a0386c9c61d7bfbc21375041df4989

Thanks, looks great. But if I add this on top (to enable this for
"git daemon"), then t0001 kills my entire X session. Not sure yet
what's going.

diff --git a/run-command.c b/run-command.c
index aeb9c6e..53218df 100644
--- a/run-command.c
+++ b/run-command.c
@@ -497,6 +497,7 @@ static void prepare_run_command_v_opt(struct child_process *cmd,
        cmd->stdout_to_stderr = opt & RUN_COMMAND_STDOUT_TO_STDERR ? 1 : 0;
        cmd->silent_exec_failure = opt & RUN_SILENT_EXEC_FAILURE ? 1 : 0;
        cmd->use_shell = opt & RUN_USING_SHELL ? 1 : 0;
+       cmd->clean_on_exit = 1;
 }
 
 int run_command_v_opt(const char **argv, int opt)

^ permalink raw reply related

* Re: [PATCH] clone: allow detached checkout when --branch takes a tag
From: Junio C Hamano @ 2012-01-05 16:22 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git
In-Reply-To: <1325771380-18862-1-git-send-email-pclouds@gmail.com>

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> This allows you to do "git clone --branch=v1.7.8 git.git" and work
> right away from there. No big deal, just one more convenient step, I
> think. --branch taking a tag may be confusing though.
>
> We can still have master in this case instead of detached HEAD, which
> may make more sense because we use --branch. I don't care much which
> way should be used.

You clone a single lineage of the history, either shallowly or fully,
either starting at the tip of one single branch or a named tag.

What is the expected use scenario of a resulting repository of this new
feature? As this is creating a repository, not a tarball extract, you
certainly would want the user to build further history in the resulting
repository, and it would need a real branch at some point, preferably
before any new commit is made. Which makes me think that the only reason
we would use a detached HEAD would be because we cannot decide what name
to give to that single branch and make it the responsibility of the user
to run "git checkout -b $whatever" as the first thing.

I think the real cause of the above is because this patch and its previous
companion patch conflate the meaning of the "--branch" option with the
purpose of specifying which lineage of the history to copy. The option is
described to name the local branch that is checked out, instead of using
the the same name the remote's primary branch. But these patches abuse the
option to name something different at the same time---the endpoint of the
single lineage to be copied.

These two may often be the same, and use of "clone --branch=master" in
such a case would mean that you want to name the local branch of the final
checkout to be "master" _and_ the endpoint of the single lineage you are
copying is also their "master".

But the "tag" extension proposed with this change is different.

You are specifying an endpoint of the single lineage with the option that
is different from any of the branches at the origin, and because you used
the "--branch" option for that purpose, you lost the way to specify the
primary thing the option wanted to express: what the name of the resulting
checkout should be.

Perhaps something like "clone --branch=master --$endpoint=v1.7.8" that
says "I want a clone of the repository limited to a single lineage, whose
history ends at the commit pointed by the v1.7.8 tag, and name the local
checkout my master branch" be more appropriate?

Also, the user is likely to want to fetch and integrate from the origin
with his own history. How should "git pull" and "git fetch" work in the
resulting repository? What should the remote.origin.* look like?

^ permalink raw reply

* Re: [PATCH] Fix incorrect ref namespace check
From: Junio C Hamano @ 2012-01-05 16:23 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Michael Haggerty
In-Reply-To: <1325767180-15083-1-git-send-email-pclouds@gmail.com>

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> The reason why the trailing slash is needed is obvious. refs/stash and
> HEAD are not namespace, but complete refs. Do full string compare on them.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  I missed prefixcmp(..., "HEAD") right below prefixcmp(..., "refs/stash")

As Michael has been actively showing interest in cleaning up the area, he
should have been CC'ed, I would think.

>
>  builtin/fetch.c  |    2 +-
>  builtin/remote.c |    2 +-
>  log-tree.c       |    4 ++--
>  3 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 33ad3aa..daa68d2 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -573,7 +573,7 @@ static void find_non_local_tags(struct transport *transport,
>  
>  	for_each_ref(add_existing, &existing_refs);
>  	for (ref = transport_get_remote_refs(transport); ref; ref = ref->next) {
> -		if (prefixcmp(ref->name, "refs/tags"))
> +		if (prefixcmp(ref->name, "refs/tags/"))
>  			continue;
>  
>  		/*
> diff --git a/builtin/remote.c b/builtin/remote.c
> index 583eec9..f54a89a 100644
> --- a/builtin/remote.c
> +++ b/builtin/remote.c
> @@ -534,7 +534,7 @@ static int add_branch_for_removal(const char *refname,
>  	}
>  
>  	/* don't delete non-remote-tracking refs */
> -	if (prefixcmp(refname, "refs/remotes")) {
> +	if (prefixcmp(refname, "refs/remotes/")) {
>  		/* advise user how to delete local branches */
>  		if (!prefixcmp(refname, "refs/heads/"))
>  			string_list_append(branches->skipped,
> diff --git a/log-tree.c b/log-tree.c
> index 319bd31..535b905 100644
> --- a/log-tree.c
> +++ b/log-tree.c
> @@ -119,9 +119,9 @@ static int add_ref_decoration(const char *refname, const unsigned char *sha1, in
>  		type = DECORATION_REF_REMOTE;
>  	else if (!prefixcmp(refname, "refs/tags/"))
>  		type = DECORATION_REF_TAG;
> -	else if (!prefixcmp(refname, "refs/stash"))
> +	else if (!strcmp(refname, "refs/stash"))
>  		type = DECORATION_REF_STASH;
> -	else if (!prefixcmp(refname, "HEAD"))
> +	else if (!strcmp(refname, "HEAD"))
>  		type = DECORATION_REF_HEAD;
>  
>  	if (!cb_data || *(int *)cb_data == DECORATE_SHORT_REFS)

^ permalink raw reply

* Re: Warning from AV software about kill.exe
From: Erik Faye-Lund @ 2012-01-05 16:33 UTC (permalink / raw)
  To: Erik Blake; +Cc: Pat Thoyts, Thomas Rast, git
In-Reply-To: <4F0418B1.5050403@icefield.yk.ca>

On Wed, Jan 4, 2012 at 10:15 AM, Erik Blake <erik@icefield.yk.ca> wrote:
> On 2011-12-22 19:19, Pat Thoyts wrote:
>> Thomas Rast<trast@student.ethz.ch>  writes:
>>> Erik Blake<erik@icefield.yk.ca>  writes:
>>>
>>>> I'm running git under Win7 64. As I selected "Repository|Visualize all
>>>> branch history" in the git gui, my AV software (Trustport) trapped the
>>>> bin\kill.exe program for "trying to modify system global settings
>>>> (time, timezone, registry quota, etc.)"
>>>>
>>>> Does anyone know the details of this process and what it's function
>>>> is? First time I've seen it, though I'm a relatively new user.
>>>
>>> 'kill' is a standard unix utility that sends signals to processes, in
>>> particular signals that cause the processes to exit or be killed
>>> forcibly by the kernel, hence the name.  (I don't know how the windows
>>> equivalent works under the hood, but presumably it's something similar.)
>>>
>>> git-gui and gitk use kill to terminate background worker processes that
>>> are no longer needed because you closed the window their output would
>>> have been displayed in, etc.
>>
>> You might try replacing the command in the tcl scripts with 'exec
>> taskkill /f /pid $pid' and see if that avoids the error. taskkill is
>> present on XP and above as part of the OS distribution so shouldn't
>> suffer any AV complaints.
>>
>
> Another way to implement this (on Windows) would be for the git programs to
> tag themselves with a mutex. Then the "kill" program can determine which git
> programs are running and send them user-defined windows messages to shut
> themselves down. Alternatively, you could send the programs the standard
> windows WM_CLOSE message, but the OS or an AV program might still be
> troubled by that behaviour.
>
> This is how we implement this type of behaviour in our windows programs. It
> does not raise the ire of the OS or AV since you do not have one process
> trying to shut down another. It also bypasses all issues with process
> privileges etc.
>
> Erik
>

No thanks. A process is allowed to terminate another process on
Windows (as long as they are running as the same user, and the access
token has not been messed with). If your AV detects this and prevents
it, then your AV is broken. Re-building a kind of cooperative process
termination for that reason is not the way forward.

But the problem might be that MSYS' kill does more than it's supposed
to (or misbehaves in some other way). This is, however, something you
should take up with the MSYS developers, not the git development
community.

I would take this up with Trustport support. Overly eager AV
heuristics is a fairly common problem, and usually gets fixed quickly.

^ permalink raw reply

* Re: checkout on an empty directory fails
From: Dirk Süsserott @ 2012-01-05 19:33 UTC (permalink / raw)
  To: Holger Hellmuth; +Cc: René Doß, git
In-Reply-To: <4F05ACD6.6040603@ira.uka.de>

Am 05.01.2012 14:59 schrieb Holger Hellmuth:
> On 05.01.2012 13:38, René Doß wrote:
>> git status says not special informations.
> 
>  versus
> 
>> red@linux-nrd1:~/iso/a> git status
>> # On branch master
>> # Changed but not updated:
>> # (use "git add/rm <file>..." to update what will be committed)
>> # (use "git checkout -- <file>..." to discard changes in working
>> directory)
>> #
>> # deleted: SP601_RevC_annotated_master_ucf_8-28-09.ucf
>> # deleted: rtl/ether_speed.vhd
>> # deleted: rtl/ether_top.vhd
>> # deleted: rtl/ether_tx.vhd
>> # deleted: rtl/takt.vhd
>> # deleted: sim/makefile
>> # deleted: sim/tb_ether_top.vhd
>> #
> 
> This *is* special information: It tells you that master has those 7
> files but your working directory has none of them (i.e. it is as if you
> had deleted them from your working directory).
> 
> "git checkout <branch>" switches between branches, *but* leaves changes
> you made (files you edited, added or deleted) intact! This is so you can
> switch branches before commiting if you suddenly realize you are in the
> wrong branch.
> 
> "git checkout -- <paths...>" or in your case "git checkout -- ." is
> different, it really overwrites the files in your working dir with the
> versions stored somewhere else, by default from the index.
> 
>> What means the point in checkout?
> 
> "." is simply your current directory

Another way of reviving the deleted files and restore the master branch is

$ git checkout -f master # or git checkout --force master

This will unconditionally checkout master and overwrite the local
changes, including the deletions Holger mentioned.

For me, "checkout --force" is more intuitive than "reset --hard" or
"checkout .".

    Dirk

^ permalink raw reply

* [PATCH 1/2] gitweb: Fix file links in "grep" search
From: Jakub Narebski @ 2012-01-05 20:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Thomas Perl, git
In-Reply-To: <CANQwDwfnp167Uth5TLbCD6OR-Xe6JD-2vENiJVnipi1YdjnMPQ@mail.gmail.com>

There were two bugs in generating file links (links to "blob" view),
one hidden by the other.  The correct way of generating file link is

	href(action=>"blob", hash_base=>$co{'id'},
	     file_name=>$file);

It was $co{'hash'} (this key does not exist, and therefore this is
undef), and 'hash' instead of 'hash_base'.

To have this fix applied in single place, this commit also reduces
code duplication by saving file link (which is used for line links) in
$file_href.

Reported-by: Thomas Perl <th.perl@gmail.com>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
On Wed, 4 Jan 2012, Jakub Narębski wrote:
> On Wed, Jan 4, 2012 at 1:28 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Thomas Perl <th.perl@gmail.com> writes:
>>
>>> I think I found a bug in gitweb when grep'ing for text in a branch
>>> different from "master". Here's how to reproduce it:
>>
>> Thanks for a detailed report (and thanks for gpodder ;-).
>>
>> Jakub, care to take a look?
> 
> I see the bug: it should be 'hash_base' not 'hash' in href()
> creating link to "blob" view in git_search_files().
> 
> I'll try to send a fix soon...

Actually there were two errors, one hiding the other...


Thomas, could you check if this fixes your issue?

 gitweb/gitweb.perl |   15 +++++++--------
 1 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index fc41b07..fa58156 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -5852,7 +5852,7 @@ sub git_search_files {
 	my $lastfile = '';
 	while (my $line = <$fd>) {
 		chomp $line;
-		my ($file, $lno, $ltext, $binary);
+		my ($file, $file_href, $lno, $ltext, $binary);
 		last if ($matches++ > 1000);
 		if ($line =~ /^Binary file (.+) matches$/) {
 			$file = $1;
@@ -5867,10 +5867,10 @@ sub git_search_files {
 			} else {
 				print "<tr class=\"light\">\n";
 			}
+			$file_href = href(action=>"blob", hash_base=>$co{'id'},
+			                  file_name=>$file);
 			print "<td class=\"list\">".
-				$cgi->a({-href => href(action=>"blob", hash=>$co{'hash'},
-						       file_name=>"$file"),
-					-class => "list"}, esc_path($file));
+				$cgi->a({-href => $file_href, -class => "list"}, esc_path($file));
 			print "</td><td>\n";
 			$lastfile = $file;
 		}
@@ -5888,10 +5888,9 @@ sub git_search_files {
 				$ltext = esc_html($ltext, -nbsp=>1);
 			}
 			print "<div class=\"pre\">" .
-				$cgi->a({-href => href(action=>"blob", hash=>$co{'hash'},
-						       file_name=>"$file").'#l'.$lno,
-					-class => "linenr"}, sprintf('%4i', $lno))
-				. ' ' .  $ltext . "</div>\n";
+				$cgi->a({-href => $file_href.'#l'.$lno,
+				        -class => "linenr"}, sprintf('%4i', $lno)) .
+				' ' .  $ltext . "</div>\n";
 		}
 	}
 	if ($lastfile) {
-- 
1.7.6

^ permalink raw reply related

* [PATCH 2/2] gitweb: Harden "grep" search against filenames with ':'
From: Jakub Narebski @ 2012-01-05 20:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Thomas Perl, git
In-Reply-To: <201201052126.49087.jnareb@gmail.com>

Run "git grep" in "grep" search with '-z' option, to be able to parse
response also for files with filename containing ':' character.  The
':' character is otherwise (without '-z') used to separate filename
from line number and from matched line.

Note that this does not protect files with filename containing
embedded newline.  This would be hard but doable for text files, and
harder or even currently impossible with binary files: git does not
quote filename in

  "Binary file <foo> matches"

message, but new `--break` and/or `--header` options to git-grep could
help here.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This is what I did after fixing previous issue, after looking at current
code.  Hopefully nobody sane uses filenames with embedded newlines...

  http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html

 gitweb/gitweb.perl |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index fa58156..f884dfe 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -5836,7 +5836,7 @@ sub git_search_files {
 	my %co = @_;
 
 	local $/ = "\n";
-	open my $fd, "-|", git_cmd(), 'grep', '-n',
+	open my $fd, "-|", git_cmd(), 'grep', '-n', '-z',
 		$search_use_regexp ? ('-E', '-i') : '-F',
 		$searchtext, $co{'tree'}
 			or die_error(500, "Open git-grep failed");
@@ -5858,7 +5858,8 @@ sub git_search_files {
 			$file = $1;
 			$binary = 1;
 		} else {
-			(undef, $file, $lno, $ltext) = split(/:/, $line, 4);
+			($file, $lno, $ltext) = split(/\0/, $line, 3);
+			$file =~ s/^$co{'tree'}://;
 		}
 		if ($file ne $lastfile) {
 			$lastfile and print "</td></tr>\n";
-- 
1.7.6

^ permalink raw reply related

* [PATCH] parse_object: try internal cache before reading object db
From: Jeff King @ 2012-01-05 21:00 UTC (permalink / raw)
  To: git; +Cc: git-dev

When parse_object is called, we do the following:

  1. read the object data into a buffer via read_sha1_file

  2. call parse_object_buffer, which then:

     a. calls the appropriate lookup_{commit,tree,blob,tag}
	to either create a new "struct object", or to find
	an existing one. We know the appropriate type from
	the lookup in step 1.

     b. calls the appropriate parse_{commit,tree,blob,tag}
        to parse the buffer for the new (or existing) object

In step 2b, all of the called functions are no-ops for
object "X" if "X->object.parsed" is set. I.e., when we have
already parsed an object, we end up going to a lot of work
just to find out at a low level that there is nothing left
for us to do (and we throw away the data from read_sha1_file
unread).

We can optimize this by moving the check for "do we have an
in-memory object" from 2a before the expensive call to
read_sha1_file in step 1.

This might seem circular, since step 2a uses the type
information determined in step 1 to call the appropriate
lookup function. However, we can notice that all of the
lookup_* functions are backed by lookup_object. In other
words, all of the objects are kept in a master hash table,
and we don't actually need the type to do the "do we have
it" part of the lookup, only to do the "and create it if it
doesn't exist" part.

This can save time whenever we call parse_object on the same
sha1 twice in a single program. Some code paths already
perform this optimization manually, with either:

  if (!obj->parsed)
	  obj = parse_object(obj->sha1);

if you already have a "struct object", or:

  struct object *obj = lookup_unknown_object(sha1);
  if (!obj || !obj->parsed)
	  obj = parse_object(sha1);

if you don't.  This patch moves the optimization into
parse_object itself.

Most git operations won't notice any impact. Either they
don't parse a lot of duplicate sha1s, or the calling code
takes special care not to re-parse objects. I timed two
code paths that do benefit (there may be more, but these two
were immediately obvious and easy to time).

The first is fast-export, which calls parse_object on each
object it outputs, like this:

  object = parse_object(sha1);
  if (!object)
	  die(...);
  if (object->flags & SHOWN)
	  return;

which means that just to realize we have already shown an
object, we will read the whole object from disk!

With this patch, my best-of-five time for "fast-export --all" on
git.git dropped from 26.3s to 21.3s.

The second case is upload-pack, which will call parse_object
for each advertised ref (because it needs to peel tags to
show "^{}" entries). This doesn't matter for most
repositories, because they don't have a lot of refs pointing
to the same objects. However, if you have a big alternates
repository with a shared object db for a number of child
repositories, then the alternates repository will have
duplicated refs representing each of its children.

For example, GitHub's alternates repository for git.git has
~120,000 refs, of which only ~3200 are unique. The time for
upload-pack to print its list of advertised refs dropped
from 3.4s to 0.76s.

Signed-off-by: Jeff King <peff@peff.net>
---
 object.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/object.c b/object.c
index d8d09f9..6b06297 100644
--- a/object.c
+++ b/object.c
@@ -191,10 +191,15 @@ struct object *parse_object(const unsigned char *sha1)
 	enum object_type type;
 	int eaten;
 	const unsigned char *repl = lookup_replace_object(sha1);
-	void *buffer = read_sha1_file(sha1, &type, &size);
+	void *buffer;
+	struct object *obj;
+
+	obj = lookup_object(sha1);
+	if (obj && obj->parsed)
+		return obj;

+	buffer = read_sha1_file(sha1, &type, &size);
 	if (buffer) {
-		struct object *obj;
 		if (check_sha1_signature(repl, buffer, size, typename(type)) < 0) {
 			free(buffer);
 			error("sha1 mismatch %s\n", sha1_to_hex(repl));
-- 
1.7.6.5.6.ge6248

^ permalink raw reply related

* Re: [PATCH v2] Limit refs to fetch to minimum in shallow clones
From: Junio C Hamano @ 2012-01-05 21:25 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Shawn O. Pearce
In-Reply-To: <1325743516-14940-1-git-send-email-pclouds@gmail.com>

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> The main purpose of shallow clones is to reduce download by only
> fetching objects up to a certain depth from the given refs. The number
> of objects depends on how many refs to follow. So:
>
>  - Only fetch HEAD or the ref specified by --branch
>  - Only fetch tags that point to downloaded objects
>
> More tags/branches can be fetched later using git-fetch as usual.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  Only lightly tested, but seems to work.

Thanks.

Perhaps you would want to add tests so that you do not have to say
"lightly tested"?

> diff --git a/builtin/clone.c b/builtin/clone.c
> index efe8b6c..8de9248 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -48,6 +48,7 @@ static int option_verbosity;
>  static int option_progress;
>  static struct string_list option_config;
>  static struct string_list option_reference;
> +static char *src_ref_prefix = "refs/heads/";

Would this be const?

>  static int opt_parse_reference(const struct option *opt, const char *arg, int unset)
>  {
> @@ -427,9 +428,27 @@ static struct ref *wanted_peer_refs(const struct ref *refs,
>  	struct ref *local_refs = head;
>  	struct ref **tail = head ? &head->next : &local_refs;
>  
> -	get_fetch_map(refs, refspec, &tail, 0);
> -	if (!option_mirror)
> -		get_fetch_map(refs, tag_refspec, &tail, 0);
> +	if (option_depth) {
> +		struct ref *remote_head = NULL;
> +
> +		if (!option_branch)
> +			remote_head = guess_remote_head(head, refs, 0);
> +		else {
> +			struct strbuf sb = STRBUF_INIT;
> +			strbuf_addstr(&sb, src_ref_prefix);
> +			strbuf_addstr(&sb, option_branch);
> +			remote_head = find_ref_by_name(refs, sb.buf);
> +			strbuf_release(&sb);
> +		}
> +
> +		if (remote_head)
> +			get_fetch_map(remote_head, refspec, &tail, 0);

What happens when we fail to find any remote_head and make no call to
get_fetch_map() here?  I am wondering if that should trigger an error
here.

Also this breaks 5500 for rather obvious reasons, as the point of this
patch is to reduce the object transferred when a shallow clone is made.

Perhaps there should be an option to give users the historical "all
branches equally shallow" behaviour?

^ permalink raw reply

* Re: [PATCH] parse_object: try internal cache before reading object db
From: Junio C Hamano @ 2012-01-05 21:35 UTC (permalink / raw)
  To: Jeff King; +Cc: git, git-dev
In-Reply-To: <20120105210001.GA30549@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> This might seem circular, since step 2a uses the type
> information determined in step 1 to call the appropriate
> lookup function. However, we can notice that all of the
> lookup_* functions are backed by lookup_object. In other
> words, all of the objects are kept in a master hash table,
> and we don't actually need the type to do the "do we have
> it" part of the lookup,...

The only case that might matter is where you read one object, you have
written another object of a different type but that happens to hash to the
same SHA-1 value. The other existing optimizations do not take that into
account, so I do not think there is any new issue here.

> For example, GitHub's alternates repository for git.git has
> ~120,000 refs, of which only ~3200 are unique. The time for
> upload-pack to print its list of advertised refs dropped
> from 3.4s to 0.76s.

Nice. I am more impressed by 120k/3.4 than 3.2k/0.76, though ;-)

Thanks.

^ permalink raw reply

* Re: [PATCH] parse_object: try internal cache before reading object db
From: Jeff King @ 2012-01-05 21:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, git-dev
In-Reply-To: <7vipkpn87d.fsf@alter.siamese.dyndns.org>

On Thu, Jan 05, 2012 at 01:35:50PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > This might seem circular, since step 2a uses the type
> > information determined in step 1 to call the appropriate
> > lookup function. However, we can notice that all of the
> > lookup_* functions are backed by lookup_object. In other
> > words, all of the objects are kept in a master hash table,
> > and we don't actually need the type to do the "do we have
> > it" part of the lookup,...
> 
> The only case that might matter is where you read one object, you have
> written another object of a different type but that happens to hash to the
> same SHA-1 value. The other existing optimizations do not take that into
> account, so I do not think there is any new issue here.

Yeah, I tried to think of issues like that. Even if you protected
against that, you'd still have the issue of reading one object, then
writing another of the _same_ type but with different content. We
wouldn't notice with the current code path (we'd just recreationally
read the data from disk and then throw it away).

The worst potential problem I could come up with is if you somehow had
an object whose "parsed" flag was set, but somehow didn't have its other
fields set (like type). But I think you'd have to be abusing the lookup
functions pretty hard to get into such a state (how would you be parsing
if you didn't know the type?). The parsed flag only gets set by the
type-specific lookup functions.

So I think it is safe short of somebody doing some horrible manual
munging of a "struct object".

> > For example, GitHub's alternates repository for git.git has
> > ~120,000 refs, of which only ~3200 are unique. The time for
> > upload-pack to print its list of advertised refs dropped
> > from 3.4s to 0.76s.
> 
> Nice. I am more impressed by 120k/3.4 than 3.2k/0.76, though ;-)

You can thank optimized zlib for that. We spent 60% of our time there.
:)

-Peff

^ permalink raw reply

* Re: [PATCH] parse_object: try internal cache before reading object db
From: Junio C Hamano @ 2012-01-05 21:55 UTC (permalink / raw)
  To: Jeff King; +Cc: git, git-dev
In-Reply-To: <20120105214941.GA31836@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> The worst potential problem I could come up with is if you somehow had
> an object whose "parsed" flag was set, but somehow didn't have its other
> fields set (like type).
> ...
> So I think it is safe short of somebody doing some horrible manual
> munging of a "struct object".

Yeah, I was worried about codepaths like commit-pretty-printing might be
mucking with the contents of commit->buffer, perhaps reencoding the text
and then calling parse_object() to get the unmodified original back, or
something silly like that. But the lookup_object() call at the beginning
of the parse_object() already prevents us from doing such a thing, so we
should be OK, I would think.

^ permalink raw reply

* What's cooking in git.git (Jan 2012, #01; Thu, 5)
From: Junio C Hamano @ 2012-01-05 21:55 UTC (permalink / raw)
  To: git

Here are the topics that have been cooking.  Commits prefixed with '-' are
only in 'pu' (proposed updates) while commits prefixed with '+' are in
'next'.

I'll be tagging 1.7.9-rc0 after merging a few topics to "master" to make
it feature complete for the upcoming release tomorrow.

Here are the repositories that have my integration branches:

With maint, master, next, pu, todo:

        git://git.kernel.org/pub/scm/git/git.git
        git://repo.or.cz/alt-git.git
        https://code.google.com/p/git-core/
        https://github.com/git/git

With only maint and master:

        git://git.sourceforge.jp/gitroot/git-core/git.git
        git://git-core.git.sourceforge.net/gitroot/git-core/git-core

With all the topics and integration branches:

        https://github.com/gitster/git

The preformatted documentation in HTML and man format are found in:

        git://git.kernel.org/pub/scm/git/git-{htmldocs,manpages}.git/
        git://repo.or.cz/git-{htmldocs,manpages}.git/
        https://code.google.com/p/git-{htmldocs,manpages}.git/
        https://github.com/gitster/git-{htmldocs,manpages}.git/

--------------------------------------------------
[New Topics]

* ss/git-svn-prompt-sans-terminal (2012-01-04) 3 commits
 - fixup! 15eaaf4
 - git-svn, perl/Git.pm: extend Git::prompt helper for querying users
  (merged to 'next' on 2012-01-05 at 954f125)
 + perl/Git.pm: "prompt" helper to honor GIT_ASKPASS and SSH_ASKPASS

The bottom one has been replaced with a rewrite based on comments from
Ævar. The second one needs more work, both in perl/Git.pm and prompt.c, to
give precedence to tty over SSH_ASKPASS when terminal is available.

I think it is OK to include the first one in the upcoming release, but we
may want to wait and defer both to the next cycle.

* pw/p4-view-updates (2012-01-03) 6 commits
  (merged to 'next' on 2012-01-03 at c3b5872)
 + git-p4: view spec documentation
 + git-p4: rewrite view handling
 + git-p4: support single file p4 client view maps
 + git-p4: sort client views by reverse View number
 + git-p4: fix test for unsupported P4 Client Views
 + git-p4: test client view handling

Will merge to 'master' by 1.7.9 final.
Unless real git-p4 users object (I am not one of them, so I cannot really
judge), that is.

* cb/git-daemon-tests (2012-01-04) 1 commit
  (merged to 'next' on 2012-01-05 at 86f3e93)
 + daemon: add tests

It stirred a related discussion on how the process termination should be
handled in the daemon, but the test queued should be OK as-is on systems
that have "pkill" (which is outside POSIX).

* jc/show-sig (2012-01-05) 6 commits
  (merged to 'next' on 2012-01-05 at 5da3ae2)
 + log --show-signature: reword the common two-head merge case
 + log-tree: show mergetag in log --show-signature output
 + log-tree.c: small refactor in show_signature()
 + commit --amend -S: strip existing gpgsig headers
 + verify_signed_buffer: fix stale comment
 + Merge branch 'jc/signed-commit' and 'jc/pull-signed-tag'
 (this branch uses jc/signed-commit.)

Finishing touches to the already graduated "pull signed tags" topic.

Will merge to 'master' by 1.7.9 final.

* jm/stash-diff-disambiguate (2012-01-01) 1 commit
  (merged to 'next' on 2012-01-05 at 75a283b)
 + stash: Don't fail if work dir contains file named 'HEAD'

Will merge to 'master' by 1.7.9 final.

* mm/maint-gitweb-project-maxdepth (2012-01-04) 1 commit
 - gitweb: accept trailing "/" in $project_list

Looked quite sensible.
Will merge to 'master' by 1.7.9 final.

* nd/shallow-clone-without-tag-following (2012-01-05) 1 commit
 - Limit refs to fetch to minimum in shallow clones

Needs adjustment of t5500 at least, and possibly an option to ask for the
traditional "shallowly clone all branches" behaviour.

* jk/parse-object-cached (2012-01-05) 1 commit
 - parse_object: try internal cache before reading object db

This is a bit scary change, but I do not think of a way it would break
anything that is currently working correctly.

* jn/maint-gitweb-grep-fix (2012-01-05) 2 commits
 - gitweb: Harden "grep" search against filenames with ':'
 - gitweb: Fix file links in "grep" search

Waiting for a confirmation from bug reporter.

--------------------------------------------------
[Graduated to "master"]

* jv/maint-config-set (2011-12-27) 1 commit
  (merged to 'next' on 2011-12-27 at 551ac8f)
 + Fix an incorrect reference to --set-all.

* pw/p4-docs-and-tests (2011-12-27) 11 commits
  (merged to 'next' on 2011-12-28 at 8acf26e)
 + git-p4: document and test submit options
 + git-p4: test and document --use-client-spec
 + git-p4: test --keep-path
 + git-p4: test --max-changes
 + git-p4: document and test --import-local
 + git-p4: honor --changesfile option and test
 + git-p4: document and test clone --branch
 + git-p4: test cloning with two dirs, clarify doc
 + git-p4: clone does not use --git-dir
 + git-p4: introduce asciidoc documentation
 + rename git-p4 tests

--------------------------------------------------
[Stalled]

* bw/maint-t8006-sed-incomplete-line (2012-01-03) 1 commit
 - Work around sed portability issue in t8006-blame-textconv

Waiting for a clarification of the reasoning in the log message.

* nd/index-pack-no-recurse (2011-12-27) 4 commits
 - fixup! 3413d4d
 - index-pack: eliminate unlimited recursion in get_delta_base()
 - index-pack: eliminate recursion in find_unresolved_deltas
 - Eliminate recursion in setting/clearing marks in commit list

Expecting a reroll.

* jc/advise-push-default (2011-12-18) 1 commit
 - push: hint to use push.default=upstream when appropriate

Peff had a good suggestion outlining an updated code structure so that
somebody new can try to dip his or her toes in the development. Any
takers?

Waiting for a reroll.

* mh/ref-api-rest (2011-12-12) 35 commits
 - repack_without_ref(): call clear_packed_ref_cache()
 - read_packed_refs(): keep track of the directory being worked in
 - is_refname_available(): query only possibly-conflicting references
 - refs: read loose references lazily
 - read_loose_refs(): take a (ref_entry *) as argument
 - struct ref_dir: store a reference to the enclosing ref_cache
 - sort_ref_dir(): take (ref_entry *) instead of (ref_dir *)
 - do_for_each_ref_in_dir*(): take (ref_entry *) instead of (ref_dir *)
 - add_entry(): take (ref_entry *) instead of (ref_dir *)
 - search_ref_dir(): take (ref_entry *) instead of (ref_dir *)
 - find_containing_direntry(): use (ref_entry *) instead of (ref_dir *)
 - add_ref(): take (ref_entry *) instead of (ref_dir *)
 - read_packed_refs(): take (ref_entry *) instead of (ref_dir *)
 - find_ref(): take (ref_entry *) instead of (ref_dir *)
 - is_refname_available(): take (ref_entry *) instead of (ref_dir *)
 - get_loose_refs(): return (ref_entry *) instead of (ref_dir *)
 - get_packed_refs(): return (ref_entry *) instead of (ref_dir *)
 - refs: wrap top-level ref_dirs in ref_entries
 - get_ref_dir(): keep track of the current ref_dir
 - do_for_each_ref(): only iterate over the subtree that was requested
 - refs: sort ref_dirs lazily
 - sort_ref_dir(): do not sort if already sorted
 - refs: store references hierarchically
 - refs.c: rename ref_array -> ref_dir
 - struct ref_entry: nest the value part in a union
 - check_refname_component(): return 0 for zero-length components
 - free_ref_entry(): new function
 - refs.c: reorder definitions more logically
 - is_refname_available(): reimplement using do_for_each_ref_in_array()
 - names_conflict(): simplify implementation
 - names_conflict(): new function, extracted from is_refname_available()
 - repack_without_ref(): reimplement using do_for_each_ref_in_array()
 - do_for_each_ref_in_arrays(): new function
 - do_for_each_ref_in_array(): new function
 - do_for_each_ref(): correctly terminate while processesing extra_refs

The API for extra anchoring points may require rethought first; that would
hopefully make the "ref" part a lot simpler.

Waiting for a reroll.

* jc/split-blob (2011-12-01) 6 commits
 . WIP (streaming chunked)
 - chunked-object: fallback checkout codepaths
 - bulk-checkin: support chunked-object encoding
 - bulk-checkin: allow the same data to be multiply hashed
 - new representation types in the packstream
 - varint-in-pack: refactor varint encoding/decoding

Not ready.

At least pack-objects and fsck need to learn the new encoding for the
series to be usable locally, and then index-pack/unpack-objects needs to
learn it to be used remotely.

* jc/advise-i18n (2011-12-22) 1 commit
 - i18n of multi-line advice messages

Allow localization of advice messages that tend to be longer and
multi-line formatted. For now this is deliberately limited to advise()
interface and not vreportf() in general as touching the latter has
interactions with error() that has plumbing callers whose prefix "error: "
should never be translated.

--------------------------------------------------
[Cooking]

* jh/fetch-head-update (2012-01-03) 1 commit
  (merged to 'next' on 2012-01-04 at b5778e1)
 + write first for-merge ref to FETCH_HEAD first

Will merge to 'master' by 1.7.9 final.

* jc/signed-commit (2011-11-29) 5 commits
  (merged to 'next' on 2011-12-21 at 8fcbf00)
 + gpg-interface: allow use of a custom GPG binary
 + pretty: %G[?GS] placeholders
 + test "commit -S" and "log --show-signature"
 + log: --show-signature
 + commit: teach --gpg-sign option
 (this branch is used by jc/show-sig.)

The infrastructure this series adds is used by the finishing touches to
the earlier "pull signed tags" topic, so this will graduate to "master"
together with it when the latter matures, hopefully before 1.7.9 final.

--------------------------------------------------
[Discarded]

* ss/git-svn-askpass (2011-12-27) 5 commits
 - make askpass_prompt a global prompt method for asking users
 - ignore empty *_ASKPASS variables
 - honour *_ASKPASS for querying username and for querying further actions like unknown certificates
 - switch to central prompt method
 - add central method for prompting a user using GIT_ASKPASS or SSH_ASKPASS

This has become more about "prompt without terminal", and was rerolled
into a two-patch series, which is structured a lot nicer than this
original.

^ permalink raw reply

* Re: [PATCH] parse_object: try internal cache before reading object db
From: Jeff King @ 2012-01-05 22:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, git-dev
In-Reply-To: <7vehvdn7at.fsf@alter.siamese.dyndns.org>

On Thu, Jan 05, 2012 at 01:55:22PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The worst potential problem I could come up with is if you somehow had
> > an object whose "parsed" flag was set, but somehow didn't have its other
> > fields set (like type).
> > ...
> > So I think it is safe short of somebody doing some horrible manual
> > munging of a "struct object".
> 
> Yeah, I was worried about codepaths like commit-pretty-printing might be
> mucking with the contents of commit->buffer, perhaps reencoding the text
> and then calling parse_object() to get the unmodified original back, or
> something silly like that. But the lookup_object() call at the beginning
> of the parse_object() already prevents us from doing such a thing, so we
> should be OK, I would think.

Er, without my patch there is no such lookup_object, is there?

What saves you is that the parse_*_buffer functions all do nothing when
the object.parsed flag is set, and the code I added makes sure that
object.parsed is set in the object that lookup_object returns.

So yeah, anytime you tweak the contents of commit->buffer but don't
unset the "parsed" flag, you are asking for trouble.

Here's another possible code path where the behavior is changed:

  1. You set the global save_commit_buffer to 0.

  2. You call parse_commit(commit) on an unparsed commit object, which
     does not save the commit buffer, but does set
     commit->object.parsed.

  3. You call parse_object(commit->object.sha1).

     a. Without my patch, we read the file contents again, do _not_
        re-parse them (because we look up the existing object and notice
        that its "parsed" flag is set), but we _do_ assign the buffer to
        commit->buffer.

     b. With my patch, we see that there is an existing object that is
        already parsed, and return early. commit->buffer remains NULL.

I would argue that this doesn't matter, since "parse_commit" uses the
exact same optimization (it returns early without setting commit->buffer
if the parsed flag is set). So any program turning off
save_commit_buffer has to be ready to deal with a NULL commit->buffer in
the first place. The only exception would be a program that then tries
to fill in the commit->buffer field by manually running parse_object on
an already-parsed, buffer-less commit object. I don't think we do that.

You can verify that commit->buffer is the only place where these issues
can happen by following the logic in parse_object_buffer.

Sorry to belabor the discussion, but this is such a core piece of code,
I want to make sure the optimization isn't hurting anybody (I don't
think it is, and certainly the tests are all happy, but I think talking
through the cases is a good thing).

-Peff

^ permalink raw reply

* Re: git-subtree
From: David A. Greene @ 2012-01-05 22:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Greene, git
In-Reply-To: <7vboqino1r.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> David Greene <dag@cray.com> writes:
>
>> How does the git community want the patch presented?  Right now it's one
>> monolithic thing.  I understand that isn't ideal but I don't think
>> incorporating the entire GitHub master history is necessarily the best
>> idea either.
>
> It depends on the longer term vision of how the result of this submission
> will evolve and more importantly, where you fit in the piture.

This is a very fair question.  I'll try to answer it as best I can.  I
think it mostly jibes with your suggested possible answer.

I've been using git-subtree for about six months now and as an
enthusiastic user who wants to introduce this too into my daily
corporate work environment, I'd like to see it incorporated as an
officially-supported git tool to make that introduction easier.

So my intention is to make git-subtree an integral part of the core git
suite and take on further maintenance and development along with Avery
and the other git-subtree developers.

I have not previously been a contributor to git-subtree and don't know
the code at all but I am a quick learner.  The actual git-subtree code
itself is not overwhelmingly large and strikes me as a tractable
learning project.

I approached Avery about submitting git-subtree to become part of the
core git suite.  He responded positively but indicated he does not have
the cycles to do it at this time.  He asked whether I could take on the
job and I agreed.

He mentioned that he'd talked to some developers at GitTogether and got
a positive reponse there.  I don't know whether you were part of those
discussions.  My impression is that the GitTogether discussions went
well and there was general agreement that git-subtree would be a
valuable addition to the core git suite.

I am perfectly happy to put this in contrib/ first if it eases the
introduction.  I would like to move it to the subcommand area after
getting everything in tip-top shape.  What I don't want is for it to
languish forever in contrib/.  That means I'll need some guideline of
the changes/standards necessary to qualify it for transition from
contrib/ to an official subcommand.  I expect we'll develop that as we
go along but I hope the git community has some institutional knowledge
gathered from previous experience.

I have asked Avery how he wants to do maintenance going forward.  I
haven't heard back from him yet so I can't speak to whether the existing
GitHub project will continue or not.  I'll pass along his thoughts when
I get them.

> Your answer might differ, of course, but the point is that we would need
> to weigh pros and cons between inclusion of it in the git repository and
> keeping it in Avery's repository and have him and his contributors
> maintain, enhance and distribute it from there, and it largely depends on
> the nature of the submission. Is it a "throw it over the wall" dump of a
> large code of unknown quality that we need to clean up first without
> knowing the vision of how "git subtree" should evolve by original author
> and/or people who have been actively developing it?

I certainly don't want this to be an "over the wall" operation.  I
intend to participate in maintenance of git-subtree in the official git
repository.

So I'll go ahead and work on adding this to contrib/.  Once I get a
response from Avery about his long-term vision I'll pass that along and
we can have further discussion.  I may start sending patches to the
mailing list for review before hearing back from him, however.

Sound good?

                             -Dave

^ permalink raw reply

* Re: git-subtree
From: David A. Greene @ 2012-01-05 22:18 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: David Greene, git, Junio C Hamano
In-Reply-To: <CALkWK0k+AwCsizZFwbKKxuz0B4xLoyC4hAy3WRD=sLCq-HvvCw@mail.gmail.com>

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Hi again,
>
> [+CC: Junio Hamano, our maintainer]
>
> David A. Greene wrote:
>> I've read that document. The issue is that I didn't develop the code,
>> Avery did.
>
> Not an issue as long as you have Avery's signoff.

As in a signed-off-by log entry on the commit?  I did a commit -s to add
my own signed-off-by tag and added a "From:" line in accordance with the
SubmittingPatches document:

  "If you are forwarding a patch from somebody else, optionally, at the
   beginning of the e-mail message just before the commit message starts,
   you can put a "From: " line to name that person."

I have not used signoffs before in my day-to-day git flow.  How do I go
about getting one from Avery and incorporating it into the history in an
autheticated way?  I'm assuming you don't want me to forge his sign-off.
:)

>> It's a lot of time to learn a completely new codebase. I was hoping
>> to submit something soon and then learn the codebase gradually during
>> maintenance/further development.
>
> We certainly don't want badly reviewed code that nobody understands
> floating around in the codebase

Certainly, I'm not trying to avoid review, just trying to figure out the
most efficient mechanics.

> so, I'd suggest sending out whatever you think is appropriate for the
> first round of reviews, and see how things shape up from there.

Fair enough.  I think I will take Jeff's suggested route and see where
that goes.

>> How have completely new tools be introduced into the git mainline in the
>> past?
>
> Yes.  For an example of something I was involved with but didn't
> author, see vcs-svn/.

Ok, I'll look into that.  Thanks for the pointer.

                             -Dave

^ permalink raw reply

* Re: git-subtree
From: David A. Greene @ 2012-01-05 16:33 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: David Greene, git, Junio C Hamano
In-Reply-To: <CALkWK0k+AwCsizZFwbKKxuz0B4xLoyC4hAy3WRD=sLCq-HvvCw@mail.gmail.com>

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Hi again,
>
> [+CC: Junio Hamano, our maintainer]
>
> David A. Greene wrote:
>> I've read that document.  The issue is that I didn't develop the code,
>> Avery did.
>
> Not an issue as long as you have Avery's signoff.

As in a signed-off-by log entry on the commit?  I did a commit -s to add
my own signed-off-by tag and added a "From:" line in accordance with the
SubmittingPatches document:

  "If you are forwarding a patch from somebody else, optionally, at the
   beginning of the e-mail message just before the commit message starts,
   you can put a "From: " line to name that person."

I have not used signoffs before in my day-to-day git flow.  How do I go
about getting one from Avery and incorporating it into the history in an
autheticated way?  I'm assuming you don't want me to forge his sign-off.
:)

>> It's a lot of time to learn a
>> completely new codebase.  I was hoping to submit something soon and then
>> learn the codebase gradually during maintenance/further development.
>
> We certainly don't want badly reviewed code that nobody understands
> floating around in the codebase- 

Certainly, I'm not trying to avoid review, just trying to figure out the
most efficient mechanics.

> so, I'd suggest sending out whatever you think is appropriate for the
> first round of reviews, and see how things shape up from there.

Fair enough.  I think I will take Jeff's suggested route and see where
that goes.

>> How have completely new tools be introduced into the git mainline in the
>> past?
>
> Yes.  For an example of something I was involved with but didn't
> author, see vcs-svn/.

Ok, I'll look into that.  Thanks for the pointer.

                             -Dave

^ permalink raw reply

* Re: git-subtree
From: David A. Greene @ 2012-01-05 16:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Greene, git
In-Reply-To: <7vboqino1r.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> David Greene <dag@cray.com> writes:
>
>> How does the git community want the patch presented?  Right now it's one
>> monolithic thing.  I understand that isn't ideal but I don't think
>> incorporating the entire GitHub master history is necessarily the best
>> idea either.
>
> It depends on the longer term vision of how the result of this submission
> will evolve and more importantly, where you fit in the piture.

This is a very fair question.  I'll try to answer it as best I can.  I
think it mostly jibes with your suggested possible answer.

I've been using git-subtree for about six months now and as an
enthusiastic user who wants to introduce this too into my daily
corporate work environment, I'd like to see it incorporated as an
officially-supported git tool to make that introduction easier.

So my intention is to make git-subtree an integral part of the core git
suite and take on further maintenance and development along with Avery
and the other git-subtree developers.

I have not previously been a contributor to git-subtree and don't know
the code at all but I am a quick learner.  The actual git-subtree code
itself is not overwhelmingly large and strikes me as a tractable
learning project.

I approached Avery about submitting git-subtree to become part of the
core git suite.  He responded positively but indicated he does not have
the cycles to do it at this time.  He asked whether I could take on the
job and I agreed.

He mentioned that he'd talked to some developers at GitTogether and got
a positive reponse there.  I don't know whether you were part of those
discussions.  My impression is that the GitTogether discussions went
well and there was general agreement that git-subtree would be a
valuable addition to the core git suite.

I am perfectly happy to put this in contrib/ first if it eases the
introduction.  I would like to move it to the subcommand area after
getting everything in tip-top shape.  What I don't want is for it to
languish forever in contrib/.  That means I'll need some guideline of
the changes/standards necessary to qualify it for transition from
contrib/ to an official subcommand.  I expect we'll develop that as we
go along but I hope the git community has some institutional knowledge
gathered from previous experience.

I have asked Avery how he wants to do maintenance going forward.  I
haven't heard back from him yet so I can't speak to whether the existing
GitHub project will continue or not.  I'll pass along his thoughts when
I get them.

> Your answer might differ, of course, but the point is that we would need
> to weigh pros and cons between inclusion of it in the git repository and
> keeping it in Avery's repository and have him and his contributors
> maintain, enhance and distribute it from there, and it largely depends on
> the nature of the submission. Is it a "throw it over the wall" dump of a
> large code of unknown quality that we need to clean up first without
> knowing the vision of how "git subtree" should evolve by original author
> and/or people who have been actively developing it?

I certainly don't want this to be an "over the wall" operation.  I
intend to participate in maintenance of git-subtree in the official git
repository.

So I'll go ahead and work on adding this to contrib/.  Once I get a
response from Avery about his long-term vision I'll pass that along and
we can have further discussion.  I may start sending patches to the
mailing list for review before hearing back from him, however.

Sound good?

                             -Dave

^ permalink raw reply

* Re: git-subtree
From: David A. Greene @ 2012-01-05 16:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Ramkumar Ramachandra, David Greene, git
In-Reply-To: <20120105154740.GA11475@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> I think this is also somewhat different in that git-subtree has a
> multi-year history in git that we may want to keep. So it is more

I agree there may be some value in preserving this history.

> The biggest decision is whether or not to import the existing history.

I agree.  I will leave that decision to the more experienced git
developers.  I'm happy to work either way.

> If we want to throw away the existing history, then I think you end up
> doing the same munging as the latter option above, and then just make a
> single patch out of it instead of a merge.

Right.  That's the approach I've taken for now but it's easy to switch.
There aren't that many changes.

> I don't use git-subtree, but just glancing over the repo, it looks like
> that munging is mostly:
>
>   1. git-subtree.sh stays, and gets added to git.git's top-level Makefile

Done.

>   2. the test.sh script gets adapted into t/tXXXX-subtree.sh

Done.

>   3. git-subtree.txt goes into Documentation/

Done.

>   4. The rest of the files are infrastructure that can go away, as they
>      are a subset of what git.git already contains.

Done.

I have a patch that does all of the above but it is one monolithic blob.
Like I said, the changes aren't extensive so it's easy for me to change
strategies.

> I'd favor keeping the history and doing the munge-overlay thing.

Ok, that sounds fine to me.  I'll do that in a private branch.  What
should I send as patches to the mailing list?  I'm assuming we don't
want [PATCH 235/12342], etc. sent to the list chronicling the entire
history.  :)

> Although part of me wants to join the histories in a subtree so that we
> can use "git subtree" to do it (which would just be cool),

Heh.  I thought about that too.  :)

> I think the resulting code layout doesn't make much sense unless
> git-subtree is going to be maintained separately.

Yeah, I agree.

                                -Dave

^ permalink raw reply

* Re: git-subtree
From: David A. Greene @ 2012-01-05 22:16 UTC (permalink / raw)
  To: Jeff King; +Cc: Ramkumar Ramachandra, David Greene, git
In-Reply-To: <20120105154740.GA11475@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> I think this is also somewhat different in that git-subtree has a
> multi-year history in git that we may want to keep. So it is more

I agree there may be some value in preserving this history.

> The biggest decision is whether or not to import the existing history.

I agree.  I will leave that decision to the more experienced git
developers.  I'm happy to work either way.

> If we want to throw away the existing history, then I think you end up
> doing the same munging as the latter option above, and then just make a
> single patch out of it instead of a merge.

Right.  That's the approach I've taken for now but it's easy to switch.
There aren't that many changes.

> I don't use git-subtree, but just glancing over the repo, it looks like
> that munging is mostly:
>
>   1. git-subtree.sh stays, and gets added to git.git's top-level Makefile

Done.

>   2. the test.sh script gets adapted into t/tXXXX-subtree.sh

Done.

>   3. git-subtree.txt goes into Documentation/

Done.

>   4. The rest of the files are infrastructure that can go away, as they
>      are a subset of what git.git already contains.

Done.

I have a patch that does all of the above but it is one monolithic blob.
Like I said, the changes aren't extensive so it's easy for me to change
strategies.

> I'd favor keeping the history and doing the munge-overlay thing.

Ok, that sounds fine to me.  I'll do that in a private branch.  What
should I send as patches to the mailing list?  I'm assuming we don't
want [PATCH 235/12342], etc. sent to the list chronicling the entire
history.  :)

> Although part of me wants to join the histories in a subtree so that we
> can use "git subtree" to do it (which would just be cool),

Heh.  I thought about that too.  :)

> I think the resulting code layout doesn't make much sense unless
> git-subtree is going to be maintained separately.

Yeah, I agree.

                                -Dave

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox