Git development

Git development
 help / color / mirror / Atom feed

* Re: [RFC] deprecating and eventually removing "git relink"?
From: Chris Packham @ 2011-11-14  9:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Miles Bader, git
In-Reply-To: <7v62inkymg.fsf@alter.siamese.dyndns.org>

On 14/11/11 19:27, Junio C Hamano wrote:
> Miles Bader <miles@gnu.org> writes:
> 
>> Do you mean a more elaborate UI that does this nicely...?
> 
> Yes, that is what I meant. I also have a feeling that people would prefer
> to have an option that treats these two repositories equally; your
> illustration makes one a subordinate to the other.

Not sure if it's what you're after but there was this patch [1] that I
was kicking around a while back. I've still got the code in an old
branch if there is interest in resurrecting it. It looks like I started
addressing Junio's comments and never posted v3.

[1] http://article.gmane.org/gmane.comp.version-control.git/143164

^ permalink raw reply

* Re: [RFC] deprecating and eventually removing "git relink"?
From: Simon Brenner @ 2011-11-14  8:48 UTC (permalink / raw)
  To: git
In-Reply-To: <buomxbzutjm.fsf@dhlpc061.dev.necel.com>

I think one of the most annoying aspects of alternates (beyond the
hassle of adding/removing them except using clone --reference) is the
danger of losing data if you aren't absolutely sure that your
alternate is stable and won't ever lose references to objects.

If the alternate just had links to the referring repositories, I think
this hole could be neatly closed.

On Mon, Nov 14, 2011 at 7:06 AM, Miles Bader <miles@gnu.org> wrote:
> It might be nice to have a mechanism where new objects would update
> the _alternate_ rather than the object-store in the tree where the
> command was run... then you could easily have a bunch of trees using a
> central object store without needing to update the central store
> occasionally by hand (and do gc in its "clients")...

This sounds like a nice way forward: replace/extend the current
alternates system with support for a shared object store that is
"intelligently" shared so that it can be gc:d based on all refs from
all referring repositories. I imagine it would be something very much
like a bare repository - except it wouldn't have any refs of its own,
just a list of other repositories it should search for refs when
GC:ing.

The object store currently built into each git repository could even
become a special case of that: a shared object store (that happens to
reside under .git) with a single referring repository (the parent .git
dir). If the location of the object store is configurable, clone
--reference could simply point the new repository directly to the
shared store instead of ever setting up a local object store.

// Simon

^ permalink raw reply

* Re: git behaviour question regarding SHA-1 and commits
From: Johannes Sixt @ 2011-11-14  7:39 UTC (permalink / raw)
  To: vinassa vinassa; +Cc: git
In-Reply-To: <CAJuRt+r9BjYcead6hgzdUT0Bisz1D48cegqkoJ0S537VMYBy_g@mail.gmail.com>

Am 11/13/2011 18:04, schrieb vinassa vinassa:
> I am wondering about how git behaves currently, if I kinda win the
> lottery of the universe, and happen to create a commit with a SHA-1
> that is already the SHA-1 of another commit in the previous history.
> However improbable.
> 
> Would that be detected, so that I could just add a newline, and then
> commit with a different resulting SHA-1,
> would I just lose one of those commits (hopefully the new one), would
> I end up with a corrupted repository?

I *think* the following would happen:

1. Git detects that the (commit) object that it is about to generate
already exists, and does not write a new one.

2. Then the branch's ref is updated to the SHA-1. Since the original
commit is somewhere back in history, this is effectively like 'git reset
--soft that-commit'.

3. At your next 'git diff --cached', you notice unexpected differences
between the index and the branch head. You will wonder what happened.
("Who typed 'git reset --soft that-commit' while I was looking the other
way??")

4. To recover, you just 'git reset --soft @{1}' to revert to the state
before the commit attempt, and commit again. Your commit message from the
first attempt will be lost unless you have used -C or -F for your commit.
At any rate, you can reuse the exact same commit message for this second
commit attempt, because by now time will have advanced by at least one
second, which gives you a different commit timestamp and, hence, a
different commit object.

-- Hannes

^ permalink raw reply

* Re: What's cooking in git.git (Nov 2011, #03; Sun, 13)
From: Johannes Sixt @ 2011-11-14  7:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Vincent van Ravesteijn, Ramsay Jones, msysGit
In-Reply-To: <7vmxbzl5ch.fsf@alter.siamese.dyndns.org>

IMO, these two topics can move forward:

> * vr/msvc (2011-10-31) 3 commits
>  - MSVC: Remove unneeded header stubs
>  - Compile fix for MSVC: Include <io.h>
>  - Compile fix for MSVC: Do not include sys/resources.h
> 
> It seems this needs to be rehashed with msysgit folks.

With these patches, git can be built with MSVC. The result is usable,
although a few tests still fail.

> * na/strtoimax (2011-11-05) 3 commits
>  - Support sizes >=2G in various config options accepting 'g' sizes.
>  - Compatibility: declare strtoimax() under NO_STRTOUMAX
>  - Add strtoimax() compatibility function.
> 
> It seems this needs to be rehashed with msysgit folks.

There were a few curiosities around strtoimax being present in MinGW or
not, but these have been resolved. Also, whether or not we should define
NO_STRTOUMAX for the MinGW build is an independent matter.

-- Hannes

^ permalink raw reply

* Re: [RFC] deprecating and eventually removing "git relink"?
From: Junio C Hamano @ 2011-11-14  6:27 UTC (permalink / raw)
  To: Miles Bader; +Cc: git
In-Reply-To: <buomxbzutjm.fsf@dhlpc061.dev.necel.com>

Miles Bader <miles@gnu.org> writes:

> Do you mean a more elaborate UI that does this nicely...?

Yes, that is what I meant. I also have a feeling that people would prefer
to have an option that treats these two repositories equally; your
illustration makes one a subordinate to the other.

^ permalink raw reply

* Re: [RFC] deprecating and eventually removing "git relink"?
From: Miles Bader @ 2011-11-14  6:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v4ny7mtbx.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:
>  (2) allowing two repositories that started independently to share objects
>      using the alternates mechanism after the fact.

Can they not already?

I mean, it works great right now to do:

  cd $REP2
  echo $REP1/.git/objects > .git/objects/info/alternates
  git gc

Do you mean a more elaborate UI that does this nicely...? or something
else?

It might be nice to have a mechanism where new objects would update
the _alternate_ rather than the object-store in the tree where the
command was run... then you could easily have a bunch of trees using a
central object store without needing to update the central store
occasionally by hand (and do gc in its "clients")...

-Miles

-- 
"Most attacks seem to take place at night, during a rainstorm, uphill,
 where four map sheets join."   -- Anon. British Officer in WW I

^ permalink raw reply

* Re: [PATCH 3/4] pack-objects: don't traverse objects unnecessarily
From: Junio C Hamano @ 2011-11-14  5:40 UTC (permalink / raw)
  To: Dan McGee; +Cc: GIT Mailing-list
In-Reply-To: <CAEik5nPJ3r6gp9Lttzh5aQmiPRFxpZvhTBXZoreY98QV6Cocdg@mail.gmail.com>

Dan McGee <dpmcgee@gmail.com> writes:

>>> unable to figure out how you generated those numbers so I wasn't able
>>> to do so (and had planned to get back to you to find out how you made
>>> those tables). Were you able to verify the ordering did not regress?
>>
>> No; I was hoping you would redo the benchmark using 5f44324 (core: log
>> offset pack data accesses happened, 2011-07-06).
>
> I'm still not sure what you used to parse these results,...

Ah, in the kernel repository, after running "repack -a -d -f" with
versions of git and copying the resulting packfiles in PACK-OLD/ and
PACK-NEW/, I used these scripts to examine the access pattern.

-- >8 -- DOIT.sh -- >8 --
#!/bin/sh

tmp=/var/tmp/ll$
trap 'rm -f "$tmp.*"' 0

ln -f PACK-OLD/* .git/objects/pack/. || exit
log="$tmp.old"
eval '/usr/bin/time rungit test -c core.logpackaccess="$log" '"$*"

ln -f PACK-NEW/* .git/objects/pack/. || exit
log="$tmp.new"
eval '/usr/bin/time rungit test -c core.logpackaccess="$log" '"$*"

perl OFS.perl "$tmp.old" "$tmp.new"
-- 8< -- DOIT.sh -- 8< --

-- >8 -- OFS.perl -- >8 --
#!/usr/bin/perl

use strict;
use warnings;
use Getopt::Long;

my $verbose;

exit(1) if (!GetOptions("verbose" => \$verbose));

sub take_one {
	my ($filename) = @_;
	my (%lofs, $num);
	my @diff;
	open my $in, '<', $filename;

	$num = 0;
	while (<$in>) {
		my ($file, $ofs) = split(' ');
		if (!exists $lofs{$file}) {
			$lofs{$file} = [$num++, 0];
		}
		my $diff = $ofs - $lofs{$file}[1];
		$lofs{$file}[1] = $ofs;
		push @diff, abs($diff);
		print "$lofs{$file}[0] $diff $ofs\n" if $verbose;
	}
	return \@diff;
}

sub bsearch {
	my ($list, $target) = @_;
	my ($hi, $lo) = ((scalar @$list), 0);

	while ($lo < $hi) {
		my $mi = int(($lo + $hi) / 2);
		if ($list->[$mi] == $target) {
			return $mi;
		} elsif ($list->[$mi] < $target) {
			$lo = $mi + 1;
		} else {
			$hi = $mi;
		}
	}
	return $hi;
}

my @percentile = ();
for (my $i = 0; $i < 100; $i += 10) {
	push @percentile, $i;
}
push @percentile, 95, 99, 99.9, 99.99;

sub thcomma {
	my ($intval) = @_;
	my $result = "";
	while ($intval > 1000) {
		my $rem = $intval % 1000;
		if ($result ne "") {
			$result = sprintf "%03d,%s", $rem, $result;
		} else {
			$result = sprintf "%03d", $rem;
		}
		$intval -= $rem;
		$intval /= 1000;
	}
	if ($intval) {
		if ($result ne "") {
			$result = sprintf "%d,%s", $intval, $result;
		} else {
			$result = sprintf "%d", $intval;
		}
	}
	$result =~ s/^[0,]*//;
	$result = "0" if ($result eq "");
	return $result;
}

sub show_stat {
	my ($diff1, $diff2) = @_;
	my ($i, $ix);
	if ($diff2) {
		@$diff2 = sort { $a <=> $b } @$diff2;
	}
	@$diff1 = sort { $a <=> $b } @$diff1;
	printf "\nTotal number of access : %12s", thcomma(scalar(@$diff1));
	printf "%12s", thcomma(scalar(@$diff2)) if ($diff2);
	for $i (@percentile) {
		$ix = scalar(@$diff1) * $i / 100;
		printf "\n     %5.2f%% percentile : %12s", $i, thcomma($diff1->[$ix]);
		if ($diff2) {
			$ix = scalar(@$diff2) * $i / 100;
			printf "%12s", thcomma($diff2->[$ix]);
		}
	}

	$ix = bsearch($diff1, 2 * 1024 * 1024);
	printf "\n   Less than 2MiB seek :       %5.2f%%", ($ix * 100.0 / @$diff1);
	if ($diff2) {
		$ix = bsearch($diff2, 2 * 1024 * 1024);
		printf "      %5.2f%%", ($ix * 100.0 / @$diff2);
	}

	print "\n";
}

my ($diff1, $diff2);
$diff1 = take_one($ARGV[0]);
$diff2 = take_one($ARGV[1]) if ($ARGV[1]);

show_stat($diff1, $diff2);
-- 8< -- OFS.perl -- 8< --


	

^ permalink raw reply

* Re: [PATCH 2/2] Copy resolve_ref() return value for longer use
From: Junio C Hamano @ 2011-11-14  4:03 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git
In-Reply-To: <CACsJy8BnqoPVJiM6mbq7p3gKtLh-KGUuTshcukGokC3istTxMQ@mail.gmail.com>

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

>> Now reference_name stores the result of xstrdup(), it does not have reason
>> to be of type "const char *". It is preferable to lose the cast here, I
>> think. The same comment applies to the remainder of the patch.
>
> But resolve_ref() returns "const char *", we need to type cast at
> least once, either at resolve_ref() assignment or at free(), until we
> change resolve_ref().

In any case, I do not think it matters either way, so I queued this patch
to 'pu' unmodified.

This patch uses xstrdup() on return value of resolve_ref() only at
hand-picked places; while the choice of the places the patch decided not
to call free() looked reasonable from a quick review, both of us may be
blind and it may introduce huge leaks in a repeatedly called function.

A more extensive patch that would turn resolve_ref() to return an
allocated piece of memory share the same risk of adding new leaks at the
callsites, and this late in the cycle, neither will be in 1.7.8 anyway.

Given that if we build the more extensive patch on top of this one after
1.7.8, it will need to undo xstrdup() in this patch, add free()s that
become necessary, in addition to having to add free()s that this patch
might have potentially forgot, I have a feeling that we should just drop
this [2/2] and do a more thorough fix after 1.7.8 release is done
immediately on top of [1/2].

Thanks.

^ permalink raw reply

* What's cooking in git.git (Nov 2011, #03; Sun, 13)
From: Junio C Hamano @ 2011-11-14  4:01 UTC (permalink / raw)
  To: git

What's cooking in git.git (Nov 2011, #03; Sun, 13)
--------------------------------------------------

Here are the topics that have been cooking.  Commits prefixed with '-' are
only in 'pu' (proposed updates) while commits prefixed with '+' are in 'next'.

Here are the repositories that have my integration branches:

With maint, master, next, pu, todo:

        git://git.kernel.org/pub/scm/git/git.git
        git://repo.or.cz/alt-git.git
        https://code.google.com/p/git-core/
        https://github.com/git/git

With only maint and master:

        git://git.sourceforge.jp/gitroot/git-core/git.git
        git://git-core.git.sourceforge.net/gitroot/git-core/git-core

With all the topics and integration branches:

        https://github.com/gitster/git

The preformatted documentation in HTML and man format are found in:

        git://git.kernel.org/pub/scm/git/git-{htmldocs,manpages}.git/
        git://repo.or.cz/git-{htmldocs,manpages}.git/
        https://code.google.com/p/git-{htmldocs,manpages}.git/
        https://github.com/gitster/git-{htmldocs,manpages}.git/

--------------------------------------------------
[New Topics]

* jc/commit-tree-extra (2011-11-12) 2 commits
 - commit-tree: teach -C <extra-commit>
 - commit-tree: teach -x <extra>
 (this branch uses jc/pull-signed-tag; is tangled with jc/signed-commit.)

* nd/resolve-ref (2011-11-13) 2 commits
 - Copy resolve_ref() return value for longer use
 - Convert many resolve_ref() calls to read_ref*() and ref_exists()

--------------------------------------------------
[Graduated to "master"]

* ab/i18n-test-fix (2011-11-05) 2 commits
  (merged to 'next' on 2011-11-06 at f1de9a6)
 + t/t7508-status.sh: use test_i18ncmp
 + t/t6030-bisect-porcelain.sh: use test_i18ngrep

* fc/remote-seturl-usage-fix (2011-11-06) 1 commit
  (merged to 'next' on 2011-11-06 at 6c8328c)
 + remote: fix remote set-url usage

* jc/remote-setbranches-usage-fix (2011-11-06) 1 commit
  (merged to 'next' on 2011-11-06 at 017606d)
 + remote: fix set-branches usage

* pw/p4-appledouble-fix (2011-11-05) 1 commit
  (merged to 'next' on 2011-11-06 at 2ec0af3)
 + git-p4: ignore apple filetype

Regression fix for the upcoming release.

* sn/http-auth-with-netrc-fix (2011-11-04) 1 commit
  (merged to 'next' on 2011-11-06 at 60b7f96)
 + http: don't always prompt for password

Regression fix for the upcoming release.

--------------------------------------------------
[Stalled]

* hv/submodule-merge-search (2011-10-13) 4 commits
 - submodule.c: make two functions static
 - allow multiple calls to submodule merge search for the same path
 - push: Don't push a repository with unpushed submodules
 - push: teach --recurse-submodules the on-demand option

What the topic aims to achieve may make sense, but the implementation
looked somewhat suboptimal.

* sr/transport-helper-fix-rfc (2011-07-19) 2 commits
 - t5800: point out that deleting branches does not work
 - t5800: document inability to push new branch with old content

See comments on sr/fix-fast-export-tips topic.

* sr/fix-fast-export-tips (2011-11-05) 3 commits
 - fast-export: output reset command for commandline revs
 - fast-export: do not refer to non-existing marks
 - t9350: point out that refs are not updated correctly

The bottom commit from the stalled sr/transport-helper-fix-rfc topic is
fixed with this. It may make sense to drop the other topic and include
that commit in this series.

The command line parser is still too lax and accepts malformed input, but
this is a step in the right direction and tightening the command line now
should be doable without a low level surgery that touches codepaths that
are unrelated to the command line processing like the previous attempt
used to do.

* jc/lookup-object-hash (2011-08-11) 6 commits
 - object hash: replace linear probing with 4-way cuckoo hashing
 - object hash: we know the table size is a power of two
 - object hash: next_size() helper for readability
 - pack-objects --count-only
 - object.c: remove duplicated code for object hashing
 - object.c: code movement for readability

I do not think there is anything fundamentally wrong with this series, but
the risk of breakage far outweighs observed performance gain in one
particular workload.

* jc/verbose-checkout (2011-10-16) 2 commits
 - checkout -v: give full status output after switching branches
 - checkout: move the local changes report to the end

This is just to leave a record that the reason why we do not do this not
because we are incapable of coding this, but because it is not a good idea
to do this. I suspect people who are new to git that might think they need
it would soon realize the don't.

Will keep in 'pu' as a showcase for a while and then will drop.

* eh/grep-scale-to-cpunum (2011-11-05) 1 commit
 - grep: detect number of CPUs for thread spawning

Kills I/O parallelism and needs to be improved or discarded.

* vr/msvc (2011-10-31) 3 commits
 - MSVC: Remove unneeded header stubs
 - Compile fix for MSVC: Include <io.h>
 - Compile fix for MSVC: Do not include sys/resources.h

It seems this needs to be rehashed with msysgit folks.

* na/strtoimax (2011-11-05) 3 commits
 - Support sizes >=2G in various config options accepting 'g' sizes.
 - Compatibility: declare strtoimax() under NO_STRTOUMAX
 - Add strtoimax() compatibility function.

It seems this needs to be rehashed with msysgit folks.

--------------------------------------------------
[Cooking]

* jc/signed-commit (2011-11-12) 4 commits
 - pretty: %G[?GS] placeholders
 - test "commit -S" and "log --show-signature"
 - log: --show-signature
 - commit: teach --gpg-sign option
 (this branch uses jc/pull-signed-tag; is tangled with jc/commit-tree-extra.)

Rebased on top of jc/pull-signed-tag topic, after reverting the old one
out of 'next'.

* jc/pull-signed-tag (2011-11-12) 15 commits
 - commit-tree: teach -m/-F options to read logs from elsewhere
 - commit-tree: update the command line parsing
 - commit: teach --amend to carry forward extra headers
 - merge: force edit and no-ff mode when merging a tag object
 - commit: copy merged signed tags to headers of merge commit
 - merge: record tag objects without peeling in MERGE_HEAD
 - merge: make usage of commit->util more extensible
 - fmt-merge-msg: Add contents of merged tag in the merge message
 - fmt-merge-msg: package options into a structure
 - fmt-merge-msg: avoid early returns
 - refs DWIMmery: use the same rule for both "git fetch" and others
 - fetch: allow "git fetch $there v1.0" to fetch a tag
 - merge: notice local merging of tags and keep it unwrapped
 - fetch: do not store peeled tag object names in FETCH_HEAD
 - Split GPG interface into its own helper library
 (this branch is used by jc/commit-tree-extra and jc/signed-commit.)

Further updated to allow "commit --amend" to retain the mergetag
headers. I think this is ready for the cycle after upcoming 1.7.8.

* ab/clang-lints (2011-11-06) 2 commits
  (merged to 'next' on 2011-11-13 at a573aec)
 + cast variable in call to free() in builtin/diff.c and submodule.c
 + apply: get rid of useless x < 0 comparison on a size_t type

Will keep in 'next' during this cycle.

* ab/pull-rebase-config (2011-11-07) 1 commit
  (merged to 'next' on 2011-11-13 at 72bb2d5)
 + pull: introduce a pull.rebase option to enable --rebase

Will keep in 'next' during this cycle.

* nd/fsck-progress (2011-11-06) 4 commits
  (merged to 'next' on 2011-11-13 at 8831811)
 + fsck: print progress
 + fsck: avoid reading every object twice
 + verify_packfile(): check as many object as possible in a pack
 + fsck: return error code when verify_pack() goes wrong

Will keep in 'next' during this cycle.

* nd/prune-progress (2011-11-07) 3 commits
  (merged to 'next' on 2011-11-13 at c5722ac)
 + reachable: per-object progress
 + prune: handle --progress/no-progress
 + prune: show progress while marking reachable objects

Will keep in 'next' during this cycle.

* jc/stream-to-pack (2011-11-03) 4 commits
 - Bulk check-in
 - finish_tmp_packfile(): a helper function
 - create_tmp_packfile(): a helper function
 - write_pack_header(): a helper function

Teaches "git add" to send large-ish blob data straight to a packfile.
This is a continuation to the "large file support" topic. I think this
codepath to move data from worktree to repository needs to become aware of
streaming, just like the checkout codepath that goes the other way, which
was done in the previous "large file support" topic in the 1.7.7 cycle.

* jn/gitweb-side-by-side-diff (2011-10-31) 8 commits
 - gitweb: Add navigation to select side-by-side diff
 - gitweb: Use href(-replay=>1,...) for formats links in "commitdiff"
 - t9500: Add basic sanity tests for side-by-side diff in gitweb
 - t9500: Add test for handling incomplete lines in diff by gitweb
 - gitweb: Give side-by-side diff extra CSS styling
 - gitweb: Add a feature to show side-by-side diff
 - gitweb: Extract formatting of diff chunk header
 - gitweb: Refactor diff body line classification

Replaces a series from Kato Kazuyoshi on the same topic.

* mf/curl-select-fdset (2011-11-04) 4 commits
  (merged to 'next' on 2011-11-06 at a49516f)
 + http: drop "local" member from request struct
 + http.c: Rely on select instead of tracking whether data was received
 + http.c: Use timeout suggested by curl instead of fixed 50ms timeout
 + http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping

Reduces unnecessary waits.

* nd/misc-cleanups (2011-10-27) 6 commits
  (merged to 'next' on 2011-10-28 at 2527a49)
 + unpack_object_header_buffer(): clear the size field upon error
 + tree_entry_interesting: make use of local pointer "item"
 + tree_entry_interesting(): give meaningful names to return values
 + read_directory_recursive: reduce one indentation level
 + get_tree_entry(): do not call find_tree_entry() on an empty tree
 + tree-walk.c: do not leak internal structure in tree_entry_len()

These are unquestionably good parts taken out of a larger series, so that
we can focus more on the other changes in later rounds of review.

Will keep in 'next' during this cycle.

* rs/allocate-cache-entry-individually (2011-10-26) 2 commits
  (merged to 'next' on 2011-10-27 at 2e4acd6)
 + cache.h: put single NUL at end of struct cache_entry
 + read-cache.c: allocate index entries individually

Will keep in 'next' during this cycle.

* mh/ref-api-3 (2011-10-19) 11 commits
  (merged to 'next' on 2011-10-23 at 92e2d35)
 + is_refname_available(): reimplement using do_for_each_ref_in_array()
 + names_conflict(): simplify implementation
 + names_conflict(): new function, extracted from is_refname_available()
 + repack_without_ref(): reimplement using do_for_each_ref_in_array()
 + do_for_each_ref_in_array(): new function
 + do_for_each_ref(): correctly terminate while processesing extra_refs
 + add_ref(): take a (struct ref_entry *) parameter
 + create_ref_entry(): extract function from add_ref()
 + parse_ref_line(): add a check that the refname is properly formatted
 + repack_without_ref(): remove temporary
 + Rename another local variable name -> refname
 (this branch uses mh/ref-api-2.)

Will keep in 'next' during this cycle.

* rr/revert-cherry-pick (2011-10-23) 5 commits
  (merged to 'next' on 2011-10-26 at 27b7496)
 + revert: simplify communicating command-line arguments
 + revert: allow mixed pick and revert instructions
 + revert: make commit subjects in insn sheet optional
 + revert: simplify getting commit subject in format_todo()
 + revert: free msg in format_todo()

The internals of "git revert/cherry-pick" has been further refactored to
serve as the basis for the sequencer.

Will keep in 'next' during this cycle.

* cb/daemon-permission-errors (2011-10-17) 2 commits
 - daemon: report permission denied error to clients
 - daemon: add tests

The tip commit might be loosening things a bit too much.
Will keep in 'pu' until hearing a convincing argument for the patch.

* mh/ref-api-2 (2011-10-17) 14 commits
  (merged to 'next' on 2011-10-19 at cc89f0e)
 + resolve_gitlink_ref_recursive(): change to work with struct ref_cache
 + Pass a (ref_cache *) to the resolve_gitlink_*() helper functions
 + resolve_gitlink_ref(): improve docstring
 + get_ref_dir(): change signature
 + refs: change signatures of get_packed_refs() and get_loose_refs()
 + is_dup_ref(): extract function from sort_ref_array()
 + add_ref(): add docstring
 + parse_ref_line(): add docstring
 + is_refname_available(): remove the "quiet" argument
 + clear_ref_array(): rename from free_ref_array()
 + refs: rename parameters result -> sha1
 + refs: rename "refname" variables
 + struct ref_entry: document name member
 + cache.h: add comments for git_path() and git_path_submodule()
 (this branch is used by mh/ref-api-3.)

Will keep in 'next' during this cycle.

* sg/complete-refs (2011-10-21) 9 commits
  (merged to 'next' on 2011-10-26 at d65e2b4)
 + completion: remove broken dead code from __git_heads() and __git_tags()
 + completion: fast initial completion for config 'remote.*.fetch' value
 + completion: improve ls-remote output filtering in __git_refs_remotes()
 + completion: query only refs/heads/ in __git_refs_remotes()
 + completion: support full refs from remote repositories
 + completion: improve ls-remote output filtering in __git_refs()
 + completion: make refs completion consistent for local and remote repos
 + completion: optimize refs completion
 + completion: document __gitcomp()

Will keep in 'next' until an Ack or two from completion folks.

* jc/request-pull-show-head-4 (2011-11-09) 12 commits
  (merged to 'next' on 2011-11-13 at e473fd2)
 + request-pull: use the annotated tag contents
  (merged to 'next' on 2011-10-15 at 7e340ff)
 + fmt-merge-msg.c: Fix an "dubious one-bit signed bitfield" sparse error
  (merged to 'next' on 2011-10-10 at 092175e)
 + environment.c: Fix an sparse "symbol not declared" warning
 + builtin/log.c: Fix an "Using plain integer as NULL pointer" warning
  (merged to 'next' on 2011-10-07 at fcaeca0)
 + fmt-merge-msg: use branch.$name.description
  (merged to 'next' on 2011-10-06 at fa5e0fe)
 + request-pull: use the branch description
 + request-pull: state what commit to expect
 + request-pull: modernize style
 + branch: teach --edit-description option
 + format-patch: use branch description in cover letter
 + branch: add read_branch_desc() helper function
 + Merge branch 'bk/ancestry-path' into jc/branch-desc

Allow setting "description" for branches and use it to help communications
between humans in various workflow elements. It also allows requesting for
a signed tag to be pulled and shows the tag message in the generated message.

Will keep in 'next' during this cycle.

--------------------------------------------------
[Discarded]

* kk/gitweb-side-by-side-diff (2011-10-17) 2 commits
 . gitweb: add a feature to show side-by-side diff
 . gitweb: change format_diff_line() to remove leading SP from $diff_class

* jc/check-ref-format-fixup (2011-10-19) 2 commits
  (merged to 'next' on 2011-10-19 at 98981be)
 + Revert "Restrict ref-like names immediately below $GIT_DIR"
  (merged to 'next' on 2011-10-15 at 8e89bc5)
 + Restrict ref-like names immediately below $GIT_DIR

This became a no-op except for the bottom one which is part of the other
topic now.

^ permalink raw reply

* Re: [PATCH 2/2] Copy resolve_ref() return value for longer use
From: Nguyen Thai Ngoc Duy @ 2011-11-14  3:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vbosfoiuy.fsf@alter.siamese.dyndns.org>

2011/11/14 Junio C Hamano <gitster@pobox.com>:
>> diff --git a/builtin/branch.c b/builtin/branch.c
>> index 0fe9c4d..5b6d839 100644
>> --- a/builtin/branch.c
>> +++ b/builtin/branch.c
>> @@ -115,8 +115,10 @@ static int branch_merged(int kind, const char *name,
>>                   branch->merge[0] &&
>>                   branch->merge[0]->dst &&
>>                   (reference_name =
>> -                  resolve_ref(branch->merge[0]->dst, sha1, 1, NULL)) != NULL)
>> +                  resolve_ref(branch->merge[0]->dst, sha1, 1, NULL)) != NULL) {
>> +                     reference_name = xstrdup(reference_name);
>>                       reference_rev = lookup_commit_reference(sha1);
>> +             }
>>       }
>>       if (!reference_rev)
>>               reference_rev = head_rev;
>> @@ -141,6 +143,7 @@ static int branch_merged(int kind, const char *name,
>>                               "         '%s', even though it is merged to HEAD."),
>>                               name, reference_name);
>>       }
>> +     free((char*)reference_name);
>>       return merged;
>>  }
>
> Now reference_name stores the result of xstrdup(), it does not have reason
> to be of type "const char *". It is preferable to lose the cast here, I
> think. The same comment applies to the remainder of the patch.

But resolve_ref() returns "const char *", we need to type cast at
least once, either at resolve_ref() assignment or at free(), until we
change resolve_ref(). Or should we change resolve_ref() to return
"char *" now?
-- 
Duy

^ permalink raw reply

* Re: git behaviour question regarding SHA-1 and commits
From: Junio C Hamano @ 2011-11-14  3:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: vinassa vinassa, git
In-Reply-To: <CACBZZX7VTdc2wHYHb1BB-wCJbKLVEmbzQaBTV04S1KDrqeN73A@mail.gmail.com>

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> This is not something you have to worry about, just get on with using
> Git and stop worrying about phenomenally unlikely edge cases that are
> never going to happen.

People who repeated answers along this line, you can stop. The message has
been heard, but without answering the original question.

When we create a new object (i.e. "git add" to register a new blob
contents, "git commit" that internally generates new tree objects to
record updated "whole contents" and then records the commit object), we
first compute what the object name of the new object would be, and then
check if we already have an object with the same object name in the object
store. If we do, we do not write the new copy of the object out (see the
function write_sha1_file() in sha1_file.c and the call to has_sha1_file()
that bypasses write_loose_object()).

So the old contents will be kept without getting overwritten.

Which sounds nice, but it has interesting consequences, as we do not
bother running byte-for-byte comparison when we find what we tried to
write already existed in the object store in order to error out in fear of
the miniscule chance that we would hit a SHA-1 collision.

If the collision is between commit objects, for example, we would write
the (old) commit object name to the tip of the current branch. Most
likely, the tree object recorded in the (old) commit would not match the
tree object your "git commit" wanted to record (otherwise you have hit
SHA-1 collision twice in a row ;-), which would mean "git status" would
show that a whole bunch of paths have changed between the HEAD and the
index. Also "git log" would show the history leading to the (old) commit
that is likely to be very different from what you would expect immediately
after committing the collided commit. Of course, you could recover from it
with "git reset --soft" after finding out what the previous HEAD was from
the reflog, but it won't be a pleasant experience.

There can be other kinds of collisions (e.g. your latest commit might have
collided with an existing blob or tree, in which case it is likely that
almost nothing would work after finding a blob or tree in HEAD).

^ permalink raw reply

* Re: Git:  CVS to Git import
From: Matthew Ogilvie @ 2011-11-14  2:44 UTC (permalink / raw)
  To: Jvsrvcs; +Cc: git
In-Reply-To: <1321053453892-6987037.post@n2.nabble.com>

On Fri, Nov 11, 2011 at 03:17:33PM -0800, Jvsrvcs wrote:
> Git:  CVS to Git import
> 
> We are moving from CVS to Git and want to know if anyone has had any
> experience there doing this and could share do's  / dont's, best practices
> when doing the initial import.

Some ideas:

I wouldn't trust "git cvsimport".  In my testing, it was actaully fairly
common for the resulting git tags and branches to be inconsistent with the
original CVS tags and branches: checking out a tag from CVS and the same
tag from GIT, the trees were often different.  See the manpage
for a list of some of the known issues. 

Use cvs2git instead.

Write up your own script to do the conversion.  Iteratively inspect
the results, find ways to fix up anything you don't like,
and re-run the script.  Any "fixups" you want should be
scripted, so that you can try different things, examine
the result.  Then when the actual "real" conversion
happens, you have a minimal amount of downtime as you your
already-tested script runs.

The exact fixups your script should do depend on your
circumstances, but in my case, some of things my script did included:

  - First, copy the CVS repository, and work with the copy:
  - Delete some ",v" files we didn't interested in importing into git for
    various reasons.
  - Tweak some CVS commit timestamps in some files (such as a version
    file), to reduce import odditities.  (The most common oddities
    resulted from an old CVS workflow that would often sequence:
    (a) checkout, (b) modify version number file, (c) build, (d) commit
    the new version number file, and (e) tag the sandbox.  It was
    was moderately common for other changes (in other files) to
    be committed between (a) and (d), which will either cause
    strange import artifacts or actually break import tools, due to
    the out-of-order timestamps.  Tweaking back the timestamp in the
    CVS file typically allows the import tool to avoid the
    oddity.  Completely cleaning this up would have been a
    lot of work, so I focused just on just improving recent
    history.)  (sed -i ...)
  - Do the bulk of the import work using cvs2git.
  - Graft on appropriate merge history (multiple parents) for
    CVS merges.  To save time, I only worried about recent merges.
     - If you have a nice consistent tag naming
       convention, there are ways to do this as part of cvs2git.
       Unfortunately, we didn't.
     - Do not refer to a previous run's commit SHA-1's; they'll
       likely change as things change.  Use CVS tags instead.
     - git rev-parse is useful for looking up current references
       to construct graft lines.
  - Use git filter-branch to both make the above grafts permanent,
    and to fix commiter/author username/email.
  - Move imported tags and branches to refs/oldcvstags/*
    and refs/oldcvsbranches/*, to bury a lot of the noise
    (automatic build tags, tags applied as part of doing a
    merge, etc) to where a normal "git clone" will not grab
    them, but they can still be fetched manually if necessary.
  - Copy/rename a few recent release tags and branches to
    normal refs/tags/* and refs/heads/*, when they are actually
    useful. (git pack-refs and sed)
  - Something like: sleep 5 ; git gc --aggressive --prune='1 second ago'

--
Matthew Ogilvie   [mmogilvi_git@miniinfo.net]

^ permalink raw reply

* [RFC] deprecating and eventually removing "git relink"?
From: Junio C Hamano @ 2011-11-14  0:38 UTC (permalink / raw)
  To: git

What do people think about the subject?  As to the timeframe I am thinking
about deprecation at v1.7.9 (late January 2012) and removal at v1.7.12
(early August 2012) [*1*].

The script more-or-less outlived its usefulness in July 2005 when
packfiles were introduced.

You are better off repacking the repositories in the first place before
accumulating so many loose object that linking the same ones between the
repositories would give you major saving.  It is theoretically possible
that two repositories happen to have the same pack and it would give you
saving to hardlink them together, but even in that case, the saving would
not survive repacking unless they are marked with a ".keep" marker, which
the script does not do anyway.

A more useful feature to attack a similar issue would be to make it easier
to use "--reference" aka "objects/info/alternates". Namely:

 (1) devise a way to make it safer by allowing the repository whose
     objects are borrowed by other repositories a way to protect objects
     that it does not need but may be needed by others from repacking; and

 (2) allowing two repositories that started independently to share objects
     using the alternates mechanism after the fact.

but that is a separate issue.

[Footnote]

*1* Both dates were derived from a mechanical "9-week per release cycle".

^ permalink raw reply

* Revisiting metadata storage
From: Richard Hartmann @ 2011-11-14  0:07 UTC (permalink / raw)
  To: Git List

Hi all,

every few months, a thread seems to pop up regarding metadata storage,
be it owner, mtime, xattr or what have you.

As of today there is (ttbomk) still no sane way to carry & restore
metadata across different repositories.

metastore[1] is closest to a working solution but its storage format
is binary and merge-unfriendly and it does not store mtime which is a
deal-breaker, for me.
I tried extend & fix metastore, but failed. Others looked at it,
deemed it possible and lost interest and/or their development box.

I know from Joey Hess [2] that the GitTogether 2011 saw some
discussion about metadata, but I was unable to find any follow-up to
this issue.

To make a long story short: Does anyone have a working solution,
today? If not, is anyone working on one? If not, is anyone interested
in working on one? And is there any follow-up to the GitTogether
discussion?

The feature set of any solution should probably include save, display,
diff, and apply on a per-metadata and per-file basis.

Thanks,
Richard

[1] http://david.hardeman.nu/software.php
[2] http://kitenet.net/~joey/blog/entry/GitTogether2011/

^ permalink raw reply

* Re: [PATCH 0/7] New sequencer workflow!
From: Junio C Hamano @ 2011-11-14  0:04 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Jonathan Nieder, Christian Couder
In-Reply-To: <7v39droi63.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> Almost. If these are replaced with "git cherry-pick --continue" and "git
> revert --continue" that internally triggers "git sequencer --continue"
> *and* errors out if a wrong command was used to "--continue", it would be
> perfect.
>
> The longer-term ultimate goal would be to make it "git continue" that
> covers more than the sequencer based workflow, i.e. allow "git merge" that
> conflicted to be concluded with "edit && git add && git continue".

To clarify a bit.

While I do not mind "sequencer --continue" as a step to get us closer to a
more pleasant user experience at the implementation level, exposing the
name "sequencer" or having the user to type "sequencer --continue" is going
in a very wrong direction at the UI level.

There are many commands in the Git suite that take an order from the user,
attempt to perform the necessary operation but stops in the middle to ask
for help from the user. "cherry-pick" and "revert" are two of them, and
there are many others: e.g. am, merge, pull, rebase, "rebase -i". They use
different mechanisms to keep track of their states before giving the
control back to the user when they ask for help.

The original workflow was "pull; edit && add && commit", and it is very
unlikely this will change while we are still in Git 1.X series. The
original single commit variants of "cherry-pick" and "revert" are in the
same category. We would need to keep supporting "cherry-pick/revert; edit
&& add && commit" as a workflow for a while.

Others that deal with more than one stop-point follow a different pattern
from necessity. The user tells the same command to continue after the
command asks for help in "am; edit && add && am --continue" and "rebase
[-i]; edit && add && rebase --continue" sequence. Multi-commit variants of
"cherry-pick" and "revert" take the same approach for consistency.

In the shorter term, a person new to Git, after learning "run command X, X
gives back control asking for help, help X by editing and adding, and
telling X to continue" pattern from these commands, will eventually find
that the commands in single stop-point category (i.e. "pull", "merge" and
single-commit "cherry-pick" and "revert") inconsistent from others that
take "--continue" to resume. For this reason, "git cherry-pick --continue"
that would work even when picking a single commit like this would be
beneficial:

    $ git cherry-pick X
    ... conflicts
    $ edit && add
    $ git cherry-pick --continue

That is, no "commit" by the user. The "helping" part is literally that;
the user helps Git because Git is too stupid to read the user's mind to
come up with a perfect conflict resolution. Git knows, given a correct
resolution, how to make a commit out of it perfectly fine and does not
need help from the user to commit the result.

In the medium term, extending the above "shorter term" goal, it would make
sense to support (but not necessarily require) the following flow for
consistency:

    $ git pull ;# or "git merge branch"
    ... conflicts
    $ edit && add ;# again, no "commit"
    $ git pull --continue ;# or "git merge --continue"

Now, if you rename "cherry-pick --continue" and "revert --continue" to
"sequencer --continue", what message are you sending to the users? They
now need to learn that only these two commands are based on something
called "sequencer", can be resumed with "sequencer --continue", but other
commands need to be continued with "X --continue".

That is totally backwards, no? You are _adding_ mental burden to your
users by introducing another thing or two they need to learn about.

In the longer term (now we are talking about Git 2.X version bump), it
would be ideal if all the "git X --continue" can be consolidated to a
single "git continue" command (and "git abort" to give up the sequence
of operations).

Given that bash-completion script can tell in what state the repository
is, I think this is doable. "git continue" may invoke your implementation
of "git sequencer --continue" internally when it detects that the state is
something the "sequencer" machinery knows how to continue, and if it is in
the middle of conflicted "am -3" or rejected "am", the command may invoke
"git am --continue" instead.

That way, the user does not have to learn which command can be resumed
with "sequencer --continue" and which other command needs to be called
with "--continue" to resume. The user does not even need to know there is
a mechanism called sequencer, some commands are already using it while
others are not yet using it, and these other commands are in the process
of being rewritten to use the sequencer machinery.

^ permalink raw reply

* Re: [PATCH 3/7] sequencer: handle single-commit pick as special case
From: Junio C Hamano @ 2011-11-13 23:23 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Jonathan Nieder, Christian Couder
In-Reply-To: <1321181181-23923-4-git-send-email-artagnon@gmail.com>

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Prior to v1.7.2-rc1~4^2~7 (revert: allow cherry-picking more than one
> commit, 2010-06-02), 'git cherry-pick' could only pick one commit at a
> time, and it used '.git/CHERRY_PICK_HEAD' to pass on information to a
> subsequent invocation in case of a conflict.  While
> '.git/CHERRY_PICK_HEAD' can only keep information about one commit,
> the sequencer uses '.git/sequencer' to persist information in the
> general case.
>
> A problem arises because a single-commit cherry-pick operation can be
> completed successfully using 'git commit'.  This removes
> '.git/CHERRY_PICK_HEAD' without informing the sequencer, leaving
> behind a stale sequencer state as a result.  We have worked around
> this problem already by prematurely removing the sequencer state in
> d3f4628e (revert: Remove sequencer state when no commits are pending,
> 2011-06-06).  However, this gets in the way of our future plan to
> eliminate a glaring workflow inconsistency:
>
>   $ git cherry-pick foo
>   ... .git/sequencer is created ....
>   ... .git/CHERRY_PICK_HEAD is created ...
>   ... conflict ...
>   .... .git/sequencer is prematurely removed ...

Isn't the real problem that .git/sequencer is created in the first place,
when this form of the command knows it will use CHERRY_PICK_HEAD?

>   $ echo "resolved" >problematicfile
>   $ git add problematicfile
>   $ git commit
>   ... .git/CHERRY_PICK_HEAD is removed ...
>   $ git cherry-pick --continue
>   error: No cherry-pick in progress

This is the right thing to happen, no? There is no in-progress cherry-pick
anymore after that commit. The user said "I want to replay this commit",
the command couldn't finish it by itself and asked the user to help
resolving the conflict, and the user resolved and recorded the result.

It is a different story if the sequence were like this:

    $ git cherry-pick foo
    ... conflict happens
    $ edit problematicfile
    $ git add problematicfile
    $ git cherry-pick --continue
    ... This should notice CHERRY_PICK_HEAD and record it.
    ... After that, there is nothing remaining to be done.

In other words, the user said "I want to replay this commit", the command
couldn't finish it by itself and asked the user to help resolving the
conflict, the user resolved and told the command to continue. The command
should continue recording the result.

And if "continue" does not work in this sequence, that is a bug worth
fixing.

>   $ git cherry-pick foo..bar
>   ... .git/sequencer is created ....
>   ... CHERRY_PICK_HEAD is created ...
>   ... conflict in bar~1 ...
>   $ echo "resolved" >problematicfile
>   $ git add problematicfile
>   $ git commit
>   ... CHERRY_PICK_HEAD is removed ...
>   $ git cherry-pick --continue # Success!

This again is the right thing to happen, as the user had to help the
command while replaying bar~1 and then told it to continue, which
successfully replayed bar.

I do not see an inconsistency here, let alone any "glaring" one.

^ permalink raw reply

* Re: [PATCH 1/5] sequencer: factor code out of revert builtin
From: Junio C Hamano @ 2011-11-13 23:10 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Jonathan Nieder, Git List, Christian Couder
In-Reply-To: <CALkWK0n7v15n_s3CNq1Qu3LHjYkV-ENAkv2b+oB+VBkyV+Sphw@mail.gmail.com>

Ramkumar Ramachandra <artagnon@gmail.com> writes:

>> Why did sequencer.h move to after dir.h?
>
> 1. I like the convention of including the "foo.h" as the last header
> in "foo.c".

I do not think it is a good convention. The implementation of "foo.c" may
need to include many other headers for its own use of other APIs, and the
declarations in "foo.h" may depend on some types declared in some of them,
but by definition the latter is a subset of the former. Having "foo.h" at
the end of "foo.c" makes it difficult for others to tell between the two.

A user of foo.h API should need to include only git-compat-util.h and
foo.h to be able to use foo.h API in the ideal world, even though it may
need to include other headers to use other APIs defined in them.

A workable alternative in a world that is not so perfect for a user of
"foo.h" API is to include git-compat-util.h and what "foo.h" needs before
including "foo.h" and then other headers it needs. I think the current
source code takes this approach.

With that observation, it would probably make more sense if "foo.c"
included the headers in the following order:

 - git-compat-util.h (or the prominent ones like "cache.h" that is known
   to include it at the beginning);
 - Anything the declarations in "foo.h" depends on;
 - "foo.h" itself; and finally
 - Other headers that "foo.c" implementation needs.

That way, people who want to use "foo.h" can guess what needs to be
included before using "foo.h" a lot more easily.

^ permalink raw reply

* Re: [PATCH 3/4] pack-objects: don't traverse objects unnecessarily
From: Dan McGee @ 2011-11-13 22:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: GIT Mailing-list
In-Reply-To: <7v1utdrfsf.fsf@alter.siamese.dyndns.org>

On Sat, Nov 12, 2011 at 12:55 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Dan McGee <dpmcgee@gmail.com> writes:
>
>> On Thu, Oct 27, 2011 at 5:26 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>>> I am not sure if this produces the identical result that was benchmarked
>>> in the original series.
>> I was not either when I wrote the patch, and I had hoped to confirm
>> the results you showed in the message of 1b4bb16b9ec.
>
> I actually am reasonably sure the result will not be identical, but I also
> do not think it matters. The differences would appear only for entries
> that have been filled earlier, which should be a minority.
>
>> unable to figure out how you generated those numbers so I wasn't able
>> to do so (and had planned to get back to you to find out how you made
>> those tables). Were you able to verify the ordering did not regress?
>
> No; I was hoping you would redo the benchmark using 5f44324 (core: log
> offset pack data accesses happened, 2011-07-06).

I'm still not sure what you used to parse these results, so I had to
spend a good amount of time writing up scripts to parse that output.
Anyway, I ran tests that nearly correspond to the ones you quoted on
four different pack-object versions as noted below. The first is one
revision before your original changes, the next two are
self-explanatory, and the final version is with the short diff
included below.

Observations:
* Perhaps I did something wrong, but the v1.7.7.2 numbers don't seem
to agree with the figures you came up with- not sure why that is,
however, and I've already spent a lot of time on this today and don't
really have more time to sink into this.
* The diff included below seems to make a significant difference in
some of the seek values.

dmcgee@galway ~/logs-git
$ ~/projects/git/parse_pack_access.py *git_log.txt
                 1b4bb16b9ec~1          v1.7.7.2          v1.7.7.3
add-whole-family
        0.0:                32                32                32
           32
       10.0:               256               256               256
          256
       20.0:               293               293               293
          293
       30.0:               327               327               327
          327
       40.0:               366               366               366
          366
       50.0:               421               421               421
          421
       60.0:               526               526               526
          526
       70.0:               894               894               894
          895
       80.0:             11612             11625             11622
        11625
       90.0:             97405             97487             97487
        97510
       95.0:            280123            280391            280396
       280391
       99.0:           1251812           1254871           1252919
      1253318
       99.5:           1850181           1853291           1853100
      1853106
       99.9:           4008778           4013897           3988759
      4013897
   accesses:            280450            280464            280457
       280461
 <2MiB seek:             99.61             99.61             99.61
        99.61

dmcgee@galway ~/logs-git
$ ~/projects/git/parse_pack_access.py *git_log_drivers_net.txt
                 1b4bb16b9ec~1          v1.7.7.2          v1.7.7.3
add-whole-family
        0.0:                 0                 0                 0
            0
       10.0:               144                46                46
           46
       20.0:               233                48                48
           48
       30.0:               317                98                97
           97
       40.0:               512              1396              1367
          990
       50.0:              2921            773060            786210
       399996
       60.0:            774452          11594348          10156053
      4532415
       70.0:         333258049         424530065         428113049
    101687854
       80.0:         411869214         438733385         438510929
    106316734
       90.0:         426972983         444510034         443993824
    112362757
       95.0:         432061866         447253337         446466814
    116078451
       99.0:         434915514         453076229         451430514
    118597896
       99.5:         435032830         454359692         452394830
    119008652
       99.9:         435123054         456005056         454017949
    119605604
   accesses:            601405            600780            601732
       601219
 <2MiB seek:             61.68             53.06             53.53
        56.21

dmcgee@galway ~/logs-git
$ ~/projects/git/parse_pack_access.py *blame*.txt
                 1b4bb16b9ec~1          v1.7.7.2          v1.7.7.3
add-whole-family
        0.0:                 0                 0                 0
            0
       10.0:               137                46                46
           46
       20.0:               192                48                48
           48
       30.0:               309                97                97
           97
       40.0:               774              5246              5244
         4034
       50.0:             32585           2643798           2585078
      1518706
       60.0:         376168864         434624162         434479253
    102257691
       70.0:         415045893         438464313         438213313
    104681256
       80.0:         425668210         441472744         441222839
    107541644
       90.0:         430603653         445070643         444450126
    112494824
       95.0:         433514511         447215535         446450183
    116456428
       99.0:         435037297         453431874         451707047
    118722564
       99.5:         435065325         454459123         452652312
    119165598
       99.9:         435149193         456205377         454197863
    119710538
   accesses:            199249            194708            194738
       194875
 <2MiB seek:             54.31             49.25             49.43
        50.94

dmcgee@galway ~/logs-git
$ ~/projects/git/parse_pack_access.py *index*.txt
                 1b4bb16b9ec~1          v1.7.7.2          v1.7.7.3
add-whole-family
        0.0:                 9                 9                 9
            9
       10.0:               137                45                45
           45
       20.0:               224                47                47
           47
       30.0:               315                71                71
           71
       40.0:               449                96                96
           96
       50.0:               808               164               165
          165
       60.0:              2693               289               290
          287
       70.0:             46913               456               458
          448
       80.0:           1456359               966               975
          905
       90.0:          12555961              3423              3486
         2836
       95.0:          44134452             10211             10616
         7075
       99.0:         269753078         304302120         314414250
        35401
       99.5:         353346206         386205936         388585783
        59897
       99.9:         408908212         414260557         415445310
       244643
   accesses:           3050155           3045012           3045025
      3045141
 <2MiB seek:             81.56             98.71             98.65
        99.98


The scripts used, obviously some hardcoded magic here should be able
to use them if you want:

$ cat test_pack.sh
#!/bin/bash -e

versions=('1b4bb16b9ec~1' 'v1.7.7.2' 'v1.7.7.3' 'add-whole-family')
commands=('git log' 'git log drivers/net' 'git blame
drivers/net/netconsole.c' 'git index-pack -v
.git/objects/pack/*.pack')

gitdir=/home/dmcgee/projects/git
linuxdir=/home/dmcgee/projects/linux

export GIT_EXEC_DIR=$gitdir

for version in "${versions[@]}"; do
	echo $version
	cd $gitdir
	git checkout $version
	make -j6 CFLAGS="-march=native -mtune=native -O2 -pipe -g"
PYTHON_PATH=/usr/bin/python2
	cd $linuxdir
	git config core.logpackaccess "/tmp/$version-repack.txt"
	$gitdir/git repack -a -d
	for command in "${commands[@]}"; do
		clean_cmd=${command//\//_}
		clean_cmd=${clean_cmd// /_}
		git config core.logpackaccess "/tmp/$version-$clean_cmd.txt"
		echo $command
		$gitdir/$command >/dev/null
	done
	git config --unset core.logpackaccess
done


$ cat parse_pack_access.py
#!/usr/bin/python2

from collections import defaultdict
import sys

def read_file(filename):
    packs = defaultdict(list)
    with open(filename, 'r') as data:
        for line in data.readlines():
            pack, position = line.strip().split(' ')
            packs[pack].append(int(position))
    return packs

def calculate_seeks(positions):
    seeks = []
    prev = positions[0]
    for position in positions[1:]:
        seeks.append(abs(position - prev))
        prev = position
    return sorted(seeks)

def bucket_seeks(seeks):
    percents = [0.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0,
            95.0, 99.0, 99.5, 99.9]
    results = []
    for percent in percents:
        index = int(percent/100.0 * len(seeks))
        offset = seeks[index]
        results.append((percent, offset))

    return results

def print_result_line(label, results):
    print '%12s%s' % (label,
            ''.join('%18s' % result for result in results))

def main(filenames):
    known_versions = ['1b4bb16b9ec~1', 'v1.7.7.2', 'v1.7.7.3',
'add-whole-family']
    pack_accesses = {}

    for filename in filenames:
        pack_accesses[filename] = read_file(filename)

    bucket_table = defaultdict(list)
    accesses = []
    under_twomb = []
    for version in known_versions:
        filename = [fn for fn in pack_accesses.keys() if
fn.startswith(version)][0]
        access = pack_accesses[filename]

        for pack, positions in access.items():
            seeks = calculate_seeks(positions)
            results = bucket_seeks(seeks)
            for percent, offset in results:
                bucket_table[percent].append(offset)
            accesses.append(len(positions))
            under_twomb.append(sum(1 for s in seeks if s < 2 * 1024 *
1024) * 100.0 / len(seeks))

    print_result_line('', known_versions)
    for k in sorted(bucket_table.keys()):
        print_result_line('%.1f:' % k, bucket_table[k])
    print_result_line('accesses:', accesses)
    print_result_line('<2MiB seek:', ('%.2f' % under for under in under_twomb))

if __name__ == '__main__':
    main(sys.argv[1:])



>From dfee7999b95442a5de2a7a0232c75262d13a28d6 Mon Sep 17 00:00:00 2001
From: Dan McGee <dpmcgee@gmail.com>
Date: Sun, 13 Nov 2011 15:07:13 -0600
Subject: [PATCH] Add whole family when packing objects

Signed-off-by: Dan McGee <dpmcgee@gmail.com>
---
 builtin/pack-objects.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 80ab6c3..0a9e761 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -566,7 +566,7 @@ static struct object_entry **compute_write_order(void)
 	 */
 	for (; i < nr_objects; i++) {
 		if (objects[i].tagged)
-			add_to_write_order(wo, &wo_end, &objects[i]);
+			add_family_to_write_order(wo, &wo_end, &objects[i]);
 	}

 	/*
@@ -576,7 +576,7 @@ static struct object_entry **compute_write_order(void)
 		if (objects[i].type != OBJ_COMMIT &&
 		    objects[i].type != OBJ_TAG)
 			continue;
-		add_to_write_order(wo, &wo_end, &objects[i]);
+		add_family_to_write_order(wo, &wo_end, &objects[i]);
 	}

 	/*
@@ -585,7 +585,7 @@ static struct object_entry **compute_write_order(void)
 	for (i = last_untagged; i < nr_objects; i++) {
 		if (objects[i].type != OBJ_TREE)
 			continue;
-		add_to_write_order(wo, &wo_end, &objects[i]);
+		add_family_to_write_order(wo, &wo_end, &objects[i]);
 	}

 	/*
-- 
1.7.7.3

^ permalink raw reply related

* Re: git behaviour question regarding SHA-1 and commits
From: Dmitry Potapov @ 2011-11-13 22:14 UTC (permalink / raw)
  To: vinassa vinassa; +Cc: git
In-Reply-To: <CAJuRt+r9BjYcead6hgzdUT0Bisz1D48cegqkoJ0S537VMYBy_g@mail.gmail.com>

On Sun, Nov 13, 2011 at 9:04 PM, vinassa vinassa
<vinassa.vinassa@gmail.com> wrote:
>
> I found some mention of this in the archive, more about SHA-1 security
> implications, that were dismissed, but here I am looking at just a
> random, very unfortunate case, and just wondering if in this case I
> would end up in a FUBAR situation.

I do not see how such an event would be very unfortunate considering
that it would make you instantaneously famous, so you could write a
lot of articles about what happened and make a fortunate of it... but
if we consider a _far_ much more likely event like some object from
the sky falling directly on your head at the moment when you are doing
a commit, that I would be really very unfortunate... So, maybe, you
should rent space in a bunker first just to work safely...

Seriously, it is so ridiculous to worry so much about so improbable
event, while in practice a lot of repository corruptions comes from
unreliable DRAM, disk storage, or some other reasons. The mean time
between failures for high quality components is only a few hundred
years while doing a commit every second will take dozen million
times more than the age of our universe to generate a collision. So,
those probabilities are so different that there is nothing in our
every day experiences that has the same scale difference. It is like
a hair width and the distance to the closest star.

Dmitry

^ permalink raw reply

* Re: git behaviour question regarding SHA-1 and commits
From: vinassa vinassa @ 2011-11-13 22:14 UTC (permalink / raw)
  To: git
In-Reply-To: <20111113182757.GA15194@elie.hsd1.il.comcast.net>

Hi, thanks for the responses, I get the picture. Some comments below still.

On Sun, Nov 13, 2011 at 7:27 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> Hi Vinassa,
>
> vinassa vinassa wrote:
>
> > I am wondering about how git behaves currently, if I kinda win the
> > lottery of the universe, and happen to create a commit with a SHA-1
> > that is already the SHA-1 of another commit in the previous history.
> > However improbable.
>
> That would be great!  You could definitely get an academic paper out
> of it.
>
> > Would that be detected, so that I could just add a newline, and then
> > commit with a different resulting SHA-1,
> > would I just lose one of those commits (hopefully the new one), would
> > I end up with a corrupted repository?
>
> I suspect that one of the two commits would "win" the right to be
> shown by commands like "git log".  A commit made after one of the
> commits participating in the hash collision might be stored as a delta
> against the wrong one in the pack, producing errors when you try to
> access it (which is good, since it helps you find the hash collision
> and you can get a paper and prizes).

After cashing in the prizes, I would be able then to git reset --soft,
add a newline, make another commit and go on with my work, right? No
screw up big enough to demand restoring from backups.

> Though I haven't tested.  It would be nice to have an md5git (or even
> truncated-sha1-git) program to test this kind of thing with.

Yes, would be nice. I'll try to see if I can wrap my mind around the
test infrastructure.

> Thanks and hope that helps,
> Jonathan

Thank you for your patience, I understand I should not worry about
this, but this has made me even more curious about what would happen..

Vinassa

^ permalink raw reply

* Re: [PATCH 0/7] New sequencer workflow!
From: Junio C Hamano @ 2011-11-13 20:56 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Git List, Jonathan Nieder, Christian Couder
In-Reply-To: <1321181181-23923-1-git-send-email-artagnon@gmail.com>

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> This series preserves the old workflow:
>
>   $ git cherry-pick foo
>   ... conflict ...
>   $ echo "resolved" >problematicfile
>   $ git add problematicfile
>   $ git commit
>
>   $ git cherry-pick foo..bar
>   ... conflict ...
>   $ echo "resolved" >problematicfile
>   $ git add problematicfile
>   $ git commit
>   $ git revert --continue # No, there are no typos

Hmm, doesn't "git add && git cherry-pick --continue" (i.e. no "git commit"
in between) trigger the "commit the resolution first and then go on"?

> And introduces a brand new workflow:
>
>   $ git cherry-pick foo
>   ... conflict ...
>   $ echo "resolved" >problematicfile
>   $ git add problematicfile
>   $ git sequencer --continue
>
>   $ git cherry-pick foo..bar
>   ... conflict ...
>   $ echo "resolved" >problematicfile
>   $ git add problematicfile
>   $ git sequencer --continue
>
>   $ git revert moo..baz
>   ... conflict ...
>   $ echo "resolved" >problematicfile
>   $ git add problematicfile
>   $ git sequencer --continue

Almost. If these are replaced with "git cherry-pick --continue" and "git
revert --continue" that internally triggers "git sequencer --continue"
*and* errors out if a wrong command was used to "--continue", it would be
perfect.

The longer-term ultimate goal would be to make it "git continue" that
covers more than the sequencer based workflow, i.e. allow "git merge" that
conflicted to be concluded with "edit && git add && git continue".

^ permalink raw reply

* Re: [PATCH 3/5] sequencer: sequencer state is useless without todo
From: Junio C Hamano @ 2011-11-13 20:50 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: Jonathan Nieder, Git List, Christian Couder
In-Reply-To: <CALkWK0nGhUshwJM1vmAUhBG9foH+=6+_KFhfTTF6+kNS0Hm2JA@mail.gmail.com>

Ramkumar Ramachandra <artagnon@gmail.com> writes:

>>>  static int create_seq_dir(void)
>>>  {
>>> +     const char *todo_file = git_path(SEQ_TODO_FILE);
>>>       const char *seq_dir = git_path(SEQ_DIR);
>>
>> Scary idiom.
>
> What's scary about it?

The next person who copies and pastes this code to other codepaths without
thinking that the return value of git_path() is ephemeral and may need to
be saved away depending on what goes between its assignment and its use.

^ permalink raw reply

* Re: [PATCH 2/2] Copy resolve_ref() return value for longer use
From: Junio C Hamano @ 2011-11-13 20:41 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git
In-Reply-To: <1321179735-21890-2-git-send-email-pclouds@gmail.com>

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> resolve_ref() may return a pointer to a static buffer. Callers that
> use this value longer than a couple of statements should copy the
> value to avoid some hidden resolve_ref() call that may change the
> static buffer's value.
>
> The bug found by Tony Wang <wwwjfy@gmail.com> in builtin/merge.c
> demonstrates this. The first call is in cmd_merge()
>
> branch = resolve_ref("HEAD", head_sha1, 0, &flag);
>
> Then deep in lookup_commit_or_die() a few lines after, resolve_ref()
> may be called again and destroy "branch".
>
> lookup_commit_or_die
>  lookup_commit_reference
>   lookup_commit_reference_gently
>    parse_object
>     lookup_replace_object
>      do_lookup_replace_object
>       prepare_replace_object
>        for_each_replace_ref
>         do_for_each_ref
>          get_loose_refs
>           get_ref_dir
>            get_ref_dir
>             resolve_ref
>
> All call sites are checked and made sure that xstrdup() is called if
> the value should be saved.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  Unfortunately there are still 37 resolve_ref() calls. An API change
>  would touch all of them and potentially introduce more bugs along the
>  way, either leak more memory or free() the wrong way.
>
>  So I'd stick with this for v1.7.8.
>
>  builtin/branch.c        |    5 +++-
>  builtin/checkout.c      |    4 ++-
>  builtin/commit.c        |    3 +-
>  builtin/fmt-merge-msg.c |    6 ++++-
>  builtin/merge.c         |   62 ++++++++++++++++++++++++++++++----------------
>  builtin/notes.c         |    6 ++++-
>  builtin/receive-pack.c  |    3 ++
>  reflog-walk.c           |    5 +++-
>  8 files changed, 66 insertions(+), 28 deletions(-)
>
> diff --git a/builtin/branch.c b/builtin/branch.c
> index 0fe9c4d..5b6d839 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -115,8 +115,10 @@ static int branch_merged(int kind, const char *name,
>  		    branch->merge[0] &&
>  		    branch->merge[0]->dst &&
>  		    (reference_name =
> -		     resolve_ref(branch->merge[0]->dst, sha1, 1, NULL)) != NULL)
> +		     resolve_ref(branch->merge[0]->dst, sha1, 1, NULL)) != NULL) {
> +			reference_name = xstrdup(reference_name);
>  			reference_rev = lookup_commit_reference(sha1);
> +		}
>  	}
>  	if (!reference_rev)
>  		reference_rev = head_rev;
> @@ -141,6 +143,7 @@ static int branch_merged(int kind, const char *name,
>  				"         '%s', even though it is merged to HEAD."),
>  				name, reference_name);
>  	}
> +	free((char*)reference_name);
>  	return merged;
>  }

Now reference_name stores the result of xstrdup(), it does not have reason
to be of type "const char *". It is preferable to lose the cast here, I
think. The same comment applies to the remainder of the patch.

Otherwise the patch looks good. Thanks.

^ permalink raw reply

* Re: [PATCH 1/2] Convert many resolve_ref() calls to read_ref*() and ref_exists()
From: Junio C Hamano @ 2011-11-13 20:30 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git
In-Reply-To: <1321179735-21890-1-git-send-email-pclouds@gmail.com>

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> Convert all these call sites to new wrappers to reduce resolve_ref()
> calls from 57 to 34. If we change resolve_ref() prototype later on
> to avoid passing static buffer out, this helps reduce changes.

Indeed the result is very nice and it is almost trivial to see that the
result does not change any behaviour without looking outside the context.

Thanks.

^ permalink raw reply

* Re: git behaviour question regarding SHA-1 and commits
From: Jonathan Nieder @ 2011-11-13 18:27 UTC (permalink / raw)
  To: vinassa vinassa; +Cc: git, Ævar Arnfjörð Bjarmason
In-Reply-To: <CAJuRt+r9BjYcead6hgzdUT0Bisz1D48cegqkoJ0S537VMYBy_g@mail.gmail.com>

Hi Vinassa,

vinassa vinassa wrote:

> I am wondering about how git behaves currently, if I kinda win the
> lottery of the universe, and happen to create a commit with a SHA-1
> that is already the SHA-1 of another commit in the previous history.
> However improbable.

That would be great!  You could definitely get an academic paper out
of it.

> Would that be detected, so that I could just add a newline, and then
> commit with a different resulting SHA-1,
> would I just lose one of those commits (hopefully the new one), would
> I end up with a corrupted repository?

I suspect that one of the two commits would "win" the right to be
shown by commands like "git log".  A commit made after one of the
commits participating in the hash collision might be stored as a delta
against the wrong one in the pack, producing errors when you try to
access it (which is good, since it helps you find the hash collision
and you can get a paper and prizes).

Though I haven't tested.  It would be nice to have an md5git (or even
truncated-sha1-git) program to test this kind of thing with.

Thanks and hope that helps,
Jonathan

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox