* Re: Git: CVS to Git import
From: Matthew Ogilvie @ 2011-11-14 2:44 UTC (permalink / raw)
To: Jvsrvcs; +Cc: git
In-Reply-To: <1321053453892-6987037.post@n2.nabble.com>
On Fri, Nov 11, 2011 at 03:17:33PM -0800, Jvsrvcs wrote:
> Git: CVS to Git import
>
> We are moving from CVS to Git and want to know if anyone has had any
> experience there doing this and could share do's / dont's, best practices
> when doing the initial import.
Some ideas:
I wouldn't trust "git cvsimport". In my testing, it was actaully fairly
common for the resulting git tags and branches to be inconsistent with the
original CVS tags and branches: checking out a tag from CVS and the same
tag from GIT, the trees were often different. See the manpage
for a list of some of the known issues.
Use cvs2git instead.
Write up your own script to do the conversion. Iteratively inspect
the results, find ways to fix up anything you don't like,
and re-run the script. Any "fixups" you want should be
scripted, so that you can try different things, examine
the result. Then when the actual "real" conversion
happens, you have a minimal amount of downtime as you your
already-tested script runs.
The exact fixups your script should do depend on your
circumstances, but in my case, some of things my script did included:
- First, copy the CVS repository, and work with the copy:
- Delete some ",v" files we didn't interested in importing into git for
various reasons.
- Tweak some CVS commit timestamps in some files (such as a version
file), to reduce import odditities. (The most common oddities
resulted from an old CVS workflow that would often sequence:
(a) checkout, (b) modify version number file, (c) build, (d) commit
the new version number file, and (e) tag the sandbox. It was
was moderately common for other changes (in other files) to
be committed between (a) and (d), which will either cause
strange import artifacts or actually break import tools, due to
the out-of-order timestamps. Tweaking back the timestamp in the
CVS file typically allows the import tool to avoid the
oddity. Completely cleaning this up would have been a
lot of work, so I focused just on just improving recent
history.) (sed -i ...)
- Do the bulk of the import work using cvs2git.
- Graft on appropriate merge history (multiple parents) for
CVS merges. To save time, I only worried about recent merges.
- If you have a nice consistent tag naming
convention, there are ways to do this as part of cvs2git.
Unfortunately, we didn't.
- Do not refer to a previous run's commit SHA-1's; they'll
likely change as things change. Use CVS tags instead.
- git rev-parse is useful for looking up current references
to construct graft lines.
- Use git filter-branch to both make the above grafts permanent,
and to fix commiter/author username/email.
- Move imported tags and branches to refs/oldcvstags/*
and refs/oldcvsbranches/*, to bury a lot of the noise
(automatic build tags, tags applied as part of doing a
merge, etc) to where a normal "git clone" will not grab
them, but they can still be fetched manually if necessary.
- Copy/rename a few recent release tags and branches to
normal refs/tags/* and refs/heads/*, when they are actually
useful. (git pack-refs and sed)
- Something like: sleep 5 ; git gc --aggressive --prune='1 second ago'
--
Matthew Ogilvie [mmogilvi_git@miniinfo.net]
^ permalink raw reply
* Re: git behaviour question regarding SHA-1 and commits
From: Junio C Hamano @ 2011-11-14 3:29 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason; +Cc: vinassa vinassa, git
In-Reply-To: <CACBZZX7VTdc2wHYHb1BB-wCJbKLVEmbzQaBTV04S1KDrqeN73A@mail.gmail.com>
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> This is not something you have to worry about, just get on with using
> Git and stop worrying about phenomenally unlikely edge cases that are
> never going to happen.
People who repeated answers along this line, you can stop. The message has
been heard, but without answering the original question.
When we create a new object (i.e. "git add" to register a new blob
contents, "git commit" that internally generates new tree objects to
record updated "whole contents" and then records the commit object), we
first compute what the object name of the new object would be, and then
check if we already have an object with the same object name in the object
store. If we do, we do not write the new copy of the object out (see the
function write_sha1_file() in sha1_file.c and the call to has_sha1_file()
that bypasses write_loose_object()).
So the old contents will be kept without getting overwritten.
Which sounds nice, but it has interesting consequences, as we do not
bother running byte-for-byte comparison when we find what we tried to
write already existed in the object store in order to error out in fear of
the miniscule chance that we would hit a SHA-1 collision.
If the collision is between commit objects, for example, we would write
the (old) commit object name to the tip of the current branch. Most
likely, the tree object recorded in the (old) commit would not match the
tree object your "git commit" wanted to record (otherwise you have hit
SHA-1 collision twice in a row ;-), which would mean "git status" would
show that a whole bunch of paths have changed between the HEAD and the
index. Also "git log" would show the history leading to the (old) commit
that is likely to be very different from what you would expect immediately
after committing the collided commit. Of course, you could recover from it
with "git reset --soft" after finding out what the previous HEAD was from
the reflog, but it won't be a pleasant experience.
There can be other kinds of collisions (e.g. your latest commit might have
collided with an existing blob or tree, in which case it is likely that
almost nothing would work after finding a blob or tree in HEAD).
^ permalink raw reply
* Re: [PATCH 2/2] Copy resolve_ref() return value for longer use
From: Nguyen Thai Ngoc Duy @ 2011-11-14 3:32 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <7vbosfoiuy.fsf@alter.siamese.dyndns.org>
2011/11/14 Junio C Hamano <gitster@pobox.com>:
>> diff --git a/builtin/branch.c b/builtin/branch.c
>> index 0fe9c4d..5b6d839 100644
>> --- a/builtin/branch.c
>> +++ b/builtin/branch.c
>> @@ -115,8 +115,10 @@ static int branch_merged(int kind, const char *name,
>> branch->merge[0] &&
>> branch->merge[0]->dst &&
>> (reference_name =
>> - resolve_ref(branch->merge[0]->dst, sha1, 1, NULL)) != NULL)
>> + resolve_ref(branch->merge[0]->dst, sha1, 1, NULL)) != NULL) {
>> + reference_name = xstrdup(reference_name);
>> reference_rev = lookup_commit_reference(sha1);
>> + }
>> }
>> if (!reference_rev)
>> reference_rev = head_rev;
>> @@ -141,6 +143,7 @@ static int branch_merged(int kind, const char *name,
>> " '%s', even though it is merged to HEAD."),
>> name, reference_name);
>> }
>> + free((char*)reference_name);
>> return merged;
>> }
>
> Now reference_name stores the result of xstrdup(), it does not have reason
> to be of type "const char *". It is preferable to lose the cast here, I
> think. The same comment applies to the remainder of the patch.
But resolve_ref() returns "const char *", we need to type cast at
least once, either at resolve_ref() assignment or at free(), until we
change resolve_ref(). Or should we change resolve_ref() to return
"char *" now?
--
Duy
^ permalink raw reply
* What's cooking in git.git (Nov 2011, #03; Sun, 13)
From: Junio C Hamano @ 2011-11-14 4:01 UTC (permalink / raw)
To: git
What's cooking in git.git (Nov 2011, #03; Sun, 13)
--------------------------------------------------
Here are the topics that have been cooking. Commits prefixed with '-' are
only in 'pu' (proposed updates) while commits prefixed with '+' are in 'next'.
Here are the repositories that have my integration branches:
With maint, master, next, pu, todo:
git://git.kernel.org/pub/scm/git/git.git
git://repo.or.cz/alt-git.git
https://code.google.com/p/git-core/
https://github.com/git/git
With only maint and master:
git://git.sourceforge.jp/gitroot/git-core/git.git
git://git-core.git.sourceforge.net/gitroot/git-core/git-core
With all the topics and integration branches:
https://github.com/gitster/git
The preformatted documentation in HTML and man format are found in:
git://git.kernel.org/pub/scm/git/git-{htmldocs,manpages}.git/
git://repo.or.cz/git-{htmldocs,manpages}.git/
https://code.google.com/p/git-{htmldocs,manpages}.git/
https://github.com/gitster/git-{htmldocs,manpages}.git/
--------------------------------------------------
[New Topics]
* jc/commit-tree-extra (2011-11-12) 2 commits
- commit-tree: teach -C <extra-commit>
- commit-tree: teach -x <extra>
(this branch uses jc/pull-signed-tag; is tangled with jc/signed-commit.)
* nd/resolve-ref (2011-11-13) 2 commits
- Copy resolve_ref() return value for longer use
- Convert many resolve_ref() calls to read_ref*() and ref_exists()
--------------------------------------------------
[Graduated to "master"]
* ab/i18n-test-fix (2011-11-05) 2 commits
(merged to 'next' on 2011-11-06 at f1de9a6)
+ t/t7508-status.sh: use test_i18ncmp
+ t/t6030-bisect-porcelain.sh: use test_i18ngrep
* fc/remote-seturl-usage-fix (2011-11-06) 1 commit
(merged to 'next' on 2011-11-06 at 6c8328c)
+ remote: fix remote set-url usage
* jc/remote-setbranches-usage-fix (2011-11-06) 1 commit
(merged to 'next' on 2011-11-06 at 017606d)
+ remote: fix set-branches usage
* pw/p4-appledouble-fix (2011-11-05) 1 commit
(merged to 'next' on 2011-11-06 at 2ec0af3)
+ git-p4: ignore apple filetype
Regression fix for the upcoming release.
* sn/http-auth-with-netrc-fix (2011-11-04) 1 commit
(merged to 'next' on 2011-11-06 at 60b7f96)
+ http: don't always prompt for password
Regression fix for the upcoming release.
--------------------------------------------------
[Stalled]
* hv/submodule-merge-search (2011-10-13) 4 commits
- submodule.c: make two functions static
- allow multiple calls to submodule merge search for the same path
- push: Don't push a repository with unpushed submodules
- push: teach --recurse-submodules the on-demand option
What the topic aims to achieve may make sense, but the implementation
looked somewhat suboptimal.
* sr/transport-helper-fix-rfc (2011-07-19) 2 commits
- t5800: point out that deleting branches does not work
- t5800: document inability to push new branch with old content
See comments on sr/fix-fast-export-tips topic.
* sr/fix-fast-export-tips (2011-11-05) 3 commits
- fast-export: output reset command for commandline revs
- fast-export: do not refer to non-existing marks
- t9350: point out that refs are not updated correctly
The bottom commit from the stalled sr/transport-helper-fix-rfc topic is
fixed with this. It may make sense to drop the other topic and include
that commit in this series.
The command line parser is still too lax and accepts malformed input, but
this is a step in the right direction and tightening the command line now
should be doable without a low level surgery that touches codepaths that
are unrelated to the command line processing like the previous attempt
used to do.
* jc/lookup-object-hash (2011-08-11) 6 commits
- object hash: replace linear probing with 4-way cuckoo hashing
- object hash: we know the table size is a power of two
- object hash: next_size() helper for readability
- pack-objects --count-only
- object.c: remove duplicated code for object hashing
- object.c: code movement for readability
I do not think there is anything fundamentally wrong with this series, but
the risk of breakage far outweighs observed performance gain in one
particular workload.
* jc/verbose-checkout (2011-10-16) 2 commits
- checkout -v: give full status output after switching branches
- checkout: move the local changes report to the end
This is just to leave a record that the reason why we do not do this not
because we are incapable of coding this, but because it is not a good idea
to do this. I suspect people who are new to git that might think they need
it would soon realize the don't.
Will keep in 'pu' as a showcase for a while and then will drop.
* eh/grep-scale-to-cpunum (2011-11-05) 1 commit
- grep: detect number of CPUs for thread spawning
Kills I/O parallelism and needs to be improved or discarded.
* vr/msvc (2011-10-31) 3 commits
- MSVC: Remove unneeded header stubs
- Compile fix for MSVC: Include <io.h>
- Compile fix for MSVC: Do not include sys/resources.h
It seems this needs to be rehashed with msysgit folks.
* na/strtoimax (2011-11-05) 3 commits
- Support sizes >=2G in various config options accepting 'g' sizes.
- Compatibility: declare strtoimax() under NO_STRTOUMAX
- Add strtoimax() compatibility function.
It seems this needs to be rehashed with msysgit folks.
--------------------------------------------------
[Cooking]
* jc/signed-commit (2011-11-12) 4 commits
- pretty: %G[?GS] placeholders
- test "commit -S" and "log --show-signature"
- log: --show-signature
- commit: teach --gpg-sign option
(this branch uses jc/pull-signed-tag; is tangled with jc/commit-tree-extra.)
Rebased on top of jc/pull-signed-tag topic, after reverting the old one
out of 'next'.
* jc/pull-signed-tag (2011-11-12) 15 commits
- commit-tree: teach -m/-F options to read logs from elsewhere
- commit-tree: update the command line parsing
- commit: teach --amend to carry forward extra headers
- merge: force edit and no-ff mode when merging a tag object
- commit: copy merged signed tags to headers of merge commit
- merge: record tag objects without peeling in MERGE_HEAD
- merge: make usage of commit->util more extensible
- fmt-merge-msg: Add contents of merged tag in the merge message
- fmt-merge-msg: package options into a structure
- fmt-merge-msg: avoid early returns
- refs DWIMmery: use the same rule for both "git fetch" and others
- fetch: allow "git fetch $there v1.0" to fetch a tag
- merge: notice local merging of tags and keep it unwrapped
- fetch: do not store peeled tag object names in FETCH_HEAD
- Split GPG interface into its own helper library
(this branch is used by jc/commit-tree-extra and jc/signed-commit.)
Further updated to allow "commit --amend" to retain the mergetag
headers. I think this is ready for the cycle after upcoming 1.7.8.
* ab/clang-lints (2011-11-06) 2 commits
(merged to 'next' on 2011-11-13 at a573aec)
+ cast variable in call to free() in builtin/diff.c and submodule.c
+ apply: get rid of useless x < 0 comparison on a size_t type
Will keep in 'next' during this cycle.
* ab/pull-rebase-config (2011-11-07) 1 commit
(merged to 'next' on 2011-11-13 at 72bb2d5)
+ pull: introduce a pull.rebase option to enable --rebase
Will keep in 'next' during this cycle.
* nd/fsck-progress (2011-11-06) 4 commits
(merged to 'next' on 2011-11-13 at 8831811)
+ fsck: print progress
+ fsck: avoid reading every object twice
+ verify_packfile(): check as many object as possible in a pack
+ fsck: return error code when verify_pack() goes wrong
Will keep in 'next' during this cycle.
* nd/prune-progress (2011-11-07) 3 commits
(merged to 'next' on 2011-11-13 at c5722ac)
+ reachable: per-object progress
+ prune: handle --progress/no-progress
+ prune: show progress while marking reachable objects
Will keep in 'next' during this cycle.
* jc/stream-to-pack (2011-11-03) 4 commits
- Bulk check-in
- finish_tmp_packfile(): a helper function
- create_tmp_packfile(): a helper function
- write_pack_header(): a helper function
Teaches "git add" to send large-ish blob data straight to a packfile.
This is a continuation to the "large file support" topic. I think this
codepath to move data from worktree to repository needs to become aware of
streaming, just like the checkout codepath that goes the other way, which
was done in the previous "large file support" topic in the 1.7.7 cycle.
* jn/gitweb-side-by-side-diff (2011-10-31) 8 commits
- gitweb: Add navigation to select side-by-side diff
- gitweb: Use href(-replay=>1,...) for formats links in "commitdiff"
- t9500: Add basic sanity tests for side-by-side diff in gitweb
- t9500: Add test for handling incomplete lines in diff by gitweb
- gitweb: Give side-by-side diff extra CSS styling
- gitweb: Add a feature to show side-by-side diff
- gitweb: Extract formatting of diff chunk header
- gitweb: Refactor diff body line classification
Replaces a series from Kato Kazuyoshi on the same topic.
* mf/curl-select-fdset (2011-11-04) 4 commits
(merged to 'next' on 2011-11-06 at a49516f)
+ http: drop "local" member from request struct
+ http.c: Rely on select instead of tracking whether data was received
+ http.c: Use timeout suggested by curl instead of fixed 50ms timeout
+ http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping
Reduces unnecessary waits.
* nd/misc-cleanups (2011-10-27) 6 commits
(merged to 'next' on 2011-10-28 at 2527a49)
+ unpack_object_header_buffer(): clear the size field upon error
+ tree_entry_interesting: make use of local pointer "item"
+ tree_entry_interesting(): give meaningful names to return values
+ read_directory_recursive: reduce one indentation level
+ get_tree_entry(): do not call find_tree_entry() on an empty tree
+ tree-walk.c: do not leak internal structure in tree_entry_len()
These are unquestionably good parts taken out of a larger series, so that
we can focus more on the other changes in later rounds of review.
Will keep in 'next' during this cycle.
* rs/allocate-cache-entry-individually (2011-10-26) 2 commits
(merged to 'next' on 2011-10-27 at 2e4acd6)
+ cache.h: put single NUL at end of struct cache_entry
+ read-cache.c: allocate index entries individually
Will keep in 'next' during this cycle.
* mh/ref-api-3 (2011-10-19) 11 commits
(merged to 'next' on 2011-10-23 at 92e2d35)
+ is_refname_available(): reimplement using do_for_each_ref_in_array()
+ names_conflict(): simplify implementation
+ names_conflict(): new function, extracted from is_refname_available()
+ repack_without_ref(): reimplement using do_for_each_ref_in_array()
+ do_for_each_ref_in_array(): new function
+ do_for_each_ref(): correctly terminate while processesing extra_refs
+ add_ref(): take a (struct ref_entry *) parameter
+ create_ref_entry(): extract function from add_ref()
+ parse_ref_line(): add a check that the refname is properly formatted
+ repack_without_ref(): remove temporary
+ Rename another local variable name -> refname
(this branch uses mh/ref-api-2.)
Will keep in 'next' during this cycle.
* rr/revert-cherry-pick (2011-10-23) 5 commits
(merged to 'next' on 2011-10-26 at 27b7496)
+ revert: simplify communicating command-line arguments
+ revert: allow mixed pick and revert instructions
+ revert: make commit subjects in insn sheet optional
+ revert: simplify getting commit subject in format_todo()
+ revert: free msg in format_todo()
The internals of "git revert/cherry-pick" has been further refactored to
serve as the basis for the sequencer.
Will keep in 'next' during this cycle.
* cb/daemon-permission-errors (2011-10-17) 2 commits
- daemon: report permission denied error to clients
- daemon: add tests
The tip commit might be loosening things a bit too much.
Will keep in 'pu' until hearing a convincing argument for the patch.
* mh/ref-api-2 (2011-10-17) 14 commits
(merged to 'next' on 2011-10-19 at cc89f0e)
+ resolve_gitlink_ref_recursive(): change to work with struct ref_cache
+ Pass a (ref_cache *) to the resolve_gitlink_*() helper functions
+ resolve_gitlink_ref(): improve docstring
+ get_ref_dir(): change signature
+ refs: change signatures of get_packed_refs() and get_loose_refs()
+ is_dup_ref(): extract function from sort_ref_array()
+ add_ref(): add docstring
+ parse_ref_line(): add docstring
+ is_refname_available(): remove the "quiet" argument
+ clear_ref_array(): rename from free_ref_array()
+ refs: rename parameters result -> sha1
+ refs: rename "refname" variables
+ struct ref_entry: document name member
+ cache.h: add comments for git_path() and git_path_submodule()
(this branch is used by mh/ref-api-3.)
Will keep in 'next' during this cycle.
* sg/complete-refs (2011-10-21) 9 commits
(merged to 'next' on 2011-10-26 at d65e2b4)
+ completion: remove broken dead code from __git_heads() and __git_tags()
+ completion: fast initial completion for config 'remote.*.fetch' value
+ completion: improve ls-remote output filtering in __git_refs_remotes()
+ completion: query only refs/heads/ in __git_refs_remotes()
+ completion: support full refs from remote repositories
+ completion: improve ls-remote output filtering in __git_refs()
+ completion: make refs completion consistent for local and remote repos
+ completion: optimize refs completion
+ completion: document __gitcomp()
Will keep in 'next' until an Ack or two from completion folks.
* jc/request-pull-show-head-4 (2011-11-09) 12 commits
(merged to 'next' on 2011-11-13 at e473fd2)
+ request-pull: use the annotated tag contents
(merged to 'next' on 2011-10-15 at 7e340ff)
+ fmt-merge-msg.c: Fix an "dubious one-bit signed bitfield" sparse error
(merged to 'next' on 2011-10-10 at 092175e)
+ environment.c: Fix an sparse "symbol not declared" warning
+ builtin/log.c: Fix an "Using plain integer as NULL pointer" warning
(merged to 'next' on 2011-10-07 at fcaeca0)
+ fmt-merge-msg: use branch.$name.description
(merged to 'next' on 2011-10-06 at fa5e0fe)
+ request-pull: use the branch description
+ request-pull: state what commit to expect
+ request-pull: modernize style
+ branch: teach --edit-description option
+ format-patch: use branch description in cover letter
+ branch: add read_branch_desc() helper function
+ Merge branch 'bk/ancestry-path' into jc/branch-desc
Allow setting "description" for branches and use it to help communications
between humans in various workflow elements. It also allows requesting for
a signed tag to be pulled and shows the tag message in the generated message.
Will keep in 'next' during this cycle.
--------------------------------------------------
[Discarded]
* kk/gitweb-side-by-side-diff (2011-10-17) 2 commits
. gitweb: add a feature to show side-by-side diff
. gitweb: change format_diff_line() to remove leading SP from $diff_class
* jc/check-ref-format-fixup (2011-10-19) 2 commits
(merged to 'next' on 2011-10-19 at 98981be)
+ Revert "Restrict ref-like names immediately below $GIT_DIR"
(merged to 'next' on 2011-10-15 at 8e89bc5)
+ Restrict ref-like names immediately below $GIT_DIR
This became a no-op except for the bottom one which is part of the other
topic now.
^ permalink raw reply
* Re: [PATCH 2/2] Copy resolve_ref() return value for longer use
From: Junio C Hamano @ 2011-11-14 4:03 UTC (permalink / raw)
To: Nguyen Thai Ngoc Duy; +Cc: git
In-Reply-To: <CACsJy8BnqoPVJiM6mbq7p3gKtLh-KGUuTshcukGokC3istTxMQ@mail.gmail.com>
Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>> Now reference_name stores the result of xstrdup(), it does not have reason
>> to be of type "const char *". It is preferable to lose the cast here, I
>> think. The same comment applies to the remainder of the patch.
>
> But resolve_ref() returns "const char *", we need to type cast at
> least once, either at resolve_ref() assignment or at free(), until we
> change resolve_ref().
In any case, I do not think it matters either way, so I queued this patch
to 'pu' unmodified.
This patch uses xstrdup() on return value of resolve_ref() only at
hand-picked places; while the choice of the places the patch decided not
to call free() looked reasonable from a quick review, both of us may be
blind and it may introduce huge leaks in a repeatedly called function.
A more extensive patch that would turn resolve_ref() to return an
allocated piece of memory share the same risk of adding new leaks at the
callsites, and this late in the cycle, neither will be in 1.7.8 anyway.
Given that if we build the more extensive patch on top of this one after
1.7.8, it will need to undo xstrdup() in this patch, add free()s that
become necessary, in addition to having to add free()s that this patch
might have potentially forgot, I have a feeling that we should just drop
this [2/2] and do a more thorough fix after 1.7.8 release is done
immediately on top of [1/2].
Thanks.
^ permalink raw reply
* Re: [PATCH 3/4] pack-objects: don't traverse objects unnecessarily
From: Junio C Hamano @ 2011-11-14 5:40 UTC (permalink / raw)
To: Dan McGee; +Cc: GIT Mailing-list
In-Reply-To: <CAEik5nPJ3r6gp9Lttzh5aQmiPRFxpZvhTBXZoreY98QV6Cocdg@mail.gmail.com>
Dan McGee <dpmcgee@gmail.com> writes:
>>> unable to figure out how you generated those numbers so I wasn't able
>>> to do so (and had planned to get back to you to find out how you made
>>> those tables). Were you able to verify the ordering did not regress?
>>
>> No; I was hoping you would redo the benchmark using 5f44324 (core: log
>> offset pack data accesses happened, 2011-07-06).
>
> I'm still not sure what you used to parse these results,...
Ah, in the kernel repository, after running "repack -a -d -f" with
versions of git and copying the resulting packfiles in PACK-OLD/ and
PACK-NEW/, I used these scripts to examine the access pattern.
-- >8 -- DOIT.sh -- >8 --
#!/bin/sh
tmp=/var/tmp/ll$
trap 'rm -f "$tmp.*"' 0
ln -f PACK-OLD/* .git/objects/pack/. || exit
log="$tmp.old"
eval '/usr/bin/time rungit test -c core.logpackaccess="$log" '"$*"
ln -f PACK-NEW/* .git/objects/pack/. || exit
log="$tmp.new"
eval '/usr/bin/time rungit test -c core.logpackaccess="$log" '"$*"
perl OFS.perl "$tmp.old" "$tmp.new"
-- 8< -- DOIT.sh -- 8< --
-- >8 -- OFS.perl -- >8 --
#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long;
my $verbose;
exit(1) if (!GetOptions("verbose" => \$verbose));
sub take_one {
my ($filename) = @_;
my (%lofs, $num);
my @diff;
open my $in, '<', $filename;
$num = 0;
while (<$in>) {
my ($file, $ofs) = split(' ');
if (!exists $lofs{$file}) {
$lofs{$file} = [$num++, 0];
}
my $diff = $ofs - $lofs{$file}[1];
$lofs{$file}[1] = $ofs;
push @diff, abs($diff);
print "$lofs{$file}[0] $diff $ofs\n" if $verbose;
}
return \@diff;
}
sub bsearch {
my ($list, $target) = @_;
my ($hi, $lo) = ((scalar @$list), 0);
while ($lo < $hi) {
my $mi = int(($lo + $hi) / 2);
if ($list->[$mi] == $target) {
return $mi;
} elsif ($list->[$mi] < $target) {
$lo = $mi + 1;
} else {
$hi = $mi;
}
}
return $hi;
}
my @percentile = ();
for (my $i = 0; $i < 100; $i += 10) {
push @percentile, $i;
}
push @percentile, 95, 99, 99.9, 99.99;
sub thcomma {
my ($intval) = @_;
my $result = "";
while ($intval > 1000) {
my $rem = $intval % 1000;
if ($result ne "") {
$result = sprintf "%03d,%s", $rem, $result;
} else {
$result = sprintf "%03d", $rem;
}
$intval -= $rem;
$intval /= 1000;
}
if ($intval) {
if ($result ne "") {
$result = sprintf "%d,%s", $intval, $result;
} else {
$result = sprintf "%d", $intval;
}
}
$result =~ s/^[0,]*//;
$result = "0" if ($result eq "");
return $result;
}
sub show_stat {
my ($diff1, $diff2) = @_;
my ($i, $ix);
if ($diff2) {
@$diff2 = sort { $a <=> $b } @$diff2;
}
@$diff1 = sort { $a <=> $b } @$diff1;
printf "\nTotal number of access : %12s", thcomma(scalar(@$diff1));
printf "%12s", thcomma(scalar(@$diff2)) if ($diff2);
for $i (@percentile) {
$ix = scalar(@$diff1) * $i / 100;
printf "\n %5.2f%% percentile : %12s", $i, thcomma($diff1->[$ix]);
if ($diff2) {
$ix = scalar(@$diff2) * $i / 100;
printf "%12s", thcomma($diff2->[$ix]);
}
}
$ix = bsearch($diff1, 2 * 1024 * 1024);
printf "\n Less than 2MiB seek : %5.2f%%", ($ix * 100.0 / @$diff1);
if ($diff2) {
$ix = bsearch($diff2, 2 * 1024 * 1024);
printf " %5.2f%%", ($ix * 100.0 / @$diff2);
}
print "\n";
}
my ($diff1, $diff2);
$diff1 = take_one($ARGV[0]);
$diff2 = take_one($ARGV[1]) if ($ARGV[1]);
show_stat($diff1, $diff2);
-- 8< -- OFS.perl -- 8< --
^ permalink raw reply
* Re: [RFC] deprecating and eventually removing "git relink"?
From: Miles Bader @ 2011-11-14 6:06 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <7v4ny7mtbx.fsf@alter.siamese.dyndns.org>
Junio C Hamano <gitster@pobox.com> writes:
> (2) allowing two repositories that started independently to share objects
> using the alternates mechanism after the fact.
Can they not already?
I mean, it works great right now to do:
cd $REP2
echo $REP1/.git/objects > .git/objects/info/alternates
git gc
Do you mean a more elaborate UI that does this nicely...? or something
else?
It might be nice to have a mechanism where new objects would update
the _alternate_ rather than the object-store in the tree where the
command was run... then you could easily have a bunch of trees using a
central object store without needing to update the central store
occasionally by hand (and do gc in its "clients")...
-Miles
--
"Most attacks seem to take place at night, during a rainstorm, uphill,
where four map sheets join." -- Anon. British Officer in WW I
^ permalink raw reply
* Re: [RFC] deprecating and eventually removing "git relink"?
From: Junio C Hamano @ 2011-11-14 6:27 UTC (permalink / raw)
To: Miles Bader; +Cc: git
In-Reply-To: <buomxbzutjm.fsf@dhlpc061.dev.necel.com>
Miles Bader <miles@gnu.org> writes:
> Do you mean a more elaborate UI that does this nicely...?
Yes, that is what I meant. I also have a feeling that people would prefer
to have an option that treats these two repositories equally; your
illustration makes one a subordinate to the other.
^ permalink raw reply
* Re: What's cooking in git.git (Nov 2011, #03; Sun, 13)
From: Johannes Sixt @ 2011-11-14 7:19 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Vincent van Ravesteijn, Ramsay Jones, msysGit
In-Reply-To: <7vmxbzl5ch.fsf@alter.siamese.dyndns.org>
IMO, these two topics can move forward:
> * vr/msvc (2011-10-31) 3 commits
> - MSVC: Remove unneeded header stubs
> - Compile fix for MSVC: Include <io.h>
> - Compile fix for MSVC: Do not include sys/resources.h
>
> It seems this needs to be rehashed with msysgit folks.
With these patches, git can be built with MSVC. The result is usable,
although a few tests still fail.
> * na/strtoimax (2011-11-05) 3 commits
> - Support sizes >=2G in various config options accepting 'g' sizes.
> - Compatibility: declare strtoimax() under NO_STRTOUMAX
> - Add strtoimax() compatibility function.
>
> It seems this needs to be rehashed with msysgit folks.
There were a few curiosities around strtoimax being present in MinGW or
not, but these have been resolved. Also, whether or not we should define
NO_STRTOUMAX for the MinGW build is an independent matter.
-- Hannes
^ permalink raw reply
* Re: git behaviour question regarding SHA-1 and commits
From: Johannes Sixt @ 2011-11-14 7:39 UTC (permalink / raw)
To: vinassa vinassa; +Cc: git
In-Reply-To: <CAJuRt+r9BjYcead6hgzdUT0Bisz1D48cegqkoJ0S537VMYBy_g@mail.gmail.com>
Am 11/13/2011 18:04, schrieb vinassa vinassa:
> I am wondering about how git behaves currently, if I kinda win the
> lottery of the universe, and happen to create a commit with a SHA-1
> that is already the SHA-1 of another commit in the previous history.
> However improbable.
>
> Would that be detected, so that I could just add a newline, and then
> commit with a different resulting SHA-1,
> would I just lose one of those commits (hopefully the new one), would
> I end up with a corrupted repository?
I *think* the following would happen:
1. Git detects that the (commit) object that it is about to generate
already exists, and does not write a new one.
2. Then the branch's ref is updated to the SHA-1. Since the original
commit is somewhere back in history, this is effectively like 'git reset
--soft that-commit'.
3. At your next 'git diff --cached', you notice unexpected differences
between the index and the branch head. You will wonder what happened.
("Who typed 'git reset --soft that-commit' while I was looking the other
way??")
4. To recover, you just 'git reset --soft @{1}' to revert to the state
before the commit attempt, and commit again. Your commit message from the
first attempt will be lost unless you have used -C or -F for your commit.
At any rate, you can reuse the exact same commit message for this second
commit attempt, because by now time will have advanced by at least one
second, which gives you a different commit timestamp and, hence, a
different commit object.
-- Hannes
^ permalink raw reply
* Re: [RFC] deprecating and eventually removing "git relink"?
From: Simon Brenner @ 2011-11-14 8:48 UTC (permalink / raw)
To: git
In-Reply-To: <buomxbzutjm.fsf@dhlpc061.dev.necel.com>
I think one of the most annoying aspects of alternates (beyond the
hassle of adding/removing them except using clone --reference) is the
danger of losing data if you aren't absolutely sure that your
alternate is stable and won't ever lose references to objects.
If the alternate just had links to the referring repositories, I think
this hole could be neatly closed.
On Mon, Nov 14, 2011 at 7:06 AM, Miles Bader <miles@gnu.org> wrote:
> It might be nice to have a mechanism where new objects would update
> the _alternate_ rather than the object-store in the tree where the
> command was run... then you could easily have a bunch of trees using a
> central object store without needing to update the central store
> occasionally by hand (and do gc in its "clients")...
This sounds like a nice way forward: replace/extend the current
alternates system with support for a shared object store that is
"intelligently" shared so that it can be gc:d based on all refs from
all referring repositories. I imagine it would be something very much
like a bare repository - except it wouldn't have any refs of its own,
just a list of other repositories it should search for refs when
GC:ing.
The object store currently built into each git repository could even
become a special case of that: a shared object store (that happens to
reside under .git) with a single referring repository (the parent .git
dir). If the location of the object store is configurable, clone
--reference could simply point the new repository directly to the
shared store instead of ever setting up a local object store.
// Simon
^ permalink raw reply
* Re: [RFC] deprecating and eventually removing "git relink"?
From: Chris Packham @ 2011-11-14 9:03 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Miles Bader, git
In-Reply-To: <7v62inkymg.fsf@alter.siamese.dyndns.org>
On 14/11/11 19:27, Junio C Hamano wrote:
> Miles Bader <miles@gnu.org> writes:
>
>> Do you mean a more elaborate UI that does this nicely...?
>
> Yes, that is what I meant. I also have a feeling that people would prefer
> to have an option that treats these two repositories equally; your
> illustration makes one a subordinate to the other.
Not sure if it's what you're after but there was this patch [1] that I
was kicking around a while back. I've still got the code in an old
branch if there is interest in resurrecting it. It looks like I started
addressing Junio's comments and never posted v3.
[1] http://article.gmane.org/gmane.comp.version-control.git/143164
^ permalink raw reply
* Re: Git shouldn't allow to push a new branch called HEAD
From: Daniele Segato @ 2011-11-14 9:07 UTC (permalink / raw)
To: Git Mailing List
In-Reply-To: <1318592153.2938.21.camel@mastroc3.mobc3.local>
On Fri, 2011-10-14 at 13:35 +0200, Daniele Segato wrote:
> On Fri, 2011-10-14 at 13:31 +0200, Daniele Segato wrote:
> > Hi all,
> >
> >
> > following from a discussion in IRC freenode #git between me, sitaram an
> > shruggar
> >
> >
> > step to reproduce:
> >
> > $ mkdir /tmp/gitbug
> > $ cd /tmp/gitbug/
> >
> > $ # create a fake remote repo
> > $ git init --bare remote.git
> >
> > $ # clone it with the user that will generate the bug
> > $ git clone remote.git buggenerator
> > $ cd buggenerator/
> > $ touch whatever
> > $ git add .
> > $ git commit -m "first commit"
> > $ git push origin master
> >
> > $ # now clone the same repo the other guy is the "victim" of this issue
> > $ cd ..
> > $ git clone remote.git victim
> >
> > $ # time to create the remote HEAD branch
> > $ cd buggenerator/
> > $ git push origin HEAD:HEAD
> >
> > $ # the remote refs has been created!
> > $ git ls-remote
> >
> > $ # another commit
> > $ echo 'any change' >> whatever
> > $ git commit -a -m "some change"
> > $ git push origin master
> >
> > $ # the refs/heads/HEAD is still where it was
> > $ git ls-remote
> >
> > $ # now from the victim perspective
> > $ cd ../victim/
> >
> > $ # every time executing a fetch he will get a force update
> > $ # or maybe even an error, seen it my real repo, don't know how
> > $ # to reproduce
> > $ git fetch
> > $ git fetch
> > $ git ls-remote
> > $ git fetch
> > $ git ls-remote
> > $ git branch -a
>
> This should also help understanding what happen in the "victim" local
> repo at every fetch:
>
> mastro@mastroc3 /tmp/gitbug/victim (master) $ git br -av
> * master 11d0a12 [behind 1] first commit
> remotes/origin/HEAD -> origin/master
> remotes/origin/master 77852ef some change
> mastro@mastroc3 /tmp/gitbug/victim (master) $ git fetch
> From /tmp/gitbug/remote
> + 77852ef...11d0a12 HEAD -> origin/HEAD (forced update)
> mastro@mastroc3 /tmp/gitbug/victim (master) $ git br -av
> * master 11d0a12 first commit
> remotes/origin/HEAD -> origin/master
> remotes/origin/master 11d0a12 first commit
Hi again,
I'm aware my request has been ignored for a good reason but I would
appreciate someone stepping in and explaining to me why this is not a
bug or why it has been ignored.
Thanks.
Regards,
Daniele Segato
^ permalink raw reply
* Re: [RFC] deprecating and eventually removing "git relink"?
From: Junio C Hamano @ 2011-11-14 10:24 UTC (permalink / raw)
To: Miles Bader; +Cc: git
In-Reply-To: <buomxbzutjm.fsf@dhlpc061.dev.necel.com>
Miles Bader <miles@gnu.org> writes:
> It might be nice to have a mechanism where new objects would update
> the _alternate_ rather than the object-store in the tree where the
> command was run.
With the alternate mechanism, your borrowing is read-only and that is
exactly why you can borrow from other peoples' repositories to which you
have no write permission to.
What you are suggesting is fundamentally different from the alternates
mechanism. I am not saying it is better or worse, though. Not yet at this
point in this message.
> .. then you could easily have a bunch of trees using a
> central object store without needing to update the central store
> occasionally by hand (and do gc in its "clients")...
If you write objects to the central store, "gc" in the "clients" will be a
no-op because they do not have their own objects. But instead, crufts your
"clients" accumulate will be in the central store. There is still need for
"gc" at the central store to remove things that are no longer used by any
client, isn't it? Unless you declare that you do not care because perhaps
the central store is large enough, that is.
At least with the alternates, running "gc" in the "clients" is a safe
operation and the only change necessary is to make fsck/repack aware of
the repositories that borrow from the repository these commands are run,
and the logic to do so is exactly the same as the case to run "gc" in your
central store, I would think.
^ permalink raw reply
* Re: [RFC] deprecating and eventually removing "git relink"?
From: Jeff King @ 2011-11-14 10:34 UTC (permalink / raw)
To: Simon Brenner; +Cc: git
In-Reply-To: <CAD=rjTXgH+AivmK+zLurQVC+=p1UYqFy_p=wBF-1-TOQ=Cqjtw@mail.gmail.com>
On Mon, Nov 14, 2011 at 09:48:07AM +0100, Simon Brenner wrote:
> On Mon, Nov 14, 2011 at 7:06 AM, Miles Bader <miles@gnu.org> wrote:
> > It might be nice to have a mechanism where new objects would update
> > the _alternate_ rather than the object-store in the tree where the
> > command was run... then you could easily have a bunch of trees using a
> > central object store without needing to update the central store
> > occasionally by hand (and do gc in its "clients")...
>
> This sounds like a nice way forward: replace/extend the current
> alternates system with support for a shared object store that is
> "intelligently" shared so that it can be gc:d based on all refs from
> all referring repositories. I imagine it would be something very much
> like a bare repository - except it wouldn't have any refs of its own,
> just a list of other repositories it should search for refs when
> GC:ing.
Yes, I think that is sensible. I'm not sure there is even any core git
code to be written. I think a wrapper that does the following would
probably work:
1. Make new repo groups. E.g.:
$ git share init foo
which would be implemented something like:
ROOT=$HOME/.git-share
git init --bare $ROOT/$1
2. Add a repo to a group.
$ git share add foo
implemented as:
echo $ROOT/$1/objects >>.git/objects/info/alternates
git --git-dir=$ROOT/$1 config --add share.child $PWD
3. Compact a group.
$ git share compact foo
implemented as:
# delete any existing refs
git for-each-ref --format='%(refname)' | xargs git update-ref -d
# now make new refs for each child
n=1
for dir in `git config --all share.child`; do
if ! test -d $dir; then
echo >&2 "warning: $dir went away"
continue
fi
git fetch $dir refs/*:refs/$1/*
n=$(($n + 1))
done
# and then repack/prune
git repack -ad
# and then gc each child, dropping anything in the share
for dir in `git config --all share.child`; do
git --git-dir=$dir gc
done
I'm sure I'm missing a corner case or two, and of course there are
quoting issues and error handling missing. But the point is, I don't
think there's a real reason that the UI can't wrap the existing
mechanism, creating a momentary list of refs and pruning based on that.
One issue with this scheme (or most similar schemes) is that child repos
are uniquely identified by their directory name. In the absence of
alternates, it's perfectly reasonable to do:
git init; hack hack hack; commit commit commit
cd .. ; mv project new-project-name
but here it would break the shared repo's link to the child (which is
not just inconvenient, but dangerous, as we will not respect its refs
when pruning). Probably the "warning" above should actually error out
and force the user to say "yes, I deleted this child" or "no, I moved it
here".
You could try to be clever with assigning each child a UUID, but then
you have to resort to grepping the filesystem for the UUID to detect a
move. Which is complex and still not foolproof (i.e., if you don't find
it, is it because the repo was deleted, or because it got moved
somewhere that we didn't look?).
-Peff
^ permalink raw reply
* Re: Git shouldn't allow to push a new branch called HEAD
From: Michael Haggerty @ 2011-11-14 10:45 UTC (permalink / raw)
To: Daniele Segato; +Cc: Git Mailing List
In-Reply-To: <1321261662.2941.13.camel@mastroc3.mobc3.local>
On 11/14/2011 10:07 AM, Daniele Segato wrote:
> On Fri, 2011-10-14 at 13:35 +0200, Daniele Segato wrote:
>> On Fri, 2011-10-14 at 13:31 +0200, Daniele Segato wrote:
>>> following from a discussion in IRC freenode #git between me, sitaram an
>>> shruggar
>>>
>>>
>>> step to reproduce:
>>>
>>> $ mkdir /tmp/gitbug
>>> $ cd /tmp/gitbug/
>>>
>>> $ # create a fake remote repo
>>> $ git init --bare remote.git
>>>
>>> $ # clone it with the user that will generate the bug
>>> $ git clone remote.git buggenerator
>>> $ cd buggenerator/
>>> $ touch whatever
>>> $ git add .
>>> $ git commit -m "first commit"
>>> $ git push origin master
>>>
>>> $ # now clone the same repo the other guy is the "victim" of this issue
>>> $ cd ..
>>> $ git clone remote.git victim
>>>
>>> $ # time to create the remote HEAD branch
>>> $ cd buggenerator/
>>> $ git push origin HEAD:HEAD
>>>
>>> $ # the remote refs has been created!
>>> $ git ls-remote
>>>
>>> $ # another commit
>>> $ echo 'any change' >> whatever
>>> $ git commit -a -m "some change"
>>> $ git push origin master
>>>
>>> $ # the refs/heads/HEAD is still where it was
>>> $ git ls-remote
>>>
>>> $ # now from the victim perspective
>>> $ cd ../victim/
>>>
>>> $ # every time executing a fetch he will get a force update
>>> $ # or maybe even an error, seen it my real repo, don't know how
>>> $ # to reproduce
>>> $ git fetch
>>> $ git fetch
>>> $ git ls-remote
>>> $ git fetch
>>> $ git ls-remote
>>> $ git branch -a
>>
>> This should also help understanding what happen in the "victim" local
>> repo at every fetch:
>>
>> mastro@mastroc3 /tmp/gitbug/victim (master) $ git br -av
>> * master 11d0a12 [behind 1] first commit
>> remotes/origin/HEAD -> origin/master
>> remotes/origin/master 77852ef some change
>> mastro@mastroc3 /tmp/gitbug/victim (master) $ git fetch
>> From /tmp/gitbug/remote
>> + 77852ef...11d0a12 HEAD -> origin/HEAD (forced update)
>> mastro@mastroc3 /tmp/gitbug/victim (master) $ git br -av
>> * master 11d0a12 first commit
>> remotes/origin/HEAD -> origin/master
>> remotes/origin/master 11d0a12 first commit
>
> I'm aware my request has been ignored for a good reason but I would
> appreciate someone stepping in and explaining to me why this is not a
> bug or why it has been ignored.
This is a nice little bug.
I'm sure that you noticed that running "git fetch" repeatedly from the
"victim" repository alternates between two behaviors (I'm using 1.7.7.2):
> $ git fetch
> From /home/mhagger/tmp/gitbug/remote
> + 6bf3df1...4c9ebba HEAD -> origin/HEAD (forced update)
> $ git for-each-ref
> 4c9ebba3c0618bd6238a810013da4a8cd4f2213b commit refs/heads/master
> 4c9ebba3c0618bd6238a810013da4a8cd4f2213b commit refs/remotes/origin/HEAD
> 4c9ebba3c0618bd6238a810013da4a8cd4f2213b commit refs/remotes/origin/master
> $ git fetch
> From /home/mhagger/tmp/gitbug/remote
> 4c9ebba..6bf3df1 master -> origin/master
> $ git for-each-ref
> 4c9ebba3c0618bd6238a810013da4a8cd4f2213b commit refs/heads/master
> 6bf3df178cd92ca72625ae5bda9206c4333fd807 commit refs/remotes/origin/HEAD
> 6bf3df178cd92ca72625ae5bda9206c4333fd807 commit refs/remotes/origin/master
> $ git fetch
> From /home/mhagger/tmp/gitbug/remote
> + 6bf3df1...4c9ebba HEAD -> origin/HEAD (forced update)
> $ git fetch
> From /home/mhagger/tmp/gitbug/remote
> 4c9ebba..6bf3df1 master -> origin/master
The whole time, victim's .git/HEAD contains "ref: refs/heads/master",
.git/refs/remotes/origin/HEAD contains "ref:
refs/remotes/origin/master", and its packed-refs file contains
# pack-refs with: peeled
4c9ebba3c0618bd6238a810013da4a8cd4f2213b refs/remotes/origin/master
In "remote.git", refs/heads/HEAD contains not a symbolic reference but
the explicit SHA1 "4c9ebba...". This is of course not affected by
running "git fetch" in the "victim" tree. Deleting this file makes the
problem go away.
Given that this problem seems to be in the remote protocol rather than
in the refs API, I think I'll stop working on this. I hope that my
observations are helpful to somebody.
Michael
--
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/
^ permalink raw reply
* [PATCH] tag: implement --no-strip option
From: Kirill A. Shutemov @ 2011-11-14 11:08 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Kirill A. Shutemov
From: "Kirill A. Shutemov" <kirill@shutemov.name>
--no-strip turns off strip any comments or empty lines.
It's useful if you want to take a tag message as-is, without any
stripping.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
Documentation/git-tag.txt | 4 ++++
builtin/tag.c | 13 ++++++++-----
2 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/Documentation/git-tag.txt b/Documentation/git-tag.txt
index c83cb13..947d4e5 100644
--- a/Documentation/git-tag.txt
+++ b/Documentation/git-tag.txt
@@ -99,6 +99,10 @@ OPTIONS
Implies `-a` if none of `-a`, `-s`, or `-u <key-id>`
is given.
+-S::
+--no-strip::
+ Take tag message as-is. Do not strip any comments or empty lines.
+
<tagname>::
The name of the tag to create, delete, or describe.
The new tag name must pass all checks defined by
diff --git a/builtin/tag.c b/builtin/tag.c
index 9b6fd95..427d646 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -320,7 +320,7 @@ static int build_tag_object(struct strbuf *buf, int sign, unsigned char *result)
}
static void create_tag(const unsigned char *object, const char *tag,
- struct strbuf *buf, int message, int sign,
+ struct strbuf *buf, int message, int sign, int nostrip,
unsigned char *prev, unsigned char *result)
{
enum object_type type;
@@ -356,7 +356,7 @@ static void create_tag(const unsigned char *object, const char *tag,
if (!is_null_sha1(prev))
write_tag_body(fd, prev);
- else
+ else if (!nostrip)
write_or_die(fd, _(tag_template), strlen(_(tag_template)));
close(fd);
@@ -367,7 +367,8 @@ static void create_tag(const unsigned char *object, const char *tag,
}
}
- stripspace(buf, 1);
+ if (!nostrip)
+ stripspace(buf, 1);
if (!message && !buf->len)
die(_("no tag message?"));
@@ -423,7 +424,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
const char *object_ref, *tag;
struct ref_lock *lock;
- int annotate = 0, sign = 0, force = 0, lines = -1,
+ int annotate = 0, sign = 0, nostrip = 0, force = 0, lines = -1,
list = 0, delete = 0, verify = 0;
const char *msgfile = NULL, *keyid = NULL;
struct msg_arg msg = { 0, STRBUF_INIT };
@@ -443,6 +444,8 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
"tag message", parse_msg_arg),
OPT_FILENAME('F', "file", &msgfile, "read message from file"),
OPT_BOOLEAN('s', "sign", &sign, "annotated and GPG-signed tag"),
+ OPT_BOOLEAN('S', "no-strip", &nostrip,
+ "turn off tag message stripping"),
OPT_STRING('u', "local-user", &keyid, "key-id",
"use another key to sign the tag"),
OPT__FORCE(&force, "replace the tag if exists"),
@@ -525,7 +528,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
if (annotate)
create_tag(object, tag, &buf, msg.given || msgfile,
- sign, prev, object);
+ sign, nostrip, prev, object);
lock = lock_any_ref_for_update(ref.buf, prev, 0);
if (!lock)
--
1.7.7.2
^ permalink raw reply related
* Re: What's cooking in git.git (Nov 2011, #03; Sun, 13)
From: Jeff King @ 2011-11-14 11:10 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <7vmxbzl5ch.fsf@alter.siamese.dyndns.org>
On Sun, Nov 13, 2011 at 08:01:50PM -0800, Junio C Hamano wrote:
> * jc/lookup-object-hash (2011-08-11) 6 commits
> - object hash: replace linear probing with 4-way cuckoo hashing
> - object hash: we know the table size is a power of two
> - object hash: next_size() helper for readability
> - pack-objects --count-only
> - object.c: remove duplicated code for object hashing
> - object.c: code movement for readability
>
> I do not think there is anything fundamentally wrong with this series, but
> the risk of breakage far outweighs observed performance gain in one
> particular workload.
FWIW, I finally got a chance to read through this series. It was fun, as
I had not looked at cuckoo hashing before. However, the performance
results were a bit underwhelming, and the code is more complex, which
left me a bit negative. I also took a quick try at quadratic probing,
which is only a few extra lines of code. I wasn't able to show any real
performance improvement, though.
I suspect it is because our hash table is not all that big, and we keep
it pretty sparse, so linear probing does well. Googling around, it seems
that linear probing performs well up to about 70% load factor, but
there's surprisingly little theory behind it.
I notice that the decorate.c hash keeps us below 2/3 full, but the
object.c hash keeps us at 1/2. From my reading, that's just wasting
space. Pushing the boundary up to 2/3 and trying your "--count-objects"
on git.git, I don't see a big performance difference (with my change,
the best-of-5 was a little better, but well within the noise). It does
drop the maxresident by a few percent.
So I don't think it's a big deal either way, but the code change is
pretty trivial.
-Peff
^ permalink raw reply
* Re: Git shouldn't allow to push a new branch called HEAD
From: Jeff King @ 2011-11-14 11:16 UTC (permalink / raw)
To: Michael Haggerty; +Cc: Daniele Segato, Git Mailing List
In-Reply-To: <4EC0F15A.9010502@alum.mit.edu>
On Mon, Nov 14, 2011 at 11:45:46AM +0100, Michael Haggerty wrote:
> The whole time, victim's .git/HEAD contains "ref: refs/heads/master",
> .git/refs/remotes/origin/HEAD contains "ref:
> refs/remotes/origin/master", and its packed-refs file contains
>
> # pack-refs with: peeled
> 4c9ebba3c0618bd6238a810013da4a8cd4f2213b refs/remotes/origin/master
>
> In "remote.git", refs/heads/HEAD contains not a symbolic reference but
> the explicit SHA1 "4c9ebba...". This is of course not affected by
> running "git fetch" in the "victim" tree. Deleting this file makes the
> problem go away.
>
>
> Given that this problem seems to be in the remote protocol rather than
> in the refs API, I think I'll stop working on this. I hope that my
> observations are helpful to somebody.
I didn't recreate the test situation and look closely, but my impression
is that this isn't a code bug at all, but rather a design problem in the
way we store remote namespaces. That is, we make "refs/remotes/foo/HEAD"
a symbolic ref with special meaning, but then fetch into it from the
remote's refs/heads namespace, writing remote's HEAD branch into
whatever our HEAD symref points to.
So one solution is to block fetching of remote branches called HEAD
(which I would be OK with). But another is to use a more sensible layout
for representing the remote refs, like:
refs/remotes/origin/HEAD (a symbolic ref)
refs/remotes/origin/heads/master
refs/remotes/origin/tags/v1.0
etc. Then the namespaces are properly separated, and the magic remote
"HEAD" symref is not in the way.
Obviously there's a lot more to it than just tweaking the default fetch
refspecs. The ref lookup rules need to be changed to take this into
account. There was some discussion about this over the summer (under the
subject of possible "1.8.0" changes), but I don't think any work has
been done.
-Peff
^ permalink raw reply
* Re: [PATCH 2/2] Copy resolve_ref() return value for longer use
From: Jeff King @ 2011-11-14 11:24 UTC (permalink / raw)
To: Nguyen Thai Ngoc Duy; +Cc: Junio C Hamano, git
In-Reply-To: <CACsJy8BnqoPVJiM6mbq7p3gKtLh-KGUuTshcukGokC3istTxMQ@mail.gmail.com>
On Mon, Nov 14, 2011 at 10:32:11AM +0700, Nguyen Thai Ngoc Duy wrote:
> 2011/11/14 Junio C Hamano <gitster@pobox.com>:
> >> diff --git a/builtin/branch.c b/builtin/branch.c
> >> index 0fe9c4d..5b6d839 100644
> >> --- a/builtin/branch.c
> >> +++ b/builtin/branch.c
> >> @@ -115,8 +115,10 @@ static int branch_merged(int kind, const char *name,
> >> branch->merge[0] &&
> >> branch->merge[0]->dst &&
> >> (reference_name =
> >> - resolve_ref(branch->merge[0]->dst, sha1, 1, NULL)) != NULL)
> >> + resolve_ref(branch->merge[0]->dst, sha1, 1, NULL)) != NULL) {
> >> + reference_name = xstrdup(reference_name);
> >> reference_rev = lookup_commit_reference(sha1);
> >> + }
> >> }
> >> if (!reference_rev)
> >> reference_rev = head_rev;
> >> @@ -141,6 +143,7 @@ static int branch_merged(int kind, const char *name,
> >> " '%s', even though it is merged to HEAD."),
> >> name, reference_name);
> >> }
> >> + free((char*)reference_name);
> >> return merged;
> >> }
> >
> > Now reference_name stores the result of xstrdup(), it does not have reason
> > to be of type "const char *". It is preferable to lose the cast here, I
> > think. The same comment applies to the remainder of the patch.
>
> But resolve_ref() returns "const char *", we need to type cast at
> least once, either at resolve_ref() assignment or at free(), until we
> change resolve_ref(). Or should we change resolve_ref() to return
> "char *" now?
Your problem is that you are using the same variable for two different
things: storing the pointer to non-owned memory returned from
resolve_ref, and then storing the owned memory that comes from xstrdup.
Those two things have different types, since we use "const" on non-owned
memory. Thus you end up casting.
So your code isn't wrong, but I do think it would be more obviously
correct to a reader if it used two variables and dropped the cast.
-Peff
^ permalink raw reply
* Re: git behaviour question regarding SHA-1 and commits
From: Jeff King @ 2011-11-14 11:32 UTC (permalink / raw)
To: Jonathan Nieder
Cc: vinassa vinassa, git, Ævar Arnfjörð Bjarmason
In-Reply-To: <20111113182757.GA15194@elie.hsd1.il.comcast.net>
On Sun, Nov 13, 2011 at 12:27:57PM -0600, Jonathan Nieder wrote:
> Though I haven't tested. It would be nice to have an md5git (or even
> truncated-sha1-git) program to test this kind of thing with.
Fortunately we have such a thing:
http://article.gmane.org/gmane.comp.version-control.git/184243
That one actually has 40 bits of hash entropy, so you'd expect to
generate 2^20 (about a million) commits before accidentally colliding.
If you want an easier experiment, you could truncate it even further.
-Peff
^ permalink raw reply
* Re: git behaviour question regarding SHA-1 and commits
From: Jeff King @ 2011-11-14 11:48 UTC (permalink / raw)
To: Junio C Hamano
Cc: Ævar Arnfjörð Bjarmason, vinassa vinassa, git
In-Reply-To: <7vwrb3l6v2.fsf@alter.siamese.dyndns.org>
On Sun, Nov 13, 2011 at 07:29:05PM -0800, Junio C Hamano wrote:
> If the collision is between commit objects, for example, we would write
> the (old) commit object name to the tip of the current branch. Most
> likely, the tree object recorded in the (old) commit would not match the
> tree object your "git commit" wanted to record (otherwise you have hit
> SHA-1 collision twice in a row ;-), which would mean "git status" would
> show that a whole bunch of paths have changed between the HEAD and the
> index. Also "git log" would show the history leading to the (old) commit
> that is likely to be very different from what you would expect immediately
> after committing the collided commit. Of course, you could recover from it
> with "git reset --soft" after finding out what the previous HEAD was from
> the reflog, but it won't be a pleasant experience.
>
> There can be other kinds of collisions (e.g. your latest commit might have
> collided with an existing blob or tree, in which case it is likely that
> almost nothing would work after finding a blob or tree in HEAD).
You are more likely to just have blobs collide, since we generate many
more blobs than commits (each commit should have at least one changed
blob, but typically has more).
And in that case, I expect git would silently lose that state. We would
fail to write the new blob to the object db, but "git diff" would report
nothing, as it would see that the index entry's sha1 is the same as what
is in HEAD, and that the file is up to date with respect to the stat
information in the index. So if you were to "git checkout", your content
would be lost forever. However, if you instead modify the file further,
the new content will be kept (and you will get a very confusing diff).
-Peff
^ permalink raw reply
* Re: [RFC/PATCH] remote: add new sync command
From: Jeff King @ 2011-11-14 12:25 UTC (permalink / raw)
To: Felipe Contreras; +Cc: git
In-Reply-To: <CAMP44s06p+KyJAu4ddiCa8CFRq5eogbqxxJU16Z-SUb3GSp67Q@mail.gmail.com>
On Sun, Nov 13, 2011 at 12:07:19AM +0200, Felipe Contreras wrote:
> > So in that sense, it is poorly named, and "--branches" (or "--heads")
> > would be more accurate. At the same time, it is probably more likely
> > what the user wants to do (you almost never want to push "refs/remotes",
> > for example).
>
> But you do want to push tags, and --all --tags doesn't sound right; if
> I'm pushing everything, why do I specify I want to push more stuff.
> And then, why it --all --tags disallowed?
I agree that "--all --tags" looks silly. I don't know why it's
disallowed; from my reading, it should be a perfectly sensible
operation. You might try digging in the history or the mailing list.
> > So I am a little hesitant to suggest changing it, even
> > with a warning and deprecation period.
>
> It is confusing and wrong, what more reason do you need?
Because I am worried that "--all" pushing refs/remotes will also be
confusing; it's not what most people are going to want.
If your suggestion is to deprecate the name "--all" and start calling it
"--branches" or "--heads", then that is an improvement. But making
"refs/*:refs/*" easier to accidentally use might not be.
> > Right. It looks like that is just spelled "--mirror" (which gives you
> > pruning also), or "refs/*:refs/*" (without pruning). The latter is even
> > more flexible, as you could do "refs/*:refs/foo/*" to keep several
> > related backups in one upstream repo.
>
> So, we agree that --all is the same as 'refs/heads/*'. Therefore we
> already have this mixture of refspecs and options.
True. I wonder why there has been so much confusion over "--tags", and
so little over "--all".
> > and then it really is just a special way of spelling "refs/heads/*". But
> > then, I also think it's good for users to understand that the options
> > are refspecs, and what that means. It's not a hard concept, and then
> > when they inevitably say "how can I do BRANCHES, except put the result
> > somewhere else in the remote namespace", it's a very natural extension
> > to learn about the right-hand side o the refspec.
> >
> > Of course I also think BRANCHES looks ugly, and people should just learn
> > "refs/heads/*".
>
> Look, I'm all in favor of people learning stuff, but I have been
> involved in Git since basically day 1, and up to this day I was (am?)
> not familiar with refspecs, I don't use them regularly, and never
> really had a need to, and that's fine. People are already complaining
> about the learning curve of git, and what you are suggesting is that:
>
> Instead of doing:
> % git push remote --branches --tags
>
> They should do:
> % git push remote 'refs/heads/*' 'refs/tags/*'
Sorry, I should have been more clear with what I wrote. My "of
course..." was more of a tangential "well, this is so far from what my
gut tells me is reasonable that I'm not sure my definition of ugly is
even relevant here".
For me personally as a user, I prefer learning how a tool actually works
at its core (in this case, refspecs), and then applying syntactic sugar
to simplify usage. But I also respect that not everybody feels that way.
> I'm not going to investigate the subtleties of these different setups,
> I'm going to put my common user hat and ask; how do I fetch as a
> mirror?
The problem with that question is that you haven't defined mirror. Does
that mean you just want pruning, or does it mean that you want your
local ref namespace to match that of the remote?
Git should be able to do each of those cases. And I think it's fine to
have a less cumbersome syntax to specify them. But it's also important
that we don't over-simplify the terms so much that they get option A
when they wanted B.
BTW, right now there is "git remote add --mirror ...", which sets up the
fetch refspec for you (in this case, mirror is "make your refs look like
the remote's"). Perhaps rather than adding syntactic sugar to fetch, it
would be best to channel users into configuring a remote that selects
from one of a few common setups (including different types of mirrors).
It's not as flexible (I can't do a one-off mirrored push without using
actual refspecs), but my guess is that most users would want to set up
an actual remote, and picking from a set of configuration recipes would
be the ideal interface for them.
> > And "--prune-local" doesn't seem like a fetch operation to me. Either
> > you are mirroring, and --prune already handles it as above. Or you are
> > interested in getting rid of branches whose upstream has gone away. But
> > that's not a fetch operation; that's a branch operation.
>
> This would make things more confusing to the user.
>
> Say on one side I do this push?
> % git push test --prune 'refs/heads/*' 'refs/tags/*'
>
> What do I do in the other side to synchronize the repo?
> % git fetch test --prune-local 'refs/heads/*:refs/heads/*'
> 'refs/tags/*:refs/tags/*'
No, you would just do "--prune", because your refspecs are _already_
indicating that you are writing into the local namespace, and anything
you have locally would be deleted by the prune operation. I.e., there is
no need for --prune-local in this scenario; --prune already does what we
want.
> I would prefer this of course:
> % git fetch test --all --prune-local
>
> But you are saying it should be:
> % git fetch test 'refs/heads/*:refs/heads/*' 'refs/tags/*:refs/tags/*'
> % git branch --prune-remote test
>
> That doesn't sound right to me; mixing branch operations with a specific remote?
I was trying to outline a situation where "--prune" wouldn't be
sufficient, which is:
: we make some topic branch based on another branch
$ git checkout -b topic-Y origin/topic-X
: later, we (or someone else) deletes topic-X upstream
$ git push origin :topic-X
: now we fetch using the regular default refspecs, which put
: everything in a separate remote. But we ask to prune, so that
: deleted branches will go away.
$ git fetch --prune origin
Now origin/topic-X doesn't exist, even though it's configured as the
upstream of topic-Y. Fetch doesn't enter into the picture, because it is
configured to only touch items in refs/remotes/.
As a user, how do I resolve the situation? I might say topic-Y is
obsolete and get rid of it. I might rebase it onto another branch. Or I
might declare it to have no upstream. But all of those are branch
operations, not fetch operations.
So what I was trying to say was that either your fetch refspecs tell
fetch to write into your local branch namespace, or not. If they do,
then --prune is sufficient (with no -local variant required). If not,
then touching your local branch namespace is outside the scope of fetch.
-Peff
^ permalink raw reply
* Re: git behaviour question regarding SHA-1 and commits
From: Victor Engmark @ 2011-11-14 12:48 UTC (permalink / raw)
To: Jeff King
Cc: Jonathan Nieder, vinassa vinassa, git,
Ævar Arnfjörð Bjarmason
In-Reply-To: <20111114113235.GE10847@sigill.intra.peff.net>
On Mon, Nov 14, 2011 at 06:32:35AM -0500, Jeff King wrote:
> On Sun, Nov 13, 2011 at 12:27:57PM -0600, Jonathan Nieder wrote:
>
> > Though I haven't tested. It would be nice to have an md5git (or even
> > truncated-sha1-git) program to test this kind of thing with.
>
> Fortunately we have such a thing:
>
> http://article.gmane.org/gmane.comp.version-control.git/184243
>
> That one actually has 40 bits of hash entropy, so you'd expect to
> generate 2^20 (about a million) commits before accidentally colliding.
> If you want an easier experiment, you could truncate it even further.
Would it be helpful to truncate this to something ludicrous like a
single byte of entropy, to be able to write tests for the various tools
and options?
Cheers,
V
--
terreActive AG
Kasinostrasse 30
CH-5001 Aarau
Tel: +41 62 834 00 55
Fax: +41 62 823 93 56
www.terreactive.ch
Wir sichern Ihren Erfolg - seit 15 Jahren
^ permalink raw reply
* Re: git behaviour question regarding SHA-1 and commits
From: Jeff King @ 2011-11-14 13:04 UTC (permalink / raw)
To: Jonathan Nieder, vinassa vinassa, git,
Ævar Arnfjörð Bjarmason
In-Reply-To: <20111114124851.GB21854@victor>
On Mon, Nov 14, 2011 at 01:48:51PM +0100, Victor Engmark wrote:
> > Fortunately we have such a thing:
> >
> > http://article.gmane.org/gmane.comp.version-control.git/184243
> >
> > That one actually has 40 bits of hash entropy, so you'd expect to
> > generate 2^20 (about a million) commits before accidentally colliding.
> > If you want an easier experiment, you could truncate it even further.
>
> Would it be helpful to truncate this to something ludicrous like a
> single byte of entropy, to be able to write tests for the various tools
> and options?
That's probably too small. Obviously any implementation like this is not
going to be usable for interacting with existing repositories, but if
you have too many collisions, then you won't even be able to create a
few new commits for your test.
Something like 20 bits means you can brute-force a collision for a
particular blob, commit, tree, or whatever in a few seconds, but you
won't be having accidental ones all the time.
-Peff
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox