Git development
 help / color / mirror / Atom feed
* Re: [PATCH v2 0/3] grep multithreading and scaling
From: Jeff King @ 2011-12-02 17:34 UTC (permalink / raw)
  To: Thomas Rast; +Cc: René Scharfe, Eric Herman, git, Junio C Hamano
In-Reply-To: <cover.1322830368.git.trast@student.ethz.ch>

On Fri, Dec 02, 2011 at 02:07:45PM +0100, Thomas Rast wrote:

> where I put the --cached originally because that makes it independent
> of the worktree (which in the very first measurements I still had
> wiped, as I tend to do for this repo; I checked it out again after
> that).  This in fact gives me (~/g/git-grep --cached
> INITRAMFS_ROOT_UID, leaving aside -W; best of 10):
> 
>   THREADS=8:   2.88user 0.21system 0:02.94elapsed
>   THREADS=4:   2.89user 0.29system 0:02.99elapsed
>   THREADS=2:   2.83user 0.36system 0:02.87elapsed
>   NO_PTHREADS: 2.16user 0.08system 0:02.25elapsed
> 
> Uhuh.  Doesn't scale so well after all.  But removing the --cached, as
> most people probably would:
> 
>   THREADS=8:   0.19user 0.32system 0:00.16elapsed
>   THREADS=4:   0.16user 0.34system 0:00.17elapsed
>   THREADS=2:   0.18user 0.32system 0:00.26elapsed
>   NO_PTHREADS: 0.12user 0.17system 0:00.31elapsed
> 
> So I conclude that during any grep that cannot use the worktree,
> having any threads hurts.

Wow, that's horrible. Leaving aside the parallelism, it's just terrible
that reading from the cache is 20 times slower than the worktree. I get
similar results on my quad-core machine.

A quick perf run shows most of the time is spent inflating objects. The
diff code has a sneaky trick to re-use worktree files when we know they
are stat-clean (in diff's case it is to avoid writing a tempfile). I
wonder if we should use the same trick here.

It would hurt the cold cache case, though, as the compressed versions
require fewer disk accesses, of course.

-Peff

PS I suspect your timings are somewhat affected by the simplicity of the
   regex you are asking for. The time to inflate the blobs dominates,
   because the search is just a memmem(). On my quad-core w/
   hyperthreading (i.e., 8 apparent cores):

   [no caching, simple regex; we get some parallelism, but the regex
    task is just not that intensive]
   $ /usr/bin/time git grep INITRAMFS_ROOT_UID >/dev/null
   0.42user 0.45system 0:00.15elapsed 578%CPU

   [no caching, harder regex; we get much higher CPU utilization]
   $ /usr/bin/time git grep 'a.*b' >/dev/null
   14.68user 0.50system 0:02.00elapsed 758%CPU

   [with caching, simple regex; we get almost _no_ parallelism because
    all of our time is spent deflating under a lock, and the regex task
    takes very little time]
   $ /usr/bin/time git grep --cached INITRAMFS_ROOT_UID >/dev/null
   7.64user 0.41system 0:07.61elapsed 105%CPU

   [with caching, harder regex; not as much parallelism as we hoped for,
    but still much more than before. Because there is actually work to
    parallelize in the regex]
   $ /usr/bin/time git grep --cached 'a.*b' >/dev/null
   23.46user 0.47system 0:08.42elapsed 284%CPU

   So I think there is value in parallelizing even --cached greps. But
   we could do so much better if blob inflation could be done in
   parallel.

^ permalink raw reply

* Re: git auto-repack is broken...
From: Junio C Hamano @ 2011-12-02 17:35 UTC (permalink / raw)
  To: Jeff King
  Cc: Linus Torvalds, Ævar Arnfjörð Bjarmason,
	Junio C Hamano, Git Mailing List
In-Reply-To: <20111202171017.GB23447@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> When the objects become unreferenced, we eject them from the pack into
> loose form again. If they don't become referenced in the 2-week window,
> they get pruned then. So yes, you drop the age information, but they do
> eventually go away.

If you update gc/repack -A to put them in a separate pack, then you would
never be able to get rid of them, no? You pack, then eject (which gives
them a fresher timestamp), then notice that you are within the 2-week window
and pack them again,...

^ permalink raw reply

* Re: git auto-repack is broken...
From: Jeff King @ 2011-12-02 17:45 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Ævar Arnfjörð Bjarmason,
	Git Mailing List
In-Reply-To: <7vobvqoozr.fsf@alter.siamese.dyndns.org>

On Fri, Dec 02, 2011 at 09:35:52AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > When the objects become unreferenced, we eject them from the pack into
> > loose form again. If they don't become referenced in the 2-week window,
> > they get pruned then. So yes, you drop the age information, but they do
> > eventually go away.
> 
> If you update gc/repack -A to put them in a separate pack, then you would
> never be able to get rid of them, no? You pack, then eject (which gives
> them a fresher timestamp), then notice that you are within the 2-week window
> and pack them again,...

But we shouldn't be packing totally unreferenced objects. Barring bugs,
the life cycle of such an object should be something like:

  1. Object X is created on branch 'foo'.

  2. Branch 'foo' is deleted, but its commits are still in the HEAD
     reflog, referencing X.

  3. 90 days pass (actually, I think this might be the 30-day
     expire-unreachable time)

  4. "git gc" runs "git repack -Ad", which will eject X from the pack
     into a loose form (because it is not becoming part of the new pack
     we are writing).

  5. Two weeks pass.

  6. "git gc" runs "git prune --expire=2.weeks.ago", which removes the
     object.

"gc" runs between (4) and (6) will not re-pack the object, because it
remains unreferenced.

I think things might be slowed somewhat by "gc --auto", which will not
do a "repack -A" until we have too many packs. So steps (3) and (4) are
really more like "gc runs git-repack without -A" 50 times, and then we
finally run "git repack -A".

-Peff

^ permalink raw reply

* Re: Suggestion on hashing
From: Jeff King @ 2011-12-02 17:54 UTC (permalink / raw)
  To: Bill Zaumen; +Cc: git, pclouds
In-Reply-To: <1322813319.4340.109.camel@yos>

On Fri, Dec 02, 2011 at 12:08:39AM -0800, Bill Zaumen wrote:

> At one point Nguyen said that "What I'm thinking is whether it's
> possible to decouple two sha-1 roles in git, as object identifier
> and digest, separately. Each sha-1 identifies an object and an extra
> set of digests on the "same" object."
> 
> My code pretty much does that (it just uses a CRC instead of a real
> digest, but I can easily change that).   So the question is whether
> using SHA-1 as an ID and SHA-256(?) as a digest is a better long term
> solution than simply replacing SHA-1.

I think your code is solving the wrong problem (or solving the right
problem in a half-way manner). The only things that make sense to me
are:

  1. Do nothing. SHA-1 is probably not broken yet, even by the NSA, and
     even if it is, an attack is extremely expensive to mount. This may
     change in the future, of course, but it will probably stay
     expensive for a while.

  2. Decouple the object identifier and digest roles, but insert the
     digest into newly created objects, so it can be part of the
     signature chain. I described such a scheme in one of my replies to
     you. It has some complexities, but has the bonus that we can build
     directly on older history, preserving its sha1s.

  3. Replace SHA-1 with a more secure algorithm.

I'm probably in favor of (1) at this point. Whether to do (2) or (3)
will depend on where we are when SHA-1 gets feasibly broken. It may be
many years away, at which point we may be considering a git 2.0 that
breaks repository compatibility, anyway. That would be a natural time to
consider changing the algorithm.

> Replacing SHA-1 with something like SHA-256 sounds easier to implement,
> but the problem is all the existing repositories.

Right. I don't think anyone is denying that it would be a giant pain.

-Peff

^ permalink raw reply

* Re: git auto-repack is broken...
From: Junio C Hamano @ 2011-12-02 18:08 UTC (permalink / raw)
  To: Jeff King
  Cc: Linus Torvalds, Ævar Arnfjörð Bjarmason,
	Git Mailing List
In-Reply-To: <20111202174546.GA24093@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> But we shouldn't be packing totally unreferenced objects.

Everything you said is correct in today's Git and I obviously know it, but
I was taking the "Or have an extra option to..." at the end of the OP's
message in the thread into account, so...

^ permalink raw reply

* Re: Suggestion on hashing
From: Jeff King @ 2011-12-02 18:09 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Bill Zaumen, Git Mailing List
In-Reply-To: <CACsJy8CO1GtpZVo-oA2eKbQadsXYBEKVLfUH0GONR5jovuvH+Q@mail.gmail.com>

On Fri, Dec 02, 2011 at 09:22:31PM +0700, Nguyen Thai Ngoc Duy wrote:

> > So the question is whether
> > using SHA-1 as an ID and SHA-256(?) as a digest is a better long term
> > solution than simply replacing SHA-1.
> 
> I would not stick with any algorithm permanently. No one knows when
> SHA-256 might be broken.

Yeah, you could stick a few bits of algorithm parameter in the beginning
of each identifier. It would mean unique hashes get one character or so
longer (and they would all start with "1", or whatever the identifier
is).

SHA-256 doesn't suffer from SHA-1's problems, though they are based on
related constructions, so I think there is some concern that it may
eventually fail in the same way. SHA-3 is a better bet in that sense,
but it will also be very unproven, even once it is actually
standardized.

> > Replacing SHA-1 with something like SHA-256 sounds easier to implement,
> 
> SHA-1 charateristics (like 20 byte length) are hard coded everywhere
> in git, it'd be a big audit.

In theory, you could truncate a longer hash to 160-bits. It's not the
bit-strength of SHA-1 that is the problem, but the attacks on the
algorithm itself which reduce the bit-strength to something too low.
I would think a truncated result would retain the same cryptographic
properties, as one of the properties of the un-truncated hash is that
changes in the input data are reflected throughout the hash. Some
hashes, like Skein, explicitly have a big internal state, and then just
let you output as many bytes as is appropriate (i.e., being a drop-in
replacement for SHA-1 is an explicit goal).

But I'm not a cryptographer, so there may be some subtle issues with
doing that to arbitrary hash functions.

-Peff

^ permalink raw reply

* Re: git auto-repack is broken...
From: Jeff King @ 2011-12-02 18:13 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Ævar Arnfjörð Bjarmason,
	Git Mailing List
In-Reply-To: <7vd3c6onhs.fsf@alter.siamese.dyndns.org>

On Fri, Dec 02, 2011 at 10:08:15AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > But we shouldn't be packing totally unreferenced objects.
> 
> Everything you said is correct in today's Git and I obviously know it, but
> I was taking the "Or have an extra option to..." at the end of the OP's
> message in the thread into account, so...

Ah, sorry, I missed the subtlety of Linus's "repacking the objects
results in problems..." from his later message and thought he just meant
repacking in general. Yes, it's a bad idea to repack unreachable objects
because then you could never prune anything.

I think just shrinking the --expire window that we already use is a much
more reasonable bet. It's not about preventing the loss of old work
(reflogs are there for that), but about avoiding hurting an actively
running, about-to-reference-the-objects git process. And 2 weeks is
quite conservative for that.

-Peff

^ permalink raw reply

* Re: [PATCH v2 0/3] grep multithreading and scaling
From: Eric Herman @ 2011-12-02 20:02 UTC (permalink / raw)
  To: Thomas Rast; +Cc: René Scharfe, git, Junio C Hamano
In-Reply-To: <cover.1322830368.git.trast@student.ethz.ch>

Hello Thomas,

Thanks for the work and the great info.
Some of the numbers are quite surprising.

I do, indeed, have a machine with more cores, but I have been either 
busy with out-of-town guests or generally plain lazy in the last couple 
of weeks. I intend to set aside some time to do some benchmarking this 
weekend.

I'll let you know what I find.

Cheers,
  -Eric



-- 
http://www.freesa.org/ -- mobile: +31 620719662
aim: ericigps -- skype: eric_herman -- jabber: eric.herman@gmail.com

^ permalink raw reply

* [ANNOUNCE] Git 1.7.8
From: Junio C Hamano @ 2011-12-02 20:25 UTC (permalink / raw)
  To: git; +Cc: Linux Kernel

The latest feature release Git 1.7.8 is available.

The release tarballs are found at:

    http://code.google.com/p/git-core/downloads/list

and their SHA-1 checksums are:

7453e737e008f7319a5eca24a9ef3c5fb1f13398  git-1.7.8.tar.gz
2734079e22a0a6e3e78779582be9138ffc7de6f7  git-htmldocs-1.7.8.tar.gz
93315f7f51d7f27d3e421c9b0d64afa27f3d16df  git-manpages-1.7.8.tar.gz

Also the following public repositories all have a copy of the v1.7.8
tag and the master branch that the tag points at:

  url = git://repo.or.cz/alt-git.git
  url = https://code.google.com/p/git-core/
  url = git://git.sourceforge.jp/gitroot/git-core/git.git
  url = git://git-core.git.sourceforge.net/gitroot/git-core/git-core
  url = https://github.com/gitster/git


Git v1.7.8 Release Notes
========================

Updates since v1.7.7
--------------------

 * Some git-svn, git-gui, git-p4 (in contrib) and msysgit updates.

 * Updates to bash completion scripts.

 * The build procedure has been taught to take advantage of computed
   dependency automatically when the complier supports it.

 * The date parser now accepts timezone designators that lack minutes
   part and also has a colon between "hh:mm".

 * The contents of the /etc/mailname file, if exists, is used as the
   default value of the hostname part of the committer/author e-mail.

 * "git am" learned how to read from patches generated by Hg.

 * "git archive" talking with a remote repository can report errors
   from the remote side in a more informative way.

 * "git branch" learned an explicit --list option to ask for branches
   listed, optionally with a glob matching pattern to limit its output.

 * "git check-attr" learned "--cached" option to look at .gitattributes
   files from the index, not from the working tree.

 * Variants of "git cherry-pick" and "git revert" that take multiple
   commits learned to "--continue" and "--abort".

 * "git daemon" gives more human readble error messages to clients
   using ERR packets when appropriate.

 * Errors at the network layer is logged by "git daemon".

 * "git diff" learned "--minimal" option to spend extra cycles to come
   up with a minimal patch output.

 * "git diff" learned "--function-context" option to show the whole
   function as context that was affected by a change.

 * "git difftool" can be told to skip launching the tool for a path by
   answering 'n' to its prompt.

 * "git fetch" learned to honor transfer.fsckobjects configuration to
   validate the objects that were received from the other end, just like
   "git receive-pack" (the receiving end of "git push") does.

 * "git fetch" makes sure that the set of objects it received from the
   other end actually completes the history before updating the refs.
   "git receive-pack" (the receiving end of "git push") learned to do the
   same.

 * "git fetch" learned that fetching/cloning from a regular file on the
   filesystem is not necessarily a request to unpack a bundle file; the
   file could be ".git" with "gitdir: <path>" in it.

 * "git for-each-ref" learned "%(contents:subject)", "%(contents:body)"
   and "%(contents:signature)". The last one is useful for signed tags.

 * "git grep" used to incorrectly pay attention to .gitignore files
   scattered in the directory it was working in even when "--no-index"
   option was used. It no longer does this. The "--exclude-standard"
   option needs to be given to explicitly activate the ignore
   mechanism.

 * "git grep" learned "--untracked" option, where given patterns are
    searched in untracked (but not ignored) files as well as tracked
    files in the working tree, so that matches in new but not yet
    added files do not get missed.

 * The recursive merge backend no longer looks for meaningless
   existing merges in submodules unless in the outermost merge.

 * "git log" and friends learned "--children" option.

 * "git ls-remote" learned to respond to "-h"(elp) requests.

 * "mediawiki" remote helper can interact with (surprise!) MediaWiki
   with "git fetch" & "git push".

 * "git merge" learned the "--edit" option to allow users to edit the
   merge commit log message.

 * "git rebase -i" can be told to use special purpose editor suitable
   only for its insn sheet via sequence.editor configuration variable.

 * "git send-email" learned to respond to "-h"(elp) requests.

 * "git send-email" allows the value given to sendemail.aliasfile to begin
   with "~/" to refer to the $HOME directory.

 * "git send-email" forces use of Authen::SASL::Perl to work around
   issues between Authen::SASL::Cyrus and AUTH PLAIN/LOGIN.

 * "git stash" learned "--include-untracked" option to stash away
   untracked/ignored cruft from the working tree.

 * "git submodule clone" does not leak an error message to the UI
   level unnecessarily anymore.

 * "git submodule update" learned to honor "none" as the value for
   submodule.<name>.update to specify that the named submodule should
   not be checked out by default.

 * When populating a new submodule directory with "git submodule init",
   the $GIT_DIR metainformation directory for submodules is created inside
   $GIT_DIR/modules/<name>/ directory of the superproject and referenced
   via the gitfile mechanism. This is to make it possible to switch
   between commits in the superproject that has and does not have the
   submodule in the tree without re-cloning.

 * "gitweb" leaked unescaped control characters from syntax hiliter
   outputs.

 * "gitweb" can be told to give custom string at the end of the HTML
   HEAD element.

 * "gitweb" now has its own manual pages.


Also contains other documentation updates and minor code cleanups.


Fixes since v1.7.7
------------------

Unless otherwise noted, all fixes in the 1.7.7.X maintenance track are
included in this release.

 * HTTP transport did not use pushurl correctly, and also did not tell
   what host it is trying to authenticate with when asking for
   credentials.
   (merge deba493 jk/http-auth later to maint).

 * "git blame" was aborted if started from an uncommitted content and
   the path had the textconv filter in effect.
   (merge 8518088 ss/blame-textconv-fake-working-tree later to maint).

 * Adding many refs to the local repository in one go (e.g. "git fetch"
   that fetches many tags) and looking up a ref by name in a repository
   with too many refs were unnecessarily slow.
   (merge 17d68a54d jp/get-ref-dir-unsorted later to maint).

 * Report from "git commit" on untracked files was confused under
   core.ignorecase option.
   (merge 395c7356 jk/name-hash-dirent later to maint).

 * "git merge" did not understand ":/<pattern>" as a way to name a commit.

 " "git push" on the receiving end used to call post-receive and post-update
   hooks for attempted removal of non-existing refs.
   (merge 160b81ed ph/push-to-delete-nothing later to maint).

 * Help text for "git remote set-url" and "git remote set-branches"
   were misspelled.
   (merge c49904e fc/remote-seturl-usage-fix later to maint).
   (merge 656cdf0 jc/remote-setbranches-usage-fix later to maint).

^ permalink raw reply

* What's cooking in git.git (Dec 2011, #01; Fri, 2)
From: Junio C Hamano @ 2011-12-02 20:30 UTC (permalink / raw)
  To: git

Here are the topics that have been cooking.  Commits prefixed with '-' are
only in 'pu' (proposed updates) while commits prefixed with '+' are in 'next'.

Now that 1.7.8 is tagged, I'll start planning to rewind 'next' by kicking
topics back to 'pu' and/or dropping them, but let's give the new release a
couple of days of calm.

Here are the repositories that have my integration branches:

With maint, master, next, pu, todo:

        git://git.kernel.org/pub/scm/git/git.git
        git://repo.or.cz/alt-git.git
        https://code.google.com/p/git-core/
        https://github.com/git/git

With only maint and master:

        git://git.sourceforge.jp/gitroot/git-core/git.git
        git://git-core.git.sourceforge.net/gitroot/git-core/git-core

With all the topics and integration branches:

        https://github.com/gitster/git

The preformatted documentation in HTML and man format are found in:

        git://git.kernel.org/pub/scm/git/git-{htmldocs,manpages}.git/
        git://repo.or.cz/git-{htmldocs,manpages}.git/
        https://code.google.com/p/git-{htmldocs,manpages}.git/
        https://github.com/gitster/git-{htmldocs,manpages}.git/

--------------------------------------------------
[New Topics]

* aw/rebase-i-stop-on-failure-to-amend (2011-11-30) 1 commit
 - rebase -i: interrupt rebase when "commit --amend" failed during "reword"

* jc/split-blob (2011-12-01) 6 commits
 . WIP (streaming chunked)
 - chunked-object: fallback checkout codepaths
 - bulk-checkin: support chunked-object encoding
 - bulk-checkin: allow the same data to be multiply hashed
 - new representation types in the packstream
 - varint-in-pack: refactor varint encoding/decoding
 (this branch uses jc/stream-to-pack.)

* jh/fast-import-notes (2011-11-28) 3 commits
 - fast-import: Fix incorrect fanout level when modifying existing notes refs
 - t9301: Add 2nd testcase exposing bugs in fast-import's notes fanout handling
 - t9301: Fix testcase covering up a bug in fast-import's notes fanout handling

* ld/p4-labels-branches (2011-11-30) 4 commits
 - git-p4: importing labels should cope with missing owner
 - git-p4: add test for p4 labels
 - git-p4: cope with labels with empty descriptions
 - git-p4: handle p4 branches and labels containing shell chars

Retracted?

* tj/maint-imap-send-remove-unused (2011-11-23) 2 commits
 - Merge branch 'maint' into tj/imap-send-remove-unused
 - imap-send: Remove unused 'use_namespace' variable

--------------------------------------------------
[Stalled]

* hv/submodule-merge-search (2011-10-13) 4 commits
 - submodule.c: make two functions static
 - allow multiple calls to submodule merge search for the same path
 - push: Don't push a repository with unpushed submodules
 - push: teach --recurse-submodules the on-demand option

What the topic aims to achieve may make sense, but the implementation
looked somewhat suboptimal.

* sg/complete-refs (2011-10-21) 9 commits
  (merged to 'next' on 2011-10-26 at d65e2b4)
 + completion: remove broken dead code from __git_heads() and __git_tags()
 + completion: fast initial completion for config 'remote.*.fetch' value
 + completion: improve ls-remote output filtering in __git_refs_remotes()
 + completion: query only refs/heads/ in __git_refs_remotes()
 + completion: support full refs from remote repositories
 + completion: improve ls-remote output filtering in __git_refs()
 + completion: make refs completion consistent for local and remote repos
 + completion: optimize refs completion
 + completion: document __gitcomp()

Needs an Ack or two from completion folks before going forward.

* sr/transport-helper-fix-rfc (2011-07-19) 2 commits
 - t5800: point out that deleting branches does not work
 - t5800: document inability to push new branch with old content

See comments on sr/fix-fast-export-tips topic.

* sr/fix-fast-export-tips (2011-11-05) 3 commits
 - fast-export: output reset command for commandline revs
 - fast-export: do not refer to non-existing marks
 - t9350: point out that refs are not updated correctly

The bottom commit from the stalled sr/transport-helper-fix-rfc topic is
fixed with this. It may make sense to drop the other topic and include
that commit in this series.

The command line parser is still too lax and accepts malformed input, but
this is a step in the right direction and tightening the command line now
should be doable without a low level surgery that touches codepaths that
are unrelated to the command line processing like the previous attempt
used to do.

* jc/lookup-object-hash (2011-08-11) 6 commits
 - object hash: replace linear probing with 4-way cuckoo hashing
 - object hash: we know the table size is a power of two
 - object hash: next_size() helper for readability
 - pack-objects --count-only
 - object.c: remove duplicated code for object hashing
 - object.c: code movement for readability

I do not think there is anything fundamentally wrong with this series, but
the risk of breakage far outweighs observed performance gain in one
particular workload.

* jc/verbose-checkout (2011-10-16) 2 commits
 - checkout -v: give full status output after switching branches
 - checkout: move the local changes report to the end

This is just to leave a record that the reason why we do not do this not
because we are incapable of coding this, but because it is not a good idea
to do this. I suspect people who are new to git that might think they need
it would soon realize the don't.

Will drop.

* eh/grep-scale-to-cpunum (2011-11-05) 1 commit
 - grep: detect number of CPUs for thread spawning

Kills I/O parallelism and needs to be improved or discarded.

* jc/commit-tree-extra (2011-11-12) 2 commits
 - commit-tree: teach -C <extra-commit>
 - commit-tree: teach -x <extra>
 (this branch uses jc/pull-signed-tag; is tangled with jc/signed-commit.)

Not absolutely needed; parked in 'pu' but may drop.

* cb/daemon-permission-errors (2011-10-17) 2 commits
 - daemon: report permission denied error to clients
 - daemon: add tests

The tip commit might be loosening things a bit too much.

--------------------------------------------------
[Cooking]

* cn/maint-lf-to-crlf-filter (2011-11-28) 1 commit
 - convert: track state in LF-to-CRLF filter

Candidate for early next cycle.

* nd/maint-ignore-exclude (2011-11-28) 1 commit
 - checkout,merge: loosen overwriting untracked file check based on info/exclude
 (this branch is used by nd/ignore-might-be-precious.)

Candidate for early next cycle.

* vr/git-merge-doc (2011-11-21) 1 commit
 - Show error for 'git merge' with unset merge.defaultToUpstream

Candidate for early next cycle.

* jn/branch-move-to-self (2011-11-28) 2 commits
 - Allow checkout -B <current-branch> to update the current branch
 - branch: allow a no-op "branch -M <current-branch> HEAD"

Candidate for early next cycle.

* jk/credentials (2011-11-28) 20 commits
 - fixup! 034c066e
 - compat/getpass: add a /dev/tty implementation
 - credential: use git_prompt instead of git_getpass
 - prompt: add PROMPT_ECHO flag
 - stub out getpass_echo function
 - refactor git_getpass into generic prompt function
 - move git_getpass to its own source file
 - t: add test harness for external credential helpers
 - credentials: add "store" helper
 - strbuf: add strbuf_add*_urlencode
 - credentials: add "cache" helper
 - docs: end-user documentation for the credential subsystem
 - credential: make relevance of http path configurable
 - credential: add credential.*.username
 - credential: apply helper config
 - http: use credential API to get passwords
 - credential: add function for parsing url components
 - introduce credentials API
 - t5550: fix typo
 - test-lib: add test_config_global variant

* nd/ignore-might-be-precious (2011-11-28) 2 commits
 - checkout,merge: disallow overwriting ignored files with --no-overwrite-ignore
 - Merge branch 'nd/maint-ignore-exclude' into nd/ignore-might-be-precious
 (this branch uses nd/maint-ignore-exclude.)

* jk/upload-archive-use-start-command (2011-11-21) 1 commit
 - upload-archive: use start_command instead of fork

* jk/maint-1.6.2-upload-archive (2011-11-21) 1 commit
 - archive: don't let remote clients get unreachable commits
 (this branch is used by jk/maint-upload-archive.)

* jk/maint-upload-archive (2011-11-21) 1 commit
 - Merge branch 'jk/maint-1.6.2-upload-archive' into jk/maint-upload-archive
 (this branch uses jk/maint-1.6.2-upload-archive.)

* jk/refresh-porcelain-output (2011-11-18) 3 commits
  (merged to 'next' on 2011-11-18 at 872f25e)
 + refresh_index: make porcelain output more specific
 + refresh_index: rename format variables
 + read-cache: let refresh_cache_ent pass up changed flags

* ab/enable-i18n (2011-11-18) 1 commit
 - i18n: add infrastructure for translating Git with gettext

* gh/userdiff-matlab (2011-11-15) 1 commit
  (merged to 'next' on 2011-11-18 at 10cd275)
 + Add built-in diff patterns for MATLAB code

* jc/maint-pack-object-cycle (2011-11-16) 1 commit
  (merged to 'next' on 2011-11-18 at 3715a81)
 + pack-object: tolerate broken packs that have duplicated objects

Make the client side more robust against bogus pack stream; the problem
was discovered by accident while repacking a clone obtained from somewhat
buggy test server.

* jc/index-pack-reject-dups (2011-11-16) 1 commit
  (merged to 'next' on 2011-11-18 at 2090221)
 + receive-pack, fetch-pack: reject bogus pack that records objects twice

And this is the prevention to reject such pack stream in the first place.

* nd/resolve-ref (2011-11-13) 2 commits
  (merged to 'next' on 2011-11-18 at 8e17b34)
 + Copy resolve_ref() return value for longer use
 + Convert many resolve_ref() calls to read_ref*() and ref_exists()

A fix for an error-reporting codepath that is not a regression, that
turned into a patch that touches many callsite.

* vr/msvc (2011-10-31) 3 commits
  (merged to 'next' on 2011-11-14 at f46d021)
 + MSVC: Remove unneeded header stubs
 + Compile fix for MSVC: Include <io.h>
 + Compile fix for MSVC: Do not include sys/resources.h

* na/strtoimax (2011-11-05) 3 commits
  (merged to 'next' on 2011-11-14 at b64e1cb)
 + Support sizes >=2G in various config options accepting 'g' sizes.
 + Compatibility: declare strtoimax() under NO_STRTOUMAX
 + Add strtoimax() compatibility function.

* jc/signed-commit (2011-11-29) 5 commits
 - gpg-interface: allow use of a custom GPG binary
 - pretty: %G[?GS] placeholders
 - test "commit -S" and "log --show-signature"
 - log: --show-signature
 - commit: teach --gpg-sign option
 (this branch uses jc/pull-signed-tag; is tangled with jc/commit-tree-extra.)

Rebased on top of jc/pull-signed-tag topic, after reverting the old one
out of 'next'.

* jc/pull-signed-tag (2011-11-12) 15 commits
  (merged to 'next' on 2011-11-14 at 25e8838)
 + commit-tree: teach -m/-F options to read logs from elsewhere
 + commit-tree: update the command line parsing
 + commit: teach --amend to carry forward extra headers
 + merge: force edit and no-ff mode when merging a tag object
 + commit: copy merged signed tags to headers of merge commit
 + merge: record tag objects without peeling in MERGE_HEAD
 + merge: make usage of commit->util more extensible
 + fmt-merge-msg: Add contents of merged tag in the merge message
 + fmt-merge-msg: package options into a structure
 + fmt-merge-msg: avoid early returns
 + refs DWIMmery: use the same rule for both "git fetch" and others
 + fetch: allow "git fetch $there v1.0" to fetch a tag
 + merge: notice local merging of tags and keep it unwrapped
 + fetch: do not store peeled tag object names in FETCH_HEAD
 + Split GPG interface into its own helper library
 (this branch is used by jc/commit-tree-extra and jc/signed-commit.)

Further updated to allow "commit --amend" to retain the mergetag
headers. I think this is ready for the cycle after upcoming 1.7.8.

* jc/request-pull-show-head-4 (2011-11-09) 12 commits
  (merged to 'next' on 2011-11-13 at e473fd2)
 + request-pull: use the annotated tag contents
  (merged to 'next' on 2011-10-15 at 7e340ff)
 + fmt-merge-msg.c: Fix an "dubious one-bit signed bitfield" sparse error
  (merged to 'next' on 2011-10-10 at 092175e)
 + environment.c: Fix an sparse "symbol not declared" warning
 + builtin/log.c: Fix an "Using plain integer as NULL pointer" warning
  (merged to 'next' on 2011-10-07 at fcaeca0)
 + fmt-merge-msg: use branch.$name.description
  (merged to 'next' on 2011-10-06 at fa5e0fe)
 + request-pull: use the branch description
 + request-pull: state what commit to expect
 + request-pull: modernize style
 + branch: teach --edit-description option
 + format-patch: use branch description in cover letter
 + branch: add read_branch_desc() helper function
 + Merge branch 'bk/ancestry-path' into jc/branch-desc

Allow setting "description" for branches and use it to help communications
between humans in various workflow elements. It also allows requesting for
a signed tag to be pulled and shows the tag message in the generated message.

* ab/clang-lints (2011-11-06) 2 commits
  (merged to 'next' on 2011-11-13 at a573aec)
 + cast variable in call to free() in builtin/diff.c and submodule.c
 + apply: get rid of useless x < 0 comparison on a size_t type

* ab/pull-rebase-config (2011-11-07) 1 commit
  (merged to 'next' on 2011-11-13 at 72bb2d5)
 + pull: introduce a pull.rebase option to enable --rebase

* nd/fsck-progress (2011-11-06) 4 commits
  (merged to 'next' on 2011-11-13 at 8831811)
 + fsck: print progress
 + fsck: avoid reading every object twice
 + verify_packfile(): check as many object as possible in a pack
 + fsck: return error code when verify_pack() goes wrong

* nd/prune-progress (2011-11-07) 3 commits
  (merged to 'next' on 2011-11-13 at c5722ac)
 + reachable: per-object progress
 + prune: handle --progress/no-progress
 + prune: show progress while marking reachable objects

* jc/stream-to-pack (2011-12-01) 5 commits
 - bulk-checkin: replace fast-import based implementation
 - csum-file: introduce sha1file_checkpoint
 - finish_tmp_packfile(): a helper function
 - create_tmp_packfile(): a helper function
 - write_pack_header(): a helper function
 (this branch is used by jc/split-blob.)

Teaches "git add" to send large-ish blob data straight to a packfile.
This is a continuation to the "large file support" topic. The codepath to
move data from worktree to repository is made aware of streaming, just
like the checkout codepath that goes the other way, which was done in the
previous "large file support" topic in the 1.7.7 cycle.

* jn/gitweb-side-by-side-diff (2011-10-31) 8 commits
 - gitweb: Add navigation to select side-by-side diff
 - gitweb: Use href(-replay=>1,...) for formats links in "commitdiff"
 - t9500: Add basic sanity tests for side-by-side diff in gitweb
 - t9500: Add test for handling incomplete lines in diff by gitweb
 - gitweb: Give side-by-side diff extra CSS styling
 - gitweb: Add a feature to show side-by-side diff
 - gitweb: Extract formatting of diff chunk header
 - gitweb: Refactor diff body line classification

Replaces a series from Kato Kazuyoshi on the same topic.

* mf/curl-select-fdset (2011-11-04) 4 commits
  (merged to 'next' on 2011-11-06 at a49516f)
 + http: drop "local" member from request struct
 + http.c: Rely on select instead of tracking whether data was received
 + http.c: Use timeout suggested by curl instead of fixed 50ms timeout
 + http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping

Reduces unnecessary waits.

* nd/misc-cleanups (2011-10-27) 6 commits
  (merged to 'next' on 2011-10-28 at 2527a49)
 + unpack_object_header_buffer(): clear the size field upon error
 + tree_entry_interesting: make use of local pointer "item"
 + tree_entry_interesting(): give meaningful names to return values
 + read_directory_recursive: reduce one indentation level
 + get_tree_entry(): do not call find_tree_entry() on an empty tree
 + tree-walk.c: do not leak internal structure in tree_entry_len()

These are unquestionably good parts taken out of a larger series, so that
we can focus more on the other changes in later rounds of review.

* rs/allocate-cache-entry-individually (2011-10-26) 2 commits
  (merged to 'next' on 2011-10-27 at 2e4acd6)
 + cache.h: put single NUL at end of struct cache_entry
 + read-cache.c: allocate index entries individually

* mh/ref-api-3 (2011-11-16) 26 commits
  (merged to 'next' on 2011-11-16 at cc76151)
 + refs: loosen over-strict "format" check
  (merged to 'next' on 2011-10-23 at 92e2d35)
 + is_refname_available(): reimplement using do_for_each_ref_in_array()
 + names_conflict(): simplify implementation
 + names_conflict(): new function, extracted from is_refname_available()
 + repack_without_ref(): reimplement using do_for_each_ref_in_array()
 + do_for_each_ref_in_array(): new function
 + do_for_each_ref(): correctly terminate while processesing extra_refs
 + add_ref(): take a (struct ref_entry *) parameter
 + create_ref_entry(): extract function from add_ref()
 + parse_ref_line(): add a check that the refname is properly formatted
 + repack_without_ref(): remove temporary
 + Rename another local variable name -> refname
  (merged to 'next' on 2011-10-19 at cc89f0e)
 + resolve_gitlink_ref_recursive(): change to work with struct ref_cache
 + Pass a (ref_cache *) to the resolve_gitlink_*() helper functions
 + resolve_gitlink_ref(): improve docstring
 + get_ref_dir(): change signature
 + refs: change signatures of get_packed_refs() and get_loose_refs()
 + is_dup_ref(): extract function from sort_ref_array()
 + add_ref(): add docstring
 + parse_ref_line(): add docstring
 + is_refname_available(): remove the "quiet" argument
 + clear_ref_array(): rename from free_ref_array()
 + refs: rename parameters result -> sha1
 + refs: rename "refname" variables
 + struct ref_entry: document name member
 + cache.h: add comments for git_path() and git_path_submodule()
 (this branch is tangled with mh/ref-api-2.)

* rr/revert-cherry-pick (2011-10-23) 5 commits
  (merged to 'next' on 2011-10-26 at 27b7496)
 + revert: simplify communicating command-line arguments
 + revert: allow mixed pick and revert instructions
 + revert: make commit subjects in insn sheet optional
 + revert: simplify getting commit subject in format_todo()
 + revert: free msg in format_todo()

The internals of "git revert/cherry-pick" has been further refactored to
serve as the basis for the sequencer.

* mh/ref-api-2 (2011-11-16) 15 commits
  (merged to 'next' on 2011-11-16 at 511457f)
 + refs: loosen over-strict "format" check
  (merged to 'next' on 2011-10-19 at cc89f0e)
 + resolve_gitlink_ref_recursive(): change to work with struct ref_cache
 + Pass a (ref_cache *) to the resolve_gitlink_*() helper functions
 + resolve_gitlink_ref(): improve docstring
 + get_ref_dir(): change signature
 + refs: change signatures of get_packed_refs() and get_loose_refs()
 + is_dup_ref(): extract function from sort_ref_array()
 + add_ref(): add docstring
 + parse_ref_line(): add docstring
 + is_refname_available(): remove the "quiet" argument
 + clear_ref_array(): rename from free_ref_array()
 + refs: rename parameters result -> sha1
 + refs: rename "refname" variables
 + struct ref_entry: document name member
 + cache.h: add comments for git_path() and git_path_submodule()
 (this branch is tangled with mh/ref-api-3.)

^ permalink raw reply

* Re: [PATCH] gitweb: Don't append ';js=(0|1)' to external links
From: Jakub Narebski @ 2011-12-02 20:44 UTC (permalink / raw)
  To: Jürgen Kreileder; +Cc: git
In-Reply-To: <CAKD0UuyH3m0RR2=jk5apFjVMgbD5iWeztR94mE-m7q9dyYKR2Q@mail.gmail.com>

On Tue, 29 Nov 2011, Jürgen Kreileder wrote:
> On Tue, Nov 29, 2011 at 20:28, Jakub Narebski <jnareb@gmail.com> wrote:
> > Jürgen Kreileder <jk@blackdown.de> writes:
> [...]
> > Thanks for this, but I think a better solution would be to explicitly
> > mark the few external links we have e.g. with 'class="external"', and
> > use that to avoid adding ';js=(0|1)' to them.
> 
> This won't work because there are more than a few external links.  Think of
> links added in the header or footer or via a project specific README.html.

Thanks for noticing that.
 
> You would have to do it the other way round: Mark all internal links.

In this case your solution is better... provided that you either check
that all internal links generated by gitweb are absolute URLs in all cases
(they always include $my_url), or add ';js=(0|1)' also for relative
URLs (i.e. not starting with http:// or https://).


Well... the "mark external" idea would also work, provided that we rework
how ';js=(0|1)' is generated.  Instead of fixing links on load, we could
add an event handler[1] which on clicking the link[2] would add 
';js=(0|1)'... and there you can check if we are inside README.html etc.
by examining DOM (perhaps those externally added fragments would need to
be wrapped in div / span with "external" class, though).
 
[1] You assign event handler to 'html' or 'body' element, and check in
    handler where you clicked; that is what we do for JavaScript
    timezone stuff.

[2] I am not sure if "Open in new tab" from context menu or Ctrl-Click
    would trigger adding ';js=(0|1)' or not... and whether it is a bad
    thing or not.


> > This has the advantage that we can use different style to mark
> > outgoing external links.

I think this advantage might be worth it, even without changes to
javascript-detection.js

-- 
Jakub Narebski
Poland

^ permalink raw reply

* [PATCH v2] Add MYMETA.yml to perl/.gitignore
From: Sebastian Morr @ 2011-12-02 22:55 UTC (permalink / raw)
  To: gitster; +Cc: Jeff King, git
In-Reply-To: <20111201223520.GB4869@sigill.intra.peff.net>

This file is auto-generated by newer versions of ExtUtils::MakeMaker
(presumably starting with the version shipping with Perl 5.14). It just
contains extra information about the environment and arguments to the
Makefile-building process, and can be safely deleted.

Signed-off-by: Sebastian Morr <sebastian@morr.cc>
---
 perl/.gitignore |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/perl/.gitignore b/perl/.gitignore
index 98b2477..9235e73 100644
--- a/perl/.gitignore
+++ b/perl/.gitignore
@@ -1,5 +1,6 @@
 perl.mak
 perl.mak.old
+MYMETA.yml
 blib
 blibdirs
 pm_to_blib
-- 
1.7.8.rc4.dirty

^ permalink raw reply related

* Re: [PATCH v2] Add MYMETA.yml to perl/.gitignore
From: Junio C Hamano @ 2011-12-02 23:41 UTC (permalink / raw)
  To: Sebastian Morr; +Cc: Jeff King, git
In-Reply-To: <20111202225528.GA1176@thinkpad>

Sebastian Morr <sebastian@morr.cc> writes:

> This file is auto-generated by newer versions of ExtUtils::MakeMaker
> (presumably starting with the version shipping with Perl 5.14). It just
> contains extra information about the environment and arguments to the
> Makefile-building process, and can be safely deleted.

"safely deleted" sounds as if you made "make clean" remove it.  I do not
mind doing s/and can be.*/and should be ignored./ myself, if you want.

Does running "make clean" actually remove it, by the way?

> Signed-off-by: Sebastian Morr <sebastian@morr.cc>
> ---
>  perl/.gitignore |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/perl/.gitignore b/perl/.gitignore
> index 98b2477..9235e73 100644
> --- a/perl/.gitignore
> +++ b/perl/.gitignore
> @@ -1,5 +1,6 @@
>  perl.mak
>  perl.mak.old
> +MYMETA.yml
>  blib
>  blibdirs
>  pm_to_blib

^ permalink raw reply

* Debugging git-commit slowness on a large repo
From: Joshua Redstone @ 2011-12-02 23:17 UTC (permalink / raw)
  To: git@vger.kernel.org

Hi,
I have a git repo with about 300k commits,  150k files totaling maybe 7GB.
 Locally committing a small change - say touching fewer than 300 bytes
across 4 files - consistently takes over one second, which seems kinda
slow.  This is using git 1.7.7.4 on a linux 2.6 box.  The time does not
improve after doing a git-gc (my .git dir has maybe 250 files after a git
gc).  The same size commit on a brand new repo takes < 10ms.  Any thoughts
on why committing a small change seems to take a long time on larger repos?

Fwiw, I also tried doing the same test using libgit2 (via the pygit2
wrapper), and it was ever slower (about 6 seconds to commit the same small
change).

Thanks for any thoughts or places to look.

Cheers,
Josh

^ permalink raw reply

* gitk show only all refs matching pattern
From: Neal Kreitzinger @ 2011-12-03  0:02 UTC (permalink / raw)
  To: git

Is there a way to tell gitk to show only all refs matching a pattern (e.g., 
all refs matching refs/heads/neal/*)?  (I'm using git 1.7.1)

v/r,
neal 

^ permalink raw reply

* Re: [PATCH v2] Add MYMETA.yml to perl/.gitignore
From: Sebastian Morr @ 2011-12-03  0:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git
In-Reply-To: <7vvcpymti3.fsf@alter.siamese.dyndns.org>

On Fri, Dec 02, 2011 at 03:41:24PM -0800, Junio C Hamano wrote:
> Sebastian Morr <sebastian@morr.cc> writes:
> 
> > This file is auto-generated by newer versions of ExtUtils::MakeMaker
> > (presumably starting with the version shipping with Perl 5.14). It just
> > contains extra information about the environment and arguments to the
> > Makefile-building process, and can be safely deleted.
> 
> "safely deleted" sounds as if you made "make clean" remove it.  I do not
> mind doing s/and can be.*/and should be ignored./ myself, if you want.

You're very welcome to do that.

> Does running "make clean" actually remove it, by the way?

Yes, it does. perl/perl.mak (which, in turn, is generated by MakeMaker
via perl/Makefile) takes care of that file explicitly.

^ permalink raw reply

* Re: Debugging git-commit slowness on a large repo
From: Carlos Martín Nieto @ 2011-12-03  0:23 UTC (permalink / raw)
  To: Joshua Redstone; +Cc: git@vger.kernel.org
In-Reply-To: <CAFE9C7B.2BFEC%joshua.redstone@fb.com>

[-- Attachment #1: Type: text/plain, Size: 2071 bytes --]

On Fri, Dec 02, 2011 at 11:17:10PM +0000, Joshua Redstone wrote:
> Hi,
> I have a git repo with about 300k commits,  150k files totaling maybe 7GB.
>  Locally committing a small change - say touching fewer than 300 bytes
> across 4 files - consistently takes over one second, which seems kinda
> slow.  This is using git 1.7.7.4 on a linux 2.6 box.  The time does not
> improve after doing a git-gc (my .git dir has maybe 250 files after a git
> gc).  The same size commit on a brand new repo takes < 10ms.  Any thoughts
> on why committing a small change seems to take a long time on larger repos?

By "same size commit" do you mean the same amount of changes, or the
same amount of files? Committing doesn't depend on the size of the
repo (by itself), but on the size of the index, which depends on the
amount of files to be committed (as git is snapshot-based). At one
point, commit forgot how to write the tree cache to the index (a
performance optimisation). Do the times improve if you run 'git
read-tree HEAD' between one commit and another? Note that this will
reset the index to the last commit, though for the tests I image you
use some variation of 'git commit -a'.

Thomas Rast wrote a patch to re-teach commit to store the tree cache,
but there were some issues and never got applied.

> 
> Fwiw, I also tried doing the same test using libgit2 (via the pygit2
> wrapper), and it was ever slower (about 6 seconds to commit the same small
> change).

I don't know about the python bindings, but on the (somewhat
unscientific) tests for libgit2's write-tree (the slow part of a
creating a commit), it performs slightly faster than git's (though I
think git's write-tree does update the tree cache, which libgit2
doesn't currently). The speed could just be a side-effect of the small
test repo. From your domain, I assume the data is not for public
consumption, but it'd be great if you could post your code to pygit2's
issue tracker so we can see how much of the slowdown comes from the
bindings or the library.

   cmn


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: Suggestion on hashing
From: Bill Zaumen @ 2011-12-03  0:48 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Jeff King, Git Mailing List
In-Reply-To: <CACsJy8CO1GtpZVo-oA2eKbQadsXYBEKVLfUH0GONR5jovuvH+Q@mail.gmail.com>

On Fri, 2011-12-02 at 21:22 +0700, Nguyen Thai Ngoc Duy wrote:
> (I'm not sure why you dropped git@vger. I see nothing private here so
> I bring git@vger back)

Oh, I just didn't want to flood the mailing list with too much on
one topic and figured we could summarize a discussion at some point
and post that, but if you'd rather keep it all on the list, that's 
fine with me.

I can split the code into a series of smaller patches - smaller than
the set of three I sent, but I'm not sure if the test scripts will work
with all of the intermediate patches if I do that.

I can also make the digest (current a CRC) pluggable.  Then you can try
different digests as an experiment and see how that affects performance.
My implementation uses the CRC or new digests only when
the object database is being modified or explicitly verified. Basically
the code provides memoization for an additional hash function, used
for whatever purpose you desire.

If you want to put a digest of message digests into a commit message,
you can do that fairly quickly as one level of digests has been
precomputed. I think Jeff's or your suggestion of putting an additional
digest in the commit message is a good idea.  If you want to experiment
with such changes, the code would provide a reasonable start on that.

So, I guess I should make those changes - pluggable digest and 
splitting the patches further.

> You'd need to convince git maintainer this is worth doing first,
> before talking how big the changes are ;-)

I'd guess there are several issues: the amount of code, how complex
the changes are, what the performance impacts are, whether the changes
are backwards compatible, and what you get for the effort.

As a start on the last question, "what you get," aside from some extra
checking to detect problems, if you modify commit messages and signed
tags to use better digests, you can make a stronger argument regarding
authentication.  For example, suppose you have a project in which your
code is dual-licensed - GPL for free use but a separate license if the
code is used in a proprietary product and there is a legal dispute,
using a better digest than SHA-1 would have some advantages - when they
start calling in expert witnesses, one side will bring in a security
expert who will testify that SHA-1 is too weak to be used for
authentication, citing government publications such
as http://csrc.nist.gov/groups/ST/hash/statement.html as evidence. The
jury is not going to consist of people who can fully understand the
details, so being able to say that git's authentication matches current
best practices would be an additional reason to use git.

^ permalink raw reply

* Re: Suggestion on hashing
From: Bill Zaumen @ 2011-12-03  1:50 UTC (permalink / raw)
  To: Jeff King; +Cc: git, pclouds
In-Reply-To: <20111202175444.GB24093@sigill.intra.peff.net>

On Fri, 2011-12-02 at 12:54 -0500, Jeff King wrote:
> On Fri, Dec 02, 2011 at 12:08:39AM -0800, Bill Zaumen wrote:

> I think your code is solving the wrong problem (or solving the right
> problem in a half-way manner). The only things that make sense to me
> are:
> 
>   1. Do nothing. SHA-1 is probably not broken yet, even by the NSA, and
>      even if it is, an attack is extremely expensive to mount. This may
>      change in the future, of course, but it will probably stay
>      expensive for a while.
> 
>   2. Decouple the object identifier and digest roles, but insert the
>      digest into newly created objects, so it can be part of the
>      signature chain. I described such a scheme in one of my replies to
>      you. It has some complexities, but has the bonus that we can build
>      directly on older history, preserving its sha1s.
> 
>   3. Replace SHA-1 with a more secure algorithm.

Suppose I make the digest pluggable, something I intended to do
eventually anyway?  Then you just use the existing SHA-1 as an
object identifier and the new digest in a signature chain?  What I
did was essentially to compute the new digest (using a CRC as the
trivial case) whenever an object's SHA-1 hash is computed, plus
using the new digest for low-cost collision checks.

Then you have everything needed to experiment with your second option.
I got the impression that Nguyen had some interest in that, but could
be mistaken.

The use is simple: if you have the SHA-1 hash of an object, you call
a function, currently named "has_sha1_file_crc" and it returns true if
a CRC is available, writing the hash into the buffer supplied as its
second argument.  You can do whatever you like with it.  If you want
a digest of digests, you just traverse a commit's tree, and call
has_sha1_file_crc whenever you want to look up a digest.  So, the API
is actually very simple if you just use the patch to quickly look up
the digest associated with a SHA-1 ID - everything else it does happens
automatically.

 

^ permalink raw reply

* Re: git auto-repack is broken...
From: George Spelvin @ 2011-12-03  6:55 UTC (permalink / raw)
  To: git; +Cc: linux, peff

Thanks, Jeff, for the life-cycle chart.

A couple of ideas come to mind:
- When unpacking objects from a pack, it should be fine to set their
  date to that of the pack.  After all, they're at least that old.
- We could put unreferenced objects into packs whose date is the most
  recent of any of the contained objects.
- We could then group unreferenced objects into packs based on age,
  so their ages sould not be affected too much by the preceding
  operations.

That still produces a noticeable number of packs, which isn't
good, but maybe it's better that keeping thousands of loose
objects for a month...

^ permalink raw reply

* Re: gitk show only all refs matching pattern
From: Johannes Sixt @ 2011-12-03  8:11 UTC (permalink / raw)
  To: Neal Kreitzinger; +Cc: git
In-Reply-To: <jbbouj$p2$1@dough.gmane.org>

Am 03.12.2011 01:02, schrieb Neal Kreitzinger:
> Is there a way to tell gitk to show only all refs matching a pattern (e.g., 
> all refs matching refs/heads/neal/*)?  (I'm using git 1.7.1)

That would be

   gitk --branches=neal/*

There is also --remotes= which I find quite useful.

-- Hannes

^ permalink raw reply

* Re: Suggestion on hashing
From: Jeff King @ 2011-12-03 15:08 UTC (permalink / raw)
  To: Bill Zaumen; +Cc: git, pclouds
In-Reply-To: <1322877021.1729.118.camel@yos>

On Fri, Dec 02, 2011 at 05:50:21PM -0800, Bill Zaumen wrote:

> On Fri, 2011-12-02 at 12:54 -0500, Jeff King wrote:
> > On Fri, Dec 02, 2011 at 12:08:39AM -0800, Bill Zaumen wrote:
> 
> > I think your code is solving the wrong problem (or solving the right
> > problem in a half-way manner). The only things that make sense to me
> > are:
> > 
> >   1. Do nothing. SHA-1 is probably not broken yet, even by the NSA, and
> >      even if it is, an attack is extremely expensive to mount. This may
> >      change in the future, of course, but it will probably stay
> >      expensive for a while.
> > 
> >   2. Decouple the object identifier and digest roles, but insert the
> >      digest into newly created objects, so it can be part of the
> >      signature chain. I described such a scheme in one of my replies to
> >      you. It has some complexities, but has the bonus that we can build
> >      directly on older history, preserving its sha1s.
> > 
> >   3. Replace SHA-1 with a more secure algorithm.
> 
> Suppose I make the digest pluggable, something I intended to do
> eventually anyway?  Then you just use the existing SHA-1 as an
> object identifier and the new digest in a signature chain?  What I
> did was essentially to compute the new digest (using a CRC as the
> trivial case) whenever an object's SHA-1 hash is computed, plus
> using the new digest for low-cost collision checks.

If you make the digest stronger (or pluggable) and include it in the
actual objects themselves, then you have a start on (2).

I'd drop all of the digest-exchange bits from the protocol, as the
actual signatures are the real, trustable verification. I don't think
you can drop the external storage of the digests, which is one of the
ugliest bits. You'll be asking for the digests all the time to create
new commit objects, so you need to have it at hand without rehashing.

And I wouldn't get my hopes up that this will go into git any time soon.
At this point, we're really guessing about how broken SHA-1 will be in
the future, and how much we are going to want to care.

Just my two cents.

-Peff

^ permalink raw reply

* Re: Suggestion on hashing
From: Philip Oakley @ 2011-12-03 15:34 UTC (permalink / raw)
  To: Bill Zaumen; +Cc: git, pclouds, Jeff King
In-Reply-To: <20111203150842.GA4442@sigill.intra.peff.net>

Had you seen the recent thread by Junio with the footnote link to the paper 
on reconcilliation by using multiple hashes?
http://article.gmane.org/gmane.linux.kernel/1214517.
"What's the Difference? Efficient Set Reconciliation without Prior
Context" http://cseweb.ucsd.edu/~fuyeda/papers/sigcomm2011.pdfIt looks to 
have a lot of the properties being sought, and links with other git aspects.

Philip

From: "Jeff King" <peff@peff.net>: Saturday, December 03, 2011 3:08 PM
On Fri, Dec 02, 2011 at 05:50:21PM -0800, Bill Zaumen wrote:

> On Fri, 2011-12-02 at 12:54 -0500, Jeff King wrote:
> > On Fri, Dec 02, 2011 at 12:08:39AM -0800, Bill Zaumen wrote:
>
> > I think your code is solving the wrong problem (or solving the right
> > problem in a half-way manner). The only things that make sense to me
> > are:
> >
> >   1. Do nothing. SHA-1 is probably not broken yet, even by the NSA, and
> >      even if it is, an attack is extremely expensive to mount. This may
> >      change in the future, of course, but it will probably stay
> >      expensive for a while.
> >
> >   2. Decouple the object identifier and digest roles, but insert the
> >      digest into newly created objects, so it can be part of the
> >      signature chain. I described such a scheme in one of my replies to
> >      you. It has some complexities, but has the bonus that we can build
> >      directly on older history, preserving its sha1s.
> >
> >   3. Replace SHA-1 with a more secure algorithm.
>
> Suppose I make the digest pluggable, something I intended to do
> eventually anyway?  Then you just use the existing SHA-1 as an
> object identifier and the new digest in a signature chain?  What I
> did was essentially to compute the new digest (using a CRC as the
> trivial case) whenever an object's SHA-1 hash is computed, plus
> using the new digest for low-cost collision checks.

If you make the digest stronger (or pluggable) and include it in the
actual objects themselves, then you have a start on (2).

I'd drop all of the digest-exchange bits from the protocol, as the
actual signatures are the real, trustable verification. I don't think
you can drop the external storage of the digests, which is one of the
ugliest bits. You'll be asking for the digests all the time to create
new commit objects, so you need to have it at hand without rehashing.

And I wouldn't get my hopes up that this will go into git any time soon.
At this point, we're really guessing about how broken SHA-1 will be in
the future, and how much we are going to want to care.

Just my two cents.

-Peff
--
Version: 2012.0.1873 / Virus Database: 2102/4653 - Release Date: 12/02/11

^ permalink raw reply

* [PATCH, v3] git-tag: introduce --[no-]strip options
From: Kirill A. Shutemov @ 2011-12-03 17:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Kirill A. Shutemov

From: "Kirill A. Shutemov" <kirill@shutemov.name>

Normally git tag stripes tag message lines starting with '#', trailing
spaces from every line and empty lines from the beginning and end.

--no-strip is useful if you want to take a tag message as-is, without
any stripping.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
 Documentation/git-tag.txt |    6 ++++++
 builtin/tag.c             |   41 +++++++++++++++++++++++++++--------------
 2 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/Documentation/git-tag.txt b/Documentation/git-tag.txt
index c83cb13..31463ca 100644
--- a/Documentation/git-tag.txt
+++ b/Documentation/git-tag.txt
@@ -99,6 +99,12 @@ OPTIONS
 	Implies `-a` if none of `-a`, `-s`, or `-u <key-id>`
 	is given.
 
+--strip::
+	Remove from tag message lines starting with '#', trailing spaces from
+	every line and empty lines from the beginning and end.  Enabled by
+	default. With `--no-strip`, the tag message given by the user is used
+	as-is.
+
 <tagname>::
 	The name of the tag to create, delete, or describe.
 	The new tag name must pass all checks defined by
diff --git a/builtin/tag.c b/builtin/tag.c
index 9b6fd95..d82774d 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -319,8 +319,14 @@ static int build_tag_object(struct strbuf *buf, int sign, unsigned char *result)
 	return 0;
 }
 
+struct create_tag_options {
+	unsigned int message;
+	unsigned int sign;
+	unsigned int strip;
+};
+
 static void create_tag(const unsigned char *object, const char *tag,
-		       struct strbuf *buf, int message, int sign,
+		       struct strbuf *buf, struct create_tag_options *opt,
 		       unsigned char *prev, unsigned char *result)
 {
 	enum object_type type;
@@ -345,7 +351,7 @@ static void create_tag(const unsigned char *object, const char *tag,
 	if (header_len > sizeof(header_buf) - 1)
 		die(_("tag header too big."));
 
-	if (!message) {
+	if (!opt->message) {
 		int fd;
 
 		/* write the template message before editing: */
@@ -356,7 +362,7 @@ static void create_tag(const unsigned char *object, const char *tag,
 
 		if (!is_null_sha1(prev))
 			write_tag_body(fd, prev);
-		else
+		else if (opt->strip)
 			write_or_die(fd, _(tag_template), strlen(_(tag_template)));
 		close(fd);
 
@@ -367,14 +373,15 @@ static void create_tag(const unsigned char *object, const char *tag,
 		}
 	}
 
-	stripspace(buf, 1);
+	if (opt->strip)
+		stripspace(buf, 1);
 
-	if (!message && !buf->len)
+	if (opt->message && !buf->len)
 		die(_("no tag message?"));
 
 	strbuf_insert(buf, 0, header_buf, header_len);
 
-	if (build_tag_object(buf, sign, result) < 0) {
+	if (build_tag_object(buf, opt->sign, result) < 0) {
 		if (path)
 			fprintf(stderr, _("The tag message has been left in %s\n"),
 				path);
@@ -422,9 +429,9 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 	unsigned char object[20], prev[20];
 	const char *object_ref, *tag;
 	struct ref_lock *lock;
-
-	int annotate = 0, sign = 0, force = 0, lines = -1,
-		list = 0, delete = 0, verify = 0;
+	struct create_tag_options opt;
+	int annotate = 0, force = 0, lines = -1, list = 0,
+	    delete = 0, verify = 0;
 	const char *msgfile = NULL, *keyid = NULL;
 	struct msg_arg msg = { 0, STRBUF_INIT };
 	struct commit_list *with_commit = NULL;
@@ -442,7 +449,9 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 		OPT_CALLBACK('m', "message", &msg, "message",
 			     "tag message", parse_msg_arg),
 		OPT_FILENAME('F', "file", &msgfile, "read message from file"),
-		OPT_BOOLEAN('s', "sign", &sign, "annotated and GPG-signed tag"),
+		OPT_BOOLEAN('s', "sign", &opt.sign, "annotated and GPG-signed tag"),
+		OPT_BOOLEAN(0, "strip", &opt.strip,
+					"turn off tag message stripping"),
 		OPT_STRING('u', "local-user", &keyid, "key-id",
 					"use another key to sign the tag"),
 		OPT__FORCE(&force, "replace the tag if exists"),
@@ -459,13 +468,16 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 
 	git_config(git_tag_config, NULL);
 
+	opt.sign = 0;
+	opt.strip = 1;
+
 	argc = parse_options(argc, argv, prefix, options, git_tag_usage, 0);
 
 	if (keyid) {
-		sign = 1;
+		opt.sign = 1;
 		set_signingkey(keyid);
 	}
-	if (sign)
+	if (opt.sign)
 		annotate = 1;
 	if (argc == 0 && !(delete || verify))
 		list = 1;
@@ -523,9 +535,10 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 	else if (!force)
 		die(_("tag '%s' already exists"), tag);
 
+	opt.message = msg.given || msgfile;
+
 	if (annotate)
-		create_tag(object, tag, &buf, msg.given || msgfile,
-			   sign, prev, object);
+		create_tag(object, tag, &buf, &opt, prev, object);
 
 	lock = lock_any_ref_for_update(ref.buf, prev, 0);
 	if (!lock)
-- 
1.7.7.3

^ permalink raw reply related

* Re: git auto-repack is broken...
From: Brandon Casey @ 2011-12-03 19:42 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Linus Torvalds, Ævar Arnfjörð,
	Git Mailing List
In-Reply-To: <20111202174546.GA24093@sigill.intra.peff.net>

On Fri, Dec 2, 2011 at 11:45 AM, Jeff King <peff@peff.net> wrote:
> On Fri, Dec 02, 2011 at 09:35:52AM -0800, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>>
>> > When the objects become unreferenced, we eject them from the pack into
>> > loose form again. If they don't become referenced in the 2-week window,
>> > they get pruned then. So yes, you drop the age information, but they do
>> > eventually go away.
>>
>> If you update gc/repack -A to put them in a separate pack, then you would
>> never be able to get rid of them, no? You pack, then eject (which gives
>> them a fresher timestamp), then notice that you are within the 2-week window
>> and pack them again,...
>
> But we shouldn't be packing totally unreferenced objects. Barring bugs,
> the life cycle of such an object should be something like:
>
>  1. Object X is created on branch 'foo'.
>
>  2. Branch 'foo' is deleted, but its commits are still in the HEAD
>     reflog, referencing X.
>
>  3. 90 days pass (actually, I think this might be the 30-day
>     expire-unreachable time)
>
>  4. "git gc" runs "git repack -Ad", which will eject X from the pack
>     into a loose form (because it is not becoming part of the new pack
>     we are writing).

Actually, it is right here when the newly loosened unreferenced
objects will be deleted.  Objects ejected from a pack _are_ given the
timestamp of the pack they were ejected from.  So, if the pack is
older than two weeks (90 days in your example), then so will be the
loosened objects, and git prune will delete them when called by git
gc.

>  5. Two weeks pass.
>
>  6. "git gc" runs "git prune --expire=2.weeks.ago", which removes the
>     object.
>
> "gc" runs between (4) and (6) will not re-pack the object, because it
> remains unreferenced.

Correct with the recognition that loose objects get pack mtime, so
step 5 may be less than two weeks.

> I think things might be slowed somewhat by "gc --auto", which will not
> do a "repack -A" until we have too many packs. So steps (3) and (4) are
> really more like "gc runs git-repack without -A" 50 times, and then we
> finally run "git repack -A".

This is correct.  This should have the effect of increasing the age of
unreferenced objects when they are finally loosened and make it more
likely that they are pruned during the same git gc operation that
loosens them.

Linus's scenario of fetching a lot of stuff that never actually makes
it into the reflogs is still a valid problem.  I'm not sure that
people who don't know what they are doing are going to run into this
problem though.  Since he fetches a lot of stuff without ever checking
it out or creating a branch from it, potentially many objects become
unreferenced every time FETCH_HEAD changes.  If he does this many
times in a short period of time, he could reach the gc.autopacklimit
and trigger gc --auto and produce more than gc.auto loose objects that
are younger than gc.pruneExpire.

Decreasing gc.pruneExpire as you suggested should make it much less
likely to run into this problem.  I wonder if it is worth trying to
limit how often gc --auto is run to not be more often than
gc.pruneExpire or something.  If we modified the timestamp that is
assigned to fetched packs, maybe we could use the pack timestamps as
an indicator of how recently git gc has run.

-Brandon

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox