Git development
 help / color / mirror / Atom feed
* Re: git alias and --help
From: Junio C Hamano @ 2011-10-28 18:07 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Miles Bader, Gelonida N, git
In-Reply-To: <m362j95jv3.fsf@localhost.localdomain>

Jakub Narebski <jnareb@gmail.com> writes:

>> What I've often wished is that git's help system would output
>> something like:
>> 
>>    $ git help co
>>    `git co' is aliased to `checkout'
>> 
>>    Here's the help entry for `checkout':
>> 
>>    GIT-CHECKOUT(1)                   Git Manual                   GIT-CHECKOUT(1)
>
> Wouldn't it be more useful to say something like this:
>
>   $ git co --help
>   `git co' is aliased to `checkout'
>  
>   You can see help entry for `checkout' with "git checkout --help"
>
> Then help is only copy'n'paste away.  

Describe your algorithm to come up with the equivalent to the above
'checkout' in this example:

    $ git one --help
    `git one' is aliased to `!sh -c 'git show -s --pretty="format:%h (%s, %ai" "$@" | sed -e "s/ [012][0-9]:[0-5][0-9]:[0-5][0-9] [-+][0-9][0-9][0-9][0-9]$/)/"' -'

If we decide to punt on the '! <cmd>' form, i.e. "take the first token and
if it is a git command then do this special thing but otherwise don't make
things worse", then you could improve this example:

    $ git lgf --help
    'git lgf' is aliased to 'log --oneline --boundary --first-parent'

with "git log --help", but that is aiming too low for my taste.

If you are redesigning the help system, isn't it a shame that you are
discarding other tokens in the alias when giving help? Wouldn't it be
wonderful if you extracted the option descriptions for these three options
specified and showing only that, for example?

You would need to ensure that the manual pages for all commands share the
same structure to make that happen, which goes without saying.

^ permalink raw reply

* What's cooking in git.git (Oct 2011, #11; Fri, 28)
From: Junio C Hamano @ 2011-10-28 18:12 UTC (permalink / raw)
  To: git

Here are the topics that have been cooking.  Commits prefixed with '-' are
only in 'pu' (proposed updates) while commits prefixed with '+' are in 'next'.

I am hoping to tag 1.7.8-rc0 this weekend. Proposals and patches for
changes that are not regression fixes need to wait until the next
development cycle.

Here are the repositories that have my integration branches:

With maint, master, next, pu, todo:

        git://git.kernel.org/pub/scm/git/git.git
        git://repo.or.cz/alt-git.git
        https://code.google.com/p/git-core/
        https://github.com/git/git

With only maint and master:

        git://git.sourceforge.jp/gitroot/git-core/git.git
        git://git-core.git.sourceforge.net/gitroot/git-core/git-core

With all the topics and integration branches but not todo, html or man:

        https://github.com/gitster/git

I will stop pushing the generated documentation branches to the above
repositories, as they are not sources. The only reason the source
repository at k.org has hosted these branches was because it was the only
repository over there that was writable by me; it was an ugly historical
and administrative workaround and not a demonstration of the best
practice.

These branches are pushed to their own separate repositories instead:

        git://git.kernel.org/pub/scm/git/git-{htmldocs,manpages}.git/
        git://repo.or.cz/git-{htmldocs,manpages}.git/
        https://code.google.com/p/git-{htmldocs,manpages}.git/
        https://github.com/gitster/git-{htmldocs,manpages}.git/

--------------------------------------------------
[New Topics]

* nd/misc-cleanups (2011-10-27) 6 commits
  (merged to 'next' on 2011-10-28 at 2527a49)
 + unpack_object_header_buffer(): clear the size field upon error
 + tree_entry_interesting: make use of local pointer "item"
 + tree_entry_interesting(): give meaningful names to return values
 + read_directory_recursive: reduce one indentation level
 + get_tree_entry(): do not call find_tree_entry() on an empty tree
 + tree-walk.c: do not leak internal structure in tree_entry_len()

These are unquestionably good parts taken out of a larger series, so that
we can focus more on the other changes in later rounds of review.

Will keep in 'next' during this cycle.

* ss/blame-textconv-fake-working-tree (2011-10-28) 2 commits
 - (squash) test for previous
 - blame.c: Properly initialize strbuf after calling, textconv_object()

A trivial fix for a breakage worth fixing.
Will merge to 'master' before -rc1.

--------------------------------------------------
[Graduated to "master"]

* js/grep-mutex (2011-10-26) 3 commits
  (merged to 'next' on 2011-10-26 at 6fac2d6)
 + builtin/grep: simplify lock_and_read_sha1_file()
 + builtin/grep: make lock/unlock into static inline functions
 + git grep: be careful to use mutexes only when they are initialized

* rj/gitweb-clean-js (2011-10-26) 1 commit
  (merged to 'next' on 2011-10-26 at db36a24)
 + gitweb/Makefile: Remove static/gitweb.js in the clean target

* rs/maint-estimate-cache-size (2011-10-26) 1 commit
  (merged to 'next' on 2011-10-26 at 2f11375)
 + read-cache.c: fix index memory allocation

* sn/complete-bash-wo-process-subst (2011-10-26) 1 commit
  (merged to 'next' on 2011-10-26 at 8662ed6)
 + completion: fix issue with process substitution not working on Git for Windows

--------------------------------------------------
[Stalled]

* hv/submodule-merge-search (2011-10-13) 4 commits
 - submodule.c: make two functions static
 - allow multiple calls to submodule merge search for the same path
 - push: Don't push a repository with unpushed submodules
 - push: teach --recurse-submodules the on-demand option

What the topic aims to achieve may make sense, but the implementation
looked somewhat suboptimal.

* sr/transport-helper-fix-rfc (2011-07-19) 2 commits
 - t5800: point out that deleting branches does not work
 - t5800: document inability to push new branch with old content

Perhaps 281eee4 (revision: keep track of the end-user input from the
command line, 2011-08-25) would help.

* jc/lookup-object-hash (2011-08-11) 6 commits
 - object hash: replace linear probing with 4-way cuckoo hashing
 - object hash: we know the table size is a power of two
 - object hash: next_size() helper for readability
 - pack-objects --count-only
 - object.c: remove duplicated code for object hashing
 - object.c: code movement for readability

I do not think there is anything fundamentally wrong with this series, but
the risk of breakage far outweighs observed performance gain in one
particular workload.

* jc/verbose-checkout (2011-10-16) 2 commits
 - checkout -v: give full status output after switching branches
 - checkout: move the local changes report to the end

This is just to leave a record that the reason why we do not do this not
because we are incapable of coding this, but because it is not a good idea
to do this. I suspect people who are new to git that might think they need
it would soon realize the don't.

Will keep in 'pu' as a showcase for a while and then will drop.

* kk/gitweb-side-by-side-diff (2011-10-17) 2 commits
 - gitweb: add a feature to show side-by-side diff
 - gitweb: change format_diff_line() to remove leading SP from $diff_class

Fun.
Will keep in 'pu' until the planned re-roll comes.

--------------------------------------------------
[Cooking]

* ef/mingw-upload-archive (2011-10-26) 3 commits
 - upload-archive: use start_command instead of fork
 - compat/win32/poll.c: upgrade from upstream
 - mingw: move poll out of sys-folder

Are msysgit folks OK with this series (I didn't see msysgit list Cc'ed on
these patches)? If so let's move this forward, as the changes to the core
part seem solid.

* rs/allocate-cache-entry-individually (2011-10-26) 2 commits
  (merged to 'next' on 2011-10-27 at 2e4acd6)
 + cache.h: put single NUL at end of struct cache_entry
 + read-cache.c: allocate index entries individually

Will keep in 'next' during this cycle.

* nd/pretty-commit-log-message (2011-10-23) 2 commits
  (merged to 'next' on 2011-10-27 at 4b61df7)
 + pretty.c: use original commit message if reencoding fails
 + pretty.c: free get_header() return value

Will merge to 'master' before -rc0.

* mh/ref-api-3 (2011-10-19) 11 commits
  (merged to 'next' on 2011-10-23 at 92e2d35)
 + is_refname_available(): reimplement using do_for_each_ref_in_array()
 + names_conflict(): simplify implementation
 + names_conflict(): new function, extracted from is_refname_available()
 + repack_without_ref(): reimplement using do_for_each_ref_in_array()
 + do_for_each_ref_in_array(): new function
 + do_for_each_ref(): correctly terminate while processesing extra_refs
 + add_ref(): take a (struct ref_entry *) parameter
 + create_ref_entry(): extract function from add_ref()
 + parse_ref_line(): add a check that the refname is properly formatted
 + repack_without_ref(): remove temporary
 + Rename another local variable name -> refname
 (this branch uses mh/ref-api-2.)

Will keep in 'next' during this cycle.

* rr/revert-cherry-pick (2011-10-23) 5 commits
  (merged to 'next' on 2011-10-26 at 27b7496)
 + revert: simplify communicating command-line arguments
 + revert: allow mixed pick and revert instructions
 + revert: make commit subjects in insn sheet optional
 + revert: simplify getting commit subject in format_todo()
 + revert: free msg in format_todo()

The internals of "git revert/cherry-pick" has been further refactored to
serve as the basis for the sequencer.

Will keep in 'next' during this cycle.

* jc/check-ref-format-fixup (2011-10-19) 2 commits
  (merged to 'next' on 2011-10-19 at 98981be)
 + Revert "Restrict ref-like names immediately below $GIT_DIR"
  (merged to 'next' on 2011-10-15 at 8e89bc5)
 + Restrict ref-like names immediately below $GIT_DIR

This became a no-op except for the bottom one which is part of the other
topic now.
Will discard once the other topic graduates to 'master'.

* cb/daemon-permission-errors (2011-10-17) 2 commits
 - daemon: report permission denied error to clients
 - daemon: add tests

The tip commit might be loosening things a bit too much.
Will keep in 'pu' until hearing a convincing argument for the patch.

* mh/ref-api-2 (2011-10-17) 14 commits
  (merged to 'next' on 2011-10-19 at cc89f0e)
 + resolve_gitlink_ref_recursive(): change to work with struct ref_cache
 + Pass a (ref_cache *) to the resolve_gitlink_*() helper functions
 + resolve_gitlink_ref(): improve docstring
 + get_ref_dir(): change signature
 + refs: change signatures of get_packed_refs() and get_loose_refs()
 + is_dup_ref(): extract function from sort_ref_array()
 + add_ref(): add docstring
 + parse_ref_line(): add docstring
 + is_refname_available(): remove the "quiet" argument
 + clear_ref_array(): rename from free_ref_array()
 + refs: rename parameters result -> sha1
 + refs: rename "refname" variables
 + struct ref_entry: document name member
 + cache.h: add comments for git_path() and git_path_submodule()
 (this branch is used by mh/ref-api-3.)

Will keep in 'next' during this cycle.

* dm/pack-objects-update (2011-10-20) 4 commits
  (merged to 'next' on 2011-10-27 at fa52898)
 + pack-objects: don't traverse objects unnecessarily
 + pack-objects: rewrite add_descendants_to_write_order() iteratively
 + pack-objects: use unsigned int for counter and offset values
 + pack-objects: mark add_to_write_order() as inline

Will merge to 'master' before -rc0.

* jk/git-tricks (2011-10-21) 3 commits
  (merged to 'next' on 2011-10-23 at 7c9bf71)
 + completion: match ctags symbol names in grep patterns
 + contrib: add git-jump script
 + contrib: add diff highlight script

As this stuff is in contrib/ I do not care too much about the stability.
Will merge to 'master' unless there is strong objection.

* jc/signed-commit (2011-10-21) 7 commits
  (merged to 'next' on 2011-10-23 at 03eec25)
 + pretty: %G[?GS] placeholders
 + parse_signed_commit: really use the entire commit log message
 + test "commit -S" and "log --show-signature"
 + t7004: extract generic "GPG testing" bits
 + log: --show-signature
 + commit: teach --gpg-sign option
 + Split GPG interface into its own helper library

This is to replace the earlier "signed push" experiments.
Will keep in 'next' during this cycle.

* sg/complete-refs (2011-10-21) 9 commits
  (merged to 'next' on 2011-10-26 at d65e2b4)
 + completion: remove broken dead code from __git_heads() and __git_tags()
 + completion: fast initial completion for config 'remote.*.fetch' value
 + completion: improve ls-remote output filtering in __git_refs_remotes()
 + completion: query only refs/heads/ in __git_refs_remotes()
 + completion: support full refs from remote repositories
 + completion: improve ls-remote output filtering in __git_refs()
 + completion: make refs completion consistent for local and remote repos
 + completion: optimize refs completion
 + completion: document __gitcomp()

Will keep in 'next' until an Ack or two from completion folks.

* jc/request-pull-show-head-4 (2011-10-15) 11 commits
  (merged to 'next' on 2011-10-15 at 7e340ff)
 + fmt-merge-msg.c: Fix an "dubious one-bit signed bitfield" sparse error
  (merged to 'next' on 2011-10-10 at 092175e)
 + environment.c: Fix an sparse "symbol not declared" warning
 + builtin/log.c: Fix an "Using plain integer as NULL pointer" warning
  (merged to 'next' on 2011-10-07 at fcaeca0)
 + fmt-merge-msg: use branch.$name.description
  (merged to 'next' on 2011-10-06 at fa5e0fe)
 + request-pull: use the branch description
 + request-pull: state what commit to expect
 + request-pull: modernize style
 + branch: teach --edit-description option
 + format-patch: use branch description in cover letter
 + branch: add read_branch_desc() helper function
 + Merge branch 'bk/ancestry-path' into jc/branch-desc

Allow setting "description" for branches and use it to help communications
between humans in various workflow elements.

Will keep in 'next' during this cycle.

^ permalink raw reply

* imap-send badly handles commit bodies beginning with "From <"
From: Andrew Eikum @ 2011-10-28 18:00 UTC (permalink / raw)
  To: git

Ran into this today. I had a commit message that looked like:

---
Do something

>From <http://url>:
Words
---

I put it through imap-send to email it to my project, and ended up
with this output:

sending 1 messages
 200% (2/1) done

On the server side, it was split into two mails on either side of that
commit message's From line with neither mail actually containing the
From line. To fix it, I just changed it to "Copied from <url>:" :-P

Ain't mbox grand?

^ permalink raw reply

* Re: git alias and --help
From: Miles Bader @ 2011-10-28 18:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jakub Narebski, Gelonida N, git
In-Reply-To: <7v62j9t2gj.fsf@alter.siamese.dyndns.org>

2011/10/29 Junio C Hamano <gitster@pobox.com>:
>    $ git lgf --help
>    'git lgf' is aliased to 'log --oneline --boundary --first-parent'
>
> with "git log --help", but that is aiming too low for my taste.
>
> If you are redesigning the help system, isn't it a shame that you are
> discarding other tokens in the alias when giving help? Wouldn't it be
> wonderful if you extracted the option descriptions for these three options
> specified and showing only that, for example?

I think that would be the wrong thing in most cases though:  an alias
like the above happily allows the user to specify other git-log
options on the command line; when I get help on an aliased option,
it's often precisely because I want to see the _other_ options I can
use...

E.g., I have:

   $ git slog --help
   `git slog' is aliased to `log --date=short --pretty=tformat:"%h
%ad  %s" --abbrev-commit'

But the typical question I want to answer is something like "OK, how
do I reverse the output order of slog?" or "how do I limit the output
of slog to a certain date range?"

-miles

-- 
Cat is power.  Cat is peace.

^ permalink raw reply

* Re: [msysGit] Re: [PATCH/RFC] mingw: implement PTHREAD_MUTEX_INITIALIZER
From: Atsushi Nakagawa @ 2011-10-28 18:35 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: kusmabite, Johannes Sixt, msysgit, git, johannes.schindelin
In-Reply-To: <CAGZ=bqKA7P_FJz447AZA5HjWdghKnZqAWGuKAuvjsGp5bAGC1w@mail.gmail.com>

Thanks for the explanation. :)

Kyle Moffett <kyle@moffetthome.net> wrote:
> On Thu, Oct 27, 2011 at 19:00, Atsushi Nakagawa <atnak@chejz.com> wrote:
> > Erik Faye-Lund <kusmabite@gmail.com> wrote:
> >> On Wed, Oct 26, 2011 at 5:44 AM, Kyle Moffett <kyle@moffetthome.net> wrote:
> >> > On Tue, Oct 25, 2011 at 16:51, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> >> >> [...]
> >> >
> >> > No, I'm afraid that won't solve the issue (at least in GCC, not sure about MSVC)
> >> >
> >> > A write barrier in one thread is only effective if it is paired with a
> >> > read barrier in the other thread.
> >> >
> >> > Since there's no read barrier in the "while(mutex->autoinit != 0)",
> >> > you don't have any guaranteed ordering.
> >
> > Out of curiosity, where could re-ordering be a problem here?  I'm
> > thinking probably at "EnterCriticalSection(&mutex->cs)" and the contents
> > of "mutex->cs" not being propagated to the waiting thread.  However,
> > shouldn't that be a non-problem, as far as compiler reordering goes,
> > because it's an external function call and only the address of mutex->cs
> > is passed?
> >
> > [...]
> 
> Ok, so here's the race condition:
> 
> Thread1				Thread2
> 				/* Speculative prefetch */
> 				prefetch(*mutex);
> 
> if (mutex->autoinit) {
> if (ICE(&mutex->autoinit, -1, 1) != -1) {
> /* Now mutex->autoinit == -1 */
> pthread_mutex_init(mutex, NULL);
> /* This forces writes out to memory */
> ICE(&mutex->autoinit, 0, -1);
> 
> 				if (mutex->autoinit) {} /* false */
> 				/* No read barrier here */
> 				EnterCriticalSection(&mutex->cs);
> 				/* Use cached mutex->cs from earlier */

Ok, so there's no way of skimping on that one memory barrier in every
visit to pthread_mutex_lock().  Interesting.  Makes me wonder how it
trades off to lazy initialization.
> 
> Even though you forced the memory write to be ordered in Thread 1 you
> did not ensure that the read of autoinit occurred before the read of
> mutex->cs in Thread 2.  If the first thing that EnterCriticalSection
> does is follow a pointer or read a mutex key value in mutex->cs then
> won't necessarily get the right answer.
> 
> The rule of memory barriers is the ALWAYS come in pairs.  This simple
> example guarantees that Thread2 will read "tmp_a" == 1 if it
> previously read "tmp_b" == 1, although getting "tmp_a" == 1 and
> "tmp_b" != 1 is still possible.
> 
> Thread1:
> a = 1;
> write_barrier();
> b = 1;
> 
> Thread2:
> tmp_b = b;
> read_barrier();
> tmp_a = a;
> 
> I think there's a Documentation/memory-barriers.txt file in the kernel
> source code with more helpful info.

-- 
Atsushi Nakagawa
<atnak@chejz.com>
Changes are made when there is inconvenience.

^ permalink raw reply

* Re: [PATCH] blame.c: Properly initialize strbuf after calling, textconv_object()
From: Sebastian Schuberth @ 2011-10-28 18:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, msysgit
In-Reply-To: <7vd3dht4ms.fsf@alter.siamese.dyndns.org>

On 28.10.2011 19:20, Junio C Hamano wrote:

>>>>> Thanks; do you have no addition to the test suite to demonstrate the
>>>>> breakage?
>>>>
>>>> Not yet. I'll try to come up with something.
>>>
>>> Let's do this.
>>
>> Thanks, but that does not seem to work for me. The test breaks both
>> without and with my patch. I'll look into it.
> 
> Thanks. I suspect the difference is because you are on a crlf-native
> platform while I am not...

I also didn't have any luck. I've created a test that should fail without my patch, but it succeeds when running the test script. However, if I copy and paste the lines from the test to the command line, the test fails as expected ("blame" is empty). I'm out of ideas right now.

diff --git a/t/t8006-blame-textconv.sh b/t/t8006-blame-textconv.sh
index 32ec82a..4fee5aa 100755
--- a/t/t8006-blame-textconv.sh
+++ b/t/t8006-blame-textconv.sh
@@ -14,6 +14,13 @@ sed 's/^bin: /converted: /' "$1"
 EOF
 chmod +x helper
 
+cat >helper-dos-line-endings <<'EOF'
+#!/bin/sh
+grep -q '^bin: ' "$1" || { echo "E: $1 is not \"binary\" file" 1>&2; exit 1; }
+sed 's/^bin: \(.*\)$/converted: \1\r/' "$1"
+EOF
+chmod +x helper-dos-line-endings
+
 test_expect_success 'setup ' '
 	echo "bin: test 1" >one.bin &&
 	echo "bin: test number 2" >two.bin &&
@@ -74,6 +81,14 @@ test_expect_success 'blame --textconv going through revisions' '
 	test_cmp expected result
 '
 
+test_expect_success 'blame --textconv with DOS line endings' '
+	git config diff.test.textconv ./helper-dos-line-endings &&
+	git blame --textconv two.bin >blame &&
+	git config diff.test.textconv ./helper &&
+	find_blame <blame >result &&
+	test_cmp expected result
+'
+
 test_expect_success 'setup +cachetextconv' '
 	git config diff.test.cachetextconv true
 '

-- 
Sebastian Schuberth

^ permalink raw reply related

* Re: git slow over https
From: Daniel Stenberg @ 2011-10-28 18:28 UTC (permalink / raw)
  To: Mika Fischer; +Cc: Git Mailing List
In-Reply-To: <CAOs=hR+K_YZcjdAUq_jaz0wc9k8BRQ2-ny7A=GFaNL4R-W0UBw@mail.gmail.com>

On Fri, 28 Oct 2011, Mika Fischer wrote:

> 1) What's the purpose of the select in http.c:673? Can it be removed?
> 2) If it serves a useful purpose, what can be the reason that it hurts
> performance so much in my case?

The purpose must be to avoid busy-looping in case there's nothing to read.

It should probably use curl_multi_fdset [1] to get a decent set to wait for 
instead so that it'll return fast if there is pending data. The timeout for 
select can in fact also get extended with the use of curl_multi_timeout [2].

1 = http://curl.haxx.se/libcurl/c/curl_multi_fdset.html
2 = http://curl.haxx.se/libcurl/c/curl_multi_timeout.html

-- 

  / daniel.haxx.se

^ permalink raw reply

* Re: [PATCH 00/28] Store references hierarchically in cache
From: Michael Haggerty @ 2011-10-28 18:45 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Junio C Hamano, git, Jeff King, Drew Northup, Jakub Narebski,
	Heiko Voigt, Johan Herland, Julian Phillips
In-Reply-To: <CALkWK0=Hsq_yg1Vyr-3_jf9n=WcB_XNYRQa0SEhSWD5VxKBXQg@mail.gmail.com>

On 10/28/2011 03:07 PM, Ramkumar Ramachandra wrote:
> Michael Haggerty writes:
>> Therefore, this patch series changes the data structure used to store
>> the reference cache from a single array of pointers-to-struct into a
>> tree-like structure in which each subdirectory of the reference
>> namespace is stored as an array of pointers-to-entry and entries can
>> be either references or subdirectories containing more references.
> 
> Very nice! I like the idea. Can't wait to start reading the series.
> 
>>  * refs/replace is almost *always* needed even though it often
>>    doesn't even exist.  Thus the presence of many loose references
>>    slows down *many* git commands for no reason whatsoever.
> 
> Was this one of your primary inspirations for writing this series?

My primary inspiration was that "git filter-branch" was so slow, which
is partly because of the refs/replace thing and partly just the built-in
inefficiency of the old reference cache.

>> * the time to create a new branch goes from 180 ms to less than 10 ms
>>  (my test resolution only includes two decimal places) and the time
>>  to checkout a new branch does the same.
> 
> I'm interested in seeing how the callgraph changed.  Assuming you used
> Valgrind to profile it, could you publish the outputs?

I didn't use valgrind; I just timed commands using time(1).

>> The efficiency gains are such that some operations are now faster with
>> loose references than with packed references; however, some operations
>> with packed references slow down a bit.
> 
> Curiously, why do operations with packed references slow down?  (I'll
> probably find out in a few minutes after reading the series, but I'm
> asking anyway because it it's very non-obvious to me now)

I think it's just because the new data structure is slightly slower than
the old one for the task of appending thousands of refs without doing
any searching or sorting.  Since packed refs are read all-or-nothing
(even after my changes), there is no way to read the packed refs only
for the directory that you are interested in.

>> Patches 11-24 change most of the internal functions to work with
>> "struct ref_entry *" (namely the kind of ref_entry that holds a
>> directory of references) instead of "struct ref_dir *".  The reason
>> for this change it to allow these functions access to the "flag" and
>> "name" fields that are stored in ref_entry and thereby avoid having to
>> store redundant information in "struct ref_dir" (which would increase
>> the size of *every* ref_entry because of its presence in the union).
> 
> Hm, I was wondering why the series was looking so intimidating.  Is it
> not possible to squash all (or atleast some) of these together?

Why would I possibly want to squash them together?  Each one is
self-contained and logically separate from the rest.  After each commit
the code compiles and works.  Squashing them together wouldn't increase
the cognitive burden of reading them; on the contrary, it would make it
harder to read them because several logically-separate changes would be
confounded into a single patch.  Most of these patches have only a few
nontrivial lines which you can read at a glance.  And if the series ever
has to be bisected, the error can be narrowed down to a very small diff.

>> From: Michael Haggerty <mhagger@alum.mit.edu>
> 
> Nit: Can't you configure your email client to put this in the "From: "
> header of your emails?

I'm not sure where those extra lines come from.  My email client is
git-send-email.  I just checked, and they have appeared in patch series
sent through two completely different SMTP servers, so it seems unlikely
that the SMTP server is guilty.  I'll see if I can figure it out.

> Thanks for the interesting read.

Thanks for reading :-)

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply

* Re: git slow over https
From: Gelonida N @ 2011-10-28 20:07 UTC (permalink / raw)
  To: git
In-Reply-To: <CAOs=hR+K_YZcjdAUq_jaz0wc9k8BRQ2-ny7A=GFaNL4R-W0UBw@mail.gmail.com>

On 10/28/2011 05:28 PM, Mika Fischer wrote:
> Hi,
> 
> I have an apache server that serves git repositories over https. I
> have the problem that git is very slow when accessing it, as in 3
> seconds for a "git pull" that does nothing.
> 
> I tracked the problem down to git sleeping for 50ms using select from
> time to time while downloading the response of the server. In my case
> this really hurts performance (see attached strace). However, with a
> different https server things work quite fine.
> 
> If I remove the select in http.c:673 (in run_active_slot), then things
> are fast also with my server.
> 
> So my questions are:
> 1) What's the purpose of the select in http.c:673? Can it be removed?
> 2) If it serves a useful purpose, what can be the reason that it hurts
> performance so much in my case?
> 

Do you edperience the same delay if you ssh to the server?

If yes, then try to add following line to the slow https server's config
file.

UseDNS no

The config file should have a name similair to
/etc/ssh/sshd_config



Is it getting any faster?

^ permalink raw reply

* Re: git alias and --help
From: Gelonida N @ 2011-10-28 20:21 UTC (permalink / raw)
  To: git
In-Reply-To: <m31utx5js7.fsf@localhost.localdomain>

On 10/28/2011 03:27 PM, Jakub Narebski wrote:
> Gelonida N <gelonida@gmail.com> writes:
> 
> [...]
> 
>> Another small detail:
>>
>> Let's assume I have following alias:
>>
>> log = log --name-status
>>
>>
>> In this case I directly get the help text for git log
>> if I typed 'git log --help' (or 'git help log').
>> I don't even see, that my log is in reality aliased.
> 
> That is because it doesn't work: git does not allow for aliasing its
> built-in commands.


Well this explains why.
Thanks

^ permalink raw reply

* Re: git alias and --help
From: Gelonida N @ 2011-10-28 20:23 UTC (permalink / raw)
  To: git
In-Reply-To: <m362j95jv3.fsf@localhost.localdomain>

On 10/28/2011 03:26 PM, Jakub Narebski wrote:
> Miles Bader <miles@gnu.org> writes:
>> Junio C Hamano <gitster@pobox.com> writes:
> 
>>>>> git branch --help
>>>>
>>>> How about "git help branch"?
>>>
>>> The reason why we do not do what you seem to be suggesting is because
>>> giving the same behaviour to "git b --help" as "git branch --help" is
>>> wrong.
>>
>> I agree with Gelonida's followup:  although what you say makes sense,
>> it's still pretty annoying behavior for the very common case of a
>> simple renaming alias...
>>
>> E.g., I have "co" aliased to "checkout", and so my fingers are very
>> very inclined to say "co" when I mean checkout... including when
>> asking for help.  I actually end up typing "git co --help", grumbling,
>> and retyping with the full command name, quite reguarly.
>>
>> What I've often wished is that git's help system would output
>> something like:
>>
>>    $ git help co
>>    `git co' is aliased to `checkout'
>>
>>    Here's the help entry for `checkout':
>>
>>    GIT-CHECKOUT(1)                   Git Manual                   GIT-CHECKOUT(1)
> 
> Wouldn't it be more useful to say something like this:
> 
>   $ git co --help
>   `git co' is aliased to `checkout'
>  
>   You can see help entry for `checkout' with "git checkout --help"
> 
> Then help is only copy'n'paste away.  
> 
> (This helping text probably should be controlled by some advice.*
> config variable).

This is definitely an option and something which I suggested as one
option myself (however my example had a typo and was perhaps therefore
not understandable)

^ permalink raw reply

* Re: imap-send badly handles commit bodies beginning with "From <"
From: Jeff King @ 2011-10-28 20:32 UTC (permalink / raw)
  To: Andrew Eikum; +Cc: git
In-Reply-To: <20111028180044.GA3966@foghorn.codeweavers.com>

On Fri, Oct 28, 2011 at 01:00:44PM -0500, Andrew Eikum wrote:

> On the server side, it was split into two mails on either side of that
> commit message's From line with neither mail actually containing the
> From line. To fix it, I just changed it to "Copied from <url>:" :-P
> 
> Ain't mbox grand?

Mbox does have this problem, but I think in this case it is a
particularly crappy implementation of mbox in imap-send. Look at
imap-send.c:split_msg; it just looks for "From ".

It should at least check for something that looks like a timestamp, like
git-mailsplit does. Maybe mailsplit's is_from_line should be factored
out so that it can be reused in imap-send.

Want to work on a patch?

-Peff

^ permalink raw reply

* Re: [PATCH/WIP 02/11] notes-merge: use opendir/readdir instead of using read_directory()
From: Nguyen Thai Ngoc Duy @ 2011-10-28 20:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vzkgmz6v0.fsf@alter.siamese.dyndns.org>

On Fri, Oct 28, 2011 at 4:23 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>
>>> When read_directory("where/ever") is called, what kind of paths does it
>>> collect? Do the paths the function collects share "where/ever" as their
>>> common prefix? I thought it collects the paths relative to whatever
>>> top-level directory given to the function, so that "where/ever" could be
>>> anything.
>>
>> Correct. But read_directory() takes pathspec now so naturally it does
>> not treat "where/ever" a common prefix anymore.  So it has to open(".")
>> and starts from there.
>
> That is a puzzling statement. The read_directory() function takes:
>
>  - dir: use this struct to pass traversal status and collected paths;
>
>  - path, len: this is the directory (not a pathspec) we start traversal
>   from; and
>
>  - pathspec: these are the patterns that specify which parts of the
>   directory hierarchy under <path,len> are traversed.
>
> I do not see any good reason for <path,len> to become a match pattern. Are
> you trying to get it prepended to elements in pathspec[] and match the path
> collected including the <path> part?
>
> Why?
>
> I could see that "open . and start from there, treating as if <path,len>
> is also pathspec" could be made to work, but I do not see why that is
> desirable.
>
> In other words, are there existing callers that abuse read_directory()
> to feed a pattern in <path,len>? Maybe they should be the one that needs
> fixing instead?

fill_directory() tries to calculate a common prefix (i.e. <path,len>
to read_directory()) from pathspec and that may or may not work when
pathspec magic comes into play. But yes, I could just make
fill_directory() pass <"",0> to read_directory() and keep <path,len>
in read_directory() for notes-merge and future users.
-- 
Duy

^ permalink raw reply

* Re: [PATCH/WIP 01/11] Introduce "check-attr --excluded" as a replacement for "add --ignore-missing"
From: Nguyen Thai Ngoc Duy @ 2011-10-28 20:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vfwiez4s5.fsf@alter.siamese.dyndns.org>

2011/10/28 Junio C Hamano <gitster@pobox.com>:
> Perhaps ls-files is a more suitable home for the feature?

ls-files sounds good. It does all kinds of file selection already.
I'll see if I can add -I (aka "show ignored files only) to it.
-- 
Duy

^ permalink raw reply

* Re: imap-send badly handles commit bodies beginning with "From <"
From: Andrew Eikum @ 2011-10-28 21:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Andrew Eikum, git
In-Reply-To: <20111028203256.GA15082@sigill.intra.peff.net>

On Fri, Oct 28, 2011 at 01:32:57PM -0700, Jeff King wrote:
> Mbox does have this problem, but I think in this case it is a
> particularly crappy implementation of mbox in imap-send. Look at
> imap-send.c:split_msg; it just looks for "From ".
> 
> It should at least check for something that looks like a timestamp, like
> git-mailsplit does. Maybe mailsplit's is_from_line should be factored
> out so that it can be reused in imap-send.

Since we have a program called "mailsplit," wouldn't it make more
sense to have imap-send use its implementation to split mail instead
of sharing just the From line detection?

> Want to work on a patch?

I was hoping it'd be a quick matter of pulling mailsplit's
implementation out of builtin and into the top level, but I see it's
got some global variables that are tangled enough that I actually have
to understand the code before I can pull it apart :)

If no one beats me to it, I'll work on this next week. It's late on
Friday and I'm moving house this weekend.

Quick question, since I'm not intimately familiar with Git's code: I
was thinking of creating a new compilation unit at the top level,
mailutils.{c,h}, and referencing it from both imap-send.c and
builtin/splitmail.c. Does that seem like the right approach? Is there
an existing compilation unit I should be placing splitmail's guts into
instead?

Andrew

^ permalink raw reply

* Re: imap-send badly handles commit bodies beginning with "From <"
From: Jeff King @ 2011-10-28 21:37 UTC (permalink / raw)
  To: Andrew Eikum; +Cc: git
In-Reply-To: <20111028212122.GB3966@foghorn.codeweavers.com>

On Fri, Oct 28, 2011 at 04:21:22PM -0500, Andrew Eikum wrote:

> Since we have a program called "mailsplit," wouldn't it make more
> sense to have imap-send use its implementation to split mail instead
> of sharing just the From line detection?

Potentially, yeah. I was thinking of just pulling over the from line
detection (which is the real black magic bit), but it looks like
imap-send's mbox handling could use some general attention (maybe it
would be possible to not read the entire mbox into memory, for example).

> I was hoping it'd be a quick matter of pulling mailsplit's
> implementation out of builtin and into the top level, but I see it's
> got some global variables that are tangled enough that I actually have
> to understand the code before I can pull it apart :)
>
> If no one beats me to it, I'll work on this next week. It's late on
> Friday and I'm moving house this weekend.

No rush. Let us know if you have questions.

> Quick question, since I'm not intimately familiar with Git's code: I
> was thinking of creating a new compilation unit at the top level,
> mailutils.{c,h}, and referencing it from both imap-send.c and
> builtin/splitmail.c. Does that seem like the right approach? Is there
> an existing compilation unit I should be placing splitmail's guts into
> instead?

Yes, I think a new file makes sense here. Make sure to update LIB_H and
LIB_OBJS in the Makefile.

-Peff

^ permalink raw reply

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
From: Nicolas Pitre @ 2011-10-28 22:48 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, git, Jeff King
In-Reply-To: <CAJo=hJt-YZcdxw+D=1S4haPmY-8-LLjXD=MvDGeWbdJ88_VOGw@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3385 bytes --]

On Fri, 28 Oct 2011, Shawn Pearce wrote:

> On Thu, Oct 27, 2011 at 23:04, Junio C Hamano <gitster@pobox.com> wrote:
> > In addition to four basic types (commit, tree, blob and tag), the pack
> > stream can encode a few other "representation" types, such as REF_DELTA
> > and OFS_DELTA. As we allocate 3 bits in the first byte for this purpose,
> > we do not have much room to add new representation types in place, but we
> > do have one value reserved for future expansion.
> 
> We have 2 values reserved, 0 and 5.
> 
> > When bit 4-6 encodes type 5, the first byte is used this way:
> >
> >  - Bit 0-3 denotes the real "extended" representation type. Because types
> >   0-7 can already be encoded without using the extended format, we can
> >   offset the type by 8 (i.e. if bit 0-3 says 3, it means representation
> >   type 11 = 3 + 8);
> >
> >  - Bit 4-6 has the value "5";
> >
> >  - Bit 7 is used to signal if the _third_ byte needs to be read for larger
> >   size that cannot be represented with 8-bit.
> 
> This is very complicated. We don't need more complex logic in the pack
> encoding. We _especially_ do not need yet another variant of how to
> store a variable length integer in the pack file. I'm sorry, but we
> already have two different variants and this just adds a third. It is
> beyond crazy.
> 
> Last time (this is now years ago but whatever) Nico and I discussed
> adding a new type to packs it was for the alternate tree encoding in
> "pack v4". Trees happen so often that type code 5 is a good value to
> use for these. Later you talked about using the extended type to store
> a cattree blob thing, which would not appear nearly as often as a
> normal directory listing type tree that was encoded using the pack v4
> style encoding... I think saving type 5 for a small frequently
> occurring type is a good thing.
> 
> > As it is unlikely for us to pack things that do not need to record any
> > size, the second byte is always used in full to encode the low 8-bit of
> > the size.
> >
> > I haven't started using type=8 and upwards for anything yet, but because
> > we have only one "future expansion" value left, I want us to be extremely
> > careful in order to avoid painting us into a corner that we cannot get out
> > of, so I am sending this out early for a preliminary review.
> 
> I would have said something more like:
> 
> When bit 4-6 encodes "0", then:
> 
> - Bit 0-3 and bit 7 are used normally to encode a variable length
> "size" integer. These may be 0 indicating no size information.
> 
> - 2nd-nth bytes store remaining "size" information, if bit 7 was set.
> 
> - The immediate next byte encodes the extended type. This type is
> stored using the OFS_DELTA offset varint encoding, and thus may be
> larger than 256 if we ever need it to be.

I'd say it is just a byte.  No encoding needed.  Let's not be silly 
about it.  If we really have more than 255 object types one day (and I 
really hope this will never happen) then the value 0 in that byte could 
indicate yet another extended object type encoding.  But I truly hope 
we'll have pack v9 or v10 by then and that we'll have obsoleted the 
current 3-bit encoding completely at that point anyway.

For the record, I spent around 20 hours working on pack v4 while in the 
Caribbeans for a week last winter as I said I would.  Maybe I'll repeat 
the operation this year.


Nicolas

^ permalink raw reply

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
From: Shawn Pearce @ 2011-10-28 23:07 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git, Jeff King
In-Reply-To: <alpine.LFD.2.02.1110290031540.30467@xanadu.home>

On Fri, Oct 28, 2011 at 15:48, Nicolas Pitre <nico@fluxnic.net> wrote:
> On Fri, 28 Oct 2011, Shawn Pearce wrote:
>> - The immediate next byte encodes the extended type. This type is
>> stored using the OFS_DELTA offset varint encoding, and thus may be
>> larger than 256 if we ever need it to be.
>
> I'd say it is just a byte.  No encoding needed.  Let's not be silly
> about it.  If we really have more than 255 object types one day (and I
> really hope this will never happen) then the value 0 in that byte could
> indicate yet another extended object type encoding.  But I truly hope
> we'll have pack v9 or v10 by then and that we'll have obsoleted the
> current 3-bit encoding completely at that point anyway.

Yes. I probably wouldn't code the parser to use a varint here. I would
say the extended types stored in this byte must be >= 8, and must be
<= 127. Any values out of this range are unsupported and should be
rejected. We can later reserve the right to set the high bit and
switch to the OFS_DELTA varint encoding if we need that many more
types, and we explicitly define codes 0-7 as illegal if detected here
in the extended byte field.

^ permalink raw reply

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
From: Nicolas Pitre @ 2011-10-28 23:30 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, git, Jeff King
In-Reply-To: <CAJo=hJsEzkFV9k8N+GAwWddmEZH8pQeJZrg_MXD72stbAW0ceQ@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1352 bytes --]

On Fri, 28 Oct 2011, Shawn Pearce wrote:

> On Fri, Oct 28, 2011 at 15:48, Nicolas Pitre <nico@fluxnic.net> wrote:
> > On Fri, 28 Oct 2011, Shawn Pearce wrote:
> >> - The immediate next byte encodes the extended type. This type is
> >> stored using the OFS_DELTA offset varint encoding, and thus may be
> >> larger than 256 if we ever need it to be.
> >
> > I'd say it is just a byte.  No encoding needed.  Let's not be silly
> > about it.  If we really have more than 255 object types one day (and I
> > really hope this will never happen) then the value 0 in that byte could
> > indicate yet another extended object type encoding.  But I truly hope
> > we'll have pack v9 or v10 by then and that we'll have obsoleted the
> > current 3-bit encoding completely at that point anyway.
> 
> Yes. I probably wouldn't code the parser to use a varint here. I would
> say the extended types stored in this byte must be >= 8, and must be
> <= 127. Any values out of this range are unsupported and should be
> rejected. We can later reserve the right to set the high bit and
> switch to the OFS_DELTA varint encoding if we need that many more
> types, and we explicitly define codes 0-7 as illegal if detected here
> in the extended byte field.

I wouldn't go as far as rejecting codes 1-7 as illegal though, but I 
otherwise agree with what you say.


Nicolas

^ permalink raw reply

* [PATCH 0/4] Bulk check-in
From: Junio C Hamano @ 2011-10-28 23:54 UTC (permalink / raw)
  To: git

This miniseries is a continuation of the "large file" topic from 1.7.6
development cycle.

The first three are moving existing code around for better reuse.  The
last one serves two purposes: to lift the one-pack-per-one-large-blob
constraint by introducing the concept of "plugging/unplugging" (i.e. you
plug the drain and throw many large blob at index_fd(), and they appear in
a single pack when you unplug it), and to stop using fast-import in this
codepath.

Only very lightly tested.

Junio C Hamano (4):
  write_pack_header(): a helper function
  create_tmp_packfile(): a helper function
  finish_tmp_packfile(): a helper function
  Bulk check-in

 Makefile               |    2 +
 builtin/add.c          |    5 ++
 builtin/pack-objects.c |   56 +++++------------
 bulk-checkin.c         |  159 ++++++++++++++++++++++++++++++++++++++++++++++++
 bulk-checkin.h         |   16 +++++
 pack-write.c           |   53 ++++++++++++++++
 pack.h                 |    6 ++
 sha1_file.c            |   67 +-------------------
 t/t1050-large.sh       |   26 ++++++--
 9 files changed, 282 insertions(+), 108 deletions(-)
 create mode 100644 bulk-checkin.c
 create mode 100644 bulk-checkin.h

-- 
1.7.7.1.573.ga40d2

^ permalink raw reply

* [PATCH 1/4] write_pack_header(): a helper function
From: Junio C Hamano @ 2011-10-28 23:54 UTC (permalink / raw)
  To: git
In-Reply-To: <1319846051-462-1-git-send-email-gitster@pobox.com>

Factor out a small logic out of the private write_pack_file() function
in builtin/pack-objects.c

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/pack-objects.c |    9 +++------
 pack-write.c           |   12 ++++++++++++
 pack.h                 |    2 ++
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ba3705d..6643c16 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -571,7 +571,6 @@ static void write_pack_file(void)
 	uint32_t i = 0, j;
 	struct sha1file *f;
 	off_t offset;
-	struct pack_header hdr;
 	uint32_t nr_remaining = nr_result;
 	time_t last_mtime = 0;
 	struct object_entry **write_order;
@@ -596,11 +595,9 @@ static void write_pack_file(void)
 			f = sha1fd(fd, pack_tmp_name);
 		}
 
-		hdr.hdr_signature = htonl(PACK_SIGNATURE);
-		hdr.hdr_version = htonl(PACK_VERSION);
-		hdr.hdr_entries = htonl(nr_remaining);
-		sha1write(f, &hdr, sizeof(hdr));
-		offset = sizeof(hdr);
+		offset = write_pack_header(f, nr_remaining);
+		if (!offset)
+			die_errno("unable to write pack header");
 		nr_written = 0;
 		for (; i < nr_objects; i++) {
 			struct object_entry *e = write_order[i];
diff --git a/pack-write.c b/pack-write.c
index 9cd3bfb..46f3f84 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -178,6 +178,18 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec
 	return index_name;
 }
 
+off_t write_pack_header(struct sha1file *f, uint32_t nr_entries)
+{
+	struct pack_header hdr;
+
+	hdr.hdr_signature = htonl(PACK_SIGNATURE);
+	hdr.hdr_version = htonl(PACK_VERSION);
+	hdr.hdr_entries = htonl(nr_entries);
+	if (sha1write(f, &hdr, sizeof(hdr)))
+		return 0;
+	return sizeof(hdr);
+}
+
 /*
  * Update pack header with object_count and compute new SHA1 for pack data
  * associated to pack_fd, and write that SHA1 at the end.  That new SHA1
diff --git a/pack.h b/pack.h
index 722a54e..d429d8a 100644
--- a/pack.h
+++ b/pack.h
@@ -2,6 +2,7 @@
 #define PACK_H
 
 #include "object.h"
+#include "csum-file.h"
 
 /*
  * Packed object header
@@ -74,6 +75,7 @@ extern const char *write_idx_file(const char *index_name, struct pack_idx_entry
 extern int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, off_t offset, off_t len, unsigned int nr);
 extern int verify_pack_index(struct packed_git *);
 extern int verify_pack(struct packed_git *);
+extern off_t write_pack_header(struct sha1file *f, uint32_t);
 extern void fixup_pack_header_footer(int, unsigned char *, const char *, uint32_t, unsigned char *, off_t);
 extern char *index_pack_lockfile(int fd);
 extern int encode_in_pack_object_header(enum object_type, uintmax_t, unsigned char *);
-- 
1.7.7.1.573.ga40d2

^ permalink raw reply related

* [PATCH 2/4] create_tmp_packfile(): a helper function
From: Junio C Hamano @ 2011-10-28 23:54 UTC (permalink / raw)
  To: git
In-Reply-To: <1319846051-462-1-git-send-email-gitster@pobox.com>

Factor out a small logic out of the private write_pack_file() function
in builtin/pack-objects.c

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/pack-objects.c |   12 +++---------
 pack-write.c           |   10 ++++++++++
 pack.h                 |    3 +++
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 6643c16..3258fa9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -584,16 +584,10 @@ static void write_pack_file(void)
 		unsigned char sha1[20];
 		char *pack_tmp_name = NULL;
 
-		if (pack_to_stdout) {
+		if (pack_to_stdout)
 			f = sha1fd_throughput(1, "<stdout>", progress_state);
-		} else {
-			char tmpname[PATH_MAX];
-			int fd;
-			fd = odb_mkstemp(tmpname, sizeof(tmpname),
-					 "pack/tmp_pack_XXXXXX");
-			pack_tmp_name = xstrdup(tmpname);
-			f = sha1fd(fd, pack_tmp_name);
-		}
+		else
+			f = create_tmp_packfile(&pack_tmp_name);
 
 		offset = write_pack_header(f, nr_remaining);
 		if (!offset)
diff --git a/pack-write.c b/pack-write.c
index 46f3f84..863cce8 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -328,3 +328,13 @@ int encode_in_pack_object_header(enum object_type type, uintmax_t size, unsigned
 	*hdr = c;
 	return n;
 }
+
+struct sha1file *create_tmp_packfile(char **pack_tmp_name)
+{
+	char tmpname[PATH_MAX];
+	int fd;
+
+	fd = odb_mkstemp(tmpname, sizeof(tmpname), "pack/tmp_pack_XXXXXX");
+	*pack_tmp_name = xstrdup(tmpname);
+	return sha1fd(fd, *pack_tmp_name);
+}
diff --git a/pack.h b/pack.h
index d429d8a..0027ac6 100644
--- a/pack.h
+++ b/pack.h
@@ -84,4 +84,7 @@ extern int encode_in_pack_object_header(enum object_type, uintmax_t, unsigned ch
 #define PH_ERROR_PACK_SIGNATURE	(-2)
 #define PH_ERROR_PROTOCOL	(-3)
 extern int read_pack_header(int fd, struct pack_header *);
+
+extern struct sha1file *create_tmp_packfile(char **pack_tmp_name);
+
 #endif
-- 
1.7.7.1.573.ga40d2

^ permalink raw reply related

* [PATCH 3/4] finish_tmp_packfile(): a helper function
From: Junio C Hamano @ 2011-10-28 23:54 UTC (permalink / raw)
  To: git
In-Reply-To: <1319846051-462-1-git-send-email-gitster@pobox.com>

Factor out a small logic out of the private write_pack_file() function
in builtin/pack-objects.c.

This changes the order of finishing multi-pack generation slightly. The
code used to

 - adjust shared perm of temporary packfile
 - rename temporary packfile to the final name
 - update mtime of the packfile under the final name
 - adjust shared perm of temporary idxfile
 - rename temporary idxfile to the final name

but because the helper does not want to do the mtime thing, the updated
code does that step first and then all the rest.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/pack-objects.c |   33 ++++++++++-----------------------
 pack-write.c           |   31 +++++++++++++++++++++++++++++++
 pack.h                 |    1 +
 3 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 3258fa9..b458b6d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -617,20 +617,8 @@ static void write_pack_file(void)
 
 		if (!pack_to_stdout) {
 			struct stat st;
-			const char *idx_tmp_name;
 			char tmpname[PATH_MAX];
 
-			idx_tmp_name = write_idx_file(NULL, written_list, nr_written,
-						      &pack_idx_opts, sha1);
-
-			snprintf(tmpname, sizeof(tmpname), "%s-%s.pack",
-				 base_name, sha1_to_hex(sha1));
-			free_pack_by_name(tmpname);
-			if (adjust_shared_perm(pack_tmp_name))
-				die_errno("unable to make temporary pack file readable");
-			if (rename(pack_tmp_name, tmpname))
-				die_errno("unable to rename temporary pack file");
-
 			/*
 			 * Packs are runtime accessed in their mtime
 			 * order since newer packs are more likely to contain
@@ -638,28 +626,27 @@ static void write_pack_file(void)
 			 * packs then we should modify the mtime of later ones
 			 * to preserve this property.
 			 */
-			if (stat(tmpname, &st) < 0) {
+			if (stat(pack_tmp_name, &st) < 0) {
 				warning("failed to stat %s: %s",
-					tmpname, strerror(errno));
+					pack_tmp_name, strerror(errno));
 			} else if (!last_mtime) {
 				last_mtime = st.st_mtime;
 			} else {
 				struct utimbuf utb;
 				utb.actime = st.st_atime;
 				utb.modtime = --last_mtime;
-				if (utime(tmpname, &utb) < 0)
+				if (utime(pack_tmp_name, &utb) < 0)
 					warning("failed utime() on %s: %s",
 						tmpname, strerror(errno));
 			}
 
-			snprintf(tmpname, sizeof(tmpname), "%s-%s.idx",
-				 base_name, sha1_to_hex(sha1));
-			if (adjust_shared_perm(idx_tmp_name))
-				die_errno("unable to make temporary index file readable");
-			if (rename(idx_tmp_name, tmpname))
-				die_errno("unable to rename temporary index file");
-
-			free((void *) idx_tmp_name);
+			/* Enough space for "-<sha-1>.pack"? */
+			if (sizeof(tmpname) <= strlen(base_name) + 50)
+				die("pack base name '%s' too long", base_name);
+			snprintf(tmpname, sizeof(tmpname), "%s-", base_name);
+			finish_tmp_packfile(tmpname, pack_tmp_name,
+					    written_list, nr_written,
+					    &pack_idx_opts, sha1);
 			free(pack_tmp_name);
 			puts(sha1_to_hex(sha1));
 		}
diff --git a/pack-write.c b/pack-write.c
index 863cce8..cadc3e1 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -338,3 +338,34 @@ struct sha1file *create_tmp_packfile(char **pack_tmp_name)
 	*pack_tmp_name = xstrdup(tmpname);
 	return sha1fd(fd, *pack_tmp_name);
 }
+
+void finish_tmp_packfile(char *name_buffer,
+			 const char *pack_tmp_name,
+			 struct pack_idx_entry **written_list,
+			 uint32_t nr_written,
+			 struct pack_idx_option *pack_idx_opts,
+			 unsigned char sha1[])
+{
+	const char *idx_tmp_name;
+	char *end_of_name_prefix = strrchr(name_buffer, 0);
+
+	if (adjust_shared_perm(pack_tmp_name))
+		die_errno("unable to make temporary pack file readable");
+
+	idx_tmp_name = write_idx_file(NULL, written_list, nr_written,
+				      pack_idx_opts, sha1);
+	if (adjust_shared_perm(idx_tmp_name))
+		die_errno("unable to make temporary index file readable");
+
+	sprintf(end_of_name_prefix, "%s.pack", sha1_to_hex(sha1));
+	free_pack_by_name(name_buffer);
+
+	if (rename(pack_tmp_name, name_buffer))
+		die_errno("unable to rename temporary pack file");
+
+	sprintf(end_of_name_prefix, "%s.idx", sha1_to_hex(sha1));
+	if (rename(idx_tmp_name, name_buffer))
+		die_errno("unable to rename temporary index file");
+
+	free((void *)idx_tmp_name);
+}
diff --git a/pack.h b/pack.h
index 0027ac6..cfb0f69 100644
--- a/pack.h
+++ b/pack.h
@@ -86,5 +86,6 @@ extern int encode_in_pack_object_header(enum object_type, uintmax_t, unsigned ch
 extern int read_pack_header(int fd, struct pack_header *);
 
 extern struct sha1file *create_tmp_packfile(char **pack_tmp_name);
+extern void finish_tmp_packfile(char *name_buffer, const char *pack_tmp_name, struct pack_idx_entry **written_list, uint32_t nr_written, struct pack_idx_option *pack_idx_opts, unsigned char sha1[]);
 
 #endif
-- 
1.7.7.1.573.ga40d2

^ permalink raw reply related

* [PATCH 4/4] Bulk check-in
From: Junio C Hamano @ 2011-10-28 23:54 UTC (permalink / raw)
  To: git
In-Reply-To: <1319846051-462-1-git-send-email-gitster@pobox.com>

This extends the earlier approach to stream a large file directly from the
filesystem to its own packfile, and allows "git add" to send large files
directly into a single pack. Older code used to spawn fast-import, but
the new bulk_checkin API replaces it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Makefile         |    2 +
 builtin/add.c    |    5 ++
 bulk-checkin.c   |  159 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 bulk-checkin.h   |   16 ++++++
 sha1_file.c      |   67 ++---------------------
 t/t1050-large.sh |   26 +++++++--
 6 files changed, 206 insertions(+), 69 deletions(-)
 create mode 100644 bulk-checkin.c
 create mode 100644 bulk-checkin.h

diff --git a/Makefile b/Makefile
index 3139c19..418dd2e 100644
--- a/Makefile
+++ b/Makefile
@@ -505,6 +505,7 @@ LIB_H += argv-array.h
 LIB_H += attr.h
 LIB_H += blob.h
 LIB_H += builtin.h
+LIB_H += bulk-checkin.h
 LIB_H += cache.h
 LIB_H += cache-tree.h
 LIB_H += color.h
@@ -591,6 +592,7 @@ LIB_OBJS += base85.o
 LIB_OBJS += bisect.o
 LIB_OBJS += blob.o
 LIB_OBJS += branch.o
+LIB_OBJS += bulk-checkin.o
 LIB_OBJS += bundle.o
 LIB_OBJS += cache-tree.o
 LIB_OBJS += color.o
diff --git a/builtin/add.c b/builtin/add.c
index c59b0c9..1c42900 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -13,6 +13,7 @@
 #include "diff.h"
 #include "diffcore.h"
 #include "revision.h"
+#include "bulk-checkin.h"
 
 static const char * const builtin_add_usage[] = {
 	"git add [options] [--] <filepattern>...",
@@ -458,11 +459,15 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 		free(seen);
 	}
 
+	plug_bulk_checkin();
+
 	exit_status |= add_files_to_cache(prefix, pathspec, flags);
 
 	if (add_new_files)
 		exit_status |= add_files(&dir, flags);
 
+	unplug_bulk_checkin();
+
  finish:
 	if (active_cache_changed) {
 		if (write_cache(newfd, active_cache, active_nr) ||
diff --git a/bulk-checkin.c b/bulk-checkin.c
new file mode 100644
index 0000000..cad7a0b
--- /dev/null
+++ b/bulk-checkin.c
@@ -0,0 +1,159 @@
+/*
+ * Copyright (c) 2011, Google Inc.
+ */
+#include "bulk-checkin.h"
+#include "csum-file.h"
+#include "pack.h"
+
+static int pack_compression_level = Z_DEFAULT_COMPRESSION;
+
+static struct bulk_checkin_state {
+	unsigned plugged:1;
+
+	char *pack_tmp_name;
+	struct sha1file *f;
+	off_t offset;
+	struct pack_idx_option pack_idx_opts;
+
+	struct pack_idx_entry **written;
+	uint32_t alloc_written;
+	uint32_t nr_written;
+} state;
+
+static void finish_bulk_checkin(struct bulk_checkin_state *state)
+{
+	unsigned char sha1[20];
+	char packname[PATH_MAX];
+	int i;
+
+	if (!state->f)
+		return;
+
+	if (state->nr_written == 1) {
+		sha1close(state->f, sha1, CSUM_FSYNC);
+	} else {
+		int fd = sha1close(state->f, sha1, 0);
+		fixup_pack_header_footer(fd, sha1, state->pack_tmp_name,
+					 state->nr_written, sha1,
+					 state->offset);
+		close(fd);
+	}
+
+	sprintf(packname, "%s/pack/pack-", get_object_directory());
+	finish_tmp_packfile(packname, state->pack_tmp_name,
+			    state->written, state->nr_written,
+			    &state->pack_idx_opts, sha1);
+	for (i = 0; i < state->nr_written; i++)
+		free(state->written[i]);
+	free(state->written);
+	memset(state, 0, sizeof(*state));
+
+	/* Make objects we just wrote available to ourselves */
+	reprepare_packed_git();
+}
+
+static void deflate_to_pack(struct bulk_checkin_state *state,
+			    unsigned char sha1[],
+			    int fd, size_t size, enum object_type type,
+			    const char *path, unsigned flags)
+{
+	unsigned char obuf[16384];
+	unsigned hdrlen;
+	git_zstream s;
+	git_SHA_CTX ctx;
+	int write_object = (flags & HASH_WRITE_OBJECT);
+	int status = Z_OK;
+	struct pack_idx_entry *idx = NULL;
+
+	hdrlen = sprintf((char *)obuf, "%s %" PRIuMAX, typename(type), size) + 1;
+	git_SHA1_Init(&ctx);
+	git_SHA1_Update(&ctx, obuf, hdrlen);
+
+	if (write_object) {
+		idx = xcalloc(1, sizeof(*idx));
+		idx->offset = state->offset;
+		crc32_begin(state->f);
+	}
+	memset(&s, 0, sizeof(s));
+	git_deflate_init(&s, pack_compression_level);
+
+	hdrlen = encode_in_pack_object_header(type, size, obuf);
+	s.next_out = obuf + hdrlen;
+	s.avail_out = sizeof(obuf) - hdrlen;
+
+	while (status != Z_STREAM_END) {
+		unsigned char ibuf[16384];
+
+		if (size && !s.avail_in) {
+			ssize_t rsize = size < sizeof(ibuf) ? size : sizeof(ibuf);
+			if (xread(fd, ibuf, rsize) != rsize)
+				die("failed to read %d bytes from '%s'",
+				    (int)rsize, path);
+			git_SHA1_Update(&ctx, ibuf, rsize);
+			s.next_in = ibuf;
+			s.avail_in = rsize;
+			size -= rsize;
+		}
+
+		status = git_deflate(&s, size ? 0 : Z_FINISH);
+
+		if (!s.avail_out || status == Z_STREAM_END) {
+			size_t written = s.next_out - obuf;
+			if (write_object) {
+				sha1write(state->f, obuf, written);
+				state->offset += written;
+			}
+			s.next_out = obuf;
+			s.avail_out = sizeof(obuf);
+		}
+
+		switch (status) {
+		case Z_OK:
+		case Z_BUF_ERROR:
+		case Z_STREAM_END:
+			continue;
+		default:
+			die("unexpected deflate failure: %d", status);
+		}
+	}
+	git_deflate_end(&s);
+	git_SHA1_Final(sha1, &ctx);
+	if (write_object) {
+		idx->crc32 = crc32_end(state->f);
+		hashcpy(idx->sha1, sha1);
+		ALLOC_GROW(state->written,
+			   state->nr_written + 1, state->alloc_written);
+		state->written[state->nr_written++] = idx;
+	}
+}
+
+int index_bulk_checkin(unsigned char *sha1,
+		       int fd, size_t size, enum object_type type,
+		       const char *path, unsigned flags)
+{
+	if (!state.f && (flags & HASH_WRITE_OBJECT)) {
+		state.f = create_tmp_packfile(&state.pack_tmp_name);
+		reset_pack_idx_option(&state.pack_idx_opts);
+		/* Pretend we are going to write only one object */
+		state.offset = write_pack_header(state.f, 1);
+		if (!state.offset)
+			die_errno("unable to write pack header");
+	}
+
+	deflate_to_pack(&state, sha1, fd, size, type, path, flags);
+	if (!state.plugged)
+		finish_bulk_checkin(&state);
+	return 0;
+}
+
+void plug_bulk_checkin(void)
+{
+	state.plugged = 1;
+}
+
+void unplug_bulk_checkin(void)
+{
+	state.plugged = 0;
+	if (state.f)
+		finish_bulk_checkin(&state);
+}
diff --git a/bulk-checkin.h b/bulk-checkin.h
new file mode 100644
index 0000000..4f599f8
--- /dev/null
+++ b/bulk-checkin.h
@@ -0,0 +1,16 @@
+/*
+ * Copyright (c) 2011, Google Inc.
+ */
+#ifndef BULK_CHECKIN_H
+#define BULK_CHECKIN_H
+
+#include "cache.h"
+
+extern int index_bulk_checkin(unsigned char sha1[],
+			      int fd, size_t size, enum object_type type,
+			      const char *path, unsigned flags);
+
+extern void plug_bulk_checkin(void);
+extern void unplug_bulk_checkin(void);
+
+#endif
diff --git a/sha1_file.c b/sha1_file.c
index 27f3b9b..c96e366 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -18,6 +18,7 @@
 #include "refs.h"
 #include "pack-revindex.h"
 #include "sha1-lookup.h"
+#include "bulk-checkin.h"
 
 #ifndef O_NOATIME
 #if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -2679,10 +2680,8 @@ static int index_core(unsigned char *sha1, int fd, size_t size,
 }
 
 /*
- * This creates one packfile per large blob, because the caller
- * immediately wants the result sha1, and fast-import can report the
- * object name via marks mechanism only by closing the created
- * packfile.
+ * This creates one packfile per large blob unless bulk-checkin
+ * machinery is "plugged".
  *
  * This also bypasses the usual "convert-to-git" dance, and that is on
  * purpose. We could write a streaming version of the converting
@@ -2696,65 +2695,7 @@ static int index_stream(unsigned char *sha1, int fd, size_t size,
 			enum object_type type, const char *path,
 			unsigned flags)
 {
-	struct child_process fast_import;
-	char export_marks[512];
-	const char *argv[] = { "fast-import", "--quiet", export_marks, NULL };
-	char tmpfile[512];
-	char fast_import_cmd[512];
-	char buf[512];
-	int len, tmpfd;
-
-	strcpy(tmpfile, git_path("hashstream_XXXXXX"));
-	tmpfd = git_mkstemp_mode(tmpfile, 0600);
-	if (tmpfd < 0)
-		die_errno("cannot create tempfile: %s", tmpfile);
-	if (close(tmpfd))
-		die_errno("cannot close tempfile: %s", tmpfile);
-	sprintf(export_marks, "--export-marks=%s", tmpfile);
-
-	memset(&fast_import, 0, sizeof(fast_import));
-	fast_import.in = -1;
-	fast_import.argv = argv;
-	fast_import.git_cmd = 1;
-	if (start_command(&fast_import))
-		die_errno("index-stream: git fast-import failed");
-
-	len = sprintf(fast_import_cmd, "blob\nmark :1\ndata %lu\n",
-		      (unsigned long) size);
-	write_or_whine(fast_import.in, fast_import_cmd, len,
-		       "index-stream: feeding fast-import");
-	while (size) {
-		char buf[10240];
-		size_t sz = size < sizeof(buf) ? size : sizeof(buf);
-		ssize_t actual;
-
-		actual = read_in_full(fd, buf, sz);
-		if (actual < 0)
-			die_errno("index-stream: reading input");
-		if (write_in_full(fast_import.in, buf, actual) != actual)
-			die_errno("index-stream: feeding fast-import");
-		size -= actual;
-	}
-	if (close(fast_import.in))
-		die_errno("index-stream: closing fast-import");
-	if (finish_command(&fast_import))
-		die_errno("index-stream: finishing fast-import");
-
-	tmpfd = open(tmpfile, O_RDONLY);
-	if (tmpfd < 0)
-		die_errno("index-stream: cannot open fast-import mark");
-	len = read(tmpfd, buf, sizeof(buf));
-	if (len < 0)
-		die_errno("index-stream: reading fast-import mark");
-	if (close(tmpfd) < 0)
-		die_errno("index-stream: closing fast-import mark");
-	if (unlink(tmpfile))
-		die_errno("index-stream: unlinking fast-import mark");
-	if (len != 44 ||
-	    memcmp(":1 ", buf, 3) ||
-	    get_sha1_hex(buf + 3, sha1))
-		die_errno("index-stream: unexpected fast-import mark: <%s>", buf);
-	return 0;
+	return index_bulk_checkin(sha1, fd, size, type, path, flags);
 }
 
 int index_fd(unsigned char *sha1, int fd, struct stat *st,
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index deba111..36def25 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -7,14 +7,28 @@ test_description='adding and checking out large blobs'
 
 test_expect_success setup '
 	git config core.bigfilethreshold 200k &&
-	echo X | dd of=large bs=1k seek=2000
+	echo X | dd of=large bs=1k seek=2000 &&
+	echo Y | dd of=huge bs=1k seek=2500
 '
 
-test_expect_success 'add a large file' '
-	git add large &&
-	# make sure we got a packfile and no loose objects
-	test -f .git/objects/pack/pack-*.pack &&
-	test ! -f .git/objects/??/??????????????????????????????????????
+test_expect_success 'add a large file or two' '
+	git add large huge &&
+	# make sure we got a single packfile and no loose objects
+	bad= count=0 &&
+	for p in .git/objects/pack/pack-*.pack
+	do
+		count=$(( $count + 1 ))
+		test -f "$p" && continue
+		bad=t
+	done &&
+	test -z "$bad" &&
+	test $count = 1 &&
+	for l in .git/objects/??/??????????????????????????????????????
+	do
+		test -f "$l" || continue
+		bad=t
+	done &&
+	test -z "$bad"
 '
 
 test_expect_success 'checkout a large file' '
-- 
1.7.7.1.573.ga40d2

^ permalink raw reply related

* sparse checkout using exclusions
From: Eric Raible @ 2011-10-29  0:17 UTC (permalink / raw)
  To: git@vger.kernel.org

Hi all.

I was just about to send a long message about using exclusions
in sparse-checkout, when I did one last search and saw that all
of my problems were fixed by using '/*' instead of '*' as the
first line in .git/info/sparse-checkout.

Might it make sense for the example in git-read-tree.html to be
updated to include the leading slash?

    /*
    !unwanted

- Eric

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox