Git development
 help / color / mirror / Atom feed
* Re: [PATCH 2/3] do not write null sha1s to on-disk index
From: Jonathan Nieder @ 2012-12-29 10:34 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jens Lehmann, Junio C Hamano
In-Reply-To: <20121229102707.GA26730@sigill.intra.peff.net>

Jeff King wrote:

> I can't reproduce here. I can checkout v3.2.35, and I guess that the
> patch you are applying comes from f5f1654, but I don't know your
> local modification to sound/usb/midi.c.

No local modification.  The unstaged change after "git am --abort" to
recover from a conflicted git am is a longstanding bug (at least a
couple of years old).

The patch creating conflicts is

	queue-3.2/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch

from git://git.kernel.org/pub/scm/linux/kernel/git/bwh/linux-3.2.y-queue.git

[...]
>>   $ git am --abort
>>   Unstaged changes after reset:
>>   M       sound/usb/midi.c
>
> What does your index look like afterwards? Does it have a null sha1 in
> it (check "ls-files -s")?

$ git diff-index --abbrev HEAD
:100644 100644 eeefbce3873c... 000000000000... M	sound/usb/midi.c
$ git ls-files --abbrev -s sound/usb/midi.c
100644 eeefbce3873c 0	sound/usb/midi.c

^ permalink raw reply

* Re: [PATCH 2/3] do not write null sha1s to on-disk index
From: Jonathan Nieder @ 2012-12-29 10:40 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jens Lehmann, Junio C Hamano
In-Reply-To: <20121229102707.GA26730@sigill.intra.peff.net>

Jeff King wrote:

> Can you give more details?

$ GIT_TRACE=1 git am --abort
trace: exec: 'git-am' '--abort'
trace: run_command: 'git-am' '--abort'
trace: built-in: git 'rev-parse' '--parseopt' '--' '--abort'
trace: built-in: git 'rev-parse' '--git-dir'
trace: built-in: git 'rev-parse' '--show-prefix'
trace: built-in: git 'rev-parse' '--show-toplevel'
trace: built-in: git 'var' 'GIT_COMMITTER_IDENT'
trace: built-in: git 'rev-parse' '--verify' '-q' 'HEAD'
trace: built-in: git 'config' '--bool' '--get' 'am.keepcr'
trace: built-in: git 'rerere' 'clear'
trace: built-in: git 'rev-parse' '--verify' '-q' 'HEAD'
trace: built-in: git 'read-tree' '--reset' '-u' 'HEAD' 'ORIG_HEAD'
error: cache entry has null sha1: sound/usb/midi.c
fatal: unable to write new index file
trace: built-in: git 'reset' 'ORIG_HEAD'
Unstaged changes after reset:
M       sound/usb/midi.c

^ permalink raw reply

* Re: [PATCH 2/3] do not write null sha1s to on-disk index
From: Jeff King @ 2012-12-29 10:42 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Jens Lehmann, Junio C Hamano
In-Reply-To: <20121229103430.GG18903@elie.Belkin>

On Sat, Dec 29, 2012 at 02:34:30AM -0800, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > I can't reproduce here. I can checkout v3.2.35, and I guess that the
> > patch you are applying comes from f5f1654, but I don't know your
> > local modification to sound/usb/midi.c.
> 
> No local modification.  The unstaged change after "git am --abort" to
> recover from a conflicted git am is a longstanding bug (at least a
> couple of years old).
> 
> The patch creating conflicts is
> 
> 	queue-3.2/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch
> 
> from git://git.kernel.org/pub/scm/linux/kernel/git/bwh/linux-3.2.y-queue.git

Hrm. But your output does not say there is a conflict. It says you have
a local modification and it does not try the merge:

> $ git am -3sc /tmp/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch
> Applying: ALSA: usb-audio: Fix missing autopm for MIDI input
> Using index info to reconstruct a base tree...
> Falling back to patching base and 3-way merge...
> error: Your local changes to the following files would be overwritten by merge:
>       sound/usb/midi.c
> Please, commit your changes or stash them before you can merge.
> Aborting

If I try to apply it, I get a real conflict:

  $ git checkout v3.2.35
  $ git am -3sc linux-3.2.y-queue/queue-3.2/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch
  Applying: ALSA: usb-audio: Fix missing autopm for MIDI input
  Using index info to reconstruct a base tree...
  M       sound/usb/midi.c
  Falling back to patching base and 3-way merge...
  Auto-merging sound/usb/midi.c
  CONFLICT (content): Merge conflict in sound/usb/midi.c

Although running "git am --abort" after that does seem to produce the
"cache entry has null sha1" error. So I can start investigating from
there.

-Peff

^ permalink raw reply

* Re: [PATCH 2/3] do not write null sha1s to on-disk index
From: Jonathan Nieder @ 2012-12-29 10:51 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jens Lehmann, Junio C Hamano
In-Reply-To: <20121229104247.GA30283@sigill.intra.peff.net>

Jeff King wrote:

> Hrm. But your output does not say there is a conflict. It says you have
> a local modification and it does not try the merge:

That's probably operator error on my part when gathering output to
paste into the email.

In other words, nothing to see there. :)  Sorry for the confusion.

[...]
> If I try to apply it, I get a real conflict:
[...]
> Although running "git am --abort" after that does seem to produce the
> "cache entry has null sha1" error.

Yep, that is what my report should have said.

'night,
Jonathan

^ permalink raw reply

* Re: [PATCH 2/3] do not write null sha1s to on-disk index
From: Jeff King @ 2012-12-29 11:05 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Jens Lehmann, Junio C Hamano
In-Reply-To: <20121229103430.GG18903@elie.Belkin>

On Sat, Dec 29, 2012 at 02:34:30AM -0800, Jonathan Nieder wrote:

> >>   $ git am --abort
> >>   Unstaged changes after reset:
> >>   M       sound/usb/midi.c
> >
> > What does your index look like afterwards? Does it have a null sha1 in
> > it (check "ls-files -s")?
> 
> $ git diff-index --abbrev HEAD
> :100644 100644 eeefbce3873c... 000000000000... M	sound/usb/midi.c
> $ git ls-files --abbrev -s sound/usb/midi.c
> 100644 eeefbce3873c 0	sound/usb/midi.c

Hmm. It looks like "am --abort" overwrites the index again after the
read-tree which complains.  If I downgrade the error in write_index to a
warning, like this:

diff --git a/read-cache.c b/read-cache.c
index fda78bc..70a6d86 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1797,7 +1797,7 @@ int write_index(struct index_state *istate, int newfd)
 		if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce))
 			ce_smudge_racily_clean_entry(ce);
 		if (is_null_sha1(ce->sha1))
-			return error("cache entry has null sha1: %s", ce->name);
+			warning("cache entry has null sha1: %s", ce->name);
 		if (ce_write_entry(&c, newfd, ce, previous_name) < 0)
 			return -1;
 	}

and then just run this:

  [clear state from last run]
  $ rm -rf .git/rebase-apply
  $ git reset --hard

  [apply the patch; we get a conflict]
  $ git am -3sc queue-3.2/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch

  [now run just the read-tree from "am --abort"]
  $ git.compile read-tree --reset -u HEAD ORIG_HEAD
  warning: cache entry has null sha1: sound/usb/midi.c

  [and now check our index]
  $ git ls-files -s sound/usb/midi.c
  100644 0000000000000000000000000000000000000000 0 sound/usb/midi.c

  [yes, this index is bogus]
  $ git write-tree
  error: invalid object 100644 0000000000000000000000000000000000000000 for 'sound/usb/midi.c'
  fatal: git-write-tree: error building trees

So I think this check may actually be finding a real bug. I have seen
these null sha1s in the wild, but I was never able to track down the
actual cause. Maybe this will give us a clue. Now we just need to work
backwards and figure out who is putting it in the in-memory index and
why.

I'll try to work on it tomorrow, but please don't let that stop you if
you want to keep digging in the meantime.

-Peff

^ permalink raw reply related

* Re: [PATCH 0/4] pre-push hook support
From: Aaron Schrab @ 2012-12-29 14:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v1ue9hb06.fsf@alter.siamese.dyndns.org>

At 18:01 -0800 28 Dec 2012, Junio C Hamano <gitster@pobox.com> wrote:
>One lesson we learned long time ago while doing hooks is to avoid
>unbound number of command line arguments and instead feed them from
>the standard input.  I think this should do the same.

Good point.  I had been trying to keep the interface for this hook as 
close as possible to the ones for other client-side hooks on the theory 
that less development effort may go into those than for server-side 
hooks.  But thinking on that more I certainly see that this could easily 
run into limits on argument length on some systems; especially when it's 
likely that each of those arguments is likely to be over 100 bytes long.

I'll work on an updated version which sends the variable length 
information over a pipe, using the command-line arguments only to pass 
the remote name and URL.

>How does the hook communicate its decision to the calling Git?
>
>Will it be "all-or-none", or "I'll allow these but not those"?

Currently it just uses the exit code to communicate that back, so it's 
all-or-none.  I think I'll keep that in the updated version as well.

A future enhancement could modify the protocol to support reading from 
the hook's stdout the names of remote refs which are to be rejected, I 
think that just having the option for all-or-nothing is a good starting 
point.

^ permalink raw reply

* Re: [PATCH 1/4] hooks: Add function to check if a hook exists
From: Aaron Schrab @ 2012-12-29 14:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vwqw1fw5a.fsf@alter.siamese.dyndns.org>

At 18:08 -0800 28 Dec 2012, Junio C Hamano <gitster@pobox.com> wrote:
>Aaron Schrab <aaron@schrab.com> writes:
>
>> Create find_hook() function to determine if a given hook exists and is
>> executable.  If it is the path to the script will be returned, otherwise
>> NULL is returned.
>
>Sounds like a sensible thing to do.  To make sure the API is also
>sensible, all the existing hooks should be updated to use this API,
>no?

I'd been trying to keep the changes limited.  I'll see about modifying 
the existing places that run hooks in v2 of the series.

>> This is in support for an upcoming run_hook_argv() function which will
>> expect the full path to the hook script as the first element in the
>> argv_array.
>
>There is currently a public function called run_hook() that squats
>on the good name with a kludgy API that is too specific to using
>separate index file.  Back when it was a private helper in the
>implementation of "git commit", it was perfectly fine, but it was
>exported without giving much thought on the API.
>
>If you are introducing a new run_hook_* function, give it a generic
>enough API that lets all the existing hook callers to use it.  I
>would imagine that the API requirement may be modelled after
>run_command() API so that we can pass argv[] and tweak the hook's
>environ[], as well as feeding its stdin and possibly reading from
>its stdout.  That would be very useful.

I think the attraction of the run_hook() API is its simplicity.  It's 
currently a fairly thin wrapper around the run_command() API.  I suspect 
that if the run_hook() API were made generic enough to support all of 
the existing hook callers it would greatly complicate the existing calls 
to run_hook() while not providing much benefit to hook callers which 
can't currently use it beyond what run_command() offers.

Since I'm going to be changing the interface for this hook in v2 of the 
series so that it will be more complicated than can be readily addressed 
with the run_hook() API (and will have use a fixed number of arguments 
anyway) I'll be dropping the run_hook_argv() function.

^ permalink raw reply

* Re: Hold your fire, please
From: Adam Spiers @ 2012-12-29 15:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vehi9hgz3.fsf@alter.siamese.dyndns.org>

On Fri, Dec 28, 2012 at 11:52 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Primarily in order to force me concentrate on the releng for the
>> upcoming release, and also to encourage contributors to focus on
>> finding and fixing any last minute regressions (rather than
>> distracting others by showing publicly scratching their itches), I
>> won't be queuing any patch that is not a regression fix, at least
>> for the next few days.  I may not even review them.
>>
>> Thanks.
>
> Just as a friendly reminder, I am reposting this to remind people.

Ah, I thought this period had elapsed already.  I assume this applies
to check-ignore, in which case how long should I hold off before
submitting v4?

>> And have a nice holiday if you are in areas where it is a holiday
>> season ;-)

You too :-)

^ permalink raw reply

* Re: [PATCH 0/4] pre-push hook support
From: Junio C Hamano @ 2012-12-29 16:48 UTC (permalink / raw)
  To: Aaron Schrab; +Cc: git
In-Reply-To: <20121229145025.GA3789@pug.qqx.org>

Aaron Schrab <aaron@schrab.com> writes:

> At 18:01 -0800 28 Dec 2012, Junio C Hamano <gitster@pobox.com> wrote:
>>Will it be "all-or-none", or "I'll allow these but not those"?
>
> Currently it just uses the exit code to communicate that back, so it's
> all-or-none.  I think I'll keep that in the updated version as well.

Thanks; that sounds sensible.

^ permalink raw reply

* Re: [PATCH 1/4] hooks: Add function to check if a hook exists
From: Junio C Hamano @ 2012-12-29 16:54 UTC (permalink / raw)
  To: Aaron Schrab; +Cc: git
In-Reply-To: <20121229145032.GB3789@pug.qqx.org>

Aaron Schrab <aaron@schrab.com> writes:

> Since I'm going to be changing the interface for this hook in v2 of
> the series so that it will be more complicated than can be readily
> addressed with the run_hook() API (and will have use a fixed number of
> arguments anyway) I'll be dropping the run_hook_argv() function.

Just to make sure there is no misunderstanding (sorry for sending
the message without finishing it with this clarification at the end
in the first place).  I didn't mean that converting all of the
existing callers must come earlier than introducing a new hook
invoker.

I just wanted to make sure that we are aware that we are adding to
our technical debt, if we are adding another that is also
specialized; as the proposed interface looked sufficiently generic,
it would be the ideal one to make _other_ ones thin wrappers around
it to unify the various codepaths.

Thanks.

^ permalink raw reply

* Fail to push over HTTP with MySQL authentication (Apache2)
From: Davide Baldini @ 2012-12-29 19:54 UTC (permalink / raw)
  To: git

Hi,

I'm not able to setup a public Git repository over plain HTTP with
MySQL authentication.
Both HTTP and authentication are provided by Apache2.

SETUP:
-----

This setup is performed on Debian 6.0.4.

Apache2 (v. 2.2), with modules:
    auth_mysql
    WebDAV

Git (v. 1.7.8.3)
Git repository location:
    local, for webserver: /var/www/public/GT_rulesets/GT00.git
    public, for Git:      http://greatturn.org:8081/GT00.git

Git repository has been configured as:
    cd /var/www/public/GT_rulesets/GT00.git
    git init --bare
    mv hooks/post-update.sample hooks/post-update
    git update server-info
    chmode 777 /var/www/public/GT_rulesets/GT00.git  # for testing.


FACTS:
-----

The Apache side of my setup seems to work:
_   HTTP, MySQL authentication:
        I point Iceweasel to http://greatturn.org:8081/ .
        It asks for authentication; I authenticate with a username/
        password pair taken from MySQL database (which doesn't exist as
        a system user); It works, and I can see all the content of
        the git repository "GT00.git".
_   WebDAV:
        I point Konqueror to webdav://greatturn.org:8081/ .
        Works exactly as previous point.
_   Git:
        Git can fetch the repository without problems:
        git clone http://username:password@greatturn.org:8081/GT00.git

Pushing the locally fetched repository back to the remote one doesn't
work:
    "git push http://greatturn.org:8081/GT00.git master"
    asks for username and password:
        > Username for 'greatturn.org:8081':
        > Password for 'greatturn.org:8081':

    I enter my credentials, then git outputs the following and exits:
        > error: Cannot access URL http://greatturn.org:8081/GT00.git/,
return code 22
        > fatal: git-http-push failed

    On Apache's access.log, git produces all and no more than the
    following:
        > 87.19.240.177 - - [29/Dec/2012:16:43:22 +0100] "GET /GT00.git
/info/refs?service=git-receive-pack HTTP/1.1" 401 767 "-"
"git/1.7.8.3"
        > 87.19.240.177 - - [29/Dec/2012:16:43:26 +0100] "GET
/GT00.git/info/refs?service=git-receive-pack HTTP/1.1" 401 767 "-"
"git/1.7.8.3"
        > 87.19.240.177 - davide [29/Dec/2012:16:43:26 +0100] "GET
/GT00.git/info/refs?service=git-receive-pack HTTP/1.1" 200 233 "-"
"git/1.7.8.3"
        > 87.19.240.177 - davide [29/Dec/2012:16:43:26 +0100] "GET
/GT00.git/HEAD HTTP/1.1" 200 258 "-" "git/1.7.8.3"
        > 87.19.240.177 - - [29/Dec/2012:16:43:26 +0100] "PROPFIND
/GT00.git/ HTTP/1.1" 401 767 "-" "git/1.7.8.3"

^ permalink raw reply

* [BUG] two-way read-tree can write null sha1s into index
From: Jeff King @ 2012-12-29 20:51 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Junio C Hamano
In-Reply-To: <20121229110541.GA1408@sigill.intra.peff.net>

On Sat, Dec 29, 2012 at 06:05:41AM -0500, Jeff King wrote:

>   [clear state from last run]
>   $ rm -rf .git/rebase-apply
>   $ git reset --hard
> 
>   [apply the patch; we get a conflict]
>   $ git am -3sc queue-3.2/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch
> 
>   [now run just the read-tree from "am --abort"]
>   $ git.compile read-tree --reset -u HEAD ORIG_HEAD
>   warning: cache entry has null sha1: sound/usb/midi.c
> 
>   [and now check our index]
>   $ git ls-files -s sound/usb/midi.c
>   100644 0000000000000000000000000000000000000000 0 sound/usb/midi.c
> 
>   [yes, this index is bogus]
>   $ git write-tree
>   error: invalid object 100644 0000000000000000000000000000000000000000 for 'sound/usb/midi.c'
>   fatal: git-write-tree: error building trees
> 
> So I think this check may actually be finding a real bug. I have seen
> these null sha1s in the wild, but I was never able to track down the
> actual cause. Maybe this will give us a clue. Now we just need to work
> backwards and figure out who is putting it in the in-memory index and
> why.

I made some progress on this, but I'd like a sanity check from others
(especially Junio). As far as I can tell, this is a bug in read-tree.

When we call "read-tree --reset -u HEAD ORIG_HEAD", the first thing we
do with the index is call read_cache_unmerged. Originally that would
read the index, leaving aside any unmerged entries. However, as of
d1a43f2 (reset --hard/read-tree --reset -u: remove unmerged new paths,
2008-10-15), it actually creates a new cache entry. This serves as a
placeholder, so that we later know to update the working tree.

However, we later noticed that the sha1 of that unmerged entry was
just copied from some higher stage, leaving you with random content in
the index.  That was fixed by e11d7b5 ("reset --merge": fix unmerged
case, 2009-12-31), which instead puts the null sha1 into the newly
created entry, and sets a CE_CONFLICTED flag. At the same time, it
teaches the unpack-trees machinery to pay attention to this flag, so
that oneway_merge throws away the current value.

However, it did not update the code paths for  twoway_merge, which is
where we end up in the read-tree above. We notice that the HEAD and
ORIG_HEAD versions are the same, and say "oh, we can just reuse the
current version". But that's not true. The current version is bogus.

So I think we need to update twoway_merge to recognize unmerged entries,
which gives us two options:

  1. Reject the merge.

  2. Throw away the current unmerged entry in favor of the "new" entry
     (when old and new are the same, of course; otherwise we would
     reject).

I think (2) is the right thing. It fixes the entry of the bogus sha1
into the index, _and_ it solves the problem that "git am --abort" leaves
the conflicted entry as a modification. It should just go away. But
maybe I am forgetting some other case where read-tree should be more
conservative, and (1) is a safer choice.

Something like this patch:

diff --git a/unpack-trees.c b/unpack-trees.c
index 6d96366..e06e01f 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1746,14 +1746,19 @@ int twoway_merge(struct cache_entry **src, struct unpack_trees_options *o)
 		newtree = NULL;
 
 	if (current) {
-		if ((!oldtree && !newtree) || /* 4 and 5 */
-		    (!oldtree && newtree &&
-		     same(current, newtree)) || /* 6 and 7 */
-		    (oldtree && newtree &&
-		     same(oldtree, newtree)) || /* 14 and 15 */
-		    (oldtree && newtree &&
-		     !same(oldtree, newtree) && /* 18 and 19 */
-		     same(current, newtree))) {
+		if (current->ce_flags & CE_CONFLICTED) {
+			if (same(oldtree, newtree))
+				return merged_entry(newtree, current, o);
+			return o->gently ? -1 : reject_merge(current, o);
+		}
+		else if ((!oldtree && !newtree) || /* 4 and 5 */
+			 (!oldtree && newtree &&
+			  same(current, newtree)) || /* 6 and 7 */
+			 (oldtree && newtree &&
+			  same(oldtree, newtree)) || /* 14 and 15 */
+			 (oldtree && newtree &&
+			  !same(oldtree, newtree) && /* 18 and 19 */
+			  same(current, newtree))) {
 			return keep_entry(current, o);
 		}
 		else if (oldtree && !newtree && same(current, oldtree)) {

I suspect threeway_merge may need a similar update, but I haven't looked
too carefully yet.

-Peff

^ permalink raw reply related

* Re: Fail to push over HTTP with MySQL authentication (Apache2)
From: Jeff King @ 2012-12-29 21:01 UTC (permalink / raw)
  To: Davide Baldini; +Cc: git
In-Reply-To: <50DF4A78.5000206@gmail.com>

On Sat, Dec 29, 2012 at 08:54:32PM +0100, Davide Baldini wrote:

> SETUP:
> -----
> [...]
> Git repository has been configured as:
>     cd /var/www/public/GT_rulesets/GT00.git
>     git init --bare
>     mv hooks/post-update.sample hooks/post-update
>     git update server-info
>     chmode 777 /var/www/public/GT_rulesets/GT00.git  # for testing.

Should this last line be a "chmod -R"? Git init will create many
subdirectories, and you want to make sure they are all writable for
push.

> _   Git:
>         Git can fetch the repository without problems:
>         git clone http://username:password@greatturn.org:8081/GT00.git
> 
> Pushing the locally fetched repository back to the remote one doesn't
> work:
> [...]
> > 87.19.240.177 - - [29/Dec/2012:16:43:26 +0100] "PROPFIND /GT00.git/ HTTP/1.1" 401 767 "-" "git/1.7.8.3"

If fetch is working and push is not, I'd suspect WebDAV configuration
problems (and indeed, your credentials seem fine, but the PROPFIND is
returning a 401). Fetch works over stock HTTP and does not use WebDAV at
all. There is some documentation on setting up DAV here:

  https://github.com/git/git/blob/master/Documentation/howto/setup-git-server-over-http.txt

but I have no idea if it is up-to-date or not.

However, before trying to investigate that avenue, have you considered
using git's smart-http backend instead of WebDAV? It's significantly
more efficient. You can get details and example apache configuration
from "git help http-backend".

-Peff

^ permalink raw reply

* Re: Lockless Refs?
From: Martin Fick @ 2012-12-29 21:15 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: Michael Haggerty, git, Shawn Pearce
In-Reply-To: <20121229081657.GE15408@sigill.intra.peff.net>



Jeff King <peff@peff.net> wrote:

>On Fri, Dec 28, 2012 at 09:15:52AM -0800, Junio C Hamano wrote:
>
>> Martin Fick <mfick@codeaurora.org> writes:
>> 
>> > Hmm, actually I believe that with a small modification to the 
>> > semantics described here it would be possible to make multi 
>> > repo/branch commits work....
>> >
>> > Shawn talked about adding multi repo/branch transaction 
>> > semantics to jgit, this might be something that git wants to 
>> > support also at some point?
>> 
>> Shawn may have talked about it and you may have listened to it, but
>> others wouldn't have any idea what kind of "multi repo/branch
>> transaction" you are talking about.  Is it about "I want to push
>> this ref to that repo and push this other ref to that other repo",
>> in what situation will it be used/useful, what are the failure
>> modes, what are failure tolerances by the expected use cases, ...?
>> 
>> Care to explain?
>
>I cannot speak for Martin, but I am assuming the point is to atomically
>update 2 (or more) refs on the same repo. That is, if I have a branch
>"refs/heads/foo" and a ref pointing to meta-information (say, notes
>about commits in foo, in "refs/notes/meta/foo"), I would want to "git
>push" them, and only update them if _both_ will succeed, and otherwise
>fail and update nothing.

My use case was cross repo/branch dependencies in Gerrit (which do not yet exist). Users want to be able to define several changes (destined for different project/branches) which can only be merged together.  If one change cannot be merged, the others should fail too.  The solutions we can think of generally need to hold ref locks while acquiring more ref locks, this drastically increases the opportunities for stale locks over the simple "lock, check, update, unlock" mode which git locks are currently used for.

I was perhaps making too big of a leap to assume that there would be other non Gerrit uses cases for this?  I assumed that other git projects which are spread across several git repos would need this? But maybe this simply wouldn't be practical with other git server solutions?

-Martin

Employee of Qualcomm Innovation Center,Inc. which is a member of Code Aurora Forum

^ permalink raw reply

* Re: Lockless Refs?  (Was [PATCH] refs: do not use cached refs in repack_without_ref)
From: Martin Fick @ 2012-12-29 21:29 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Haggerty, git, Junio C Hamano, Shawn Pearce
In-Reply-To: <20121229081200.GD15408@sigill.intra.peff.net>

Jeff King <peff@peff.net> wrote:

>On Fri, Dec 28, 2012 at 07:50:14AM -0700, Martin Fick wrote:
>
>> Hmm, actually I believe that with a small modification to the 
>> semantics described here it would be possible to make multi 
>> repo/branch commits work.   Simply allow the ref filename to 
>> be locked by a transaction by appending the transaction ID to 
>> the filename.  So if transaction 123 wants to lock master 
>> which points currently to abcde, then it will move 
>> master/abcde to master/abcde_123.  If transaction 123 is 
>> designed so that any process can commit/complete/abort it 
>> without requiring any locks which can go stale, then this ref 
>> lock will never go stale either (easy as long as it writes 
>> all its proposed updates somewhere upfront and has atomic 
>> semantics for starting, committing and aborting).  On commit, 
>> the ref lock gets updated to its new value: master/newsha and 
>> on abort it gets unlocked: master/abcde.
>
>Hmm. I thought our goal was to avoid locks? Isn't this just locking by
>another name?

It is a lock, but it is a lock with an owner: the transaction.  If the transaction has reliable recovery semantics, then the lock will be recoverable also.  This is possible if we have lock ownership (the transaction) which does not exist today for the ref locks.  With good lock ownership we gain the ability to reliably delete locks for a specific owner without the risk of deleting the lock when held by another owner (putting the owner in the filename is "good", while putting the owner in the filecontents is not).   Lastly, for reliable recovery of stale locks we need the ability to determine when an owner has abandoned a lock.  I believe that the transaction semantics laid out below give this.


>I guess your point is to have no locks in the "normal" case, and have
>locked transactions as an optional add-on?

Basically.  If we design the transaction into the git semantics we could ensure that it is recoverable and we should not need to expose these reflocks outside of the transaction APIs.

To illustrate a simple transaction approach (borrowing some of Shawn's ideas), we could designate a directory to hold transaction files *1.  To prepare a transaction: write a list of repo:ref:oldvalue:newvalue to a file named id.new (in a stable sorted order based on repo:ref to prevent deadlocks).  This is not a state change and thus this file could be deleted by any process at anytime (preferably after a long grace period).

If file renames are atomic on the filesystem holding the transaction files then 1, 2, 3 below will be atomic state changes.  It does not matter who performs state transitions 2 or 3.  It does not matter who implements the work following any of the 3 transitions, many processes could attempt the work in parallel (so could a human).
 
1) To start the transaction, rename the id.new file to id.  If the rename fails, start over if desired/still possible.  On success, ref locks for each entry should be acquired in listed order (to prevent deadlocks), using transaction id and oldvalue.  It is never legal to unlock a ref in this state (because a block could cause the unlock to be delayed until the commit phase).  However, it is legal for any process to transition to abort at any time from this state, perhaps because of a failure to acquire a lock (held by another transaction), and definitely if a ref has changed (is no longer oldvalue).

2) To abort the transaction, rename the id file to id.abort.  This should only ever fail if commit was achieved first.  Once in this state, any process may/should unlock any ref locks belonging to this transaction id.  Once all refs are unlocked, id.abort may be deleted (it could be deleted earlier, but then cleanup will take longer).

3) To commit the transaction, rename the file to id.commit.  This should only ever fail if abort was achieved first. This transition should never be done until every listed ref is locked by the current transaction id.  Once in this phase, all refs may/should be moved to their new values and unlocked by any process. Once all refs are unlocked, id.commit may be deleted. 

Since any process attempting any of the work in these transactions could block at any time for an indefinite amount of time, these processes may wake after the transaction is aborted or comitted and the transaction files are cleaned up.  I believe that in these cases the only actions which could succeed by these waking processes is the ref locking action.  All such abandoned ref locks may/should be unlocked by any process.  This last rule means that no transaction ids should ever be reused,

-Martin


*1 We may want to adapt the simple model illustrated above to use git mechanisms such as refs to hold transaction info instead of files in a directory, and git submodule files to hold the list of refs to update.  

Employee of Qualcomm Innovation Center,Inc. which is a member of Code Aurora Forum

^ permalink raw reply

* Re: Lockless Refs?  (Was [PATCH] refs: do not use cached refs in repack_without_ref)
From: Martin Fick @ 2012-12-29 22:18 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Haggerty, git, Junio C Hamano
In-Reply-To: <20121229081021.GC15408@sigill.intra.peff.net>

Jeff King <peff@peff.net> wrote:

>On Thu, Dec 27, 2012 at 04:11:51PM -0700, Martin Fick wrote:
>> My idea is based on using filenames to store sha1s instead of 
>> file contents.  To do this, the sha1 one of a ref would be 
>> stored in a file in a directory named after the loose ref.  I 
>> believe this would then make it possible to have lockless 
>> atomic ref updates by renaming the file.
>> 
>> To more fully illustrate the idea, imagine that any file 
>> (except for the null file) in the directory will represent the 
>> value of the ref with its name, then the following 
>> transitions can represent atomic state changes to a refs 
>> value and existence:
>
>Hmm. So basically you are relying on atomic rename() to move the value
>around within a directory, rather than using write to move it around
>within a file. Atomic rename is usually something we have on local
>filesystems (and I think we rely on it elsewhere). Though I would not
>be
>surprised if it is not atomic on all networked filesystems (though it
>is
>on NFS, at least).

Yes.  I assume this is OK because doesn't git already rely on atomic renames?  For example to rename the new packed-refs file to unlock it?

...

>> 3) To create a ref, it must be renamed from the null file (sha 
>> 0000...) to the new value just as if it were being updated 
>> from any other value, but there is one extra condition: 
>> before renaming the null file, a full directory scan must be 
>> done to ensure that the null file is the only file in the 
>> directory (this condition exists because creating the 
>> directory and null file cannot be atomic unless the filesystem 
>> supports atomic directory renames, an expectation git does 
>> not currently make).  I am not sure how this compares to 
>> today's approach, but including the setup costs (described 
>> below), I suspect it is slower.
>
>Hmm. mkdir is atomic. So wouldn't it be sufficient to just mkdir and
>create the correct sha1 file?

But then a process could mkdir and die leaving a stale empty dir with no reliable recovery mechanism.


Unfortunately, I think I see another flaw though! :( I should have known that I cannot separate an important check from its state transitioning action.  The following could happen:

 A does mkdir
 A creates null file
 A checks dir -> no other files 
 B checks dir -> no other files
 A renames null file to abcd
 C creates second null file 
 B renames second null file to defg

One way to fix this is to rely on directory renames, but I believe this is something git does not want to require of every FS? If we did, we could Change #3 to be:

3) To create a ref, it must be renamed from the null file (sha 0000...) to the new value just as if it were being updated from any other value. (No more scan)

Then, with reliable directory renames, a process could do what you suggested to a temporary directory, mkdir + create null file, then rename the temporary dir to refname.  This would prevent duplicate null files.  With a grace period, the temporary dirs could be cleaned up in case a process dies before the rename.  This is your approach with reliable recovery.


>> I don't know how this new scheme could be made to work with 
>> the current scheme, it seems like perhaps new git releases 
>> could be made to understand both the old and the new, and a 
>> config option could be used to tell it which method to write 
>> new refs with.  Since in this new scheme ref directory names 
>> would conflict with old ref filenames, this would likely 
>> prevent both schemes from erroneously being used 
>> simultaneously (so they shouldn't corrupt each other), except 
>> for the fact that refs can be nested in directories which 
>> confuses things a bit.  I am not sure what a good solution to 
>> this is?
>
>I think you would need to bump core.repositoryformatversion, and just
>never let old versions of git access the repository directly. Not the
>end of the world, but it certainly increases deployment effort. If we
>were going to do that, it would probably make sense to think about
>solving the D/F conflict issues at the same time (i.e., start calling
>"refs/heads/foo" in the filesystem "refs.d/heads.d/foo.ref" so that it
>cannot conflict with "refs.d/heads.d/foo.d/bar.ref").

Wouldn't you want to use a non legal ref character instead of dot? And without locks, we free up more of the ref namespace too I think? (Refs could end in ".lock")

-Martin

Employee of Qualcomm Innovation Center,Inc. which is a member of Code Aurora Forum

^ permalink raw reply

* [RFC/PATCH] gitk: Visualize a merge commit with a right-click in gitk
From: Jason Holden @ 2012-12-30  0:16 UTC (permalink / raw)
  To: git; +Cc: paulus, Jason Holden

When first doing a merge in git-gui, the "Visualize Merge" button is
quite helpful to visualize the changes due to a merge.
But once the merge is complete, there's not a similarly convenient
way to recreate that merge view in gitk.

This commit adds to gitk the ability to right-click on a merge commit and
bring up a new gitk window displaying only those commits involved in
the merge.

When right-clicking on a non-merge commit, this option is grayed out.  This
patch also supports correct visualization of octopus merges

Signed-off-by: Jason Holden <jason.k.holden.swdev@gmail.com>
---
 gitk | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/gitk b/gitk
index 379582a..17e1fcb 100755
--- a/gitk
+++ b/gitk
@@ -2551,6 +2551,7 @@ proc makewindow {} {
 	{mc "Compare with marked commit" command compare_commits}
 	{mc "Diff this -> marked commit" command {diffvsmark 0}}
 	{mc "Diff marked commit -> this" command {diffvsmark 1}}
+	{mc "Visualize this merge" command visualize_merge}
     }
     $rowctxmenu configure -tearoff 0
 
@@ -2590,6 +2591,31 @@ proc makewindow {} {
     $diff_menu configure -tearoff 0
 }
 
+# Return the number of parents for a given sha1 id
+proc get_numparents_from_id {id} {
+    global parentlist
+    set row [rowofcommit $id]
+    return [llength [lindex $parentlist $row]]
+}
+
+proc visualize_merge {} {
+    global parents currentid parentlist
+    global rowmenuid
+
+    set num_parents [get_numparents_from_id $rowmenuid]
+    set row [rowofcommit $rowmenuid]
+
+    if {$num_parents >= 2} {
+	set revlist $rowmenuid
+	for { set i 1 } {$i < $num_parents} {incr i} {
+
+	    set revlist "$revlist [lindex $parentlist $row 0]..[lindex $parentlist $row $i] $rowmenuid"
+	}
+	
+	eval exec gitk $revlist
+    }
+}
+
 # Windows sends all mouse wheel events to the current focused window, not
 # the one where the mouse hovers, so bind those events here and redirect
 # to the correct window
@@ -8577,6 +8603,13 @@ proc rowmenu {x y id} {
 	$menu entryconfigure 9 -state $mstate
 	$menu entryconfigure 10 -state $mstate
 	$menu entryconfigure 11 -state $mstate
+
+	# Disable visualize-merge on only one parent
+	if {[get_numparents_from_id $id] == 1} {
+	    $menu entryconfigure 15 -state disabled
+	} else {
+	    $menu entryconfigure 15 -state normal
+	}
     } else {
 	set menu $fakerowmenu
     }
-- 
1.8.1.rc3.28.g0ab5d1f

^ permalink raw reply related

* Re: Fail to push over HTTP with MySQL authentication (Apache2)
From: Davide Baldini @ 2012-12-30  0:27 UTC (permalink / raw)
  To: git
In-Reply-To: <20121229210128.GB21058@sigill.intra.peff.net>

On 12/29/12 22:01, Jeff King wrote:

> However, before trying to investigate that avenue, have you considered
> using git's smart-http backend instead of WebDAV? It's significantly
> more efficient. You can get details and example apache configuration
> from "git help http-backend".

Thank you for the suggestions, I've been able to have the setup running.
I abandoned the horrible WebDAV in favour of Smart HTTP, and it's been
quite straightforward to configure and run successfully.

There is some documentation on setting up DAV here:
> 
>   https://github.com/git/git/blob/master/Documentation/howto/setup-git-server-over-http.txt
> 
> but I have no idea if it is up-to-date or not.

No it's not up-to-date. It should refer to Apache versions <2.0 .
Some of its config directives are inexistent in modern Apaches.

- Davide

^ permalink raw reply

* [PATCH 0/3] Move CodingGuidelines and SubmittingPatches to ./Documentation/technical
From: Thomas Ackermann @ 2012-12-30 10:29 UTC (permalink / raw)
  To: th.acker, git


CodingGuidelines and SubmittingPatches are IMHO a little bit hidden in ./Documentation
and with respect to their content should be better placed in ./Documentation/technical.

[PATCH 1/3] Move CodingGuidelines to ./technical/coding-guidelines.txt and SubmittingPatches to ./technical/submitting-patches.txt
[PATCH 2/3] Add coding-guidelines.txt and submitting-patches.txt to ./Documentation/Makefile
[PATCH 3/3] Convert coding-guidelines.txt and submitting-patches.txt to asciidoc


---
Thomas

^ permalink raw reply

* [PATCH 1/3] Move CodingGuidelines to ./technical/coding-guidelines.txt and SubmittingPatches to ./technical/submitting-patches.txt
From: Thomas Ackermann @ 2012-12-30 10:31 UTC (permalink / raw)
  To: th.acker, git
In-Reply-To: <1023165134.213650.1356863340563.JavaMail.ngmail@webmail06.arcor-online.net>


Signed-off-by: Thomas Ackermann <th.acker@arcor.de>
---
 Documentation/{CodingGuidelines => technical/coding-guidelines.txt}   | 0
 Documentation/{SubmittingPatches => technical/submitting-patches.txt} | 0
 2 files changed, 0 insertions(+), 0 deletions(-)
 rename Documentation/{CodingGuidelines => technical/coding-guidelines.txt} (100%)
 rename Documentation/{SubmittingPatches => technical/submitting-patches.txt} (100%)

diff --git a/Documentation/CodingGuidelines b/Documentation/technical/coding-guidelines.txt
similarity index 100%
rename from Documentation/CodingGuidelines
rename to Documentation/technical/coding-guidelines.txt
diff --git a/Documentation/SubmittingPatches b/Documentation/technical/submitting-patches.txt
similarity index 100%
rename from Documentation/SubmittingPatches
rename to Documentation/technical/submitting-patches.txt
-- 
1.8.0.msysgit.0


---
Thomas

^ permalink raw reply

* [PATCH 2/3] Add coding-guidelines.txt and submitting-patches.txt to ./Documentation/Makefile
From: Thomas Ackermann @ 2012-12-30 10:33 UTC (permalink / raw)
  To: th.acker, git
In-Reply-To: <1023165134.213650.1356863340563.JavaMail.ngmail@webmail06.arcor-online.net>

Signed-off-by: Thomas Ackermann <th.acker@arcor.de>
---
 Documentation/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/Makefile b/Documentation/Makefile
index e53d333..a51c00f 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -47,6 +47,8 @@ TECH_DOCS += technical/racy-git
 TECH_DOCS += technical/send-pack-pipeline
 TECH_DOCS += technical/shallow
 TECH_DOCS += technical/trivial-merge
+TECH_DOCS += technical/coding-guidelines
+TECH_DOCS += technical/submitting-patches
 SP_ARTICLES += $(TECH_DOCS)
 SP_ARTICLES += technical/api-index
 
-- 
1.8.0.msysgit.0


---
Thomas

^ permalink raw reply related

* [PATCH 3/3] Convert coding-guidelines.txt and submitting-patches.txt to asciidoc
From: Thomas Ackermann @ 2012-12-30 10:35 UTC (permalink / raw)
  To: th.acker, git
In-Reply-To: <1023165134.213650.1356863340563.JavaMail.ngmail@webmail06.arcor-online.net>


Signed-off-by: Thomas Ackermann <th.acker@arcor.de>
---
 Documentation/technical/coding-guidelines.txt  | 175 ++++++++++++++-----------
 Documentation/technical/submitting-patches.txt | 144 ++++++++++----------
 2 files changed, 174 insertions(+), 145 deletions(-)

diff --git a/Documentation/technical/coding-guidelines.txt b/Documentation/technical/coding-guidelines.txt
index 69f7e9b..406ded8 100644
--- a/Documentation/technical/coding-guidelines.txt
+++ b/Documentation/technical/coding-guidelines.txt
@@ -1,23 +1,25 @@
+Coding guidelines
+=================
+
 Like other projects, we also have some guidelines to keep to the
 code.  For git in general, three rough rules are:
 
- - Most importantly, we never say "It's in POSIX; we'll happily
+- Most importantly, we never say "It's in POSIX; we'll happily
    ignore your needs should your system not conform to it."
    We live in the real world.
 
- - However, we often say "Let's stay away from that construct,
+- However, we often say "Let's stay away from that construct,
    it's not even in POSIX".
 
- - In spite of the above two rules, we sometimes say "Although
+- In spite of the above two rules, we sometimes say "Although
    this is not in POSIX, it (is so convenient | makes the code
    much more readable | has other good characteristics) and
    practically all the platforms we care about support it, so
    let's use it".
 
-   Again, we live in the real world, and it is sometimes a
-   judgement call, the decision based more on real world
-   constraints people face than what the paper standard says.
-
+Again, we live in the real world, and it is sometimes a
+judgement call, the decision based more on real world
+constraints people face than what the paper standard says.
 
 As for more concrete guidelines, just imitate the existing code
 (this is a good guideline, no matter which project you are
@@ -30,173 +32,183 @@ uses (even if it doesn't match the overall style of existing code).
 But if you must have a list of rules, here they are.
 
 For shell scripts specifically (not exhaustive):
+------------------------------------------------
 
- - We use tabs for indentation.
+- We use tabs for indentation.
 
- - Case arms are indented at the same depth as case and esac lines.
+- Case arms are indented at the same depth as case and esac lines.
 
- - Redirection operators should be written with space before, but no
+- Redirection operators should be written with space before, but no
    space after them.  In other words, write 'echo test >"$file"'
    instead of 'echo test> $file' or 'echo test > $file'.  Note that
    even though it is not required by POSIX to double-quote the
    redirection target in a variable (as shown above), our code does so
    because some versions of bash issue a warning without the quotes.
 
- - We prefer $( ... ) for command substitution; unlike ``, it
+- We prefer $( ... ) for command substitution; unlike ``, it
    properly nests.  It should have been the way Bourne spelled
    it from day one, but unfortunately isn't.
 
- - If you want to find out if a command is available on the user's
+- If you want to find out if a command is available on the user's
    $PATH, you should use 'type <command>', instead of 'which <command>'.
    The output of 'which' is not machine parseable and its exit code
    is not reliable across platforms.
 
- - We use POSIX compliant parameter substitutions and avoid bashisms;
+- We use POSIX compliant parameter substitutions and avoid bashisms;
    namely:
 
-   - We use ${parameter-word} and its [-=?+] siblings, and their
-     colon'ed "unset or null" form.
+	* We use `${parameter-word}` and its `[-=?+]` siblings, and their
+	  colon'ed "unset or null" form.
 
-   - We use ${parameter#word} and its [#%] siblings, and their
-     doubled "longest matching" form.
+	* We use `${parameter#word}` and its `[#%]` siblings, and their
+	  doubled "longest matching" form.
 
-   - No "Substring Expansion" ${parameter:offset:length}.
+	* No "Substring Expansion" `${parameter:offset:length}`.
 
-   - No shell arrays.
+	* No shell arrays.
 
-   - No strlen ${#parameter}.
+	* No strlen `${#parameter}`.
 
-   - No pattern replacement ${parameter/pattern/string}.
+	* No pattern replacement `${parameter/pattern/string}`.
 
- - We use Arithmetic Expansion $(( ... )).
+- We use Arithmetic Expansion $(( ... )).
 
- - Inside Arithmetic Expansion, spell shell variables with $ in front
+- Inside Arithmetic Expansion, spell shell variables with $ in front
    of them, as some shells do not grok $((x)) while accepting $(($x))
    just fine (e.g. dash older than 0.5.4).
 
- - We do not use Process Substitution <(list) or >(list).
+- We do not use Process Substitution <(list) or >(list).
 
- - Do not write control structures on a single line with semicolon.
+- Do not write control structures on a single line with semicolon.
    "then" should be on the next line for if statements, and "do"
    should be on the next line for "while" and "for".
 
- - We prefer "test" over "[ ... ]".
+- We prefer "test" over "[ ... ]".
 
- - We do not write the noiseword "function" in front of shell
+- We do not write the noiseword "function" in front of shell
    functions.
 
- - We prefer a space between the function name and the parentheses. The
+- We prefer a space between the function name and the parentheses. The
    opening "{" should also be on the same line.
    E.g.: my_function () {
 
- - As to use of grep, stick to a subset of BRE (namely, no \{m,n\},
+- As to use of grep, stick to a subset of BRE (namely, no \{m,n\},
    [::], [==], nor [..]) for portability.
 
-   - We do not use \{m,n\};
+	* We do not use \{m,n\};
 
-   - We do not use -E;
+	* We do not use -E;
 
-   - We do not use ? nor + (which are \{0,1\} and \{1,\}
-     respectively in BRE) but that goes without saying as these
-     are ERE elements not BRE (note that \? and \+ are not even part
-     of BRE -- making them accessible from BRE is a GNU extension).
+	* We do not use ? nor + (which are \{0,1\} and \{1,\}
+	  respectively in BRE) but that goes without saying as these
+	  are ERE elements not BRE (note that \? and \+ are not even part
+	  of BRE -- making them accessible from BRE is a GNU extension).
 
- - Use Git's gettext wrappers in git-sh-i18n to make the user
+- Use Git's gettext wrappers in git-sh-i18n to make the user
    interface translatable. See "Marking strings for translation" in
    po/README.
 
 For C programs:
+---------------
 
- - We use tabs to indent, and interpret tabs as taking up to
+- We use tabs to indent, and interpret tabs as taking up to
    8 spaces.
 
- - We try to keep to at most 80 characters per line.
+- We try to keep to at most 80 characters per line.
 
- - We try to support a wide range of C compilers to compile git with,
+- We try to support a wide range of C compilers to compile git with,
    including old ones. That means that you should not use C99
    initializers, even if a lot of compilers grok it.
 
- - Variables have to be declared at the beginning of the block.
-
- - NULL pointers shall be written as NULL, not as 0.
-
- - When declaring pointers, the star sides with the variable
-   name, i.e. "char *string", not "char* string" or
-   "char * string".  This makes it easier to understand code
-   like "char *string, c;".
+- Variables have to be declared at the beginning of the block.
 
- - We avoid using braces unnecessarily.  I.e.
+- NULL pointers shall be written as NULL, not as 0.
 
-	if (bla) {
-		x = 1;
-	}
+- When declaring pointers, the star sides with the variable
+   name, i.e. `char *string`, not `char* string` or
+   `char * string`.  This makes it easier to understand code
+   like `char *string, c;`.
 
-   is frowned upon.  A gray area is when the statement extends
-   over a few lines, and/or you have a lengthy comment atop of
-   it.  Also, like in the Linux kernel, if there is a long list
-   of "else if" statements, it can make sense to add braces to
-   single line blocks.
+- We avoid using braces unnecessarily.  I.e.
++
+............
+if (bla) {
+	x = 1;
+}
+............
++
+is frowned upon.  A gray area is when the statement extends
+over a few lines, and/or you have a lengthy comment atop of
+it.  Also, like in the Linux kernel, if there is a long list
+of `else if` statements, it can make sense to add braces to
+single line blocks.
 
- - We try to avoid assignments inside if().
+- We try to avoid assignments inside if().
 
- - Try to make your code understandable.  You may put comments
+- Try to make your code understandable.  You may put comments
    in, but comments invariably tend to stale out when the code
    they were describing changes.  Often splitting a function
    into two makes the intention of the code much clearer.
 
- - Double negation is often harder to understand than no negation
+- Double negation is often harder to understand than no negation
    at all.
 
- - Some clever tricks, like using the !! operator with arithmetic
+- Some clever tricks, like using the !! operator with arithmetic
    constructs, can be extremely confusing to others.  Avoid them,
    unless there is a compelling reason to use them.
 
- - Use the API.  No, really.  We have a strbuf (variable length
+- Use the API.  No, really.  We have a strbuf (variable length
    string), several arrays with the ALLOC_GROW() macro, a
    string_list for sorted string lists, a hash map (mapping struct
    objects) named "struct decorate", amongst other things.
 
- - When you come up with an API, document it.
+- When you come up with an API, document it.
 
- - The first #include in C files, except in platform specific
+- The first #include in C files, except in platform specific
    compat/ implementations, should be git-compat-util.h or another
    header file that includes it, such as cache.h or builtin.h.
 
- - If you are planning a new command, consider writing it in shell
+- If you are planning a new command, consider writing it in shell
    or perl first, so that changes in semantics can be easily
    changed and discussed.  Many git commands started out like
    that, and a few are still scripts.
 
- - Avoid introducing a new dependency into git. This means you
+- Avoid introducing a new dependency into git. This means you
    usually should stay away from scripting languages not already
    used in the git core command set (unless your command is clearly
    separate from it, such as an importer to convert random-scm-X
    repositories to git).
 
- - When we pass <string, length> pair to functions, we should try to
+- When we pass <string, length> pair to functions, we should try to
    pass them in that order.
 
- - Use Git's gettext wrappers to make the user interface
+- Use Git's gettext wrappers to make the user interface
    translatable. See "Marking strings for translation" in po/README.
 
 Writing Documentation:
+----------------------
 
- Every user-visible change should be reflected in the documentation.
- The same general rule as for code applies -- imitate the existing
- conventions.  A few commented examples follow to provide reference
- when writing or modifying command usage strings and synopsis sections
- in the manual pages:
+Every user-visible change should be reflected in the documentation.
+The same general rule as for code applies -- imitate the existing
+conventions.  A few commented examples follow to provide reference
+when writing or modifying command usage strings and synopsis sections
+in the manual pages:
 
- Placeholders are spelled in lowercase and enclosed in angle brackets:
+Placeholders are spelled in lowercase and enclosed in angle brackets:
+...................
    <file>
    --sort=<key>
    --abbrev[=<n>]
+...................
 
- Possibility of multiple occurrences is indicated by three dots:
+Possibility of multiple occurrences is indicated by three dots:
+...................
    <file>...
    (One or more of <file>.)
+...................
 
- Optional parts are enclosed in square brackets:
+Optional parts are enclosed in square brackets:
+...................
    [<extra>]
    (Zero or one <extra>.)
 
@@ -207,12 +219,16 @@ Writing Documentation:
    [<patch>...]
    (Zero or more of <patch>.  Note that the dots are inside, not
    outside the brackets.)
+...................
 
- Multiple alternatives are indicated with vertical bar:
+Multiple alternatives are indicated with vertical bar:
+...................
    [-q | --quiet]
    [--utf8 | --no-utf8]
+...................
 
- Parentheses are used for grouping:
+Parentheses are used for grouping:
+...................
    [(<rev>|<range>)...]
    (Any number of either <rev> or <range>.  Parens are needed to make
    it clear that "..." pertains to both <rev> and <range>.)
@@ -223,10 +239,13 @@ Writing Documentation:
    git remote set-head <name> (-a | -d | <branch>)
    (One and only one of "-a", "-d" or "<branch>" _must_ (no square
    brackets) be provided.)
+...................
 
- And a somewhat more contrived example:
+And a somewhat more contrived example:
+...................
    --diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]
    Here "=" is outside the brackets, because "--diff-filter=" is a
    valid usage.  "*" has its own pair of brackets, because it can
    (optionally) be specified only when one or more of the letters is
    also provided.
+...................
diff --git a/Documentation/technical/submitting-patches.txt b/Documentation/technical/submitting-patches.txt
index 75935d5..925b79e 100644
--- a/Documentation/technical/submitting-patches.txt
+++ b/Documentation/technical/submitting-patches.txt
@@ -1,69 +1,75 @@
+Submitting Patches
+==================
+
 Checklist (and a short version for the impatient):
 
-	Commits:
-
-	- make commits of logical units
-	- check for unnecessary whitespace with "git diff --check"
-	  before committing
-	- do not check in commented out code or unneeded files
-	- the first line of the commit message should be a short
-	  description (50 characters is the soft limit, see DISCUSSION
-	  in git-commit(1)), and should skip the full stop
-	- it is also conventional in most cases to prefix the
-	  first line with "area: " where the area is a filename
-	  or identifier for the general area of the code being
-	  modified, e.g.
-	  . archive: ustar header checksum is computed unsigned
-	  . git-cherry-pick.txt: clarify the use of revision range notation
-	  (if in doubt which identifier to use, run "git log --no-merges"
-	  on the files you are modifying to see the current conventions)
-	- the body should provide a meaningful commit message, which:
-	  . explains the problem the change tries to solve, iow, what
-	    is wrong with the current code without the change.
-	  . justifies the way the change solves the problem, iow, why
-	    the result with the change is better.
-	  . alternate solutions considered but discarded, if any.
-	- describe changes in imperative mood, e.g. "make xyzzy do frotz"
-	  instead of "[This patch] makes xyzzy do frotz" or "[I] changed
-	  xyzzy to do frotz", as if you are giving orders to the codebase
-	  to change its behaviour.
-	- try to make sure your explanation can be understood without
-	  external resources. Instead of giving a URL to a mailing list
-	  archive, summarize the relevant points of the discussion.
-	- add a "Signed-off-by: Your Name <you@example.com>" line to the
-	  commit message (or just use the option "-s" when committing)
-	  to confirm that you agree to the Developer's Certificate of Origin
-	- make sure that you have tests for the bug you are fixing
-	- make sure that the test suite passes after your commit
-
-	Patch:
-
-	- use "git format-patch -M" to create the patch
-	- do not PGP sign your patch
-	- do not attach your patch, but read in the mail
-	  body, unless you cannot teach your mailer to
-	  leave the formatting of the patch alone.
-	- be careful doing cut & paste into your mailer, not to
-	  corrupt whitespaces.
-	- provide additional information (which is unsuitable for
-	  the commit message) between the "---" and the diffstat
-	- if you change, add, or remove a command line option or
-	  make some other user interface change, the associated
-	  documentation should be updated as well.
-	- if your name is not writable in ASCII, make sure that
-	  you send off a message in the correct encoding.
-	- send the patch to the list (git@vger.kernel.org) and the
-	  maintainer (gitster@pobox.com) if (and only if) the patch
-	  is ready for inclusion. If you use git-send-email(1),
-	  please test it first by sending email to yourself.
-	- see below for instructions specific to your mailer
+Commits:
+--------
+
+- make commits of logical units
+- check for unnecessary whitespace with "git diff --check"
+  before committing
+- do not check in commented out code or unneeded files
+- the first line of the commit message should be a short
+  description (50 characters is the soft limit, see DISCUSSION
+  in git-commit(1)), and should skip the full stop
+- it is also conventional in most cases to prefix the
+  first line with "area: " where the area is a filename
+  or identifier for the general area of the code being
+  modified, e.g.
+  * archive: ustar header checksum is computed unsigned
+  * git-cherry-pick.txt: clarify the use of revision range notation
+  (if in doubt which identifier to use, run "git log --no-merges"
+  on the files you are modifying to see the current conventions)
+- the body should provide a meaningful commit message, which:
+  * explains the problem the change tries to solve, iow, what
+	is wrong with the current code without the change.
+  * justifies the way the change solves the problem, iow, why
+	the result with the change is better.
+  * alternate solutions considered but discarded, if any.
+- describe changes in imperative mood, e.g. "make xyzzy do frotz"
+  instead of "[This patch] makes xyzzy do frotz" or "[I] changed
+  xyzzy to do frotz", as if you are giving orders to the codebase
+  to change its behaviour.
+- try to make sure your explanation can be understood without
+  external resources. Instead of giving a URL to a mailing list
+  archive, summarize the relevant points of the discussion.
+- add a `Signed-off-by: Your Name <you@example.com>` line to the
+  commit message (or just use the option "-s" when committing)
+  to confirm that you agree to the Developer's Certificate of Origin
+- make sure that you have tests for the bug you are fixing
+- make sure that the test suite passes after your commit
+
+Patch:
+------
+
+- use "git format-patch -M" to create the patch
+- do not PGP sign your patch
+- do not attach your patch, but read in the mail
+  body, unless you cannot teach your mailer to
+  leave the formatting of the patch alone.
+- be careful doing cut & paste into your mailer, not to
+  corrupt whitespaces.
+- provide additional information (which is unsuitable for
+  the commit message) between the "---" and the diffstat
+- if you change, add, or remove a command line option or
+  make some other user interface change, the associated
+  documentation should be updated as well.
+- if your name is not writable in ASCII, make sure that
+  you send off a message in the correct encoding.
+- send the patch to the list (git@vger.kernel.org) and the
+  maintainer (gitster@pobox.com) if (and only if) the patch
+  is ready for inclusion. If you use git-send-email(1),
+  please test it first by sending email to yourself.
+- see below for instructions specific to your mailer
 
 Long version:
+-------------
 
 I started reading over the SubmittingPatches document for Linux
 kernel, primarily because I wanted to have a document similar to
 it for the core GIT to make sure people understand what they are
-doing when they write "Signed-off-by" line.
+doing when they write `Signed-off-by` line.
 
 But the patch submission requirements are a lot more relaxed
 here on the technical/contents front, because the core GIT is
@@ -285,7 +291,7 @@ If you like, you can put extra tags at the end:
 You can also create your own tag or use one that's in common usage
 such as "Thanks-to:", "Based-on-patch-by:", or "Mentored-by:".
 
-------------------------------------------------
+..................................
 An ideal patch flow
 
 Here is an ideal patch flow for this project the current maintainer
@@ -319,8 +325,8 @@ In any time between the (2)-(3) cycle, the maintainer may pick it up
 from the list and queue it to 'pu', in order to make it easier for
 people play with it without having to pick up and apply the patch to
 their trees themselves.
+..................................
 
-------------------------------------------------
 Know the status of your patch after submission
 
 * You can use Git itself to find out when your patch is merged in
@@ -334,8 +340,8 @@ Know the status of your patch after submission
   entitled "What's cooking in git.git" and "What's in git.git" giving
   the status of various proposed changes.
 
-------------------------------------------------
 MUA specific hints
+~~~~~~~~~~~~~~~~~~
 
 Some of patches I receive or pick up from the list share common
 patterns of breakage.  Please make sure your MUA is set up
@@ -356,7 +362,7 @@ commit message.
 
 
 Pine
-----
+^^^^
 
 (Johannes Schindelin)
 
@@ -364,14 +370,14 @@ I don't know how many people still use pine, but for those poor
 souls it may be good to mention that the quell-flowed-text is
 needed for recent versions.
 
-... the "no-strip-whitespace-before-send" option, too. AFAIK it
+And the "no-strip-whitespace-before-send" option, too. AFAIK it
 was introduced in 4.60.
 
 (Linus Torvalds)
 
 And 4.58 needs at least this.
 
----
+...........................
 diff-tree 8326dd8350be64ac7fc805f6563a1d61ad10d32c (from e886a61f76edf5410573e92e38ce22974f9c40f1)
 Author: Linus Torvalds <torvalds@g5.osdl.org>
 Date:   Mon Aug 15 17:23:51 2005 -0700
@@ -381,6 +387,8 @@ Date:   Mon Aug 15 17:23:51 2005 -0700
     There's no excuse for unconditionally removing whitespace from
     the pico buffers on close.
 
+--------------------------------------
+
 diff --git a/pico/pico.c b/pico/pico.c
 --- a/pico/pico.c
 +++ b/pico/pico.c
@@ -393,12 +401,14 @@ diff --git a/pico/pico.c b/pico/pico.c
 +#endif
 		c |= COMP_EXIT;
 		break;
-
+...........................
 
 (Daniel Barkalow)
 
+...........................
 > A patch to SubmittingPatches, MUA specific help section for
 > users of Pine 4.63 would be very much appreciated.
+...........................
 
 Ah, it looks like a recent version changed the default behavior to do the
 right thing, and inverted the sense of the configuration option. (Either
@@ -409,12 +419,12 @@ it.
 
 
 Thunderbird, KMail, GMail
--------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^
 
 See the MUA-SPECIFIC HINTS section of git-format-patch(1).
 
 Gnus
-----
+^^^^
 
 '|' in the *Summary* buffer can be used to pipe the current
 message to an external program, and this is a handy way to drive
-- 
1.8.0.msysgit.0


---
Thomas

^ permalink raw reply related

* Re: [PATCH 0/3] Move CodingGuidelines and SubmittingPatches to ./Documentation/technical
From: Ramkumar Ramachandra @ 2012-12-30 11:52 UTC (permalink / raw)
  To: Thomas Ackermann; +Cc: git
In-Reply-To: <1023165134.213650.1356863340563.JavaMail.ngmail@webmail06.arcor-online.net>

Thomas Ackermann wrote:
> CodingGuidelines and SubmittingPatches are IMHO a little bit hidden in ./Documentation
> and with respect to their content should be better placed in ./Documentation/technical.

I don't think SubmittingPatches and CodingGuidelines belong to
Documentation/technical; that location is mostly reserved for API
documentation.  Also, being prominent documents, they're probably
linked to by many places on the internet.  I wouldn't want to
unnecessarily break those links.

Ram

^ permalink raw reply

* [PATCH] gitweb: fix error in sanitize when highlight is enabled
From: Orgad Shaneh @ 2012-12-30 11:52 UTC (permalink / raw)
  To: git; +Cc: Orgad Shaneh

$1 becomes undef by internal regex, since it has no capture groups.

Match against accpetable control characters using index() instead of a regex.

Signed-off-by: Orgad Shaneh <orgads@gmail.com>
---
 gitweb/gitweb.perl |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 0f207f2..6d5aeab 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1556,7 +1556,7 @@ sub sanitize {
 	return undef unless defined $str;
 
 	$str = to_utf8($str);
-	$str =~ s|([[:cntrl:]])|($1 =~ /[\t\n\r]/ ? $1 : quot_cec($1))|eg;
+	$str =~ s|([[:cntrl:]])|(index("\t\n\r", $1) != -1 ? $1 : quot_cec($1))|eg;
 	return $str;
 }
 
-- 
1.7.10.4

^ permalink raw reply related

* Re: [RFC] pack-objects: compression level for non-blobs
From: Jeff King @ 2012-12-30 12:05 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: David Michael Barr, Git Mailing List
In-Reply-To: <20121229052747.GA14928@sigill.intra.peff.net>

On Sat, Dec 29, 2012 at 12:27:47AM -0500, Jeff King wrote:

> > If reachability bitmap is implemented, we'll have per-pack cache
> > infrastructure ready, so less work there for commit cache.
> 
> True. I don't want to dissuade you from doing any commit cache work. I
> only wanted to point out that this alternative may have merit because of
> its simplicity (so we can use it until a caching solution exists, or
> even after, if managing the cache has downsides).

So I was thinking about this, which led to some coding, which led to
some benchmarking.

I want to clean up a few things in the code before I post it, but the
general idea is to have arbitrary per-pack cache files in the
objects/pack directory. Like this:

  $ cd objects/pack && ls
  pack-a3e262f40d95fc0cc97d92797ff9988551367b75.commits
  pack-a3e262f40d95fc0cc97d92797ff9988551367b75.idx
  pack-a3e262f40d95fc0cc97d92797ff9988551367b75.pack
  pack-a3e262f40d95fc0cc97d92797ff9988551367b75.parents
  pack-a3e262f40d95fc0cc97d92797ff9988551367b75.timestamps
  pack-a3e262f40d95fc0cc97d92797ff9988551367b75.trees

Each file describes the objects in the matching pack. If a new pack is
generated, you'd throw away the old cache files along with the old pack,
and generate new ones. Or not. These are totally optional, and an older
version of git will just ignore them. A newer version will use them if
they're available, and otherwise fallback to the existing code (i.e.,
reading the whole object from the pack). So you can generate them at
repack time, later on, or not at all. For now I have a separate command
that generates them based on the pack index; if this turns out to be a
good idea, it would probably get called as part of "repack".

Each file is a set of fixed-length records. The "commits" file contains
the sha1 of every commit in the pack (sorted). A binary search of the
mmap'd file gives the position of a particular commit within the list,
and that position is used to index the parents, timestamps, and trees
files (obviously if it is missing, then the other files are useless, but
we already have to be able to fallback to just reading the objects
anyway).

I split it out into multiple files because you can actually operate with
a subset (though in my initial attempt, I transparently plug in at the
parse_commit layer, which means we need all items to consider the commit
"parsed", whether the caller actually cares or not. But in theory a
reader could only want to ask for one item).  Making a "generation"
cache file is an obvious next step (and because we already have
"commits", it is only 4 bytes per commit on top of it). Reachability
bitmaps would be another one (though due to the compression, I am not
sure they will work with a fixed-size record design, so this may need
some modification).

Anyway, here are the numbers I came up with (appended to my earlier
compression numbers):

git.git:
 Pack  | Size          |  Cold Revs  |  Warm Revs  | Cold Objects | Warm Objects
-------+---------------+-------------+-------------+--------------+--------------
  none |  56.72        | 0.68        | 0.33        |  2.45        |  1.94       
commit |  64.61 (+13%) | 0.50 (-26%) | 0.09 (-74%) |  2.42  (-1%) |  1.69 (-13%)
  tree |  60.68  (+6%) | 0.79 (+16%) | 0.33   (0%) |  2.23  (-8%) |  1.75  (-9%)
  both |  68.54 (+20%) | 0.48 (-29%) | 0.08 (-75%) |  2.24  (-8%) |  1.48 (-23%)
 cache |  59.29  (+4%) | 0.57 (-16%) | 0.05 (-84%) |  2.23  (-8%) |  1.66 (-14%)

linux.git:
 Pack  | Size          |  Cold Revs  |  Warm Revs  | Cold Objects | Warm Objects
-------+---------------+-------------+-------------+--------------+--------------
  none | 864.61        | 8.66        | 4.07        | 42.76        | 36.32       
commit | 970.46 (+12%) | 8.87  (+2%) | 1.02 (-74%) | 42.94   (0%) | 33.43  (-7%)
  tree | 895.37  (+3%) | 9.08  (+4%) | 4.07   (0%) | 36.01 (-15%) | 29.62 (-18%)
  both |1001.25 (+15%) | 8.90  (+2%) | 1.03 (-74%) | 35.57 (-16%) | 26.25 (-27%)
 cache | 894.78  (+3%) | 4.88 (-43%) | 0.69 (-83%) | 38.80  (-9%) | 32.79  (-9%)

webkit.git:
 Pack  | Size          |  Cold Revs  |  Warm Revs  | Cold Objects | Warm Objects
-------+---------------+-------------+-------------+--------------+--------------
  none |   3.46        | 1.61        | 1.38        | 20.46        | 18.72       
commit |   3.54  (+2%) | 1.42 (-11%) | 0.34 (-75%) | 20.42   (0%) | 17.57  (-6%)
  tree |   3.59  (+3%) | 1.61   (0%) | 1.39   (0%) | 16.01 (-21%) | 14.00 (-25%)
  both |   3.67  (+6%) | 1.45 (-10%) | 0.34 (-75%) | 15.94 (-22%) | 12.91 (-31%)
 cache |   3.47   (0%) | 0.49 (-69%) | 0.14 (-90%) | 19.53  (-4%) | 17.86  (-4%)


So you can see that it performs even better than no-compression on the
warm-revs case. Which makes sense, since we do not even have to touch
the object data at all, and can do the whole traversal straight out of
the cache. So we do not even have to memcpy the bytes around. And it
takes up even less space (3-4% versus 12-13% on the first two repos).
Which makes sense, because even though we are duplicating some
information that is in the packfile, we are leaving all of the commit
message bodies compressed.

The other interesting thing is that the cold cache performance also
improves by a lot. Again, this makes sense; we are doing the traversal
completely out of cache, and our data is even more tightly packed in the
cache than it is in the packfile.

Of course, it does very little for the full --objects listing, where we
spend most of our time inflating trees. We could couple this with
uncompressed trees (which are not all that much bigger, since the sha1s
do not compress anyway). Or we could have an external tree cache, but
I'm not sure exactly what it would look like (this is basically
reinventing bits of packv4, but doing so in a way that is redundant with
the existing packfile, rather than replacing it). Or since the point of
--objects is usually reachability, it may make more sense to pursue the
bitmap, which should be even faster still.

-Peff

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox