Git development
 help / color / mirror / Atom feed
* Re: [PATCH] Implement fast hash-collision detection
From: Shawn Pearce @ 2011-11-29 15:23 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Jeff King, Bill Zaumen, git, gitster, torvalds
In-Reply-To: <CACsJy8BRhoYM+Lb8afp=3rKzYNOEq0JY8uS9mD1ovz3uyJVWOA@mail.gmail.com>

On Tue, Nov 29, 2011 at 05:17, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> On Tue, Nov 29, 2011 at 4:07 PM, Jeff King <peff@peff.net> wrote:
>> However, it's harder for trees. Each entry needs to have the new digest
>> added, but there simply isn't room in the format. So we would have to
>> change the tree format, breaking interoperability with older versions of
>> git. And all of your tree sha1's would change as soon as you wrote them
>> with the new format. That's only slightly better than just swapping
>> sha1 out for a better algorithm.
>
> I think we could hide the new digest after NUL at the end of path
> name. C Git at least does not seem to care whatever after NUL.

No, you can't. The next byte after the NUL at the end of a path name
is the SHA-1 of that entry. After those 20 bytes of SHA-1 data is
consumed, the "mode SP name\0" of the next entry is parsed.

There is not room in the tree format for any additional data. Period.
At best you can modify the mode value to be a new octal code that is
not recognized by current Git implementations. But that doesn't get
you a whole lot of data, and its pretty risky change because its
rather incompatible.

^ permalink raw reply

* Re: [PATCH] grep: enable multi-threading for -p and -W
From: Thomas Rast @ 2011-11-29 14:07 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Junio C Hamano
In-Reply-To: <4ED4E2D7.9090804@lsrfire.ath.cx>

René Scharfe wrote:
> > * none of the patches:
> >   git grep --cached INITRAMFS_ROOT_UID
> >     2.13user 0.14system 0:02.10elapsed
> >   git grep --cached -W INITRAMFS_ROOT_UID    # note: broken!
> >     2.23user 0.18system 0:02.21elapsed 
> 
> Broken is a tad too hard a word

Sorry, I just wanted to mark it as: this is unattainable since we're
now doing extra work.

> > * my patch, but not yours:
> >   git grep --cached -W INITRAMFS_ROOT_UID
> >     3.01user 0.05system 0:03.07elapsed
> > 
> > * my patch + your patch:
> >   git grep --cached -W INITRAMFS_ROOT_UID
> >     4.45user 0.22system 0:02.67elapsed
> > 
> > So while it ends up being faster overall, it also uses way more CPU
> > and would presumably be *slower* on a single-processor system.
> > Apparently those attribute lookups really hurt.
> 
> Hmm, why are they *that* expensive?
> 
> callgrind tells me that userdiff_find_by_path() contributes only 0.18%
> to the total cost with your first patch.  Timings in my virtual machine
> are very volatile, but it seems that here the difference is in the
> system time while user is basically the same for all combinations of
> patches.

Not sure, perhaps callgrind does not model syscalls as expensive
enough.  I also suspect (from my very cursory look at the attributes
machinery) that loading attributes for all paths *in random order*
(i.e., threaded) causes a lot of discards of the existing attributes
data.  (The order is still random with my new patch, but we only load
them for files that have matches.)

> I wonder what caused the slight speedup for the case without -W.  It's
> probably just noise, though.

Yeah, it's very noisy, but I was too lazy for best-of-50 or something ;-)

[...]
> > +#ifndef NO_PTHREADS
> > +static inline void grep_attr_lock(struct grep_opt *opt)
[...]
> We'd need stubs here for the case of NO_PTHREADS being defined.

Right, thanks.  Sorry for not testing with NO_PTHREADS to begin with.

> Perhaps leave the thread stuff in builtin/grep.c and export a function
> from there that does [the userdiff lookup], with locking and all?

That seems like a layering violation to me.  Isn't builtin/grep.c
supposed to call out to grep.c, but not the other way around?

Maybe it would be less messy if we had a global (across all of git)
flag that says whether threads are on.  Then subsystems that are used
from threaded code, but cannot handle it, can learn to lock themselves
around their work.  But it would be pretty much the opposite of what
git-grep now does.


Thanks for the review.  I'll reroll as a proper patch later.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply

* Re: [PATCH] Implement fast hash-collision detection
From: Nguyen Thai Ngoc Duy @ 2011-11-29 14:04 UTC (permalink / raw)
  To: Jeff King; +Cc: Bill Zaumen, git, gitster, spearce, torvalds
In-Reply-To: <20111129090733.GA22046@sigill.intra.peff.net>

On Tue, Nov 29, 2011 at 4:07 PM, Jeff King <peff@peff.net> wrote:
> That brings me to my second concern: how does this alternative message
> digest have any authority?
>
> For example, your patch teaches the git protocol a new extension to pass
> these digests along with the object sha1s. But how do we know the server
> isn't feeding us a bad digest along with the bad object?
>
> The "usual" security model discussed in git is that of verifying a
> signed tag. Linus signs a tag and pushes it to a server. I fetch the
> tag, and can verify the signature on the tag. I want to know that I have
> the exact same objects that Linus had. But I can't assume the server is
> trustworthy; it may have fed me a bad object with a collided sha1.
>
> But what's in the signed part of the tag object? Only the sha1 of the
> commit the tag points to, but not the new digest. So an attacker can
> replace the commit with one that collides, and it can in turn point to
> arbitrary trees and blobs.
>
> You can fix this by including an extra header in the signed part of the
> tag that says "also, the digest of the commit I point to is X". Then you
> know you have the same commit that Linus had. But the commit points to a
> tree by its sha1. So you have to add a similar header in the commit
> object that says "also, the digest of the tree I point to is X". And
> ditto for all of the parent pointers, if you want to care about signing
> history. And then you have the same problem in the tree: each sub-tree
> and blob is referenced by its sha1.
>
> ..

(Security ignorant speaking)

Can we just hash all objects in a pack from bottom up, (replacing
sha-1 in trees/commits/tags with the new digest in memory before
hashing), then attach the new top digest to tag's content? The sender
is required by the receiver to send new digests for all objects in the
pack together with the pack. The receiver can then go through the same
process to produce the top digest and match it with one saved in tag.

I do agree that we should use stronger digests, not weaker, preferably
a combination of them.
-- 
Duy

^ permalink raw reply

* Re: [PATCH] grep: enable multi-threading for -p and -W
From: René Scharfe @ 2011-11-29 13:49 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Junio C Hamano
In-Reply-To: <201111291054.39477.trast@student.ethz.ch>

Am 29.11.2011 10:54, schrieb Thomas Rast:
> René Scharfe wrote:
>> Move attribute reading, which is not thread-safe, into add_work(), under
>> grep_mutex.  That way we can stop turning off multi-threading if -p or -W
>> is given, as the diff attribute for each file is gotten safely now.
>>
>> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
>> ---
>> Goes on top of your patch.  Needs testing..
> 
> On a random old linux-2.6 clone at v2.6.37-rc2, I'm seeing (best of 5):
> 
> * none of the patches:
>   git grep --cached INITRAMFS_ROOT_UID
>     2.13user 0.14system 0:02.10elapsed
>   git grep --cached -W INITRAMFS_ROOT_UID    # note: broken!
>     2.23user 0.18system 0:02.21elapsed 

Broken is a tad too hard a word; IIUC -W just lacks support for function
patterns of different file types, which is unintended and surprising of
course.  For the default case it works correctly (and threaded).

> * my patch, but not yours:
>   git grep --cached INITRAMFS_ROOT_UID
>     2.21user 0.21system 0:02.27elapsed
>   git grep --cached -W INITRAMFS_ROOT_UID
>     3.01user 0.05system 0:03.07elapsed
> 
> * my patch + your patch:
>   git grep --cached INITRAMFS_ROOT_UID
>     2.22user 0.17system 0:02.22elapsed
>   git grep --cached -W INITRAMFS_ROOT_UID
>     4.45user 0.22system 0:02.67elapsed
> 
> So while it ends up being faster overall, it also uses way more CPU
> and would presumably be *slower* on a single-processor system.
> Apparently those attribute lookups really hurt.

Hmm, why are they *that* expensive?

callgrind tells me that userdiff_find_by_path() contributes only 0.18%
to the total cost with your first patch.  Timings in my virtual machine
are very volatile, but it seems that here the difference is in the
system time while user is basically the same for all combinations of
patches.

> So I had the following idea: if we load attributes only very lazily
> (that is, when match_funcname() is first called), we can ask for them
> much more rarely.  The revised timings:
> 
> * my patch + the new patch at the end:
>   git grep --cached INITRAMFS_ROOT_UID
>     2.15user 0.16system 0:02.15elapsed 107%CPU
>   git grep --cached -W INITRAMFS_ROOT_UID
>     2.20user 0.18system 0:02.24elapsed

Nice.

I wonder what caused the slight speedup for the case without -W.  It's
probably just noise, though.

> However, locking on a specific lock relies on the fact that the
> attributes are only read from the tree, but never from the object
> store.  Perhaps it would be more futureproof to lock on
> read_sha1_mutex instead.  Either way the lazy attributes lookup seems
> a big win.

You could lock read_sha1_mutex after grep_attr_mutex.  With lazy loading
it should have a low impact most of the time.

> ------ 8< ------ 8< ------
> Subject: [PATCH] grep: fine-grained locking around attributes access
> 
> Lazily load the userdiff attributes in match_funcname().  Use a
> separate mutex around this loading to protect the (not thread-safe)
> attributes machinery.  This lets us re-enable threading with -p and
> -W while reducing the overhead caused by looking up attributes.
> 
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 988ea1d..822b32f 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -256,6 +256,7 @@ static void start_threads(struct grep_opt *opt)
>  
>  	pthread_mutex_init(&grep_mutex, NULL);
>  	pthread_mutex_init(&read_sha1_mutex, NULL);
> +	pthread_mutex_init(&grep_attr_mutex, NULL);
>  	pthread_cond_init(&cond_add, NULL);
>  	pthread_cond_init(&cond_write, NULL);
>  	pthread_cond_init(&cond_result, NULL);
> @@ -303,6 +304,7 @@ static int wait_all(void)
>  
>  	pthread_mutex_destroy(&grep_mutex);
>  	pthread_mutex_destroy(&read_sha1_mutex);
> +	pthread_mutex_destroy(&grep_attr_mutex);
>  	pthread_cond_destroy(&cond_add);
>  	pthread_cond_destroy(&cond_write);
>  	pthread_cond_destroy(&cond_result);
> @@ -1002,9 +1004,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>  		opt.regflags |= REG_ICASE;
>  
>  #ifndef NO_PTHREADS
> -	if (online_cpus() == 1 || !grep_threads_ok(&opt))
> +	if (online_cpus() == 1)
>  		use_threads = 0;
> +#endif
> +
> +	opt.use_threads = use_threads;
>  
> +#ifndef NO_PTHREADS
>  	if (use_threads) {
>  		if (opt.pre_context || opt.post_context || opt.file_break ||
>  		    opt.funcbody)
> diff --git a/grep.c b/grep.c
> index 7a070e9..e9c3df3 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -2,6 +2,7 @@
>  #include "grep.h"
>  #include "userdiff.h"
>  #include "xdiff-interface.h"
> +#include "thread-utils.h"
>  
>  void append_header_grep_pattern(struct grep_opt *opt, enum grep_header_field field, const char *pat)
>  {
> @@ -806,10 +807,38 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol,
>  	opt->output(opt, "\n", 1);
>  }
>  
> -static int match_funcname(struct grep_opt *opt, char *bol, char *eol)
> +#ifndef NO_PTHREADS
> +pthread_mutex_t grep_attr_mutex;
> +
> +static inline void grep_attr_lock(struct grep_opt *opt)
> +{
> +	if (opt->use_threads)
> +		pthread_mutex_lock(&grep_attr_mutex);
> +}
> +
> +static inline void grep_attr_unlock(struct grep_opt *opt)
> +{
> +	if (opt->use_threads)
> +		pthread_mutex_unlock(&grep_attr_mutex);
> +}
> +#endif

We'd need stubs here for the case of NO_PTHREADS being defined.

> +
> +static int match_funcname(struct grep_opt *opt, const char *name, char *bol, char *eol)
>  {
>  	xdemitconf_t *xecfg = opt->priv;
> -	if (xecfg && xecfg->find_func) {
> +	if (xecfg && !xecfg->find_func) {
> +		grep_attr_lock(opt);
> +		struct userdiff_driver *drv = userdiff_find_by_path(name);
> +		grep_attr_unlock(opt);
> +		if (drv && drv->funcname.pattern) {
> +			const struct userdiff_funcname *pe = &drv->funcname;
> +			xdiff_set_find_func(xecfg, pe->pattern, pe->cflags);
> +		} else {
> +			xecfg = opt->priv = NULL;
> +		}
> +	}

Perhaps leave the thread stuff in builtin/grep.c and export a function
from there that does the above, with locking and all?

> +
> +	if (xecfg) {
>  		char buf[1];
>  		return xecfg->find_func(bol, eol - bol, buf, 1,
>  					xecfg->find_func_priv) >= 0;
> @@ -835,7 +864,7 @@ static void show_funcname_line(struct grep_opt *opt, const char *name,
>  		if (lno <= opt->last_shown)
>  			break;
>  
> -		if (match_funcname(opt, bol, eol)) {
> +		if (match_funcname(opt, name, bol, eol)) {
>  			show_line(opt, bol, eol, name, lno, '=');
>  			break;
>  		}
> @@ -848,7 +877,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf,
>  	unsigned cur = lno, from = 1, funcname_lno = 0;
>  	int funcname_needed = !!opt->funcname;
>  
> -	if (opt->funcbody && !match_funcname(opt, bol, end))
> +	if (opt->funcbody && !match_funcname(opt, name, bol, end))
>  		funcname_needed = 2;
>  
>  	if (opt->pre_context < lno)
> @@ -864,7 +893,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf,
>  		while (bol > buf && bol[-1] != '\n')
>  			bol--;
>  		cur--;
> -		if (funcname_needed && match_funcname(opt, bol, eol)) {
> +		if (funcname_needed && match_funcname(opt, name, bol, eol)) {
>  			funcname_lno = cur;
>  			funcname_needed = 0;
>  		}
> @@ -942,19 +971,6 @@ static int look_ahead(struct grep_opt *opt,
>  	return 0;
>  }
>  
> -int grep_threads_ok(const struct grep_opt *opt)
> -{
> -	/* If this condition is true, then we may use the attribute
> -	 * machinery in grep_buffer_1. The attribute code is not
> -	 * thread safe, so we disable the use of threads.
> -	 */
> -	if ((opt->funcname || opt->funcbody)
> -	    && !opt->unmatch_name_only && !opt->status_only && !opt->name_only)
> -		return 0;
> -
> -	return 1;
> -}
> -
>  static void std_output(struct grep_opt *opt, const void *buf, size_t size)
>  {
>  	fwrite(buf, size, 1, stdout);
> @@ -1011,12 +1027,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
>  	if ((opt->funcname || opt->funcbody)
>  	    && !opt->unmatch_name_only && !opt->status_only &&
>  	    !opt->name_only && !binary_match_only && !collect_hits) {
> -		struct userdiff_driver *drv = userdiff_find_by_path(name);
> -		if (drv && drv->funcname.pattern) {
> -			const struct userdiff_funcname *pe = &drv->funcname;
> -			xdiff_set_find_func(&xecfg, pe->pattern, pe->cflags);
> -			opt->priv = &xecfg;
> -		}
> +		opt->priv = &xecfg;
>  	}

opt->priv can be set unconditionally here now, since we only access it
from within match_funcname() and that function is not called if any of
the short-circuit options is set.

>  	try_lookahead = should_lookahead(opt);
>  
> @@ -1093,7 +1104,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
>  				show_function = 1;
>  			goto next_line;
>  		}
> -		if (show_function && match_funcname(opt, bol, eol))
> +		if (show_function && match_funcname(opt, name, bol, eol))
>  			show_function = 0;
>  		if (show_function ||
>  		    (last_hit && lno <= last_hit + opt->post_context)) {
> diff --git a/grep.h b/grep.h
> index a652800..15d227c 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -115,6 +115,7 @@ struct grep_opt {
>  	int show_hunk_mark;
>  	int file_break;
>  	int heading;
> +	int use_threads;
>  	void *priv;
>  
>  	void (*output)(struct grep_opt *opt, const void *data, size_t size);
> @@ -131,4 +132,10 @@ struct grep_opt {
>  extern struct grep_opt *grep_opt_dup(const struct grep_opt *opt);
>  extern int grep_threads_ok(const struct grep_opt *opt);
>  
> +#ifndef NO_PTHREADS
> +/* Mutex used around access to the attributes machinery if
> + * opt->use_threads.  Must be initialized/destroyed by callers! */
> +extern pthread_mutex_t grep_attr_mutex;
> +#endif
> +
>  #endif
> 
> 

^ permalink raw reply

* Re: [PATCH] Implement fast hash-collision detection
From: Nguyen Thai Ngoc Duy @ 2011-11-29 13:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Bill Zaumen, git, gitster, spearce, torvalds
In-Reply-To: <20111129090733.GA22046@sigill.intra.peff.net>

On Tue, Nov 29, 2011 at 4:07 PM, Jeff King <peff@peff.net> wrote:
> However, it's harder for trees. Each entry needs to have the new digest
> added, but there simply isn't room in the format. So we would have to
> change the tree format, breaking interoperability with older versions of
> git. And all of your tree sha1's would change as soon as you wrote them
> with the new format. That's only slightly better than just swapping
> sha1 out for a better algorithm.

I think we could hide the new digest after NUL at the end of path
name. C Git at least does not seem to care whatever after NUL.
-- 
Duy

^ permalink raw reply

* Re: Copy branch into master
From: Tay Ray Chuan @ 2011-11-29 12:49 UTC (permalink / raw)
  To: Andrew Eikum; +Cc: Ron Eggler, git
In-Reply-To: <20111128183616.GB29503@foghorn.codeweavers.com>

On Tue, Nov 29, 2011 at 2:36 AM, Andrew Eikum <aeikum@codeweavers.com> wrote:
> On Mon, Nov 28, 2011 at 10:25:33AM -0800, Ron Eggler wrote:
>> Some time ago I created a DVT branch in my project and I have almost been
>> exclusively working in it. Now the time for some test deployment came and I
>> didn't have time to merge it all back into the master thus I gave out the
>> DVT branch version. Now I would like to copy exactly what I have in that
>> branch back into my master to have an exact copy in my master of what got
>> deployed with out any changes.
>> How can I do this?
>
> Couple options, depending on what you want:
>
> Rename DVT to master (similar to 'mv DVT master', including
> losing the contents of 'master'):
> $ git checkout --detach HEAD
> $ git branch -M DVT master
> $ git checkout master

It might not be wise to take the strict definition of rename = move
(copy + delete). You will lose *all* your reflog associated with
master. The old master is gone forever.

Resetting the branch is "safer". It's like pseudo-copying the branch.
The reflog for both branches are still around. You can do this with

 # on master
 $ git reset --hard DVT

or

 # not on master
 $ git branch -f master dvt

-- 
Cheers,
Ray Chuan

^ permalink raw reply

* Re: t/t1304: avoid -d option to setfacl (db82657)
From: Jeff King @ 2011-11-29 12:23 UTC (permalink / raw)
  To: Scott Parrish; +Cc: Brandon Casey, wsp, Git Mailing List
In-Reply-To: <D95BFEA0-BAC1-4CBD-8B56-27DB4BDF147C@gmail.com>

On Sat, Nov 19, 2011 at 03:28:49PM -0500, Scott Parrish wrote:

> > The abandoned POSIX draft does actually specify the FreeBSD behavior.
> > 
> > So I think it's kind of a toss-up.  Which option we choose should
> > probably depend on whether we get more test coverage by using the
> > "d[efault]" notation or by using the -d option.  That depends on
> > whether there are more Solaris users compiling git or whether there
> > are more FreeBSD users.  I don't know the answer to that either.  I
> > tend to think there are very few of either.
> 
> Is there a reason conditional logic can't be used (perhaps keying off
> of `uname -s` or the like) so that we have coverage in all cases?

It's nice if we can auto-detect such things based on behavior, and then
there's no need for maintaining a "uname -s" guess. Perhaps we can
check at the beginning of t1304 which one works (or neither), and set a
flag appropriately? It looks like we already check whether setfacl works
at all, so this would be a natural extension.

I started to do a "how about this" patch, but I realized that in order
to see whether "-d" worked as "default" versus "delete", we'll have to
check the output of "getfacl" to see if it reports the default entry.
And then I realized I have no idea what FreeBSD shows in such a case
(surely not "d:*", right?).

-Peff

^ permalink raw reply

* Re: git-bisect working only from toplevel dir
From: Jeff King @ 2011-11-29 12:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Peter Baumann, Adam Borowski, git
In-Reply-To: <7vd3chage9.fsf@alter.siamese.dyndns.org>

On Thu, Nov 24, 2011 at 03:50:22AM -0800, Junio C Hamano wrote:

> > On Wed, Nov 23, 2011 at 12:45:18PM -0800, Junio C Hamano wrote:
> > ...
> >> Also didn't we make bisect workable in a bare repository recently? So the
> >> start-up sequence has to be something more elaborate like...
> >> ...
> >> and then inside bisect_next() you would check if $prefix exists, and go
> >> there to run bisect--helper (or fail to go there and say "cannot test").
> >
> > But is the "cannot test" aka exit(127) the best we can do in this case?
> 
> Yeah, thinking about it a bit more, it may probably be better to make it a
> failure. The user explicitly asked "be in _this_ directory and run make;
> it should succeed for the bisection test to pass". If the bisection test
> criterion the user was interested in was a successful build of the whole
> project (not the subpart of the current directory), the user would have
> gone up to the top-level and "bisect run make" there.

There are more possibilities than that. For example, imagine a project
with two sibling directories, one a library and one a command that is
built on the library. The library has a bug that we want to bisect, but
the command is the only mechanism we have to test the bug. The command's
Makefile points to the library directory (e.g., using recursive
make[1]).  It would be natural for the user to do something like:

  cd cmd
  make && ./test-cmd
  : hmph, it's broken
  git bisect start
  git bisect bad
  : I think v1.1 was OK
  git checkout v1.1
  make && ./test-cmd
  : Yep, let's run.
  git bisect good
  git bisect run 'make && ./test-cmd'

If, somewhere in the middle, the current directory doesn't exist, then
our test harness does not exist. And we can't say good or bad, but only
"don't know".  Not knowing all of the details of what the user's command
does, that seems to me to be the only safe option.

The worst case is that the bisection takes longer to run and says "I
don't know where the bug is, but it's in this range", and the user has
to go back and run it again with a smarter test. But if we return "no,
the test failed" then we are likely going to just produce nonsensical
results, as our search is hitting on two different errors, and the "bug"
will appear to come and go.

It might be tempting to say that this case can't come up. After all, at
the branch tip the bug is there, and in v1.1 it isn't. What is the
chance that the test harness goes away in the middle? In a linear
history, not hight. But if you have history like this:


       D--*--*--*
      /          \
  *--*--A--B--C---*--E

where:

  - A introduces the "cmd" directory
  - B is v1.1 (known good)
  - C is the location of the actual bug
  - D is on a side branch, but does _not_ have "cmd"
  - E is our current tip (known bad)

then we will have to search down the side branch towards D to look for
the bug.

If this seems contrived, well, it is. In 99% of cases, the directory
_won't_ go away, and none of this will matter.  And of course you can
have this exact same problem even without the directory issue. If your
test command is "make && ./test-harness", and the side branch doesn't
have the harness, then it's going to erroneously report the presence of
the bug. But that's your fault for writing a crappy test command that
wasn't careful about verifying the pre-conditions.

So maybe it doesn't matter; there are a lot of ways to shoot yourself in
the foot with a bisection. I just think git should set a good example
and default to being conservative with its claims.

-Peff

^ permalink raw reply

* Re: Git Submodule Problem - Bug?
From: Fredrik Gustafsson @ 2011-11-29 10:41 UTC (permalink / raw)
  To: Thomas Rast
  Cc: Fredrik Gustafsson, Manuel Koller, git, Jens Lehmann, Heiko Voigt
In-Reply-To: <201111291125.41943.trast@student.ethz.ch>

On Tue, Nov 29, 2011 at 11:25:41AM +0100, Thomas Rast wrote:
> So maybe the right questions to ask would be: what's the *official*
> way of removing a submodule completely?  Do we support overwriting
> submodules in the way Manuel wanted to?  Why not? :-)

I suggest that we add a command for that;
git submodule remove <submodule>

That should be pretty easy and straight forward to do. However I would
like to hear what Jens, Heiko and/or Junio has to say before starting to
implement it.

-- 
Med vänliga hälsningar
Fredrik Gustafsson

E-post: iveqy@iveqy.com
Tel. nr.: 0733 60 82 74

^ permalink raw reply

* Re: [PATCH] Implement fast hash-collision detection
From: Jeff King @ 2011-11-29 10:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git
In-Reply-To: <CACBZZX59PiO0GGfPg=Gn0h_yWFnAMxmtDtecv9Yd_Pu8Jz0gzg@mail.gmail.com>

On Tue, Nov 29, 2011 at 11:24:18AM +0100, Ævar Arnfjörð Bjarmason wrote:

> On Tue, Nov 29, 2011 at 10:07, Jeff King <peff@peff.net> wrote:
> 
> > However, it may be of interest that the Sun is expected to burn out in a
> > mere 10^10 years[1].
> 
> Off topic, but it's a a lot sooner than that. The total age of the sun
> is is around 10^10 (10 billion) years, but we're already ~4.6 billion
> years along that line.

Yeah, I checked Wikipedia, but rounded up for simplicity. I did use 5
billion for my fun fact at the end of the email, which is close to
accurate.

> However the Sun is currently in a stage of gradual heating until it
> turns into a red giant in ~5 billion years. In around 500 million
> years the earth will be uninhabitable as we know it, and in around 1
> billion years the surface will be hot enough to have boiled all the
> oceans. In other words the earth in a billion years will probably look
> similar to how Venus looks now.

Good point. If we want an accidental collision in the next 500 million
years, we'd better step up the pace of development!

-Peff

^ permalink raw reply

* Re: Git Submodule Problem - Bug?
From: Thomas Rast @ 2011-11-29 10:25 UTC (permalink / raw)
  To: Fredrik Gustafsson; +Cc: Manuel Koller, git, Jens Lehmann, Heiko Voigt
In-Reply-To: <20111129101546.GB2829@kolya>

Fredrik Gustafsson wrote:
> > [removing a submodule and adding another under the same name
> > confuses git]
> 
> This is something I did not thought about when writing that patch. The
> reason that this fails is that the part when the first submodule is
> removed is no longer complete (as intended). Before this patch
>      git config --remove-section submodule.sub &&
>      git rm .gitmodules &&
>      rm -rf sub &&
> Did remove all the data associated with the submodule. That's no longer
> the case.

So maybe the right questions to ask would be: what's the *official*
way of removing a submodule completely?  Do we support overwriting
submodules in the way Manuel wanted to?  Why not? :-)

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply

* Re: [PATCH] Implement fast hash-collision detection
From: Ævar Arnfjörð Bjarmason @ 2011-11-29 10:24 UTC (permalink / raw)
  To: Jeff King; +Cc: Bill Zaumen, git, gitster, pclouds, spearce, torvalds
In-Reply-To: <20111129090733.GA22046@sigill.intra.peff.net>

On Tue, Nov 29, 2011 at 10:07, Jeff King <peff@peff.net> wrote:

> However, it may be of interest that the Sun is expected to burn out in a
> mere 10^10 years[1].

Off topic, but it's a a lot sooner than that. The total age of the sun
is is around 10^10 (10 billion) years, but we're already ~4.6 billion
years along that line.

However the Sun is currently in a stage of gradual heating until it
turns into a red giant in ~5 billion years. In around 500 million
years the earth will be uninhabitable as we know it, and in around 1
billion years the surface will be hot enough to have boiled all the
oceans. In other words the earth in a billion years will probably look
similar to how Venus looks now.

^ permalink raw reply

* Re: Git Submodule Problem - Bug?
From: Fredrik Gustafsson @ 2011-11-29 10:15 UTC (permalink / raw)
  To: Thomas Rast
  Cc: Manuel Koller, git, Jens Lehmann, Heiko Voigt, Fredrik Gustafsson
In-Reply-To: <201111291024.01230.trast@student.ethz.ch>

On Tue, Nov 29, 2011 at 10:24:01AM +0100, Thomas Rast wrote:
> Manuel Koller wrote:
> > 
> > The problem arises when I pull a commit that switches a submodule with another one and then run git submodule update. Say I have a repo "super" that has one submodule "sub1" in the folder "sub" and a clone "super2". Now I remove the submodule in "super2" and add another one ("sub2") again named "sub", commit this and push it. Now after pulling the commit to "super", I need to run git submodule sync and then git submodule update. 
> > 
> > I expect to end up with the submodule "sub2" in sub. But the log clearly shows that the commits from "sub1" are still there (the master branch belongs to "sub1" while origin/master comes from "sub2").  I get the following output:
> > 
> > > ...
> > > commit 77d8d11fed3b07e1d4e47b3df9fc44c278694a39
> > > Author: Manuel Koller <koller@stat.math.ethz.ch>
> > > Date:   Mon Nov 28 17:46:45 2011 +0100
> > > 
> > >     initial commit sub1
> > > commit 346fe6bd9e7957f10c5e833bb1155153379da41c
> > > Author: Manuel Koller <koller@stat.math.ethz.ch>
> > > Date:   Mon Nov 28 17:46:45 2011 +0100
> > > 
> > >     initial commit sub2
> > 
> > I think it should be twice the same, and "sub2". I checked also with Charon, on his machine, the two log messages reported are "sub1" which completely baffles me.
> 
> I used the test script at the end (just a testification of your
> attachment) to bisect this to the following commit:
> 
>     commit 501770e1bb5d132ae4f79aa96715f07f6b84e1f6
>     Author: Fredrik Gustafsson <iveqy@iveqy.com>
>     Date:   Mon Aug 15 23:17:47 2011 +0200
> 
>         Move git-dir for submodules
>         
>         Move git-dir for submodules into $GIT_DIR/modules/[name_of_submodule] of
>         the superproject. This is a step towards being able to delete submodule
>         directories without loosing the information from their .git directory
>         as that is now stored outside the submodules work tree.
>         
>         This is done relying on the already existent .git-file functionality.
>         When adding or updating a submodule whose git directory is found under
>         $GIT_DIR/modules/[name_of_submodule], don't clone it again but simply
>         point the .git-file to it and remove the now stale index file from it.
>         The index will be recreated by the following checkout.
>         
>         This patch will not affect already cloned submodules at all.
>         
>         Tests that rely on .git being a directory have been fixed.
>         
>         Signed-off-by: Fredrik Gustafsson <iveqy@iveqy.com>
>         Mentored-by: Jens Lehmann <Jens.Lehmann@web.de>
>         Mentored-by: Heiko Voigt <hvoigt@hvoigt.net>
>         Signed-off-by: Junio C Hamano <gitster@pobox.com>
> 
> That is, before 501770 I get sub1/sub2 as Manuel said above.  After
> 501770 I get sub1/sub1 (!).  I have not been able to reproduce the
> sub2/sub2 behavior with any version I tried (1.7.0, 1.7.3.4, 1.7.6,
> etc.).  Perhaps there is a configuration setting that changes this?
> 
> In any case, since it's -rc4 time and 501770 is set to go into the
> release, it would be nice if you could check whether this is indeed
> correct/intended.
> 
> 
> 
> ---- 8< ----
> #!/bin/sh
> 
> test_description='submodule change bug'
> . ./test-lib.sh
> 
> ## set current directory as working directory
> wd=`pwd`
> 
> test_expect_success 'set up submodules' '
> ## create repositories to be used as submodules
> mkdir sub1 sub2 remote &&
> (cd sub1 &&
>     git init &&
>     echo "test sub1" >> file &&
>     git add file &&
>     git commit -m "initial commit sub1"
> ) &&
> git clone --bare sub1 remote/sub1.git &&
> (cd sub2 &&
>     git init &&
>     echo "test sub2" >> file &&
>     git add file &&
>     git commit -m "initial commit sub2"
> ) &&
> git clone --bare sub2 remote/sub2.git
> '
> 
> test_expect_success 'set up super-repo' '
> ## create super repository
> git init --bare remote/super.git &&
> git clone remote/super.git super &&
> (cd super &&
>     git submodule add "$wd"/remote/sub1.git sub &&
>     git commit -m "Added submodule sub1 as sub" &&
>     git push -u origin master
> )'
> 
> test_expect_success 'make super-repo with sub1->sub2' '
> ## clone super repository again
> ## and switch submodule sub1 by sub2
> git clone --recursive remote/super.git super2 &&
> (cd super2 &&
>     ## remote submodule sub
>     git config --remove-section submodule.sub &&
>     git rm .gitmodules &&
>     rm -rf sub &&
>     git rm sub &&
>     git commit -m "Removed submodule sub" &&
>     ## add submodule sub2 as sub
>     git submodule add "$wd"/remote/sub2.git sub &&
>     git commit -m "Added submodule sub2 as sub" &&
>     git push
> )
> '
> 
> test_expect_success 'pull from super2' '
> ## now pull super
> (cd super &&
>     git pull &&
>     ## this will fail
>     if ! git submodule update --init; then
>     ## so sync first und update again
>         git submodule sync &&
>         git submodule update --init
>     fi &&
>     ## now sub is corrupt
>     (cd sub &&
>         git log master >log1 && ## this is from sub1
>         echo "# next line should be: initial commit from sub1" &&
>         grep sub1 log1 &&
>         echo "# next line should be: initial commit from sub2" &&
>         git log origin/master >log2 && ## this is from sub2
>         grep sub2 log2
>     )
> )
> '
> 
> test_done

This is something I did not thought about when writing that patch. The
reason that this fails is that the part when the first submodule is
removed is no longer complete (as intended). Before this patch
     git config --remove-section submodule.sub &&
     git rm .gitmodules &&
     rm -rf sub &&
Did remove all the data associated with the submodule. That's no longer
the case. The submodule repository is stored in .git/modules and is
still there. When a new submodule (at the same path) is added, the
repository already associated with that path will be used.

If to the three lines above from the test a line that removes the
repository also is added, the test will work:
     git config --remove-section submodule.sub &&
     git rm .gitmodules &&
     rm -rf sub &&
     rm -rf .git/modules/sub &&
     git rm sub &&

The point beeing that 'rm -rf sub' is not longer sufficient to remove a
submodule and its repository. As 'rm -rf' is not a git command, I'm not
sure how this should be solved (or if it should at all).

* Note: the .git/modules/sub is only valid if .git is a git-dir, if it's
  a gitfile, it's not...

-- 
Med vänliga hälsningar
Fredrik Gustafsson

E-post: iveqy@iveqy.com
Tel. nr.: 0733 60 82 74

^ permalink raw reply

* Re: [PATCH] grep: enable multi-threading for -p and -W
From: Thomas Rast @ 2011-11-29  9:54 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Junio C Hamano
In-Reply-To: <4ED0D875.5050900@lsrfire.ath.cx>

René Scharfe wrote:
> Move attribute reading, which is not thread-safe, into add_work(), under
> grep_mutex.  That way we can stop turning off multi-threading if -p or -W
> is given, as the diff attribute for each file is gotten safely now.
> 
> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
> ---
> Goes on top of your patch.  Needs testing..

On a random old linux-2.6 clone at v2.6.37-rc2, I'm seeing (best of 5):

* none of the patches:
  git grep --cached INITRAMFS_ROOT_UID
    2.13user 0.14system 0:02.10elapsed
  git grep --cached -W INITRAMFS_ROOT_UID    # note: broken!
    2.23user 0.18system 0:02.21elapsed 

* my patch, but not yours:
  git grep --cached INITRAMFS_ROOT_UID
    2.21user 0.21system 0:02.27elapsed
  git grep --cached -W INITRAMFS_ROOT_UID
    3.01user 0.05system 0:03.07elapsed

* my patch + your patch:
  git grep --cached INITRAMFS_ROOT_UID
    2.22user 0.17system 0:02.22elapsed
  git grep --cached -W INITRAMFS_ROOT_UID
    4.45user 0.22system 0:02.67elapsed

So while it ends up being faster overall, it also uses way more CPU
and would presumably be *slower* on a single-processor system.
Apparently those attribute lookups really hurt.

So I had the following idea: if we load attributes only very lazily
(that is, when match_funcname() is first called), we can ask for them
much more rarely.  The revised timings:

* my patch + the new patch at the end:
  git grep --cached INITRAMFS_ROOT_UID
    2.15user 0.16system 0:02.15elapsed 107%CPU
  git grep --cached -W INITRAMFS_ROOT_UID
    2.20user 0.18system 0:02.24elapsed

However, locking on a specific lock relies on the fact that the
attributes are only read from the tree, but never from the object
store.  Perhaps it would be more futureproof to lock on
read_sha1_mutex instead.  Either way the lazy attributes lookup seems
a big win.

------ 8< ------ 8< ------
Subject: [PATCH] grep: fine-grained locking around attributes access

Lazily load the userdiff attributes in match_funcname().  Use a
separate mutex around this loading to protect the (not thread-safe)
attributes machinery.  This lets us re-enable threading with -p and
-W while reducing the overhead caused by looking up attributes.

diff --git a/builtin/grep.c b/builtin/grep.c
index 988ea1d..822b32f 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -256,6 +256,7 @@ static void start_threads(struct grep_opt *opt)
 
 	pthread_mutex_init(&grep_mutex, NULL);
 	pthread_mutex_init(&read_sha1_mutex, NULL);
+	pthread_mutex_init(&grep_attr_mutex, NULL);
 	pthread_cond_init(&cond_add, NULL);
 	pthread_cond_init(&cond_write, NULL);
 	pthread_cond_init(&cond_result, NULL);
@@ -303,6 +304,7 @@ static int wait_all(void)
 
 	pthread_mutex_destroy(&grep_mutex);
 	pthread_mutex_destroy(&read_sha1_mutex);
+	pthread_mutex_destroy(&grep_attr_mutex);
 	pthread_cond_destroy(&cond_add);
 	pthread_cond_destroy(&cond_write);
 	pthread_cond_destroy(&cond_result);
@@ -1002,9 +1004,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		opt.regflags |= REG_ICASE;
 
 #ifndef NO_PTHREADS
-	if (online_cpus() == 1 || !grep_threads_ok(&opt))
+	if (online_cpus() == 1)
 		use_threads = 0;
+#endif
+
+	opt.use_threads = use_threads;
 
+#ifndef NO_PTHREADS
 	if (use_threads) {
 		if (opt.pre_context || opt.post_context || opt.file_break ||
 		    opt.funcbody)
diff --git a/grep.c b/grep.c
index 7a070e9..e9c3df3 100644
--- a/grep.c
+++ b/grep.c
@@ -2,6 +2,7 @@
 #include "grep.h"
 #include "userdiff.h"
 #include "xdiff-interface.h"
+#include "thread-utils.h"
 
 void append_header_grep_pattern(struct grep_opt *opt, enum grep_header_field field, const char *pat)
 {
@@ -806,10 +807,38 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol,
 	opt->output(opt, "\n", 1);
 }
 
-static int match_funcname(struct grep_opt *opt, char *bol, char *eol)
+#ifndef NO_PTHREADS
+pthread_mutex_t grep_attr_mutex;
+
+static inline void grep_attr_lock(struct grep_opt *opt)
+{
+	if (opt->use_threads)
+		pthread_mutex_lock(&grep_attr_mutex);
+}
+
+static inline void grep_attr_unlock(struct grep_opt *opt)
+{
+	if (opt->use_threads)
+		pthread_mutex_unlock(&grep_attr_mutex);
+}
+#endif
+
+static int match_funcname(struct grep_opt *opt, const char *name, char *bol, char *eol)
 {
 	xdemitconf_t *xecfg = opt->priv;
-	if (xecfg && xecfg->find_func) {
+	if (xecfg && !xecfg->find_func) {
+		grep_attr_lock(opt);
+		struct userdiff_driver *drv = userdiff_find_by_path(name);
+		grep_attr_unlock(opt);
+		if (drv && drv->funcname.pattern) {
+			const struct userdiff_funcname *pe = &drv->funcname;
+			xdiff_set_find_func(xecfg, pe->pattern, pe->cflags);
+		} else {
+			xecfg = opt->priv = NULL;
+		}
+	}
+
+	if (xecfg) {
 		char buf[1];
 		return xecfg->find_func(bol, eol - bol, buf, 1,
 					xecfg->find_func_priv) >= 0;
@@ -835,7 +864,7 @@ static void show_funcname_line(struct grep_opt *opt, const char *name,
 		if (lno <= opt->last_shown)
 			break;
 
-		if (match_funcname(opt, bol, eol)) {
+		if (match_funcname(opt, name, bol, eol)) {
 			show_line(opt, bol, eol, name, lno, '=');
 			break;
 		}
@@ -848,7 +877,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf,
 	unsigned cur = lno, from = 1, funcname_lno = 0;
 	int funcname_needed = !!opt->funcname;
 
-	if (opt->funcbody && !match_funcname(opt, bol, end))
+	if (opt->funcbody && !match_funcname(opt, name, bol, end))
 		funcname_needed = 2;
 
 	if (opt->pre_context < lno)
@@ -864,7 +893,7 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf,
 		while (bol > buf && bol[-1] != '\n')
 			bol--;
 		cur--;
-		if (funcname_needed && match_funcname(opt, bol, eol)) {
+		if (funcname_needed && match_funcname(opt, name, bol, eol)) {
 			funcname_lno = cur;
 			funcname_needed = 0;
 		}
@@ -942,19 +971,6 @@ static int look_ahead(struct grep_opt *opt,
 	return 0;
 }
 
-int grep_threads_ok(const struct grep_opt *opt)
-{
-	/* If this condition is true, then we may use the attribute
-	 * machinery in grep_buffer_1. The attribute code is not
-	 * thread safe, so we disable the use of threads.
-	 */
-	if ((opt->funcname || opt->funcbody)
-	    && !opt->unmatch_name_only && !opt->status_only && !opt->name_only)
-		return 0;
-
-	return 1;
-}
-
 static void std_output(struct grep_opt *opt, const void *buf, size_t size)
 {
 	fwrite(buf, size, 1, stdout);
@@ -1011,12 +1027,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
 	if ((opt->funcname || opt->funcbody)
 	    && !opt->unmatch_name_only && !opt->status_only &&
 	    !opt->name_only && !binary_match_only && !collect_hits) {
-		struct userdiff_driver *drv = userdiff_find_by_path(name);
-		if (drv && drv->funcname.pattern) {
-			const struct userdiff_funcname *pe = &drv->funcname;
-			xdiff_set_find_func(&xecfg, pe->pattern, pe->cflags);
-			opt->priv = &xecfg;
-		}
+		opt->priv = &xecfg;
 	}
 	try_lookahead = should_lookahead(opt);
 
@@ -1093,7 +1104,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
 				show_function = 1;
 			goto next_line;
 		}
-		if (show_function && match_funcname(opt, bol, eol))
+		if (show_function && match_funcname(opt, name, bol, eol))
 			show_function = 0;
 		if (show_function ||
 		    (last_hit && lno <= last_hit + opt->post_context)) {
diff --git a/grep.h b/grep.h
index a652800..15d227c 100644
--- a/grep.h
+++ b/grep.h
@@ -115,6 +115,7 @@ struct grep_opt {
 	int show_hunk_mark;
 	int file_break;
 	int heading;
+	int use_threads;
 	void *priv;
 
 	void (*output)(struct grep_opt *opt, const void *data, size_t size);
@@ -131,4 +132,10 @@ struct grep_opt {
 extern struct grep_opt *grep_opt_dup(const struct grep_opt *opt);
 extern int grep_threads_ok(const struct grep_opt *opt);
 
+#ifndef NO_PTHREADS
+/* Mutex used around access to the attributes machinery if
+ * opt->use_threads.  Must be initialized/destroyed by callers! */
+extern pthread_mutex_t grep_attr_mutex;
+#endif
+
 #endif


-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply related

* Re: Git Submodule Problem - Bug?
From: Thomas Rast @ 2011-11-29  9:24 UTC (permalink / raw)
  To: Manuel Koller; +Cc: git, Jens Lehmann, Heiko Voigt, Fredrik Gustafsson
In-Reply-To: <38AE3033-6902-48AA-819B-DB4083F1F8EF@gmail.com>

Manuel Koller wrote:
> 
> The problem arises when I pull a commit that switches a submodule with another one and then run git submodule update. Say I have a repo "super" that has one submodule "sub1" in the folder "sub" and a clone "super2". Now I remove the submodule in "super2" and add another one ("sub2") again named "sub", commit this and push it. Now after pulling the commit to "super", I need to run git submodule sync and then git submodule update. 
> 
> I expect to end up with the submodule "sub2" in sub. But the log clearly shows that the commits from "sub1" are still there (the master branch belongs to "sub1" while origin/master comes from "sub2").  I get the following output:
> 
> > ...
> > commit 77d8d11fed3b07e1d4e47b3df9fc44c278694a39
> > Author: Manuel Koller <koller@stat.math.ethz.ch>
> > Date:   Mon Nov 28 17:46:45 2011 +0100
> > 
> >     initial commit sub1
> > commit 346fe6bd9e7957f10c5e833bb1155153379da41c
> > Author: Manuel Koller <koller@stat.math.ethz.ch>
> > Date:   Mon Nov 28 17:46:45 2011 +0100
> > 
> >     initial commit sub2
> 
> I think it should be twice the same, and "sub2". I checked also with Charon, on his machine, the two log messages reported are "sub1" which completely baffles me.

I used the test script at the end (just a testification of your
attachment) to bisect this to the following commit:

    commit 501770e1bb5d132ae4f79aa96715f07f6b84e1f6
    Author: Fredrik Gustafsson <iveqy@iveqy.com>
    Date:   Mon Aug 15 23:17:47 2011 +0200

        Move git-dir for submodules
        
        Move git-dir for submodules into $GIT_DIR/modules/[name_of_submodule] of
        the superproject. This is a step towards being able to delete submodule
        directories without loosing the information from their .git directory
        as that is now stored outside the submodules work tree.
        
        This is done relying on the already existent .git-file functionality.
        When adding or updating a submodule whose git directory is found under
        $GIT_DIR/modules/[name_of_submodule], don't clone it again but simply
        point the .git-file to it and remove the now stale index file from it.
        The index will be recreated by the following checkout.
        
        This patch will not affect already cloned submodules at all.
        
        Tests that rely on .git being a directory have been fixed.
        
        Signed-off-by: Fredrik Gustafsson <iveqy@iveqy.com>
        Mentored-by: Jens Lehmann <Jens.Lehmann@web.de>
        Mentored-by: Heiko Voigt <hvoigt@hvoigt.net>
        Signed-off-by: Junio C Hamano <gitster@pobox.com>

That is, before 501770 I get sub1/sub2 as Manuel said above.  After
501770 I get sub1/sub1 (!).  I have not been able to reproduce the
sub2/sub2 behavior with any version I tried (1.7.0, 1.7.3.4, 1.7.6,
etc.).  Perhaps there is a configuration setting that changes this?

In any case, since it's -rc4 time and 501770 is set to go into the
release, it would be nice if you could check whether this is indeed
correct/intended.



---- 8< ----
#!/bin/sh

test_description='submodule change bug'
. ./test-lib.sh

## set current directory as working directory
wd=`pwd`

test_expect_success 'set up submodules' '
## create repositories to be used as submodules
mkdir sub1 sub2 remote &&
(cd sub1 &&
    git init &&
    echo "test sub1" >> file &&
    git add file &&
    git commit -m "initial commit sub1"
) &&
git clone --bare sub1 remote/sub1.git &&
(cd sub2 &&
    git init &&
    echo "test sub2" >> file &&
    git add file &&
    git commit -m "initial commit sub2"
) &&
git clone --bare sub2 remote/sub2.git
'

test_expect_success 'set up super-repo' '
## create super repository
git init --bare remote/super.git &&
git clone remote/super.git super &&
(cd super &&
    git submodule add "$wd"/remote/sub1.git sub &&
    git commit -m "Added submodule sub1 as sub" &&
    git push -u origin master
)'

test_expect_success 'make super-repo with sub1->sub2' '
## clone super repository again
## and switch submodule sub1 by sub2
git clone --recursive remote/super.git super2 &&
(cd super2 &&
    ## remote submodule sub
    git config --remove-section submodule.sub &&
    git rm .gitmodules &&
    rm -rf sub &&
    git rm sub &&
    git commit -m "Removed submodule sub" &&
    ## add submodule sub2 as sub
    git submodule add "$wd"/remote/sub2.git sub &&
    git commit -m "Added submodule sub2 as sub" &&
    git push
)
'

test_expect_success 'pull from super2' '
## now pull super
(cd super &&
    git pull &&
    ## this will fail
    if ! git submodule update --init; then
    ## so sync first und update again
        git submodule sync &&
        git submodule update --init
    fi &&
    ## now sub is corrupt
    (cd sub &&
        git log master >log1 && ## this is from sub1
        echo "# next line should be: initial commit from sub1" &&
        grep sub1 log1 &&
        echo "# next line should be: initial commit from sub2" &&
        git log origin/master >log2 && ## this is from sub2
        grep sub2 log2
    )
)
'

test_done

^ permalink raw reply

* Re: [PATCH] Implement fast hash-collision detection
From: Jeff King @ 2011-11-29  9:07 UTC (permalink / raw)
  To: Bill Zaumen; +Cc: git, gitster, pclouds, spearce, torvalds
In-Reply-To: <1322546563.1719.22.camel@yos>

[Your original message was almost certainly bounced by git@vger because
 it surpassed the 100K limit; I'll try to quote liberally for those who
 missed the original. You may want to split your content and repost.]

On Mon, Nov 28, 2011 at 10:02:43PM -0800, Bill Zaumen wrote:

> Maintains a database of CRCs of Git objects to allow SHA-1 hash
> collisions to be detected with high probability (1 - 1/2^32) and with
> little computational overhead.  The CRCs cover the content of Git
> objects, but not the header.  For loose objects, these are stored in
> subdirectories of GIT_DIRECTORY/objects/crcs, with each subdirectory's
> name consisting of the first two hexadecimal digits of the
> corresponding object's SHA-1 hash.  For each pack file, FILE.pack, the
> CRCs are stored in a FILE.mds, in the same order as the SHA-1 hashes
> that appear in FILE.idx.  Checks for hash collisions are made whenever
> a new loose object is created.

I'm confused. Is this about avoiding accidental collisions, or about
avoiding malicious collision attacks?

Let's assume the former for a minute. The usual way of considering
collision likelihood is not by probabilities, but by the number of items
that must be selected to achieve a >50% probability that there is a
collision. Which is the square root of the number of total items, or
half the bit-space.

So we expect to see a collision in 160-bit SHA-1 after writing about
2^80 objects. The linux-2.6 repository has about 2.2 million objects
after 6.5 years of development. If development continues at this pace,
we would expect a collision in a mere 10^18 years.

Assuming your 32-bit CRC is statistically independent of the SHA-1
value, it adds 32 bits to the hash, or 16 bits to the number of objects
we expect to generate (i.e., 2^96). That bumps us up to 10^23 years.

However, it may be of interest that the Sun is expected to burn out in a
mere 10^10 years[1].

So I'm not sure there is really any point in adding a few bits to the
hash length to detect an accidental collision. It's already
fantastically unlikely. Adding another probability on top does make it
less likely, but in the same ballpark of fantastic.

You can argue about whether linux-2.6 is a representative sample, or
whether the pace of object creation might increase. But the simple
answer is that we're many orders of magnitude away from having to care.

However, in your patch you write:

> +Security-Issue Details
> +----------------------
> +
> +Without hash-collision detection, Git has a higher risk of data
> +corruption due to the obvious hash-collision vulnerability, so the
> +issue is really whether a usable vulnerability exists. Recent research
> +has shown that SHA-1 collisions can be found in 2^63 operations or
> +less.  While one result claimed 2^53 operations, the paper claiming
> +that value was withdrawn from publication due to an error in the
> +estimate. Another result claimed a complexity of between 2^51 and 2^57
> +operations, and still another claimed a complexity of 2^57.5 SHA-1
> +computations. A summary is available at
> +<http://hackipedia.org/Checksums/SHA/html/SHA-1.htm#SHA-1>. Given the
> +number of recent attacks, possibly by governments or large-scale
> +criminal enterprises
> +(<http://www.csmonitor.com/World/terrorism-security/2011/0906/Iranian-government-may-be-behind-hack-of-Dutch-security-firm>,
> +<http://en.wikipedia.org/wiki/Operation_Aurora>,
> +<http://en.wikipedia.org/wiki/Botnet#Historical_list_of_botnets>),
> +which include botnets with an estimated 30 million computers, there is
> +reason for some concern: while generating a SHA-1 collision for
> +purposes of damaging a Git repository is extremely expensive
> +computationally, it is possibly within reach of very well funded
> +organizations. 2^32 operations, even if the operations are as
> +expensive as computing a SHA-1 hash of a modest source-code file, can
> +be performed in a reasonably short period of time on the type of
> +hardware widely used in desktop or laptop computers at present. With
> +sufficient parallelism, 30 million personal computers sufficient for
> +playing the latest video games could perform 2^56 operations in a
> +reasonable time.

...which makes me think that you do care about malicious collisions. All
of what you wrote above seems fairly accurate. Let's leave aside that
those numbers are for a collision attack as opposed to a pre-image
attack (collision attacks are hard to execute, as they require you to
generate two "matched" objects, one good and one evil, and then convince
somebody to first accept your good object, only to later replace it with
the evil one).

I have two concerns with this as a security mechanism:

First, let us assume that the implementation details of your scheme
work, and that git users will effectively be checking not just a SHA-1,
but now a SHA-1 plus your additional digest.

In that case, why use crc32 as the digest? It seems like a terrible
choice. Assuming it's cryptographically secure, then it's adding a mere
16 bits to the numbers you mentioned above. IOW, it's at best a band-aid
that pushes attacks off for a few years. But more importantly, it's
_not_ secure, and can be trivially forged.

But that's a relatively simple problem. crc32 could be replaced in your
scheme with any of the SHA-2 family, or the upcoming SHA-3, or whatever.

That brings me to my second concern: how does this alternative message
digest have any authority?

For example, your patch teaches the git protocol a new extension to pass
these digests along with the object sha1s. But how do we know the server
isn't feeding us a bad digest along with the bad object?

The "usual" security model discussed in git is that of verifying a
signed tag. Linus signs a tag and pushes it to a server. I fetch the
tag, and can verify the signature on the tag. I want to know that I have
the exact same objects that Linus had. But I can't assume the server is
trustworthy; it may have fed me a bad object with a collided sha1.

But what's in the signed part of the tag object? Only the sha1 of the
commit the tag points to, but not the new digest. So an attacker can
replace the commit with one that collides, and it can in turn point to
arbitrary trees and blobs.

You can fix this by including an extra header in the signed part of the
tag that says "also, the digest of the commit I point to is X". Then you
know you have the same commit that Linus had. But the commit points to a
tree by its sha1. So you have to add a similar header in the commit
object that says "also, the digest of the tree I point to is X". And
ditto for all of the parent pointers, if you want to care about signing
history. And then you have the same problem in the tree: each sub-tree
and blob is referenced by its sha1.

In other words, authority flows from the gpg-signed tag portion, and
links in the chain to each object are made by referencing sha1s. Every
time such a link is made, you need to also include the digest of the
object, or you are giving the attacker a place to insert a collided
object.

For tag and commit objects, this actually isn't that hard; they have a
relatively small number of links, and they have room for arbitrary
headers. I.e., add a "tree-md-sha256" header that gives the expected
sha-256 of the tree object referenced by sha-1 in the "tree" header.
Older versions of git will skip over this header (though obviously they
won't be smart enough to do the more-secure verification).

However, it's harder for trees. Each entry needs to have the new digest
added, but there simply isn't room in the format. So we would have to
change the tree format, breaking interoperability with older versions of
git. And all of your tree sha1's would change as soon as you wrote them
with the new format. That's only slightly better than just swapping
sha1 out for a better algorithm.

One trick you could do is to include the digest of each blob in the
commit object itself. This really bloats the size of commit objects,
though. You can store a digest of their digests instead (which your
patch actually calculates, but AFAICT does not actually insert into the
commit object). That is small and relatively cheap (it turns commit from
an O(1) operation into an O(number of files) operation, but the per-file
constant is pretty inexpensive). But it comes at the expense of not being
able to tell which of the constituent blobs was actually attacked.

So I think all of that would work for verification starting at a signed
tag that points to a commit or a blob. For a tag pointing straight to a
tree, it could include the same "digest of digests" that the commit
would.

But it can never handle direct sha1 references outside of git. For
example, a bug tracker or CI system that references a commit in git will
do so by sha1. But that's an insecure link in the chain; you really need
it to refer to both the sha1 _and_ the digest, and then you can verify
that the object you have under that sha1 matches the digest. So you'd
have to teach a whole generation of tools that they can't trust git sha1
ids, and that you need an "extended" id that includes the digest.

At that point, I really wonder if a flag day to switch to a new
repository format is all that bad. You'd have to either rewrite existing
history (which means re-signing tags if you care about them!), or just
decide to leave the new and old histories disconnected.  Whereas a
scheme based around an added-on digest could keep the old history
connected. But at the same time, the old history will have zero value
from a security perspective. If sha1 is broken, then those old
signatures are worthless, anyway. And just switching to a new algorithm
means the implementation remains very simple.

-Peff

[1] Fun fact of the day: if linux development continues at the same
    rate until the Sun burns out, there is a 1/2^60 chance of a
    collision!

^ permalink raw reply

* Re: [PATCH 10/13] credentials: add "cache" helper
From: Jeff King @ 2011-11-29  5:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vr50ru5d3.fsf@alter.siamese.dyndns.org>

On Mon, Nov 28, 2011 at 04:42:00PM -0800, Junio C Hamano wrote:

> > +		askpass: Password for '\''https://user1@example.com'\'':
> > +	'
> 
> Missing EOF for here document; I fixed this already in 'pu', but please
> squash it in when rerolling after 1.7.8 final.

Thanks, will do.

-Peff

^ permalink raw reply

* Re: [PATCH 03/13] introduce credentials API
From: Jeff King @ 2011-11-29  5:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v4nxnvs1w.fsf@alter.siamese.dyndns.org>

On Mon, Nov 28, 2011 at 01:46:35PM -0800, Junio C Hamano wrote:

> > diff --git a/Documentation/technical/api-credentials.txt b/Documentation/technical/api-credentials.txt
> > new file mode 100644
> > index 0000000..3061077
> > --- /dev/null
> > +++ b/Documentation/technical/api-credentials.txt
> > @@ -0,0 +1,148 @@
> > + ...
> > +`credential_fill`::
> > +
> > +	Attempt to fill the username and password fields of the passed
> > +	credential struct, first consulting storage helpers, then asking
> > +	the user. Guarantees that the username and password fields will
> > +	be filled afterwards (or die() will be called).
> > +
> > +`credential_reject`::
> > +
> > +	Inform the credential subsystem that the provided credentials
> > +	have been rejected. This will notify any storage helpers of the
> > +	rejection (which allows them to, for example, purge the invalid
> > +	credentials from storage), and then clear the username and
> > +	password fields in `struct credential`. It can then be
> > +	`credential_fill`-ed again.
> > +
> > +`credential_approve`::
> > +
> > +	Inform the credential subsystem that the provided credentials
> > +	were successfully used for authentication. This will notify any
> > +	storage helpers of the approval, so that they can store the
> > +	result to be used again.
> 
> It's a bit hard to read and understand which part of the system calls
> these and which other part of the system is responsible for implementing
> them, and how "helper" fits into the picture (perhaps calling some of
> these interfaces will result in "helper" getting called?).

How about following it with a rough example of how they would be used,
like:

  /*
   * Create a credential with some context; we don't yet know the
   * username or password.
   */
  struct credential c = CREDENTIAL_INIT;
  c.protocol = xstrdup("https");
  c.host = xstrdup("example.com");
  c.path = xstrdup("foo.git");

  /*
   * Fill in the username and password fields by contacting helpers
   * and/or asking the user. The function will die if it fails.
   */
  credential_fill(&c);

  /*
   * And then finally make our request, reporting back to the credential
   * system on whether it succeeded or failed.
   */
  if (make_http_request(c.host, c.path, c.username, c.password) < 0)
          credential_reject(&c);
  else
          credential_accept(&c);

Does that make it more clear?

> > +Credential Storage Helpers
> > +--------------------------
> > +
> > +Credential storage helpers are programs executed by git to fetch or save
> > +credentials from and to long-term storage (where "long-term" is simply
> > +longer than a single git process; e.g., credentials may be stored
> > +in-memory for a few minutes, or indefinitely on disk).
> > +
> > +Helper scripts should generally be found in the PATH, and have names of
> > +the form "git-credential-$HELPER".
> 
> Is this normal PATH or can a helper be moved away into $GIT_EXEC_PATH?

They are executed as normal git commands, so $GIT_EXEC_PATH. Actually,
they could even be aliases or builtins, for that matter. I'll update the
documentation to reflect that.

> I briefly wondered if they want to be git-credential--$HELPER; I do not
> deeply care either way, though.

Yeah, I thought about that. In some ways they are implementation details
that you don't care about, and that makes the "--" separator make sense.
But some of them may want to expose other actions to the user. For
example, git-credential-cache has an "exit" action that you can call to
ask the cache daemon to exit prematurely, forgetting all credentials.
Git will never use it; it's purely an end-user interface.

I think it might also make sense to ask helpers to provide a "clear"
operation which will delete any stored items. Git wouldn't use it, but
it would be helpful for testing.

> > When the helper string "$HELPER" is
> > +passed to credential functions, they will run "git-credential-$HELPER"
> > +via the shell. If the first word of $HELPER contains non-alphanumeric
> > +characters, then $HELPER is executed as a shell command. This makes it
> > +possible to specify individual scripts by their full path (e.g.,
> > +`/path/to/helper`) or even shell snippets (`f() { do_whatever; }; f`).
> 
> The definition of "the first word" above is not specified but it seems to
> be "space separated". In other words, 'f() { do_whatever; }; f' would be
> OK but 'f () { do_whatever; }; f' would not be. Am I reading and guessing
> your intention correctly?

Yes, that is my intent, though I am not altogether happy with it, as
there are too many confusing corner cases (e.g., your "f ()" example
above). My thinking was guided by these requirements:

  1. Users who are using one of the included helpers or a drop-in
     third-party helper should be able to just do:

        git config credential.helper cache

     which is extremely readable and intuitive. Thus, I want to prepend
     "git credential-" at least for simple cases.

  2. You should be able to add arguments to a helper, if it takes them.
     For example, with the current code you can do:

       git config credential.helper 'cache --timeout=300'

     or

       git config credential.helper 'store --store=/some/path'

     and that's all interpreted by the shell, and the helpers receive
     your configuration (the "--store" name is a holdover from version 1
     of the series, which had many more options. Probably "--path" or
     "--file" would be better name).

     So that means that the rule for when to prepend "git credential-"
     is not simply "when the helper is a single token". It can be
     arbitrary arguments, and should be interpreted by the shell (to
     handle things like quoting).

  3. You should be able to use a full path, like:

       git config credential.helper '/path/to/git-credential-foo'

     This works with the isalnum thing. But you could also use
     is_absolute_path.

  4. You should be able to insert full shell snippets if you really want
     to. This is handy for testing, and I could even see somebody doing
     something like:

       git config credential.helper '
       f() {
         case "`uname -s`" in
         Linux) git credential-gnome "$@" ;;
         Darwin) git credential-osx "$@" ;;
         esac
       }
       f'

     if they have a multi-platform config file (or they could check
     $DISPLAY, or `hostname`, or any of a zillion other things).  You
     could put it in a shell script, of course, but sometimes it is
     nice to carry little snippets like that in the config file.

     You could do something like "prepend ! to execute via the shell",
     like we do for aliases. But that is perhaps a little confusing, in
     that we are _always_ executing via the shell. It is just that
     sometimes we prepend some magic to the command name. I dunno.

So one possible rule would be:

  1. If it starts with "!", clip off the "!" and hand it to the shell.

  2. Otherwise, if is_absolute_path(), hand it to the shell directly.

  3. Otherwise, prepend "git credential-" and hand it to the shell.

I think that is slightly less confusing than the "first word is alnum"
thing.

> Funnily enough, 'f<TAB>() { do_whatever; }; f' would qualify as the first
> word having a non alphanumeric.

Yeah, arguably a bug in my code. ;)

> > +The details of the credential will be provided on the helper's stdin
> > +stream. The credential is split into a set of named attributes.
> > +Attributes are provided to the helper, one per line. Each attribute is
> > +specified by a key-value pair, separated by an `=` (equals) sign,
> > +followed by a newline. The key may contain any bytes except `=` or
> > +newline. The value may contain any bytes except a newline.  In both
> > +cases, all bytes are treated as-is (i.e., there is no quoting, and one
> > +cannot transmit a value with newline in it).
> 
> Can k or v contain a NUL? The literal reading of the above implies they
> could, but I do not think you meant to.

No, they cannot. I'll make that more explicit in the next version.

How do you feel about the "values cannot contain a newline" requirement?
It feels wrong to me to design a protocol with no escape hatch for
quoting. But I really didn't want helper implementations to have to deal
with unquoting or other parsing headaches. And terminating with NUL
instead of newline makes things annoying in some languages (e.g.,
shell). So I considered it "good enough", but I'd be curious to hear
other opinions.

> > +int credential_read(struct credential *c, FILE *fp)
> > +{
> > ...
> > +			c->host = xstrdup(value);
> > +		}
> > +		else if (!strcmp(key, "path")) {
> > ...
> > +		/* ignore other lines; we don't know what they mean, but
> > +		 * this future-proofs us when later versions of git do
> > +		 * learn new lines, and the helpers are updated to match */
> 
> Two style nits.

I'm supposed to guess? ;P

I know you like blank lines at the top and bottom of multi-line
comments; I often forget that, as they are not part of my usual style (I
find them no easier to read).

I'm guessing the other is an uncuddled else? I can never remember which
we prefer, as we are horribly inconsistent in the current code.

-Peff

^ permalink raw reply

* [PATCH] gitweb: Don't append ';js=(0|1)' to external links
From: Jürgen Kreileder @ 2011-11-29  0:53 UTC (permalink / raw)
  To: git

Signed-off-by: Juergen Kreileder <jk@blackdown.de>
---
 gitweb/gitweb.perl                       |    2 +-
 gitweb/static/js/javascript-detection.js |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 4f0c3bd..dfe3407 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3974,7 +3974,7 @@ sub git_footer_html {
 		print qq!<script type="text/javascript">\n!.
 		      qq!window.onload = function () {\n!;
 		if (gitweb_check_feature('javascript-actions')) {
-			print qq!	fixLinks();\n!;
+			print qq!	fixLinks('$my_url');\n!;
 		}
 		if ($jstimezone && $tz_cookie && $datetime_class) {
 			print qq!	var tz_cookie = { name: '$tz_cookie', expires: 14, path:
'/' };\n!. # in days
diff --git a/gitweb/static/js/javascript-detection.js
b/gitweb/static/js/javascript-detection.js
index fa2596f..36964ad 100644
--- a/gitweb/static/js/javascript-detection.js
+++ b/gitweb/static/js/javascript-detection.js
@@ -29,11 +29,11 @@ var jsExceptionsRe = /[;?]js=[01](#.*)?$/;
  *
  * @globals jsExceptionsRe
  */
-function fixLinks() {
+function fixLinks(baseurl) {
 	var allLinks = document.getElementsByTagName("a") || document.links;
 	for (var i = 0, len = allLinks.length; i < len; i++) {
 		var link = allLinks[i];
-		if (!jsExceptionsRe.test(link)) {
+		if (!jsExceptionsRe.test(link) && !link.href.indexOf(baseurl)) {
 			link.href = link.href.replace(/(#|$)/,
 				(link.href.indexOf('?') === -1 ? '?' : ';') + 'js=1$1');
 		}
-- 
1.7.5.4

^ permalink raw reply related

* Re: gitweb: in-page errors don't work with mod_perl
From: Jakub Narebski @ 2011-11-29  0:49 UTC (permalink / raw)
  To: Jürgen Kreileder; +Cc: git
In-Reply-To: <CAKD0Uuw35Kdno=OxqP5SYtaVjqUFZCLL9fSSscN7sq=KmycyxA@mail.gmail.com>

Jürgen Kreileder wrote:
> On Mon, Nov 28, 2011 at 23:32, Jakub Narebski <jnareb@gmail.com> wrote:
>> Jürgen Kreileder wrote:
>>> On Mon, Nov 28, 2011 at 21:13, Jürgen Kreileder <jk@blackdown.de> wrote:
>>>> On Mon, Nov 28, 2011 at 17:54, Jakub Narebski <jnareb@gmail.com> wrote:
>>>>  [...]
>>>>>
>>>>> The configuration is very similar.  Perhaps that is the difference between
>>>>> Apache 2.0.x (mine) and Apache 2.2.x (yours).
>>>>>
>>>>> Does adding `$r->err_headers_out();` before `$r->status(200);` helps?
>>>>> I'm grasping at straws here.  mod_perl documentation is not very helpful.
>>>>
>>>> Doesn't help unfortunately.  It's hard to find any information about
>>>> this on the net (except for your comment on stackoverflow :).
>>>>
>>>> The only way to get mod_perl to return a custom error message with
>>>> correct status code I've found so far is $r->custom_response($status,
>>>> $msg).  Unfortunately mod_perl then ignores headers I set, e.g.
>>>> content-type.
>>>
>>> I guess this explains it:
>>> http://foertsch.name/ModPerl-Tricks/custom-content_type-with-custom_response.shtml
>>> Requires quite some restructuring to gitweb.perl.
>>
>> I'm coming close to declaring that ModPerl::Registry is horribly broken
>> with respect to error pages created by CGI, and say that we don't support
>> it, removing mod_perl configuration examples from gitweb documentation.
> 
> Makes sense.  The benefits of mod_perl are properly small for gitweb anyway.

Anyway you can run gitweb with FastCGI (supposedly - I don't know if it
was tested), which provides the same (or most of the) advantages that
mod_perl gives, without the troubles.  Just rename gitweb.cgi to
gitweb.fcgi and configure web server appropriately (and have FCGI Perl
module installed).

-- 
Jakub Narebski
Poland

^ permalink raw reply

* Re: [PATCH 10/13] credentials: add "cache" helper
From: Junio C Hamano @ 2011-11-29  0:42 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20111124110710.GH8417@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> diff --git a/t/lib-credential.sh b/t/lib-credential.sh
> index 54ae1f4..ac54b38 100755
> --- a/t/lib-credential.sh
> +++ b/t/lib-credential.sh
> @@ -21,6 +21,225 @@ read_chunk() {
> ...
> +	test_expect_success "helper ($HELPER) can forget user" '
> +		check reject $HELPER <<-\EOF &&
> +		protocol=https
> +		host=example.com
> +		username=user1
> +		EOF
> +		check fill $HELPER <<-\EOF
> +		protocol=https
> +		host=example.com
> +		username=user1
> +		--
> +		username=user1
> +		password=askpass-password
> +		--
> +		askpass: Password for '\''https://user1@example.com'\'':
> +	'

Missing EOF for here document; I fixed this already in 'pu', but please
squash it in when rerolling after 1.7.8 final.

^ permalink raw reply

* Re: gitweb: in-page errors don't work with mod_perl
From: Jürgen Kreileder @ 2011-11-28 23:40 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <201111282332.07276.jnareb@gmail.com>

On Mon, Nov 28, 2011 at 23:32, Jakub Narebski <jnareb@gmail.com> wrote:
> Jürgen Kreileder wrote:
>> On Mon, Nov 28, 2011 at 21:13, Jürgen Kreileder <jk@blackdown.de> wrote:
>> > On Mon, Nov 28, 2011 at 17:54, Jakub Narebski <jnareb@gmail.com> wrote:
>> >  [...]
>> > >
>> > > The configuration is very similar.  Perhaps that is the difference between
>> > > Apache 2.0.x (mine) and Apache 2.2.x (yours).
>> > >
>> > > Does adding `$r->err_headers_out();` before `$r->status(200);` helps?
>> > > I'm grasping at straws here.  mod_perl documentation is not very helpful.
>> >
>> > Doesn't help unfortunately.  It's hard to find any information about
>> > this on the net (except for your comment on stackoverflow :).
>> >
>> > The only way to get mod_perl to return a custom error message with
>> > correct status code I've found so far is $r->custom_response($status,
>> > $msg).  Unfortunately mod_perl then ignores headers I set, e.g.
>> > content-type.
>>
>> I guess this explains it:
>> http://foertsch.name/ModPerl-Tricks/custom-content_type-with-custom_response.shtml
>> Requires quite some restructuring to gitweb.perl.
>
> I'm coming close to declaring that ModPerl::Registry is horribly broken
> with respect to error pages created by CGI, and say that we don't support
> it, removing mod_perl configuration examples from gitweb documentation.

Makes sense.  The benefits of mod_perl are properly small for gitweb anyway.


Juergen

-- 
http://blog.blackdown.de/
http://www.flickr.com/photos/jkreileder/

^ permalink raw reply

* [ANNOUNCE] Git 1.7.8-rc4
From: Junio C Hamano @ 2011-11-28 23:37 UTC (permalink / raw)
  To: git; +Cc: Linux Kernel

A release candidate Git 1.7.8-rc4 is available for testing. This hopefully
is the last rc before the final release (we had to make a last-minute UI
tweak to a new feature in multi-step "revert" command to avoid later
regrets). We already have interesting topics queued together with proposed
fixes for older non-regression bugs which would make the next cycle short,
sweet and fun.

The release tarballs are found at:

    http://code.google.com/p/git-core/downloads/list

and their SHA-1 checksums are:

528b8099e980877eb7e2268be8afdf12efe13490  git-1.7.8.rc4.tar.gz
f775ddd3ee207816840ced38c28d1a400a823536  git-htmldocs-1.7.8.rc4.tar.gz
b35301ed60712f441a9c3af08446184faf624d51  git-manpages-1.7.8.rc4.tar.gz

Also the following public repositories all have a copy of the v1.7.8.rc4
tag and the master branch that the tag points at:

  url = git://repo.or.cz/alt-git.git
  url = https://code.google.com/p/git-core/
  url = git://git.sourceforge.jp/gitroot/git-core/git.git
  url = git://git-core.git.sourceforge.net/gitroot/git-core/git-core
  url = https://github.com/gitster/git

----------------------------------------------------------------

Changes since v1.7.8-rc3 are as follows:

Johannes Sixt (1):
      Fix revert --abort on Windows

Jonathan Nieder (11):
      do not let git_path clobber errno when reporting errors
      notes merge: eliminate OUTPUT macro
      Makefile: add missing header file dependencies
      Makefile: add option to disable automatic dependency generation
      revert: rename --reset option to --quit
      revert: rearrange pick_revisions() for clarity
      revert: improve error message for cherry-pick during cherry-pick
      revert: write REVERT_HEAD pseudoref during conflicted revert
      revert: introduce --abort to cancel a failed cherry-pick
      revert: remove --reset compatibility option
      revert --abort: do not leave behind useless sequencer-old directory

Junio C Hamano (5):
      name-rev --all: do not even attempt to describe non-commit object
      Git 1.7.7.4
      Hopefully final update of release notes before 1.7.8 final
      Update 1.7.8 draft release notes in preparation for rc4
      Git 1.7.8-rc4

Marc-André Lureau (1):
      mailmap: xcalloc mailmap_info

Nguyễn Thái Ngọc Duy (1):
      revert: do not pass non-literal string as format to git_path()

Ramkumar Ramachandra (4):
      http: remove unused function hex()
      convert: don't mix enum with int
      sha1_file: don't mix enum with int
      git-compat-util: don't assume value for undefined variable

Ramsay Allan Jones (3):
      t5501-*.sh: Fix url passed to clone in setup test
      config.c: Fix a static buffer overwrite bug by avoiding mkpath()
      convert.c: Fix return type of git_path_check_eol()

Thomas Hochstein (1):
      documentation fix: git difftool uses diff tools, not merge tools.

Vincent van Ravesteijn (2):
      builtin-branch: Fix crash on invalid use of --force
      builtin-reset: Documentation update

^ permalink raw reply

* [PATCH] gitweb: Escape attribute in chop_and_escape_str()
From: Jürgen Kreileder @ 2011-11-28 23:27 UTC (permalink / raw)
  To: git

Fixes the title attribute in <span title="Jürgen Kreileder">Jürgen
Kreileder</span>
for example because to_utf8() is called implicitly now.

(Not sure why the attribute is there at all in the example. From my
point of view
nothing got chopped.)

Signed-off-by: Juergen Kreileder <jk@blackdown.de>
---
 gitweb/gitweb.perl |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 4f0c3bd..fd76407 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1699,7 +1699,7 @@ sub chop_and_escape_str {
 		return esc_html($chopped);
 	} else {
 		$str =~ s/[[:cntrl:]]/?/g;
-		return $cgi->span({-title=>$str}, esc_html($chopped));
+		return $cgi->span({-title => esc_attr($str)}, esc_html($chopped));
 	}
 }

-- 
1.7.5.4

^ permalink raw reply related

* Re: gitweb: in-page errors don't work with mod_perl
From: Jakub Narebski @ 2011-11-28 22:32 UTC (permalink / raw)
  To: Jürgen Kreileder; +Cc: git
In-Reply-To: <CAKD0UuzsV7A_j8YD4b0Lb95L2NcRcSu5PH8C9aZQmEx3tOuQjQ@mail.gmail.com>

Jürgen Kreileder wrote:
> On Mon, Nov 28, 2011 at 21:13, Jürgen Kreileder <jk@blackdown.de> wrote:
> > On Mon, Nov 28, 2011 at 17:54, Jakub Narebski <jnareb@gmail.com> wrote:
> >  [...]
> > >
> > > The configuration is very similar.  Perhaps that is the difference between
> > > Apache 2.0.x (mine) and Apache 2.2.x (yours).
> > >
> > > Does adding `$r->err_headers_out();` before `$r->status(200);` helps?
> > > I'm grasping at straws here.  mod_perl documentation is not very helpful.
> >
> > Doesn't help unfortunately.  It's hard to find any information about
> > this on the net (except for your comment on stackoverflow :).
> >
> > The only way to get mod_perl to return a custom error message with
> > correct status code I've found so far is $r->custom_response($status,
> > $msg).  Unfortunately mod_perl then ignores headers I set, e.g.
> > content-type.
> 
> I guess this explains it:
> http://foertsch.name/ModPerl-Tricks/custom-content_type-with-custom_response.shtml
> Requires quite some restructuring to gitweb.perl.

I'm coming close to declaring that ModPerl::Registry is horribly broken
with respect to error pages created by CGI, and say that we don't support
it, removing mod_perl configuration examples from gitweb documentation.

WTF 'return Apache2::Const::DONE;' doesn't work with ModPerl::Registry?
It is supposed to work with native mod_perl scripts...

-- 
Jakub Narebski
Poland

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox