Git development
 help / color / mirror / Atom feed
* [RFC PATCH] Why doesn't git rebase --interactive --preserve-merges continue past known conflicts?
From: David D. Kilzer @ 2011-01-02  1:20 UTC (permalink / raw)
  To: git; +Cc: David D. Kilzer
In-Reply-To: <282560.39741.qm@web30004.mail.mud.yahoo.com>

On 2010-12-31, David Kilzer wrote:
> When I run "git rebase --interactive --preserve-merges" on a sequence of 
> commits, edit an earlier commit, then run "git rebase --continue", the rebase 
> operation always stops on a merge commit with a known conflict (in the rr-cache) 
> instead of resolving it and continuing.
> 
> As long as I'm not rearranging commits, I expect git-rebase to resolve the known 
> merge commit conflict and continue.  Why does it always stop?

Here's a very rough patch that fixes my original test case so that an interactive
rebase won't stop when git-rerere knows how to resolve all conflicts during a
merge.

However, if there are any changes to a non-conflicted file during the original
merge commit, they will be lost when rebasing, even with --preserve-merges.
Note that this occurs even without this patch applied.  You must compare the
current commit with original being rebased to make sure they're not lost.

Why doesn't an interactive rebase serialize to disk all of the changes in a merge
commit like it does for non-merge commits?

Dave
---
 git-rebase--interactive.sh                    |   11 ++++-
 t/t3404-rebase-interactive-preserve-merges.sh |   64 +++++++++++++++++++++++++
 2 files changed, 73 insertions(+), 2 deletions(-)
 create mode 100755 t/t3404-rebase-interactive-preserve-merges.sh

diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index a5ffd9a..32375bc 100755
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -338,11 +338,18 @@ pick_one_preserving_merges () {
 			msg="$(commit_message $sha1)"
 			# No point in merging the first parent, that's HEAD
 			new_parents=${new_parents# $first_parent}
+			# If rerere is enabled, pass the --rerere-autoupdate flag
+			test "$(git config --bool rerere.enabled)" = "true" &&
+				rerere_autoupdate=--rerere-autoupdate || rerere_autoupdate=
 			if ! do_with_author output \
-				git merge $STRATEGY -m "$msg" $new_parents
+				git merge $STRATEGY $rerere_autoupdate -m "$msg" $new_parents
 			then
 				printf "%s\n" "$msg" > "$GIT_DIR"/MERGE_MSG
-				die_with_patch $sha1 "Error redoing merge $sha1"
+				# Commit the result if rerere resolved all the conflicts.
+				git update-index -q --refresh &&
+					printf "Resolved all conflicts using rerere.\n"  &&
+					do_with_author git commit --no-verify -F "$GIT_DIR"/MERGE_MSG ||
+					die_with_patch $sha1 "Error redoing merge $sha1"
 			fi
 			echo "$sha1 $(git rev-parse HEAD^0)" >> "$REWRITTEN_LIST"
 			;;
diff --git a/t/t3404-rebase-interactive-preserve-merges.sh b/t/t3404-rebase-interactive-preserve-merges.sh
new file mode 100755
index 0000000..3479f38
--- /dev/null
+++ b/t/t3404-rebase-interactive-preserve-merges.sh
@@ -0,0 +1,64 @@
+#!/bin/sh
+#
+# Copyright (c) 2007 Johannes E. Schindelin
+# Copyright (c) 2010 David D. Kilzer
+#
+
+test_description='git rebase --interactive --preserve-matches does not automatically resolve known conflicts in merge commits'
+. ./test-lib.sh
+
+. "$TEST_DIRECTORY"/lib-rebase.sh
+
+set_fake_editor
+
+test_expect_success 'setup' '
+	test_commit A file1 &&
+	test_commit AA file2 &&
+	test_commit B file1 &&
+	git checkout -b topic1 HEAD^ &&
+	test_commit C file1 &&
+	git checkout master
+'
+
+test_expect_success 'rebase --interactive --preserve-merges should use rerere to resolve conflicts' '
+	git config rerere.enabled 1 &&
+	git rerere clear &&
+
+	git checkout -b merge1 master &&
+	test_must_fail git merge topic1 &&
+	printf "B\nC\n" > file1 &&
+	git add file1 &&
+	git commit -m "Merged." &&
+	git branch merge1-baseline &&
+
+	FAKE_LINES="edit 1 2" git rebase -i -p HEAD~2 &&
+	echo BB >> file2 &&
+	git add file2 &&
+	git commit --amend &&
+	git rebase --continue &&
+	git diff --exit-code merge1-baseline..merge1 file1
+'
+
+test_expect_success 'rebase --interactive --preserve-merges should not lose changes in merge commits' '
+	git config rerere.enabled 1 &&
+	git rerere clear &&
+
+	git checkout -b merge2 master &&
+	test_must_fail git merge topic1 &&
+	printf "B\nC\n" > file1 &&
+	git add file1 &&
+	echo BB >> file2 &&
+	git add file2 &&
+	git commit -m "Merged with change to non-conflicted file." &&
+	git branch merge2-baseline &&
+
+	FAKE_LINES="edit 1 2" git rebase -i -p HEAD~2 &&
+	echo AAA > file3 &&
+	git add file3 &&
+	git commit --amend &&
+	git rebase --continue &&
+	git diff --exit-code merge2-baseline..merge2 file1 &&
+	git diff --exit-code merge2-baseline..merge2 file2
+'
+
+test_done
-- 
1.7.2.1.103.g48452

^ permalink raw reply related

* Re: [PATCH] Fix expected values of setup tests on Windows
From: Junio C Hamano @ 2011-01-02  1:31 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Jonathan Nieder, Nguyen Thai Ngoc Duy, git
In-Reply-To: <201012312321.31294.j6t@kdbg.org>

Johannes Sixt <j6t@kdbg.org> writes:

> On Freitag, 31. Dezember 2010, Jonathan Nieder wrote:
>> Johannes Sixt wrote:
>> > On Freitag, 31. Dezember 2010, Nguyen Thai Ngoc Duy wrote:
>> >> in your patch does not. Does bash auto convert value in
>> >> TRASH_DIRECTORY="$TE..." line?
>> >
>> > No. When this line is executed:
>> >
>> > TEST_DIRECTORY=$(pwd)
>> >
>> > $(pwd) still has its default behavior to return the POSIX style path. pwd
>> > is redefined to pwd -W only later.
>>
>> Would it make sense to change it to
>>
>>  TEST_DIRECTORY=$PWD
>>
>> for clarity and robustness against code movement, then?
>
> Yes, that would make sense.

It will be very much appreciated to add a few sentences to clarify this to
"Do's and don'ts" section of t/README if you are re-rolling this.  Thanks.

^ permalink raw reply

* Re: [PATCH 0/4 v4] minor gitweb modifications
From: Jonathan Nieder @ 2011-01-01 23:02 UTC (permalink / raw)
  To: Sylvain Rabot; +Cc: git, Jakub Narebski
In-Reply-To: <20110101104121.GA12734@burratino>

Jonathan Nieder wrote:
> Sylvain Rabot wrote:

>>   gitweb: add css class to remote url titles
>
> I had a question (why make the remote url table inconsistent with the
> older projects_list table) and suggested a more generic approach in
> reply to v2[1]:

Sorry, forgot the link before.

[1] http://thread.gmane.org/gmane.comp.version-control.git/164004/focus=164010

^ permalink raw reply

* Re: how to update a submodule?
From: Oliver Kullmann @ 2011-01-01 20:39 UTC (permalink / raw)
  To: Seth Robertson; +Cc: git
In-Reply-To: <201012312342.oBVNg1lx021930@no.baka.org>

On Fri, Dec 31, 2010 at 06:42:01PM -0500, Seth Robertson wrote:
> 
> In message <20101231222438.GA28199@cs-wsok.swansea.ac.uk>, Oliver Kullmann writ
> es:
> 
>     it would be very helpful if somebody could tell me the
>     supposed workflow how to update a submodule (I'm using version
>     1.7.3.2; the man-page doesn't say much about it).
> 
> https://git.wiki.kernel.org/index.php/GitSubmoduleTutorial
>

thanks, that's definitely something.
 
> --------------------------------------------------
> cd submodule_path
> git checkout <branchname>
> git pull
> cd .. # or otherwise get to the superproject
> 
> git add submodule_path
> git commit -m "Updated submodule to latest <branchname>"
> git submodule update
> --------------------------------------------------
> 
> This (untested) code obviously only works for a single repository.  If
> you want to do it for all repositories, you need something more like
> the untested:
> 
> --------------------------------------------------
> git submodule foreach git checkout <branchname>
> git submodule foreach git pull
> git add -a -m "Updated all submodules to latest <branchname>"
> git submodule update
> --------------------------------------------------
> 

As far as I see that, this doesn't concern the problem how to I update
one repository with submodules from another repository with "these" submodules
(as the same paths)?

Actually, even the simplest case of just cloning a repository with submodules
doesn't work:
After cloning, "git submodule status" shows "-", okay so I do "git submodule init",
which already shows a false understanding --- it shows the URL of the old repository,
from which the original submodule originated, which is long gone and no longer relevant.
Then of course "git submodule update" fails, and in the submodule there is just nothing.

A clone should provide a relation to the mother-clone, not transitively to
the grandmother-clone.

The problem seems to be that the information about the place where to update a
submodule is in .gitmodules, which git actually has under version control (different
from other configuration information), and thus copies it verbatimly.
Okay, then apparently after a clone .gitmodules has to be updated (by hand).

So .gitmodules concerns only "git submodule update", not "git pull" from within
the submodules? This would be good to know, to understand the role of the information
in .gitmodules (where the task of "git submodule init" is apparently just to
transport this information to .git/configure ?).

This has the disadvantage, that one has to publish this private
information about the places where by chance one is pulling from?
Perhaps I should then put .gitmodules into .gitignore? Or would that have
bad consequences??

Or is the idea that .gitmodules normally is not under version control, and
thus "git init", which actually hides the information of .gitmodules ?

> If you find yourself doing this continuously, and doing it for all of
> your submodules, you may way to use gitslave instead of submodules
> which keeps the branches checked out all of the time so all you need to
> do is `git pull`.  This may be better, or worse, for you workflow.
> 
> gitslave (http://gitslave.sf.net) is useful when you control and
> develop on the subprojects at more of less the same time as the
> superproject, and furthermore when you typically want to tag, branch,
> push, pull, etc all repositories at the same time.
>

On the one hand, I always want to pull "everything" (inclusive the content of all submodules),
however the submodules are completely independent (they are from Github, or collaborations on
papers, and such things --- I just want to conveniently carry around all of that with me in
one go).
 
> git-submodule is better when you do not control the subprojects or
> more specifically wish to fix the subproject at a specific revision
> even as the subproject changes
>

This seems to be the case here, so it seemed to me that gitslave wouldn't be appropriate
here.

Thanks for your help!

Oliver
 

^ permalink raw reply

* [PATCH v2] msvc: Fix compilation error due to missing mktemp() declaration
From: Ramsay Jones @ 2011-01-01 20:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: GIT Mailing-list, kusmabite


Commit d1b6e6e (win32: use our own dirent.h, 2010-11-23) removed
the compat/vcbuild/include/dirent.h compatibility header file.
This file, among other things, included the <io.h> system header
file which provides the declaration of the mktemp() function.

In order to fix the compilation error, we add an include directive
for <io.h> to the compat/mingw.h header.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
---

Change from v1:
    - add #include to compat/mingw.h rather than compat/vcbuild/include/unistd.h

ATB,
Ramsay Jones

 compat/mingw.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/compat/mingw.h b/compat/mingw.h
index cafc1eb..1c6bc02 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -1,5 +1,6 @@
 #include <winsock2.h>
 #include <ws2tcpip.h>
+#include <io.h>
 
 /*
  * things that are not available in header files
-- 
1.7.3

^ permalink raw reply related

* Re: [PATCH] msvc: Fix compilation error due to missing mktemp() declaration
From: Ramsay Jones @ 2011-01-01 20:26 UTC (permalink / raw)
  To: kusmabite; +Cc: Junio C Hamano, GIT Mailing-list
In-Reply-To: <AANLkTinAAGWeFDomo-NQrVtrdf9ue1xVxufy+bkWz1Fc@mail.gmail.com>

Erik Faye-Lund wrote:
> Actually, I suspect that you know exactly what my concern is (based on
> the above), and that it's not with your patch.

Actually no, which is why I asked! :-D

> I have a patch in my MSVC-tree that includes io.h in mingw.h, because
> mingw.h already depends on facilities from io.h on MSVC. My point was
> simply that this dependency was already present, and as such I think
> mingw.h is the appropriate place to include it.

OK, v2 of patch on the way...

ATB,
Ramsay Jones

^ permalink raw reply

* Re: [PATCH 0/4 v4] minor gitweb modifications
From: Jonathan Nieder @ 2011-01-01 10:41 UTC (permalink / raw)
  To: Sylvain Rabot; +Cc: git, Jakub Narebski
In-Reply-To: <1293744031-17790-1-git-send-email-sylvain@abstraction.fr>

(adding back cc: jakub)

Hi,

Sylvain Rabot wrote:

>   gitweb: add extensions to highlight feature map
>   gitweb: remove unnecessary test when closing file descriptor

I like the above two.

>   gitweb: add css class to remote url titles

I had a question (why make the remote url table inconsistent with the
older projects_list table) and suggested a more generic approach in
reply to v2[1]:

	<table class="projects_list">
	<tr id="metadata_desc">
		<td class="metadata_tag">description</td>
		<td>Unnamed repository; edit this file to name it for gitweb.</td>
	</tr>
	<tr id="metadata_owner">
		<td class="metadata_tag">owner</td>
		<td>UNKNOWN</td>
	</tr>
	...

The idea was that the rows are already labelled for use by css, so to
make this stylable all we need to do is use a class for the first
column.  This way if some site operator wants the first column
*always* be bold then that is easy to do.

Another approach with similar effect would be

	<dl class="projects_list">
	<dt>description</dt>
	<dd id="metadata_desc"
		>Unnamed repository; edit this file to name it for gitweb</dd>
	<dt>owner>
	<dd id="metadata_owner"
		>UNKNOWN</dd>
	...

but that does not degrade as well to browsers not supporting css.  Any
thoughts on this?

>   gitweb: add vim modeline header which describes gitweb coding rule

I don't like this one.  Isn't the tabstop whatever the reader wants it
to be (e.g., 8)?  I don't like modelines as a way of documenting
coding standards because

 (1) they are not clear to humans and editors other than vim
 (2) they require annotating each source file separately.

See [1] for an alternative approach to configuring an editor to hack
on git.

Regards,
Jonathan

[1] http://thread.gmane.org/gmane.comp.version-control.git/109462/focus=109538

^ permalink raw reply

* Re: [PATCH] Fix expected values of setup tests on Windows
From: Nguyen Thai Ngoc Duy @ 2011-01-01  3:46 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, git
In-Reply-To: <201012311711.06989.j6t@kdbg.org>

On Fri, Dec 31, 2010 at 11:11 PM, Johannes Sixt <j6t@kdbg.org> wrote:
>> in your patch does not. Does bash auto convert value in
>> TRASH_DIRECTORY="$TE..." line?
>
> No. When this line is executed:
>
> TEST_DIRECTORY=$(pwd)
>
> $(pwd) still has its default behavior to return the POSIX style path. pwd is
> redefined to pwd -W only later.
>
> I'm hesitant to redefine pwd earlier in test-lib.sh, though, because we would
> have to audit all uses of TEST_DIRECTORY for whether POSIX style paths or
> drive-letter paths are needed.

Ah I missed that. Thanks. Ack from me.
-- 
Duy

^ permalink raw reply

* Re: how to update a submodule?
From: Seth Robertson @ 2010-12-31 23:42 UTC (permalink / raw)
  To: Oliver Kullmann; +Cc: git
In-Reply-To: <20101231222438.GA28199@cs-wsok.swansea.ac.uk>


In message <20101231222438.GA28199@cs-wsok.swansea.ac.uk>, Oliver Kullmann writ
es:

    it would be very helpful if somebody could tell me the
    supposed workflow how to update a submodule (I'm using version
    1.7.3.2; the man-page doesn't say much about it).

https://git.wiki.kernel.org/index.php/GitSubmoduleTutorial

--------------------------------------------------
cd submodule_path
git checkout <branchname>
git pull
cd .. # or otherwise get to the superproject

git add submodule_path
git commit -m "Updated submodule to latest <branchname>"
git submodule update
--------------------------------------------------

This (untested) code obviously only works for a single repository.  If
you want to do it for all repositories, you need something more like
the untested:

--------------------------------------------------
git submodule foreach git checkout <branchname>
git submodule foreach git pull
git add -a -m "Updated all submodules to latest <branchname>"
git submodule update
--------------------------------------------------

If you find yourself doing this continuously, and doing it for all of
your submodules, you may way to use gitslave instead of submodules
which keeps the branches checked out all of the time so all you need to
do is `gits pull`.  This may be better, or worse, for you workflow.

gitslave (http://gitslave.sf.net) is useful when you control and
develop on the subprojects at more of less the same time as the
superproject, and furthermore when you typically want to tag, branch,
push, pull, etc all repositories at the same time.

git-submodule is better when you do not control the subprojects or
more specifically wish to fix the subproject at a specific revision
even as the subproject changes

					-Seth Robertson

^ permalink raw reply

* how to update a submodule?
From: Oliver Kullmann @ 2010-12-31 22:24 UTC (permalink / raw)
  To: git

Hello,

it would be very helpful if somebody could tell me the
supposed workflow how to update a submodule (I'm using version
1.7.3.2; the man-page doesn't say much about it).

I have on the same machine two versions of repository A,
say A1, A2, and both contain repository B as submodule, but are otherwise
unrelated.

In A1 we have B up-to-date, but not in A2 (and with up-to-date I mean
the master branch in the same state as in A1).

git submodule update

is of no help, since A1, A2 have no relation to each other, and "update"
assumes some relation; the man-page doesn't speak of options for update,
and since "git submodule update" runs through all submodules, it's hard
to see how that could work.

So I have to go into B in A2, and do some pull (I suppose), but that
doesn't work so easily; the man-page seems to speak of B in A2 is a "normal repository"
but that doesn't seem to be the case in the sense that by default 
we are on "no branch". So it seems one can't do anything there, since 
you can't merge "no branch" into some other branch. The branch "no branch"
is up-to-date after a pull, but this can't be communicated to the other
branches (so seems to be misleading, or a waste of time).

So apparently I have to checkout master in A2:B, and then pull from
master in A1:B ?

Would be good if somebody could shed light on this.

In general I find that "pull" is much under-documented in git.
What I would like to do in A2 is just to say "pull everything from A1",
which seems a logical thing to do, and which should include all corresponding
branches and all corresponding submodules, but such a "pull all" doesn't
seem to exist?

Thanks for your help in any case!

Oliver

P.S. What would be really needed for the Git documentation is to speak
about the fundamental ideas, about the "mental images".

The submodule documentation starts with a blurb-sentence which I would
regard as misleading.
Then comes a paragraph about what it is not.
Then comes some hints regarding the implementation.
Then the final paragraph seems to indicate that above I could just
use "git pull A2" for A1, and would get B in A2 from B in A1, but only not checked out?
This doesn't seem to be the case, since apparently then "git submodule update"
tries to fetch from some foreign repository (not just checking out what was fetched before)?

Adding here examples for a complete workflow (perhaps with various stages, but without
complications(!)), from initialisation to updates from other repositories would be
very helpful.

As you can see, I don't know what to expect, and so I don't know what are errors in
the usage, and what is supposed to be. No mental image about "submodule" is offered,
and I don't know what is its intention. A great thing would be using the computer-science
concept of an abstract data type, so to explain what the interface and its semantics is,
not what is the implementation.

^ permalink raw reply

* Re: [PATCH] Fix expected values of setup tests on Windows
From: Johannes Sixt @ 2010-12-31 22:21 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Nguyen Thai Ngoc Duy, Junio C Hamano, git
In-Reply-To: <20101231203019.GC5898@burratino>

On Freitag, 31. Dezember 2010, Jonathan Nieder wrote:
> Johannes Sixt wrote:
> > On Freitag, 31. Dezember 2010, Nguyen Thai Ngoc Duy wrote:
> >> in your patch does not. Does bash auto convert value in
> >> TRASH_DIRECTORY="$TE..." line?
> >
> > No. When this line is executed:
> >
> > TEST_DIRECTORY=$(pwd)
> >
> > $(pwd) still has its default behavior to return the POSIX style path. pwd
> > is redefined to pwd -W only later.
>
> Would it make sense to change it to
>
>  TEST_DIRECTORY=$PWD
>
> for clarity and robustness against code movement, then?

Yes, that would make sense.

-- Hannes

^ permalink raw reply

* Re: [PATCH] Fix expected values of setup tests on Windows
From: Jonathan Nieder @ 2010-12-31 20:30 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Nguyen Thai Ngoc Duy, Junio C Hamano, git
In-Reply-To: <201012311711.06989.j6t@kdbg.org>

Johannes Sixt wrote:
> On Freitag, 31. Dezember 2010, Nguyen Thai Ngoc Duy wrote:

>> in your patch does not. Does bash auto convert value in
>> TRASH_DIRECTORY="$TE..." line?
>
> No. When this line is executed:
> 
> TEST_DIRECTORY=$(pwd)
> 
> $(pwd) still has its default behavior to return the POSIX style path. pwd is 
> redefined to pwd -W only later.

Would it make sense to change it to

 TEST_DIRECTORY=$PWD

for clarity and robustness against code movement, then?

^ permalink raw reply

* Why doesn't git rebase --interactive --preserve-merges continue past known conflicts?
From: David D. Kilzer @ 2010-12-31 19:30 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 493 bytes --]

When I run "git rebase --interactive --preserve-merges" on a sequence of 
commits, edit an earlier commit, then run "git rebase --continue", the rebase 
operation always stops on a merge commit with a known conflict (in the rr-cache) 
instead of resolving it and continuing.

As long as I'm not rearranging commits, I expect git-rebase to resolve the known 
merge commit conflict and continue.  Why does it always stop?

I've attached a simple test case that demonstrates this behavior.

Dave

[-- Attachment #2: t3404-rebase-interactive-preserve-merges.sh --]
[-- Type: application/x-sh, Size: 736 bytes --]

^ permalink raw reply

* [RFC PATCH v7 10/9] gitweb: Background cache generation and progress indicator
From: Jakub Narebski @ 2010-12-31 18:03 UTC (permalink / raw)
  To: git; +Cc: J.H., John 'Warthog9' Hawley
In-Reply-To: <20101222234843.7998.87068.stgit@localhost.localdomain>

This commit removes asymmetry in serving stale data (if stale data exists)
when regenerating cache in GitwebCache::FileCacheWithLocking.  The process
that acquired exclusive (writers) lock, and is therefore selected to
be the one that (re)generates data to fill the cache, can now generate
data in background, while serving stale data.

Those background processes are daemonized, i.e. detached from the main
process (the one returning data or stale data).  Otherwise there might be a
problem when gitweb is running as (part of) long-lived process, for example
from mod_perl or from FastCGI: it would leave unreaped children as zombies
(entries in process table).  We don't want to wait for background process,
and we can't set $SIG{CHLD} to 'IGNORE' in gitweb to automatically reap
child processes, because this interferes with using
  open my $fd, '-|', git_cmd(), 'param', ...
    or die_error(...)
  # read from <$fd>
  close $fd
    or die_error(...)
In the above code "close" for magic "-|" open calls waitpid...  and we
would would die with "No child processes".  Removing 'or die' would
possibly remove ability to react to other errors.

This feature can be enabled or disabled on demand via 'background_cache'
cache parameter.  It is turned on by default.


When there is no stale version suitable to serve the client, currently
we have to wait for the data to be generated in full before showing it.
Add to GitwebCache::FileCacheWithLocking, via 'generating_info' callback,
the ability to show user some activity indicator / progress bar, to
show that we are working on generating data.

Note that without generating data in background, process generating
data wouldn't print progress info, because 'generating_info' can exit
(and in the case of gitweb's git_generating_data_html does exit).

We don't need to daemonize background process in this case, where
there is no stale data to serve, but progress info is on.  This is
because we have to wait for the background process to finish
generating data anyway.

Gitweb itself uses "Generating..." page as activity indicator, which
redirects (via <meta http-equiv="Refresh" ...>) to refreshed version
of the page after the cache is filled (via trick of not closing page
and therefore not closing connection till data is available in cache).
The git_generating_data_html() subroutine, which is used by gitweb to
implement this feature, is highly configurable: you can choose
frequency of writing some data so that connection won't get closed,
and maximum time to wait for data in "Generating..." page (see
comments in %generating_options hash definition), and initial delay
before starting progress indicator page.

The git_generating_data_html() subroutine would return early (not showing
HTML-base progress indicator) if action does not return HTML output, or
if web browser / user agent is a robot / web crawler (or gitweb is run as
standalone script).  In such cases HTML "Generating..." page does not make
much sense.

For this purpose new subroutine browser_is_robot() (which uses
HTTP::BrowserDetect if possible, and fall backs on simple check of
User-Agent string) was added.


The default behavior of cache_output() from GitwebCache::CacheOutput was
changed so that it would cache error pages if they were generated in
detached background process.  This together with default initial delay in
git_generating_data_html progress_info subroutine should ensure that there
are no problems with error pages and progress info interaction.


The t9511 test got updated to test both case with background generation
enabled and case with background generation disabled.  Also adde test for
simple (not exiting) 'generating_info' subroutine, for both case with
background generation disabled and enabled.

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
There are few changes included in this commit which should be fixed in
original commits/patches earlier in the series.  Namely:

* fix to gitweb/Makefile to use newer names for gitweb caching
  modules, i.e. GitwebCache/FileCacheWithLocking.pm instead of
  GitwebCache/SimpleFileCache.pm, and GitwebCache/Capture/ToFile.pm
  instead of GitwebCache/Capture/Simple.pm

* 'max_lifetime' was introduced in previous commit, so its use in
  %cache_options (setting default value for gitweb) should also be
  done in previous commit.

* gitweb_enable_caching function in tgitweb-lib.sh should use
  "$TRASH_DIRECTORY/cache" as 'cache_root' from beginning, just in
  case


It is worth mentioning that git_generating_data_html does not need to
end with 'die'; it could as well end with 'goto DONE_REQUEST'.  The
'generating_info' subroutine is outside capture anyway.


The issue with error pages should be solved, even in the case when
they are not cached.  There are three layers of defense:

1. git_generating_data_html has initial delay of 1 second, by default.
   This means that if die_error finishes within this initial delay,
   then redirection (and ending the request) wouldn't take place.  The
   error page would be printed by parent process and not cached.

2. In the case where there is no stale data to serve, but there is
   'generating_info' subroutine and it would exit / end request before
   error page is fully generated, background process would be not
   detached, and it would print error page.  The error page would not
   be cached.

   Though I wonder if exit from git_generating_data_html should be
   trapped, so that we can wait for background process; other solution
   would be to use ripper SIGCHLD signal handler for this process...

   Huh, something still to think about...

3. In the case where there is stale data for what is now an error
   condition (e.g. deleted branch or deleted project), and background
   process would generate data being detached from originating
   project, the error page would be captured and cached.

 gitweb/Makefile                                |    4 +-
 gitweb/gitweb.perl                             |  171 +++++++++++++++++++++++-
 gitweb/lib/GitwebCache/CacheOutput.pm          |   10 ++-
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |  113 +++++++++++++++-
 t/gitweb-lib.sh                                |    4 +-
 t/t9511/test_cache_interface.pl                |  149 ++++++++++++++++++++-
 6 files changed, 439 insertions(+), 12 deletions(-)

diff --git a/gitweb/Makefile b/gitweb/Makefile
index d67c138..7a5c85f 100644
--- a/gitweb/Makefile
+++ b/gitweb/Makefile
@@ -115,8 +115,8 @@ GITWEB_FILES += static/git-logo.png static/git-favicon.png
 
 # gitweb output caching
 GITWEB_MODULES += GitwebCache/CacheOutput.pm
-GITWEB_MODULES += GitwebCache/SimpleFileCache.pm
-GITWEB_MODULES += GitwebCache/Capture/Simple.pm
+GITWEB_MODULES += GitwebCache/FileCacheWithLocking.pm
+GITWEB_MODULES += GitwebCache/Capture/ToFile.pm
 
 GITWEB_REPLACE = \
 	-e 's|++GIT_VERSION++|$(GIT_VERSION)|g' \
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index eb02b6b..5ef668d 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -307,6 +307,38 @@ our %cache_options = (
 	# The (global) expiration time for objects placed in the cache, in seconds.
 	'expires_in' => 20,
 
+	# Maximum cache file life, in seconds.  If cache entry lifetime exceeds
+	# this value, it wouldn't be served as being too stale when waiting for
+	# cache to be regenerated/refreshed, instead of trying to display
+	# existing cache date.
+	#
+	# Set it to -1 to always serve existing data if it exists.
+	# Set it to  0 to turn off serving stale data, i.e. always wait.
+	'max_lifetime' => 5*60*60, # 5 hours
+
+	# This enables/disables background caching.  If it is set to true value,
+	# caching engine would return stale data (if it is not older than
+	# 'max_lifetime' seconds) if it exists, and launch process if regenerating
+	# (refreshing) cache into the background.  If it is set to false value,
+	# the process that fills cache must always wait for data to be generated.
+	# In theory this will make gitweb seem more responsive at the price of
+	# serving possibly stale data.
+	'background_cache' => 1,
+
+	# Subroutine which would be called when gitweb has to wait for data to
+	# be generated (it can't serve stale data because there isn't any,
+	# or if it exists it is older than 'max_lifetime').  The default
+	# is to use git_generating_data_html(), which creates "Generating..."
+	# page, which would then redirect or redraw/rewrite the page when
+	# data is ready.
+	# Set it to `undef' to disable this feature.
+	#
+	# Such subroutine (if invoked from GitwebCache::FileCacheWithLocking)
+	# is passed the following parameters: $cache instance, human-readable
+	# $key to current page, and $sync_coderef subroutine to invoke to wait
+	# (in a blocking way) for data.
+	'generating_info' => \&git_generating_data_html,
+
 	# How to handle runtime errors occurring during cache gets and cache
 	# sets.  Options are:
 	#  * "die" (the default) - call die() with an appropriate message
@@ -323,10 +355,27 @@ our %cache_options = (
 
 	# Extra options passed to GitwebCache::CacheOutput::cache_output subroutine
 	'cache_output' => {
-		# Enable caching of error pages (boolean).  Default is false.
-		'-cache_errors' => 0,
+		# Enable caching of error pages (tristate, with undef meaning that error
+		# pages will be cached if were generated in detached process).
+		# Default is undef.
+		'-cache_errors' => undef,
 	},
 );
+# You define site-wide options for "Generating..." page (if enabled) here
+# (which means that $cache_options{'generating_info'} is set to coderef);
+# override them with $GITWEB_CONFIG as necessary.
+our %generating_options = (
+	# The delay before displaying "Generating..." page, in seconds.  It is
+	# intended for "Generating..." page to be shown only when really needed.
+	'startup_delay' => 1,
+	# The time between generating new piece of output to prevent from
+	# redirection before data is ready, i.e. time between printing each
+	# dot in activity indicator / progress info, in seconds.
+	'print_interval' => 2,
+	# Maximum time "Generating..." page would be present, waiting for data,
+	# before unconditional redirect, in seconds.
+	'timeout' => $cache_options{'expires_min'},
+);
 # Set to _initialized_ instance of GitwebCache::Capture::ToFile
 # compatibile capturing engine, i.e. one implementing ->new()
 # constructor, and ->capture($code, $file) method.  If unset
@@ -870,6 +919,18 @@ sub evaluate_actions_info {
 	}
 }
 
+sub browser_is_robot {
+	return 1 if !exists $ENV{'HTTP_USER_AGENT'}; # gitweb run as script
+	if (eval { require HTTP::BrowserDetect; }) {
+		my $browser = HTTP::BrowserDetect->new();
+		return $browser->robot();
+	}
+	# fallback on detecting known web browsers
+	return 0 if ($ENV{'HTTP_USER_AGENT'} =~ /\b(?:Mozilla|Opera|Safari|IE)\b/);
+	# be conservative; if not sure, assume non-interactive
+	return 1;
+}
+
 # fill %input_params with the CGI parameters. All values except for 'opt'
 # should be single values, but opt can be an array. We should probably
 # build an array of parameters that can be multi-valued, but since for the time
@@ -3660,6 +3721,112 @@ sub get_page_title {
 	return $title;
 }
 
+# creates "Generating..." page when caching enabled and not in cache
+sub git_generating_data_html {
+	my ($cache, $key, $sync_coderef) = @_;
+
+	# when should gitweb show "Generating..." page
+	if ((defined $actions_info{$action}{'output_format'} &&
+	     $actions_info{$action}{'output_format'} eq 'feed') ||
+	    browser_is_robot()) {
+		return;
+	}
+
+	# Initial delay
+	if ($generating_options{'startup_delay'} > 0) {
+		eval {
+			local $SIG{ALRM} = sub { die "alarm clock restart\n" }; # NB: \n required
+			alarm $generating_options{'startup_delay'};
+			$sync_coderef->(); # wait for data
+			alarm 0;           # turn off the alarm
+		};
+		if ($@) {
+			# propagate unexpected errors
+			die $@ if $@ !~ /alarm clock restart/;
+		} else {
+			# we got response within 'startup_delay' timeout
+			return;
+		}
+	}
+
+	my $title = "[Generating...] " . get_page_title();
+	# TODO: the following line of code duplicates the one
+	# in git_header_html, and it should probably be refactored.
+	my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : '';
+
+	# Use the trick that 'refresh' HTTP header equivalent (set via http-equiv)
+	# with timeout of 0 seconds would redirect as soon as page is finished.
+	# It assumes that browser would display partially received page.
+	# This "Generating..." redirect page should not be cached (externally).
+	my %no_cache = (
+		# HTTP/1.0
+		-Pragma => 'no-cache',
+		# HTTP/1.1
+		-Cache_Control => join(', ', qw(private no-cache no-store must-revalidate
+		                                max-age=0 pre-check=0 post-check=0)),
+	);
+	print STDOUT $cgi->header(-type => 'text/html', -charset => 'utf-8',
+	                          -status=> '200 OK', -expires => 'now',
+	                          %no_cache);
+	print STDOUT <<"EOF";
+<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+                      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
+<!-- git web interface version $version -->
+<!-- git core binaries version $git_version -->
+<head>
+<meta http-equiv="content-type" content="text/html; charset=utf-8" />
+<meta http-equiv="refresh" content="0" />
+<meta name="generator" content="gitweb/$version git/$git_version$mod_perl_version" />
+<meta name="robots" content="noindex, nofollow" />
+<title>$title</title>
+</head>
+<body>
+EOF
+
+	local $| = 1; # autoflush
+	print STDOUT 'Generating...';
+
+	my $total_time = 0;
+	my $interval = $generating_options{'print_interval'} || 1;
+	my $timeout  = $generating_options{'timeout'};
+	my $alarm_handler = sub {
+		local $! = 1;
+		print STDOUT '.';
+		$total_time += $interval;
+		if ($total_time > $timeout) {
+			die "timeout\n";
+		}
+	};
+	eval {
+		local $SIG{ALRM} = $alarm_handler;
+		Time::HiRes::alarm($interval, $interval);
+		my $sync_ok;
+		do {
+			# loop is needed here because SIGALRM (from 'alarm')
+			# can interrupt waiting (process of acquiring lock)
+			$sync_ok = $sync_coderef->(); # blocking wait for data
+		} until ($sync_ok);
+		alarm 0;
+	};
+	# It doesn't really matter if we got lock, or timed-out
+	# but we should re-throw unknown (unexpected) errors
+	die $@ if ($@ and $@ !~ /timeout/);
+
+	print STDOUT <<"EOF";
+
+</body>
+</html>
+EOF
+
+	# after refresh web browser would reload page and send new request
+	die { 'status' => 200 }; # to end request
+	#goto DONE_REQUEST;
+	#exit 0;
+	#return;
+}
+
 sub print_feed_meta {
 	if (defined $project) {
 		my %href_params = get_feed_info();
diff --git a/gitweb/lib/GitwebCache/CacheOutput.pm b/gitweb/lib/GitwebCache/CacheOutput.pm
index 792ddb7..188d4ab 100644
--- a/gitweb/lib/GitwebCache/CacheOutput.pm
+++ b/gitweb/lib/GitwebCache/CacheOutput.pm
@@ -35,16 +35,24 @@ our %EXPORT_TAGS = (all => [ @EXPORT ]);
 # in ':raw' format (and thus restored in ':raw' from cache)
 #
 # Supported options:
-# * -cache_errors => 0|1  - whether error output should be cached
+# * -cache_errors => undef|0|1  - whether error output should be cached,
+#                                 undef means cache if we are in detached process
 sub cache_output {
 	my ($cache, $capture, $key, $code, %opts) = @_;
 
+
+	my $pid = $$;
 	my ($fh, $filename);
 	my ($capture_fh, $capture_filename);
 	eval { # this `eval` is to catch rethrown error, so we can print captured output
 		($fh, $filename) = $cache->compute_fh($key, sub {
 			($capture_fh, $capture_filename) = @_;
 
+			if (!defined $opts{'-cache_errors'}) {
+				# cache errors if we are in detached process
+				$opts{'-cache_errors'} = ($$ != $pid && getppid() != $pid);
+			}
+
 			# this `eval` is to be able to cache error output (up till 'die')
 			eval { $capture->capture($code, $capture_fh); };
 
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index ecd0e18..291526e 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -73,6 +73,16 @@ our $EXPIRE_NOW = 0;
 #    If it is greater than 0, and cache entry is expired but not older
 #    than it, serve stale data when waiting for cache entry to be 
 #    regenerated (refreshed).  Non-adaptive.
+#  * 'background_cache' (boolean)
+#    This enables/disables regenerating cache in background process.
+#    Defaults to true.
+#  * 'generating_info'
+#    Subroutine (code) called when process has to wait for cache entry
+#    to be (re)generated (when there is no not-too-stale data to serve
+#    instead), for other process (or bacground process).  It is passed
+#    $cache instance, $key, and $wait_code subroutine (code reference)
+#    to invoke (to call) to wait for cache entry to be ready.
+#    Unset by default (which means no activity indicator).
 #  * 'on_error' (similar to CHI 'on_get_error'/'on_set_error')
 #    How to handle runtime errors occurring during cache gets and cache
 #    sets, which may or may not be considered fatal in your application.
@@ -107,6 +117,11 @@ sub new {
 		exists $opts{'max_lifetime'}       ? $opts{'max_lifetime'} :
 		exists $opts{'max_cache_lifetime'} ? $opts{'max_cache_lifetime'} :
 		$NEVER_EXPIRE;
+	$self->{'background_cache'} =
+		exists $opts{'background_cache'} ? $opts{'background_cache'} :
+		1;
+	$self->{'generating_info'} = $opts{'generating_info'}
+		if exists $opts{'generating_info'};
 	$self->{'on_error'} =
 		exists $opts{'on_error'}      ? $opts{'on_error'} :
 		exists $opts{'on_get_error'}  ? $opts{'on_get_error'} :
@@ -127,6 +142,7 @@ sub new {
 
 # creates get_depth() and set_depth($depth) etc. methods
 foreach my $i (qw(depth root namespace expires_in max_lifetime
+                  background_cache generating_info
                   on_error)) {
 	my $field = $i;
 	no strict 'refs';
@@ -140,6 +156,16 @@ foreach my $i (qw(depth root namespace expires_in max_lifetime
 	};
 }
 
+# $cache->generating_info($wait_code);
+# runs 'generating_info' subroutine, for activity indicator,
+# checking if it is defined first.
+sub generating_info {
+	my $self = shift;
+
+	if (defined $self->{'generating_info'}) {
+		$self->{'generating_info'}->($self, @_);
+	}
+}
 
 # ----------------------------------------------------------------------
 # utility functions and methods
@@ -246,6 +272,10 @@ sub _wait_for_data {
 	my ($self, $key, $sync_coderef) = @_;
 	my @result;
 
+	# provide "generating page..." info, if exists
+	$self->generating_info($key, $sync_coderef);
+	# generating info may exit, so we can not get there
+
 	# wait for data to be available
 	$sync_coderef->();
 	# fetch data
@@ -254,6 +284,57 @@ sub _wait_for_data {
 	return @result;
 }
 
+sub _set_maybe_background {
+	my ($self, $key, $code) = @_;
+
+	my ($pid, $detach);
+	my (@result, @stale_result);
+
+	if ($self->{'background_cache'}) {
+		# try to retrieve stale data
+		@stale_result = $self->get_fh($key,
+			'expires_in' => $self->get_max_lifetime());
+
+		# fork if there is stale data, for background process
+		# to regenerate/refresh the cache (generate data),
+		# or if main process would show progress indicator
+		$detach = @stale_result;
+		$pid = fork()
+			if (@stale_result || $self->{'generating_info'});
+	}
+
+	if ($pid) {
+		## forked and are in parent process
+		# reap child, which spawned grandchild process (detaching it)
+		waitpid $pid, 0
+			if $detach;
+
+	} else {
+		## didn't fork, or are in background process
+
+		# daemonize background process, detaching it from parent
+		# see also Proc::Daemonize, Apache2::SubProcess
+		if (defined $pid && $detach) {
+			## in background process
+			POSIX::setsid(); # or setpgrp(0, 0);
+			fork() && CORE::exit(0);
+		}
+
+		@result = $self->set_coderef_fh($key, $code);
+
+		if (defined $pid) { # && !$pid
+			## in background process; parent or grandparent
+			## will serve stale data, or just generated data
+
+			# lockfile will be automatically closed on exit,
+			# and therefore lockfile would be unlocked
+			CORE::exit(0);
+		}
+	}
+
+	return @result > 0 ? @result : @stale_result;
+}
+
 # $self->_handle_error($raw_error)
 #
 # based on _handle_get_error and _dispatch_error_msg from CHI::Driver
@@ -408,14 +489,37 @@ sub compute_fh {
 		$lock_state = flock($lock_fh, LOCK_EX | LOCK_NB);
 		if ($lock_state) {
 			## acquired writers lock, have to generate data
-			@result = eval { $self->set_coderef_fh($key, $code_fh) };
+			eval { @result = $self->_set_maybe_background($key, $code_fh) };
 			$self->_handle_error($@) if $@;
 
 			# closing lockfile releases writer lock
-			flock($lock_fh, LOCK_UN);
+			#flock($lock_fh, LOCK_UN); # it would unlock here and in background process
 			close $lock_fh
 				or $self->_handle_error("Could't close lockfile '$lockfile': $!");
 
+			if (!@result) {
+				# wait for background process to finish generating data
+				open $lock_fh, '<', $lockfile
+					or $self->_handle_error("Couldn't reopen (for reading) lockfile '$lockfile': $!");
+
+				eval {
+					@result = $self->_wait_for_data($key, sub {
+						flock($lock_fh, LOCK_SH);
+						# or 'waitpid -1, 0;', or 'wait;', as we don't detach now in this situation
+					});
+				};
+				$self->_handle_error($@) if $@;
+
+				# closing lockfile releases readers lock used to wait for data
+				flock($lock_fh, LOCK_UN);
+				close $lock_fh
+					or $self->_handle_error("Could't close reopened lockfile '$lockfile': $!");
+
+				# we didn't detach, so wait for the child to reap it
+				# (it should finish working, according to lock status)
+				wait;
+			}
+
 		} else {
 			## didn't acquire writers lock, get stale data or wait for regeneration
 
@@ -429,12 +533,13 @@ sub compute_fh {
 
 			# wait for regeneration if no stale data to serve,
 			# using shared / readers lock to sync (wait for data)
-			@result = eval {
-				$self->_wait_for_data($key, sub {
+			eval {
+				@result = $self->_wait_for_data($key, sub {
 					flock($lock_fh, LOCK_SH);
 				});
 			};
 			$self->_handle_error($@) if $@;
+
 			# closing lockfile releases readers lock
 			flock($lock_fh, LOCK_UN);
 			close $lock_fh
diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index 4ce067f..8652c91 100755
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -57,7 +57,9 @@ gitweb_enable_caching () {
 		cat >>gitweb_config.perl <<-\EOF &&
 		$caching_enabled = 1;
 		$cache_options{"expires_in"} = -1;      # never expire cache for tests
-		$cache_options{"cache_root"} = "cache"; # to clear the right thing
+		$cache_options{"cache_root"} = "$TRASH_DIRECTORY/cache"; # to clear the right thing
+		$cache_options{"background_cache"} = 0; # no background processes in test suite
+		$cache_options{"generating_info"} = undef; # tests do not use web browser
 		EOF
 		rm -rf cache/
 	'
diff --git a/t/t9511/test_cache_interface.pl b/t/t9511/test_cache_interface.pl
index a2b006c..1e8feb3 100755
--- a/t/t9511/test_cache_interface.pl
+++ b/t/t9511/test_cache_interface.pl
@@ -23,7 +23,13 @@ BEGIN { use_ok('GitwebCache::FileCacheWithLocking'); }
 note("Using lib '$INC[0]'");
 note("Testing '$INC{'GitwebCache/FileCacheWithLocking.pm'}'");
 
-my $cache = new_ok('GitwebCache::FileCacheWithLocking');
+my $cache = new_ok('GitwebCache::FileCacheWithLocking', [
+	'max_lifetime' => 0, # turn it off
+	'background_cache' => 0,
+]);
+
+# ->compute_fh() can fork, don't generate zombies
+#local $SIG{CHLD} = 'IGNORE';
 
 # Test that default values are defined
 #
@@ -239,6 +245,7 @@ subtest 'parallel access' => sub {
 my $stale_value = 'Stale Value';
 
 subtest 'serving stale data when regenerating' => sub {
+	$cache->remove($key);
 	cache_set_fh($cache, $key, $stale_value);
 	$cache->set_expires_in(-1);   # never expire, for next check
 	is(cache_get_fh($cache, $key), $stale_value,
@@ -246,7 +253,10 @@ subtest 'serving stale data when regenerating' => sub {
 
 	$call_count = 0;
 	$cache->set_expires_in(0);    # expire now (so there are no fresh data)
-	$cache->set_max_lifetime(-1); # forever (always serve stale data)
+	$cache->set_max_lifetime(-1); # stale data is valid forever
+
+	# without background generation
+	$cache->set_background_cache(0);
 
 	@output = parallel_run {
 		my $data = cache_compute_fh($cache, $key, \&get_value_slow_fh);
@@ -264,6 +274,31 @@ subtest 'serving stale data when regenerating' => sub {
 	   'no background: value got set correctly, even if stale data returned');
 
 
+	# with background generation
+	$cache->set_background_cache(1);
+	$call_count = 0;
+	cache_set_fh($cache, $key, $stale_value);
+	$cache->set_expires_in(0);  # expire now (so there are no fresh data)
+
+	@output = parallel_run {
+		my $data = cache_compute_fh($cache, $key, \&get_value_slow_fh);
+		print "$call_count$sep";
+		print $data if defined $data;
+	};
+	# returning stale data works
+	is_deeply(
+		[sort @output],
+		[sort ("0$sep$stale_value", "0$sep$stale_value")],
+		'background: stale data returned by both processes'
+	);
+	$cache->set_expires_in(-1); # never expire for next ->get
+	note("waiting $slow_time sec. for background process to have time to set data");
+	sleep $slow_time; # wait for background process to have chance to set data
+	is(cache_get_fh($cache, $key), $value,
+	   'background: value got set correctly by background process');
+	$cache->set_expires_in(0);  # expire now (so there are no fresh data)
+
+
 	cache_set_fh($cache, $key, $stale_value);
 	$cache->set_expires_in(0);   # expire now
 	$cache->set_max_lifetime(0); # don't serve stale data
@@ -282,6 +317,116 @@ subtest 'serving stale data when regenerating' => sub {
 $cache->set_expires_in(-1);
 
 
+# Test 'generating_info' feature
+#
+$cache->remove($key);
+my $progress_info = "Generating...";
+sub test_generating_info {
+	local $| = 1;
+	print "$progress_info";
+}
+$cache->set_generating_info(\&test_generating_info);
+
+subtest 'generating progress info' => sub {
+	my @progress;
+
+	# without background generation, and without stale value
+	$cache->set_background_cache(0);
+	$cache->remove($key); # no data and no stale data
+	$call_count = 0;
+
+	@output = parallel_run {
+		my $data = cache_compute_fh($cache, $key, \&get_value_slow_fh);
+		print "$sep$call_count$sep";
+		print $data if defined $data;
+	};
+	# split progress and output
+	@progress = map { s/^(.*)\Q${sep}\E//o && $1 } @output;
+	is_deeply(
+		[sort @progress],
+		[sort ("${sep}1", "$progress_info${sep}0")],
+		'no background, no stale data: the process waiting for data prints progress info'
+	);
+	is_deeply(
+		\@output,
+		[ ($value) x 2 ],
+		'no background, no stale data: both processes return correct value'
+	);
+
+
+	# without background generation, with stale value
+	cache_set_fh($cache, $key, $stale_value);
+	$cache->set_expires_in(0);    # set value is now expired
+	$cache->set_max_lifetime(-1); # stale data never expire
+	$call_count = 0;
+
+	@output = parallel_run {
+		my $data = cache_compute_fh($cache, $key, \&get_value_slow_fh);
+		print "$sep$call_count$sep";
+		print $data if defined $data;
+	};
+	@progress = map { s/^(.*?)\Q${sep}\E//o && $1 } @output;
+	is_deeply(
+		\@progress,
+		[ ('') x 2],
+		'no background, stale data: neither process prints progress info'
+	);
+	is_deeply(
+		[sort @output],
+		[sort ("1$sep$value", "0$sep$stale_value")],
+		'no background, stale data: generating gets data, other gets stale data'
+	);
+	$cache->set_expires_in(-1);
+
+
+	# with background generation
+	$cache->set_background_cache(1);
+	$cache->remove($key); # no data and no stale value
+	$call_count = 0;
+
+	@output = parallel_run {
+		my $data = cache_compute_fh($cache, $key, \&get_value_slow_fh);
+		print $sep;
+		print $data if defined $data;
+	};
+	@progress = map { s/^(.*)\Q${sep}\E//o && $1 } @output;
+	is_deeply(
+		\@progress,
+		[ ($progress_info) x 2],
+		'background, no stale data: both process print progress info'
+	);
+	is_deeply(
+		\@output,
+		[ ($value) x 2 ],
+		'background, no stale data: both processes return correct value'
+	);
+
+
+	# with background generation, with stale value
+	cache_set_fh($cache, $key, $stale_value);
+	$cache->set_expires_in(0);    # set value is now expired
+	$cache->set_max_lifetime(-1); # stale data never expire
+	$call_count = 0;
+
+	@output = parallel_run {
+		my $data = cache_compute_fh($cache, $key, \&get_value_slow_fh);
+		print $sep;
+		print $data if defined $data;
+	};
+	@progress = map { s/^(.*)\Q${sep}\E//o && $1 } @output;
+	is_deeply(
+		\@progress,
+		[ ('') x 2],
+		'background, stale data: neither process prints progress info'
+	);
+	note("waiting $slow_time sec. for background process to have time to set data");
+	sleep $slow_time; # wait for background process to have chance to set data
+
+
+	done_testing();
+};
+$cache->set_expires_in(-1);
+
 done_testing();
 
 
-- 
1.7.3

^ permalink raw reply related

* Re: [RFC PATCH 3/3] filter-branch: support --submodule-filter
From: Johannes Sixt @ 2010-12-31 17:31 UTC (permalink / raw)
  To: Thomas Rast; +Cc: jeffrey.freeman, git
In-Reply-To: <44e6104ba28c80a6befe0f39fa4e2d6eeec56aa9.1293809100.git.trast@student.ethz.ch>

On Freitag, 31. Dezember 2010, Thomas Rast wrote:
> This new filter gets each submodule's current sha1 and path on stdin,
> separated by a tab.  It can then emit a new submodule sha1 and/or
> path, and filter-branch will:
>
> * if the path differs, remove the submodule at the old path;
>
> * add/replace the new sha1 at the new path.
>
> Additionally, returning an empty new sha1 deletes the submodule.
>
> You can tie this together with the last two commits to filter the
> super-project after a subproject filtering as follows:
>
>   ( cd sub1 && git filter-branch --dump-map map <filters|args> )
>   ( cd sub2 && git filter-branch --dump-map map <filters|args> )
>   cat sub1/map sub2/map > map
>   git filter-branch --load-map map \
>   	--submodule-filter "map $(cut -f1)" \
> 	<args>
>
> Other use-cases should also be covered since we also pass through the
> path.

As I said, I'm not particularly fond of a new --submodule-filter because it is 
just a special --index-filter.

Your implementation is highly problematic:

> +	if [ "$filter_submodule" ]; then
> +		git ls-files -s |
> +		grep '^160000' |
> +		while read mode sha1 stage path; do
> +			printf "$sha1\t$path\n" |
> +			{ eval "$filter_submodule" ||
> +				die "submodule filter failed: $filter_submodule"; } |

This 'die' will not do anything useful except to write an error message.

> +			read newsha1 newpath

This 'read' is part of a pipe, and as such many shells run it in a sub-shell; 
the values that it reads do not survive the pipe, hence, the subsequent code 
does not do what you intend it.

In this case, you can put everything from 'read' to the last 'fi' below inside 
a block { } because there are no process states that need to survive the 
pipe.

> +			if [ -z "$newsha1" ] || [ "$path" != "$newpath" ]; then
> +				git update-index --remove "$path" ||
> +					die "failed to remove submodule $path"
> +			fi
> +			if [ -n "$newsha1" ] && [ "$sha1" != "$newsha1" ]; then
> +				git update-index --add --replace --cacheinfo \
> +					160000 "$newsha1" "$newpath" ||
> +					die "failed to add submodule $newpath"
> +			fi
> +		done

The whole if-branch is a pipe, and it's parts are run in a sub-shell (although 
shells are allowed to optimize this). This means that the 'die' calls will 
only exit the pipe, but not terminate filter-branch. You at least need to 
have '|| exit' behind the last 'fi' and &&-join the if-statements.

> +	fi
> +
>  	parentstr=
>  	for parent in $parents; do
>  		for reparent in $(map "$parent"); do

-- Hannes

^ permalink raw reply

* Re: [RFC PATCH 0/3] Submodule filtering for filter-branch
From: Johannes Sixt @ 2010-12-31 17:20 UTC (permalink / raw)
  To: Thomas Rast; +Cc: jeffrey.freeman, git
In-Reply-To: <cover.1293809100.git.trast@student.ethz.ch>

On Freitag, 31. Dezember 2010, Thomas Rast wrote:
> Jeffrey Phillips Freeman aka tty1 asked on IRC about a way to fix the
> super-project after rewriting submodules.  This quick series provides
> some extra tricks in the filter-branch toolbox that can be plumbed
> together to achieve this (see the last commit's message), and should
> hopefully be general enough to be of other use too.

I like the --dump-map and --load-map options. But I think that 
a --submodule-filter is not necessary; it is just a 
particular --index-filter, and IMO should be part of the documentation in the 
form of an example that illustrates how to use the new options.

-- Hannes

^ permalink raw reply

* Re: [RFC PATCH 2/3] filter-branch: optionally load existing mappings prior to filtering
From: Johannes Sixt @ 2010-12-31 17:10 UTC (permalink / raw)
  To: Thomas Rast; +Cc: jeffrey.freeman, git
In-Reply-To: <f89203efcf1b2f49fb33097b3e7fc27e070626fb.1293809100.git.trast@student.ethz.ch>

On Freitag, 31. Dezember 2010, Thomas Rast wrote:
> +	--load-map)
> +		case "$OPTARG" in
> +		/*)

Please use is_absolute_path here as well.

-- Hannes

^ permalink raw reply

* Re: [RFC PATCH 1/3] filter-branch: optionally dump all mappings at the end
From: Johannes Sixt @ 2010-12-31 17:09 UTC (permalink / raw)
  To: Thomas Rast; +Cc: jeffrey.freeman, git
In-Reply-To: <57609f1df897f19becc77d5c43c3c3608725160b.1293809100.git.trast@student.ethz.ch>

On Freitag, 31. Dezember 2010, Thomas Rast wrote:
> In some cases, like the sub/super-project filtering attempted in this
> series, it may be necessary to carry over the mappings from one or
> more filter-branch run to another.
>
> As the first part to this, make a --dump-map option that dumps a flat
> text file with lines of the form
>
>   $sha1 $(map $sha1)

Despite the alternative implementation that I suggest below, the commit 
message (and also the documentation to be written) should state the output 
format in this way.

> @@ -194,6 +195,16 @@ do
>  	--original)
>  		orig_namespace=$(expr "$OPTARG/" : '\(.*[^/]\)/*$')/
>  		;;
> +	--dump-map)
> +		case "$OPTARG" in
> +		/*)

Please use

	if is_absolute_path "$OPTARG"
	then
	...

> +# At this point the mappings are stable so we can dump them if
> +# requested
> +
> +if test -n "$dump_map"; then
> +	( cd "$workdir"/../map/; ls ) |
> +	while read sha1; do
> +		echo $sha1 $(map $sha1)

There must be a more efficient way than to spawn an additional process for 
each mapped SHA1. Since this excercises only one branch in the map() function 
anyway, how about this:

	( cd "$workdir"/../map/ && ls -1 ) |
	while read sha1; do
		read mapped << $sha1 &&
		echo $sha1 $mapped
	done > "$dump_map"

Though, there is some mechanism that maps one SHA1s to more than one, and my 
approach might not cover that case.

> +	done > "$dump_map"

-- Hannes

^ permalink raw reply

* Re: [PATCH] Fix expected values of setup tests on Windows
From: Johannes Sixt @ 2010-12-31 16:11 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Junio C Hamano, git
In-Reply-To: <AANLkTi=uCCfBFBpC=+V9RpaXRXiiPYp-tZXBxAVNB7+e@mail.gmail.com>

On Freitag, 31. Dezember 2010, Nguyen Thai Ngoc Duy wrote:
> 2010/12/31 Johannes Sixt <j6t@kdbg.org>:
> > On Windows, bash stores absolute path names in shell variables in POSIX
> > format that begins with a slash, rather than in drive-letter format; such
> > a value is converted to the latter format when it is passed to a non-MSYS
> > program such as git.
>
> Hmm.. from test-lib.sh:
>
> TEST_DIRECTORY=$(pwd)
> test="trash directory.$(basename "$0" .sh)"
> TRASH_DIRECTORY="$TEST_DIRECTORY/$test"
>
> I'm just curious how these lines make $TRASH_DIRECTORY in POSIX format,
> while
>
> here=$(pwd)
>
> in your patch does not. Does bash auto convert value in
> TRASH_DIRECTORY="$TE..." line?

No. When this line is executed:

TEST_DIRECTORY=$(pwd)

$(pwd) still has its default behavior to return the POSIX style path. pwd is 
redefined to pwd -W only later.

I'm hesitant to redefine pwd earlier in test-lib.sh, though, because we would 
have to audit all uses of TEST_DIRECTORY for whether POSIX style paths or 
drive-letter paths are needed.

-- Hannes

^ permalink raw reply

* [RFC PATCH 3/3] filter-branch: support --submodule-filter
From: Thomas Rast @ 2010-12-31 15:29 UTC (permalink / raw)
  To: jeffrey.freeman; +Cc: git
In-Reply-To: <cover.1293809100.git.trast@student.ethz.ch>

This new filter gets each submodule's current sha1 and path on stdin,
separated by a tab.  It can then emit a new submodule sha1 and/or
path, and filter-branch will:

* if the path differs, remove the submodule at the old path;

* add/replace the new sha1 at the new path.

Additionally, returning an empty new sha1 deletes the submodule.

You can tie this together with the last two commits to filter the
super-project after a subproject filtering as follows:

  ( cd sub1 && git filter-branch --dump-map map <filters|args> )
  ( cd sub2 && git filter-branch --dump-map map <filters|args> )
  cat sub1/map sub2/map > map
  git filter-branch --load-map map \
  	--submodule-filter "map $(cut -f1)" \
	<args>

Other use-cases should also be covered since we also pass through the
path.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 git-filter-branch.sh |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 9feeb26..4a321c4 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -122,6 +122,7 @@ filter_msg=cat
 filter_commit=
 filter_tag_name=
 filter_subdir=
+filter_submodule=
 orig_namespace=refs/original/
 force=
 prune_empty=
@@ -193,6 +194,9 @@ do
 		filter_subdir="$OPTARG"
 		remap_to_ancestor=t
 		;;
+	--submodule-filter)
+		filter_submodule="$OPTARG"
+		;;
 	--original)
 		orig_namespace=$(expr "$OPTARG/" : '\(.*[^/]\)/*$')/
 		;;
@@ -379,6 +383,26 @@ while read commit parents; do
 	eval "$filter_index" < /dev/null ||
 		die "index filter failed: $filter_index"
 
+	if [ "$filter_submodule" ]; then
+		git ls-files -s |
+		grep '^160000' |
+		while read mode sha1 stage path; do
+			printf "$sha1\t$path\n" |
+			{ eval "$filter_submodule" ||
+				die "submodule filter failed: $filter_submodule"; } |
+			read newsha1 newpath
+			if [ -z "$newsha1" ] || [ "$path" != "$newpath" ]; then
+				git update-index --remove "$path" ||
+					die "failed to remove submodule $path"
+			fi
+			if [ -n "$newsha1" ] && [ "$sha1" != "$newsha1" ]; then
+				git update-index --add --replace --cacheinfo \
+					160000 "$newsha1" "$newpath" ||
+					die "failed to add submodule $newpath"
+			fi
+		done
+	fi
+
 	parentstr=
 	for parent in $parents; do
 		for reparent in $(map "$parent"); do
-- 
1.7.4.rc0.240.g44e61

^ permalink raw reply related

* [RFC PATCH 2/3] filter-branch: optionally load existing mappings prior to filtering
From: Thomas Rast @ 2010-12-31 15:29 UTC (permalink / raw)
  To: jeffrey.freeman; +Cc: git
In-Reply-To: <cover.1293809100.git.trast@student.ethz.ch>

In the previous commit filter-branch learned to dump its mappings.
Introduce a --load-map option that lets it load map files in the same
format.

This option does not accumulate; only the last one counts.  Letting it
accumulate would require two argument-parsing passes since we need to
know the $tempdir before we can establish any maps.

Also, the maps add to all filtered maps.  This is a bit of a tradeoff.
As it stands, it could be useful to remap refs in another repository
using the map file from elsewere.  On the other hand, it does mean
that it may rewrite more refs than the user intended.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 git-filter-branch.sh |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 020b076..9feeb26 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -127,6 +127,7 @@ force=
 prune_empty=
 remap_to_ancestor=
 dump_map=
+load_map=
 while :
 do
 	case "$1" in
@@ -205,6 +206,16 @@ do
 			;;
 		esac
 		;;
+	--load-map)
+		case "$OPTARG" in
+		/*)
+			load_map="$OPTARG"
+			;;
+		*)
+			load_map="$(pwd)/$OPTARG"
+			;;
+		esac
+		;;
 	*)
 		usage
 		;;
@@ -275,6 +286,14 @@ export GIT_INDEX_FILE
 # map old->new commit ids for rewriting parents
 mkdir ../map || die "Could not create map/ directory"
 
+if test -n "$load_map"
+then
+	while read pre post
+	do
+		echo $post > ../map/$pre
+	done < "$load_map"
+fi
+
 # we need "--" only if there are no path arguments in $@
 nonrevs=$(git rev-parse --no-revs "$@") || exit
 if test -z "$nonrevs"
-- 
1.7.4.rc0.240.g44e61

^ permalink raw reply related

* [RFC PATCH 0/3] Submodule filtering for filter-branch
From: Thomas Rast @ 2010-12-31 15:29 UTC (permalink / raw)
  To: jeffrey.freeman; +Cc: git

Jeffrey Phillips Freeman aka tty1 asked on IRC about a way to fix the
super-project after rewriting submodules.  This quick series provides
some extra tricks in the filter-branch toolbox that can be plumbed
together to achieve this (see the last commit's message), and should
hopefully be general enough to be of other use too.

Comes without docs and tests at this point, hence RFC.  It's also only
lightly tested, but since Jeffrey has a use-case, I'll leave it up to
him to bend it a bit and see if anything breaks.


Thomas Rast (3):
  filter-branch: optionally dump all mappings at the end
  filter-branch: optionally load existing mappings prior to filtering
  filter-branch: support --submodule-filter

 git-filter-branch.sh |   64 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 64 insertions(+), 0 deletions(-)

-- 
1.7.4.rc0.240.g44e61

^ permalink raw reply

* [RFC PATCH 1/3] filter-branch: optionally dump all mappings at the end
From: Thomas Rast @ 2010-12-31 15:29 UTC (permalink / raw)
  To: jeffrey.freeman; +Cc: git
In-Reply-To: <cover.1293809100.git.trast@student.ethz.ch>

In some cases, like the sub/super-project filtering attempted in this
series, it may be necessary to carry over the mappings from one or
more filter-branch run to another.

As the first part to this, make a --dump-map option that dumps a flat
text file with lines of the form

  $sha1 $(map $sha1)

after it has established all mappings.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 git-filter-branch.sh |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 962a93b..020b076 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -126,6 +126,7 @@ orig_namespace=refs/original/
 force=
 prune_empty=
 remap_to_ancestor=
+dump_map=
 while :
 do
 	case "$1" in
@@ -194,6 +195,16 @@ do
 	--original)
 		orig_namespace=$(expr "$OPTARG/" : '\(.*[^/]\)/*$')/
 		;;
+	--dump-map)
+		case "$OPTARG" in
+		/*)
+			dump_map="$OPTARG"
+			;;
+		*)
+			dump_map="$(pwd)/$OPTARG"
+			;;
+		esac
+		;;
 	*)
 		usage
 		;;
@@ -385,6 +396,16 @@ then
 	done < "$tempdir"/heads
 fi
 
+# At this point the mappings are stable so we can dump them if
+# requested
+
+if test -n "$dump_map"; then
+	( cd "$workdir"/../map/; ls ) |
+	while read sha1; do
+		echo $sha1 $(map $sha1)
+	done > "$dump_map"
+fi
+
 # Finally update the refs
 
 _x40='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]'
-- 
1.7.4.rc0.240.g44e61

^ permalink raw reply related

* Re: Rebasing multiple branches
From: Thomas Rast @ 2010-12-31 14:55 UTC (permalink / raw)
  To: weigelt; +Cc: git, leonidp.lists, Johannes Sixt
In-Reply-To: <20101230053530.GA10511@nibiru.local>

Please don't cull the Cc lists.  Unless he's subscribed, Leonid never
got your reply!

Enrico Weigelt wrote:
> * Leonid Podolny <leonidp.lists@gmail.com> wrote:
> 
> > Ah, nice. I didn't notice the -p option. However, the man page advises 
> > against using -p and -i together.
> 
> Last time I checked, -i required -p ...

-p internally implies -i, but the user doesn't have to know that ;-)

The problem is that the todo file language is not expressive enough
for what -p needs to do.  Running a rebase -p without changing the
todo file should behave reasonably.  On the other hand, if you
rearrange or extend the todo file in many cases that gives unexpected
results.

Hence the recommendation to not use it with -i.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply

* Re: Intermittent failures in t9119
From: David D. Kilzer @ 2010-12-31 14:13 UTC (permalink / raw)
  To: Eric Wong, Junio C Hamano; +Cc: git
In-Reply-To: <20101209175503.GA16478@dcvr.yhbt.net>

Eric Wong <normalperson@yhbt.net> wrote:
> Junio C Hamano <gitster@pobox.com> wrote:
> > Eric  Wong <normalperson@yhbt.net>  writes:
> > > Junio C Hamano <gitster@pobox.com> wrote:
> >  >>  (2) Nobody uses the value from "Text Last Updated" field in  practice, 
>so
> > >>      that bug has been unnoticed so  far;
> > >> 
> > >>  (3) And it is not worth fixing it  ;-)
> > >> 
> > >> For now, I would suggest fixing the  failing test to ignore the "Text 
Last
> > >> Updated" field while  comparing, and if somebody is inclined to, we would
> > >> update the  code to match what "svn info" does.
> > >
> > > Agreed on both  points.  I consider "git svn log" and "git svn info" to
> > > be  reasonable approximations of svn behavior, not exact replicas.
> > >  Exactly matching would be extremely difficult given variations between
> >  > different svn versions, and also svn requiring network access while
> >  > git svn does not.
> > 
> > Ok, here is a minimum patch to do  that.
> 
> > Signed-off-by: Junio C Hamano <gitster@pobox.com>
> 
> Thanks,  Acked-by: Eric Wong <normalperson@yhbt.net>


Acked-by: David Kilzer <ddkilzer@kilzer.net>

Thanks!

Dave

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox