Git development
 help / color / mirror / Atom feed
* Re: How to check new commit availability without full fetch?
From: Leo Razoumov @ 2010-01-11 17:35 UTC (permalink / raw)
  To: git
In-Reply-To: <alpine.LFD.2.00.1001111149150.10143@xanadu.home>

On 2010-01-11, Nicolas Pitre <nico@fluxnic.net> wrote:
> On Mon, 11 Jan 2010, Leo Razoumov wrote:
>
>  > On 2010-01-10, Nicolas Pitre <nico@fluxnic.net> wrote:
>  > >
>  > > You still don't answer my question though.  Again, _why_ do you need to
>  > >  know about remote commit availability without fetching them?
>  > >
>  >
>  > I use git to track almost all my data (code and otherwise) and spread
>  > it between several computers. I end up with several local repos having
>  > the same local branches. It happens once in a while that I fetch into
>  > a given remote/foo from several local foo branches from different
>  > machines and the operation fails. It happens because the commits have
>  > not been yet consistently distributed among the repos. To do the
>  > forensics and figure out who should update whom first I need a quick
>  > and non-destructive way to fetch dry-run.
>
>
> There is probably something awkward about your setup then.
>
>  Normally you should have a remote description for any of the remote
>  repositories you fetch from.  So if you have, say, remote machine_a with
>  repo foo, machine_b with repo bar, and machine_c with repo baz, then
>  fetching any of those will _only_ mirror locally the state of those
>  remote repositories.  There is no ordering required as there can't be
>  any conflicts in the mere fact of mirroring what the other guys have.
>  That's what remote tracking branches are for: they follow the state of a
>  remote repository and are never altered by local changes.  And you can
>  have as many of those as you wish and they will never conflict with each
>  other as each remote description is independent. And this is true
>  whether or not the remote repository lives on the same machine (that
>  would be a remote directory in that case).
>

Setup might be, indeed, awkward but it handles very diverse tasks.
As I said in my earlier emails different repos fetch into the *same* remote/foo.
So there could be conflicts and using fetch -f could cause loss of data.

Before switching to git I used mercurial for the same purpose and it
has command that are equivalent to fetch --dry-run.

--Leo--

^ permalink raw reply

* Re: [PATCH] fast-import: tag may point to any object type
From: Shawn O. Pearce @ 2010-01-11 17:14 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: git, Junio C Hamano
In-Reply-To: <1263186165-23920-1-git-send-email-dpotapov@gmail.com>

Dmitry Potapov <dpotapov@gmail.com> wrote:
> If you tried to export the official git repository, and then to import it
> back then git-fast-import would die complaining that "Mark :1 not a commit".
> 
> Accordingly to a generated crash file, Mark 1 is not a commit but a blob,
> which is pointed by junio-gpg-pub tag. Because git-tag allows to create such
> tags, git-fast-import should import them.
> 
> Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
> ---
>  fast-import.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fast-import.c b/fast-import.c
> index cd87049..e99990d 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -2305,6 +2305,7 @@ static void parse_new_tag(void)
>  	struct tag *t;
>  	uintmax_t from_mark = 0;
>  	unsigned char sha1[20];
> +	enum object_type type = OBJ_COMMIT;

NAK.

Your patch is the right idea.  But you need to make sure all of
the branch arms are handled correctly.

That is, if we do this, the get_sha1() on line 2459 should also
permit non-commit objects, and the lookup_branch() earlier up on
line 2451 should do "type = OBJ_COMMIT".
  
-- 
Shawn.

^ permalink raw reply

* Re: How to check new commit availability without full fetch?
From: Nicolas Pitre @ 2010-01-11 17:04 UTC (permalink / raw)
  To: Leo Razoumov; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <ee2a733e1001110822t1b04c1ccg9b6eb5489b69783d@mail.gmail.com>

On Mon, 11 Jan 2010, Leo Razoumov wrote:

> On 2010-01-10, Nicolas Pitre <nico@fluxnic.net> wrote:
> >
> > You still don't answer my question though.  Again, _why_ do you need to
> >  know about remote commit availability without fetching them?
> >
> 
> I use git to track almost all my data (code and otherwise) and spread
> it between several computers. I end up with several local repos having
> the same local branches. It happens once in a while that I fetch into
> a given remote/foo from several local foo branches from different
> machines and the operation fails. It happens because the commits have
> not been yet consistently distributed among the repos. To do the
> forensics and figure out who should update whom first I need a quick
> and non-destructive way to fetch dry-run.

There is probably something awkward about your setup then.

Normally you should have a remote description for any of the remote 
repositories you fetch from.  So if you have, say, remote machine_a with 
repo foo, machine_b with repo bar, and machine_c with repo baz, then 
fetching any of those will _only_ mirror locally the state of those 
remote repositories.  There is no ordering required as there can't be 
any conflicts in the mere fact of mirroring what the other guys have.  
That's what remote tracking branches are for: they follow the state of a 
remote repository and are never altered by local changes.  And you can 
have as many of those as you wish and they will never conflict with each 
other as each remote description is independent. And this is true 
whether or not the remote repository lives on the same machine (that 
would be a remote directory in that case).

And even if most if not all those remotes are actually copies of each 
others, then the first fetch to occur will transfer the new objects 
while fetching the other ones will simply notice that the required 
objects are already available locally and only the ref will be updated.

The ordering comes into play when it is time to _merge_ those remote 
branches into, say, the local master branch.  That's why it is probably 
a good thing to use fetch+merge instead of pull in this case.  But this 
is then a local matter and nothing that depends on the fetch ordering.


Nicolas

^ permalink raw reply

* [PATCH/RFC] filter-branch: Fix to allow replacing submodules with another content
From: Michal Sojka @ 2010-01-11 16:33 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Michal Sojka

When git filter-branch is used to replace a submodule with another
content, it always fails on the first commit. Consider a repository with
directory submodule containing a submodule. If I want to remove the
submodule and replace it with a file, the following command fails.

git filter-branch --tree-filter 'rm -rf submodule &&
				 git rm -q submodule &&
				 mkdir submodule &&
				 touch submodule/file'

The error message is:
error: submodule: is a directory - add files inside instead

The reason is that git diff-index, which generates a part of the list of
files to update-index, emits also the removed submodule even if it was
replaced by a real directory.

Adding --ignored-submodules solves the problem for me and
tests in t7003-filter-branch.sh passes correctly.

Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
---
 git-filter-branch.sh |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 195b5ef..d4ac7fb 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -331,7 +331,7 @@ while read commit parents; do
 			die "tree filter failed: $filter_tree"
 
 		(
-			git diff-index -r --name-only $commit &&
+			git diff-index -r --name-only --ignore-submodules $commit && 
 			git ls-files --others
 		) > "$tempdir"/tree-state || exit
 		git update-index --add --replace --remove --stdin \
-- 
1.6.6

^ permalink raw reply related

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
From: Linus Torvalds @ 2010-01-11 16:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git
In-Reply-To: <7vtyusr4r7.fsf@alter.siamese.dyndns.org>



On Mon, 11 Jan 2010, Junio C Hamano wrote:
> 
> An ObviouslyRightThing fix is this two-liner.  We shouldn't lookahead if
> we want to do something more than just skipping when we see an unmatch for
> the line we are currently looking at.

Ack. Works for me. And with that, I'd love for it to go in, and get rid of 
the external grep. Performance is now a non-issue (it goes both ways), and 
the internal grep doesn't have the bug with separators between multi-line 
greps.

And dropping the external one gets rid of all the issues with PATHs, crap 
'grep' implementations, and removes actual code. Goodie.

		Linus

^ permalink raw reply

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
From: Junio C Hamano @ 2010-01-11 16:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git
In-Reply-To: <7vtyusr4r7.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> Meh.  I checked pre-context codepath before sending the patch and was very
> satisfied that René did the right thing in 49de321 (grep: handle pre
> context lines on demand, 2009-07-02), but somehow forgot about the post
> context codepath.

Just to clarify, it was *I* who forgot to check the post context codepath
while adding the lookahead; I didn't mean René forgot anything in 49de321.

^ permalink raw reply

* [PATCH 4/4] Documentation: add new git-svn branch/tag options --username and --commit-url
From: Igor Mironov @ 2010-01-11 16:22 UTC (permalink / raw)
  To: git; +Cc: Eric Wong

Signed-off-by: Igor Mironov <igor.a.mironov@gmail.com>
---
 Documentation/git-svn.txt |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
index 4cdca0d..8dbf9d1 100644
--- a/Documentation/git-svn.txt
+++ b/Documentation/git-svn.txt
@@ -239,6 +239,19 @@ discouraged.
 where <name> is the name of the SVN repository as specified by the -R option to
 'init' (or "svn" by default).
 
+--username;;
+	Specify the SVN username to perform the commit as.  This option overrides
+	configuration property 'username'.
+
+--commit-url;;
+	Use the specified URL to connect to the destination Subversion
+	repository.  This is useful in cases where the source SVN
+	repository is read-only.  This option overrides configuration
+	property 'commiturl'.
++
+	git config --get-all svn-remote.<name>.commiturl
++
+
 'tag'::
 	Create a tag in the SVN repository. This is a shorthand for
 	'branch -t'.
-- 
1.6.6.106.ge2de8

^ permalink raw reply related

* Re: How to check new commit availability without full fetch?
From: Leo Razoumov @ 2010-01-11 16:22 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <alpine.LFD.2.00.1001102055070.10143@xanadu.home>

On 2010-01-10, Nicolas Pitre <nico@fluxnic.net> wrote:
>
> You still don't answer my question though.  Again, _why_ do you need to
>  know about remote commit availability without fetching them?
>

I use git to track almost all my data (code and otherwise) and spread
it between several computers. I end up with several local repos having
the same local branches. It happens once in a while that I fetch into
a given remote/foo from several local foo branches from different
machines and the operation fails. It happens because the commits have
not been yet consistently distributed among the repos. To do the
forensics and figure out who should update whom first I need a quick
and non-destructive way to fetch dry-run.

--Leo--

^ permalink raw reply

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
From: Junio C Hamano @ 2010-01-11 16:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git
In-Reply-To: <alpine.LFD.2.00.1001110748560.13040@localhost.localdomain>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> The bad news is that you broke multi-line greps:
>
> 	git grep --no-ext-grep -2 qwerty.*as
>
> results in:
>
> 	drivers/char/keyboard.c-unsigned char kbd_sysrq_xlate[KEY_MAX + 1] =
> 	drivers/char/keyboard.c-        "\000\0331234567890-=\177\t"                    /* 0x00 - 0x0f */
> 	drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"                          /* 0x10 - 0x1f */

Meh.  I checked pre-context codepath before sending the patch and was very
satisfied that René did the right thing in 49de321 (grep: handle pre
context lines on demand, 2009-07-02), but somehow forgot about the post
context codepath.

An ObviouslyRightThing fix is this two-liner.  We shouldn't lookahead if
we want to do something more than just skipping when we see an unmatch for
the line we are currently looking at.

 grep.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/grep.c b/grep.c
index 940e200..ac0ce0b 100644
--- a/grep.c
+++ b/grep.c
@@ -719,6 +719,8 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
 		int hit;
 
 		if (try_lookahead
+		    && !(last_hit
+			 && lno <= last_hit + opt->post_context)
 		    && look_ahead(opt, &left, &lno, &bol))
 			break;
 		eol = end_of_line(bol, &left);

^ permalink raw reply related

* [PATCH 3/4] git-svn: support options --username and --commit-url in branch/tag
From: Igor Mironov @ 2010-01-11 16:21 UTC (permalink / raw)
  To: git; +Cc: Eric Wong

Add ability to specify on the command line the username to perform the operation as and the writable URL of the repository to perform it on.
Signed-off-by: Igor Mironov <igor.a.mironov@gmail.com>
---
 git-svn.perl |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/git-svn.perl b/git-svn.perl
index 3bdd8d3..0da6c67 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -155,12 +155,16 @@ my %cmd = (
 	            { 'message|m=s' => \$_message,
 	              'destination|d=s' => \$_branch_dest,
 	              'dry-run|n' => \$_dry_run,
-		      'tag|t' => \$_tag } ],
+	              'tag|t' => \$_tag,
+	              'username=s' => \$Git::SVN::Prompt::_username,
+	              'commit-url=s' => \$_commit_url } ],
 	tag => [ sub { $_tag = 1; cmd_branch(@_) },
 	         'Create a tag in the SVN repository',
 	         { 'message|m=s' => \$_message,
 	           'destination|d=s' => \$_branch_dest,
-	           'dry-run|n' => \$_dry_run } ],
+	           'dry-run|n' => \$_dry_run,
+	           'username=s' => \$Git::SVN::Prompt::_username,
+	           'commit-url=s' => \$_commit_url } ],
 	'set-tree' => [ \&cmd_set_tree,
 	                "Set an SVN repository to a git tree-ish",
 			{ 'stdin' => \$_stdin, %cmt_opts, %fc_opts, } ],
-- 
1.6.6.106.ge2de8

^ permalink raw reply related

* [PATCH 2/4] git-svn: use commiturl in preference to url when constructing dst for branch/tag
From: Igor Mironov @ 2010-01-11 16:21 UTC (permalink / raw)
  To: git; +Cc: Eric Wong

When constructing a destination URL, use the property 'commiturl' if it is specified in the configuration file; otherwise take 'url' as usual.  This accommodates the scenario where a user only wants to involve the writable repository in operations performing a commit and defaults everything else to a read-only URL.
Signed-off-by: Igor Mironov <igor.a.mironov@gmail.com>
---
 git-svn.perl |   12 +++++++++++-
 1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/git-svn.perl b/git-svn.perl
index 3f7ccc1..3bdd8d3 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -708,7 +708,17 @@ sub cmd_branch {
 		}
 	}
 	my ($lft, $rgt) = @{ $glob->{path} }{qw/left right/};
-	my $dst = join '/', $remote->{url}, $lft, $branch_name, ($rgt || ());
+	my $url;
+	if (defined $_commit_url) {
+		$url = $_commit_url;
+	} else {
+		$url = eval { command_oneline('config', '--get',
+			"svn-remote.$gs->{repo_id}.commiturl") };
+		if (!$url) {
+			$url = $remote->{url};
+		}
+	}
+	my $dst = join '/', $url, $lft, $branch_name, ($rgt || ());
 
 	if ($dst=~"^https:" && $src=~"^http:") {
 		$src=~s/^http:/https:/;
-- 
1.6.6.106.ge2de8

^ permalink raw reply related

* [PATCH 1/4] git-svn: fix the trivial case of 'src and dst not in the same repo' during branch/tag
From: Igor Mironov @ 2010-01-11 16:20 UTC (permalink / raw)
  To: git; +Cc: Eric Wong

This fixes the following issue:
$ git svn branch -t --username=svnuser --commit-url=https://myproj.domain.com/svn mytag
Copying http://myproj.domain.com/svn/trunk at r26 to https://myproj.domain.com/svn/tags/mytag...
Trying to use an unsupported feature: Source and dest appear not to be in the same repository (src: 'http://myproj.domain.com/svn/trunk'; dst: 'https://myproj.domain.com/svn/tags/mytag') at /usr/lib/git-core/git-svn line 623
Signed-off-by: Igor Mironov <igor.a.mironov@gmail.com>
---
 git-svn.perl |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/git-svn.perl b/git-svn.perl
index 650c9e5..3f7ccc1 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -710,6 +710,10 @@ sub cmd_branch {
 	my ($lft, $rgt) = @{ $glob->{path} }{qw/left right/};
 	my $dst = join '/', $remote->{url}, $lft, $branch_name, ($rgt || ());
 
+	if ($dst=~"^https:" && $src=~"^http:") {
+		$src=~s/^http:/https:/;
+	}
+
 	my $ctx = SVN::Client->new(
 		auth    => Git::SVN::Ra::_auth_providers(),
 		log_msg => sub {
-- 
1.6.6.106.ge2de8

^ permalink raw reply related

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
From: Linus Torvalds @ 2010-01-11 15:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git
In-Reply-To: <alpine.LFD.2.00.1001110739280.13040@localhost.localdomain>



On Mon, 11 Jan 2010, Linus Torvalds wrote:
>
> Without testing it, I can already ACK it. It looks like the 
> ObviouslyRightThing(tm) to do. But I'll run some numbers too.

Ok, some good news, some meh news, and some bad news.

The good news: the trivial numbers look good. It's noticeably faster than 
external grep for me when it does the 'fixmatch()' case, quite probably 
because fixmatch() on at least Linux/x86-64 (which is the only case I 
really care about) uses SSE to do the string ops.

So on my Nehalem:

	[torvalds@nehalem linux]$ time git grep qwerty > /dev/null 

	real	0m0.418s
	user	0m0.204s
	sys	0m0.136s

	[torvalds@nehalem linux]$ time git grep --no-ext-grep qwerty > /dev/null 

	real	0m0.309s
	user	0m0.168s
	sys	0m0.136s

and since that simple fixmatch case is the common one for me, I'm happy.

The meh news: this shows how grep is faster than regexec() due to being a 
smarter algorithm. For the non-fixed case (I used "qwerty.*as"), the 
numbers are

 - built-in:
	real	0m0.548s
	user	0m0.384s
	sys	0m0.152s

 - external:
	real	0m0.415s
	user	0m0.176s
	sys	0m0.160s

so it really is just 'strstr()' that is faster. But This is a 'meh', 
because I don't really care, and the new code is still way faster than the 
old one. And I'd be personally willing to just drop the external grep if 
this is the worst problem.

[ I worry a bit that some libc implementations of 'strstr' may suck, but I 
  wouldn't lose sleep over it. ]

The bad news is that you broke multi-line greps:

	git grep --no-ext-grep -2 qwerty.*as

results in:

	drivers/char/keyboard.c-unsigned char kbd_sysrq_xlate[KEY_MAX + 1] =
	drivers/char/keyboard.c-        "\000\0331234567890-=\177\t"                    /* 0x00 - 0x0f */
	drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"                          /* 0x10 - 0x1f */

when the _correct_ result is 

	drivers/char/keyboard.c-unsigned char kbd_sysrq_xlate[KEY_MAX + 1] =
	drivers/char/keyboard.c-        "\000\0331234567890-=\177\t"                    /* 0x00 - 0x0f */
	drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"                          /* 0x10 - 0x1f */
	drivers/char/keyboard.c-        "dfghjkl;'`\000\\zxcv"                          /* 0x20 - 0x2f */
	drivers/char/keyboard.c-        "bnm,./\000*\000 \000\201\202\203\204\205"      /* 0x30 - 0x3f */

ie it didn't do the "two lines after" thing.

That said, the external grep also gets this wrong (a different way), 
because it gets all the extra noise due to unnecessary separation lines, 
so for the external grep I actually get

	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	drivers/char/keyboard.c-unsigned char kbd_sysrq_xlate[KEY_MAX + 1] =
	drivers/char/keyboard.c-        "\000\0331234567890-=\177\t"                    /* 0x00 - 0x0f */
	drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"                          /* 0x10 - 0x1f */
	drivers/char/keyboard.c-        "dfghjkl;'`\000\\zxcv"                          /* 0x20 - 0x2f */
	drivers/char/keyboard.c-        "bnm,./\000*\000 \000\201\202\203\204\205"      /* 0x30 - 0x3f */
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--

but that's a long-standing problem, and is more "ugly" than "wrong grep 
results".

			Linus

^ permalink raw reply

* [PATCH] rebase--interactive: Ignore comments and blank lines in peek_next_command
From: Michael Haggerty @ 2010-01-11 15:56 UTC (permalink / raw)
  To: git; +Cc: gitster, Johannes.Schindelin, Michael Haggerty

Previously, blank lines and/or comments within a series of
squash/fixup commands would confuse "git rebase -i" into thinking that
the series was finished.  It would therefore require the user to edit
the commit message for the squash/fixup commits seen so far.  Then,
after continuing, it would ask the user to edit the commit message
again.

Ignore comments and blank lines within a group of squash/fixup
commands, allowing them to be processed in one go.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
This patch applies to master.  It does not conflict with either
mh/rebase-fixup or ns/rebase-auto-squash.

 git-rebase--interactive.sh |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index d529328..6ed57e2 100755
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -322,7 +322,7 @@ make_squash_message () {
 }
 
 peek_next_command () {
-	sed -n "1s/ .*$//p" < "$TODO"
+	sed -n -e "/^#/d" -e "/^$/d" -e "s/ .*//p" -e "q" < "$TODO"
 }
 
 do_next () {
-- 
1.6.6.137.g5b1417

^ permalink raw reply related

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
From: Linus Torvalds @ 2010-01-11 15:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git
In-Reply-To: <7vvdf9402f.fsf@alter.siamese.dyndns.org>



On Sun, 10 Jan 2010, Junio C Hamano wrote:
> 
> Here is an experimental patch; first, some numbers (hot cache best of 5 runs).

Without testing it, I can already ACK it. It looks like the 
ObviouslyRightThing(tm) to do. But I'll run some numbers too.

One thing that worries me - but that is independent of this patch - is 
that I don't think our 'grep' function works correctly (neither the 
'fixmatch()' one or the 'regexec()' one) when there are NUL characters in 
a file. Maybe I shouldn't care, but it worries me a bit.

		Linus

^ permalink raw reply

* Re: [PATCH] Threaded grep (was: Re: [PATCH] grep: do not do external  grep on skip-worktree entries)
From: Fredrik Kuivinen @ 2010-01-11 10:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: peff, gitster, miles, pclouds, Git Mailing List
In-Reply-To: <alpine.LFD.2.00.1001080956270.7821@localhost.localdomain>

[I messed up the Cc list when I sent the first mail in this thread, so
it didn't reach git@vger. This time it's fixed for real. Sorry for the
extra copy.]

On Fri, Jan 8, 2010 at 19:04, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Fri, 8 Jan 2010, Fredrik Kuivinen wrote:
>>
>> I only have access to a couple of boxes with more than one core so
>> some more testing would be greatly appreciated. On the boxes I have
>> tested this on the added parallelism roughly cut the time to grep the
>> linux kernel in half (compared to the built-in grep). It also compares
>> favourably to the external GNU grep (these are best of three runs):
>
> On my box (in all cases best-of-five):
>
>  - "NO_THREADS=1 git grep --no-ext-grep qwerty":
>
>        real    0m0.945s
>        user    0m0.808s
>        sys     0m0.128s
>
>  - "git grep --no-ext-grep qwerty":
>
>        real    0m0.402s
>        user    0m1.116s
>        sys     0m0.216s
>
>  - "git grep qwerty":
>
>        real    0m0.408s
>        user    0m0.176s
>        sys     0m0.152s
>
> so it _just_ beat the external grep thanks to using 330% CPU time. An
> improvement, yes, but the CPU wastage is kind of sad. It really would be
> nice to see if we could get rid of the stupid per-line overhead some way.

I agree. The per-line thing seems to be fixed with Junios recent patch.

> Btw, there does seem to be some unnecessary synchronization there, because
> if I pick a pattern that has no matches at all, my best parallel number
> goes down to 0.316. But the variation in times for the parallel one is so
> big that I don't know how relevant that all is.
>
> I suspect you need more threads than CPU's due to the waiting (so that
> other threads can pick up the slack when one thread ends up waiting to
> output). Or don't wait at all, and queue it up instead.

Yes, you are right, there is some unnecessary synchronization. I am
working on a new patch which queues the output instead.

- Fredrik

^ permalink raw reply

* [PATCH] Add missing #include to support TIOCGWINSZ on Solaris
From: Nguyễn Thái Ngọc Duy @ 2010-01-11 10:41 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

On Linux TIOCGWINSZ is defined somehwere in ioctl.h, which is already
included. On Solaris we also need to include termios.h. Without this
term_columns() in help.c will think TIOCGWINSZ is not supported and
always return 80 columns.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Solaris noob here. Somebody should check if it affects other platforms.

 git-compat-util.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index e5e9f39..de3a70e 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -90,6 +90,7 @@
 #include <sys/poll.h>
 #include <sys/socket.h>
 #include <sys/ioctl.h>
+#include <termios.h>
 #ifndef NO_SYS_SELECT_H
 #include <sys/select.h>
 #endif
-- 
1.6.4.1.401.gc69a7

^ permalink raw reply related

* Re: [PATCH 2/6] Documentation: merge: add an overview
From: Junio C Hamano @ 2010-01-11 10:09 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Thomas Rast, Petr Baudis
In-Reply-To: <20100111083028.GB23806@progeny.tock>

Jonathan Nieder <jrnieder@gmail.com> writes:

> The reader unfamiliar with the concepts of branching and merging
> would have been completely lost.  Try to help him with a diagram.

Good idea.

>  DESCRIPTION
>  -----------
> -This is the top-level interface to the merge machinery
> -which drives multiple merge strategy scripts.
> +Incorporates changes leading up to the named commits into the
> +current branch.

Having "since the histories diverged" somewhere in this first sentence
would clarify the concepts better, I think.

> +Assume the following history exists and the current branch is
> +"`master`":
> +
> +------------
> +          A---B---C topic
> +         /
> +    D---E---F---G master
> +------------
> +
> +Then "`git merge topic`" will apply the changes from `A`, `B`,
> +and `C` to the work tree, and if they do not conflict with any
> +changes from `master`, will store the result in a new commit along
> +with the names of the two parent commits and a log message from the
> +user describing the changes.

 - Don't spell A, B and C out; technically we don't do that, and
   conceptually "changes since E until C" is exactly the same.

 - Don't talk about "what it does" first, but talk about "what it is used
   for" iow "why do you want to use it".

 - "What it is used for" doesn't have to talk about "if it does not
   conflict" yet; it is merely a lower-level detail that the main part of
   the document can teach the users to help the tool achieve its goal
   (i.e. "what it is used for") when it cannot do so automatically.

	`git merge topic` is used to replay the change the topic has made
        since it diverged from master's history (i.e. E) until its current
        commit (i.e. C) on top of master, and store the result in a new
        commit along with....

> +------------
> +          A---B---C topic
> +         /         \
> +    D---E---F---G---H master
> +------------
>  
>  The second syntax (<msg> `HEAD` <remote>) is supported for
>  historical reasons.  Do not use it from the command line or in
> -- 
> 1.6.6

^ permalink raw reply

* Re: [PATCH 2/2] Use $(git rev-parse --show-toplevel) in cd_to_toplevel()
From: Junio C Hamano @ 2010-01-11  9:58 UTC (permalink / raw)
  To: Steven Drake; +Cc: Junio C Hamano, git
In-Reply-To: <alpine.LNX.2.00.1001112114140.9352@vqena.qenxr.bet.am>

Steven Drake <sdrake@xnet.co.nz> writes:

>>  (4) Sign your patch, before the three-dash line.
>
> Opps forgot '--signoff', I've put format.signoff=ture in .gitconfig to solve
> that problem. 
>
> Perhaps a warning message from format-patch of the form:
> WARNING: You have not added a "Signed-off-by:" line did you mean to!

1. It is usually a good idea to make it a habit of running "commit -s",
   iow, record your sign-off at _commit time_, when working on a project
   that uses the convention.  You may start contributing by sending a
   pull-request instead of patches later.

2. scripts/checkpatch.pl script in the Linux kernel project is a good tool
   to check your patch before submission; you run it as:

   $ perl checkpatch.pl --no-tree 0001-my-changes.patch
   
>> Please line-break immediately after &&; it makes it easier to read in
>> general, and it would make "cd" stand out in this particular case, as it
>> is the most important part of this particular function.
>
> Good point, althought did you mean as a general shell script coding rule or
> just in this particular case.

"In general" as a general style suggestion, and "show 'cd' at the
beginning on its own line" as a more reason to do so for this specific
case.

> Thanks for the feedback.  Did you want me to resend the signed and cleaned up
> patches direct to you?  

Or to the list.

I could actually have fixed them up and commit myself, instead of
responding with comments that can be seen as if I were nitpicking.

The reason why I chose not to was because I hoped you would keep
contributing to this project more in the future (I see you have already
another commit in our history, and you seem to be reasonably competent,
judging from the patch description and the way you communicate in the
discussion).  And I wanted to make sure your future patches are easier to
handle for me ;-).

Thanks.

^ permalink raw reply

* Re: [PATCH] Remove empty directories when checking out a commit with fewer submodules
From: Junio C Hamano @ 2010-01-11  9:53 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Peter Collingbourne, git
In-Reply-To: <alpine.DEB.1.00.1001110954410.4985@pacific.mpi-cbg.de>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> NAK.  We should not even try to _unlink_ submodule subdirectories; it 
> would be _way_ too easy to lose data that way.  Remember, submodules are a 
> totally different beast from regular files.  They can contain valuable, 
> yet uncommitted data, that is not even meant to be committed.
>
> So you say if the submodule directories are empty, it is safe?  Not so.  
> They will never be empty: there is always .git/...

NACK on NAK.

Don't worry, your data will be safe.  The only case rmdir would actually
remove it is (1) you check out superproject that has submodule A, but you
choose not to "submodule init/update" it, because you don't need a
checkout of that part of the tree for your job, and then (2) you switch to
a different version of the superproject that doesn't anymore (or didn't
back then) have that submodule.  In such a use case, you will have only an
empty directory for A in step (1).  The unnecessary empty directory A will
be left behind, even after switching to a version that shouldn't have the
directory there in step (2), if you do not rmdir it.  So the patch is a
strict bugfix (it attempted to unlink, which is a bug; it really meant
"rmdir" and not "rm -rf" which you seem to be worried about).

It is a separate matter to _enhance_ the codepath to actually either (A)
refuse to overwrite (if the version of the superproject you are switching
to in step (2) had a regular file or a directory that is part of the
superproject there, and/or (B) move it away to somewhere safe (recall the
discussion of ".git/modules/$submodule" hierarchy of the superproject?)
automatically when it will disappear.  Such enhancements will help people
who _do_ "submodule init/update" the submodule in step (1) and switch to a
version of the superproject that lacks it in step (2).

^ permalink raw reply

* Re: [PATCH] Remove empty directories when checking out a commit with fewer submodules
From: Johannes Schindelin @ 2010-01-11  9:45 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Peter Collingbourne
In-Reply-To: <201001111032.45637.johan@herland.net>

Hi,

On Mon, 11 Jan 2010, Johan Herland wrote:

> On Monday 11 January 2010, Johannes Schindelin wrote:
> > Hi,
> >
> > On Mon, 11 Jan 2010, Peter Collingbourne wrote:
> > > Change the unlink_entry function to use rmdir to remove submodule
> > > directories.
> >
> > NAK.  We should not even try to _unlink_ submodule subdirectories; it
> > would be _way_ too easy to lose data that way.  Remember, submodules
> > are a totally different beast from regular files.  They can contain
> > valuable, yet uncommitted data, that is not even meant to be
> > committed.
> >
> > So you say if the submodule directories are empty, it is safe?  Not
> > so. They will never be empty: there is always .git/, and _that_ can
> > contain valuable information that you do not want to throw away, too.
> >  Think of unpushed branches, for example.  That would be _fatal_ if
> > you rmdir() that for me.
> >
> > So please, no,
> 
> I believe what Peter is referring to is the _empty_ directories (and 
> that includes no .git/) that are placeholders for submodules that are 
> deliberately not cloned/checked out. This lets you do things like:
> 
> 	git clone url:to/some/project
> 	cd project
> 	git checkout some-other-branch-with-different-submodules
> 	git submodule update --init
> 
> Of course, once you clone/checkout a submodule, there will be contents 
> in that directory (including the .git/), and Git should not try to 
> remove it.

Yes, this might very well have been my confusion.  Peter, could you please 
refer to such submodules as "uninitialized" rather than "empty" in the 
future?  This would help simple minds like mine to understand you better.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH] Remove empty directories when checking out a commit with fewer submodules
From: Johan Herland @ 2010-01-11  9:32 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Peter Collingbourne
In-Reply-To: <alpine.DEB.1.00.1001110954410.4985@pacific.mpi-cbg.de>

On Monday 11 January 2010, Johannes Schindelin wrote:
> Hi,
>
> On Mon, 11 Jan 2010, Peter Collingbourne wrote:
> > Change the unlink_entry function to use rmdir to remove submodule
> > directories.
>
> NAK.  We should not even try to _unlink_ submodule subdirectories; it
> would be _way_ too easy to lose data that way.  Remember, submodules
> are a totally different beast from regular files.  They can contain
> valuable, yet uncommitted data, that is not even meant to be
> committed.
>
> So you say if the submodule directories are empty, it is safe?  Not
> so. They will never be empty: there is always .git/, and _that_ can
> contain valuable information that you do not want to throw away, too.
>  Think of unpushed branches, for example.  That would be _fatal_ if
> you rmdir() that for me.
>
> So please, no,

I believe what Peter is referring to is the _empty_ directories (and 
that includes no .git/) that are placeholders for submodules that are 
deliberately not cloned/checked out. This lets you do things like:

	git clone url:to/some/project
	cd project
	git checkout some-other-branch-with-different-submodules
	git submodule update --init

Of course, once you clone/checkout a submodule, there will be contents 
in that directory (including the .git/), and Git should not try to 
remove it.


Have fun! :)

...Johan



-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply

* Re: [PATCH 2/2] Use $(git rev-parse --show-toplevel) in cd_to_toplevel()
From: Steven Drake @ 2010-01-11  9:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7viqb9w0c8.fsf@alter.siamese.dyndns.org>

On Sun, 10 Jan 2010, Junio C Hamano wrote:

>  (3) Please avoid referring to external resource in the commit log message
>      whenever makes sense; the log should be understandable on its own.
>      Because the first paragraph of your message describes the issue the
>      patch addresses very well already, you don't need "See NetBSD..." and
>      URL.  If you want to have them to help the reviewers, place such
>      reference after the three-dash line, just like you wrote "This is a
>      revision..."  You would help reviewers even more if you added a
>      pointer to your earlier patch after that sentence;

Wondered wether I should have put the extra info after the three-dashes or
not, now I know.

I also made sure my second email had References and In-Reply-To headers to 
my first email.

>  (4) Sign your patch, before the three-dash line.

Opps forgot '--signoff', I've put format.signoff=ture in .gitconfig to solve
that problem. 

Perhaps a warning message from format-patch of the form:
WARNING: You have not added a "Signed-off-by:" line did you mean to!

> Please line-break immediately after &&; it makes it easier to read in
> general, and it would make "cd" stand out in this particular case, as it
> is the most important part of this particular function.

Good point, althought did you mean as a general shell script coding rule or
just in this particular case.

Thanks for the feedback.  Did you want me to resend the signed and cleaned up
patches direct to you?  

-- 
Steven

^ permalink raw reply

* Re: [PATCH] Remove empty directories when checking out a commit with fewer submodules
From: Johannes Schindelin @ 2010-01-11  8:57 UTC (permalink / raw)
  To: Peter Collingbourne; +Cc: git
In-Reply-To: <1263178794-3140-1-git-send-email-peter@pcc.me.uk>

Hi,

On Mon, 11 Jan 2010, Peter Collingbourne wrote:

> Change the unlink_entry function to use rmdir to remove submodule
> directories.

NAK.  We should not even try to _unlink_ submodule subdirectories; it 
would be _way_ too easy to lose data that way.  Remember, submodules are a 
totally different beast from regular files.  They can contain valuable, 
yet uncommitted data, that is not even meant to be committed.

So you say if the submodule directories are empty, it is safe?  Not so.  
They will never be empty: there is always .git/, and _that_ can contain 
valuable information that you do not want to throw away, too.  Think of 
unpushed branches, for example.  That would be _fatal_ if you rmdir() that 
for me.

So please, no,
Dscho

^ permalink raw reply

* [PATCH 6/6] Documentation: tweak How Merge Works
From: Jonathan Nieder @ 2010-01-11  8:43 UTC (permalink / raw)
  To: git; +Cc: Thomas Rast, Petr Baudis, Junio C Hamano
In-Reply-To: <20100111082123.GA23742@progeny.tock>

Change heading to TRUE MERGE.  The whole manual page is about how
merges work.

Start to explain what it means to merge two commits into a single
tree.

Do not assume the commits named on the 'git merge' command line
come from another repository.  For simplicity, still assume they
are branch heads for now, though.

Do not give start any list items with `code`; a toolchain bug
makes the resulting nroff look wrong.

Recommend reset --merged for safely cancelling a failed merge.

Cc: Petr Baudis <pasky@suse.cz>
Cc: Junio C Hamano <gitster@pobox.com>
Cc: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/git-merge.txt |   56 +++++++++++++++++++-----------------------
 1 files changed, 25 insertions(+), 31 deletions(-)

diff --git a/Documentation/git-merge.txt b/Documentation/git-merge.txt
index ec9c6d3..7ae0f65 100644
--- a/Documentation/git-merge.txt
+++ b/Documentation/git-merge.txt
@@ -96,62 +96,56 @@ merge commit.
 
 This behavior can be suppressed with the `--no-ff` option.
 
-include::merge-strategies.txt[]
-
-
-If you tried a merge which resulted in complex conflicts and
-want to start over, you can recover with 'git-reset'.
-
-HOW MERGE WORKS
----------------
-
-A merge is always between the current `HEAD` and one or more
-commits (usually, branch head or tag).
+TRUE MERGE
+----------
 
 Except in a fast-forward merge (see above), the branches to be
 merged must be tied together by a merge commit that has both of them
 as its parents.
 The rest of this section describes this "True merge" case.
 
-The chosen merge strategy merges the two commits into a single
-new source tree.
 When things merge cleanly, this is what happens:
 
-1. The results are updated both in the index file and in your
-   working tree;
-2. Index file is written out as a tree;
+1. A version reconciling the changes from all branches to be
+   merged is written to the index file and your working tree;
+2. The index file is written out as a tree;
 3. The tree gets committed; and
 4. The `HEAD` pointer gets advanced.
 
 Because of 2., we require that the original state of the index
 file matches exactly the current `HEAD` commit; otherwise we
-will write out your local changes already registered in your
+would write out your local changes already registered in your
 index file along with the merge result, which is not good.
 Because 1. involves only those paths differing between your
-branch and the remote branch you are pulling from during the
-merge (which is typically a fraction of the whole tree), you can
-have local modifications in your working tree as long as they do
-not overlap with what the merge updates.
-
-When there are conflicts, the following happens:
+branch and the other branches (which is typically a fraction of
+the whole tree), you can have local modifications in your
+working tree as long as they do not overlap with what the merge
+updates.
 
-1. `HEAD` stays the same.
+When it is not obvious how to reconcile the changes, the following
+happens:
 
-2. Cleanly merged paths are updated both in the index file and
+1. The `HEAD` pointer stays the same.
+2. The `MERGE_HEAD` ref is set to point to the other branch head.
+3. Paths that merged cleanly are updated both in the index file and
    in your working tree.
-
-3. For conflicting paths, the index file records up to three
+4. For conflicting paths, the index file records up to three
    versions; stage1 stores the version from the common ancestor,
-   stage2 from `HEAD`, and stage3 from the remote branch (you
+   stage2 from `HEAD`, and stage3 from `MERGE_HEAD` (you
    can inspect the stages with `git ls-files -u`).  The working
    tree files contain the result of the "merge" program; i.e. 3-way
-   merge results with familiar conflict markers `<<< === >>>`.
-
-4. No other changes are done.  In particular, the local
+   merge results with familiar conflict markers `<<<` `===` `>>>`.
+5. No other changes are done.  In particular, the local
    modifications you had before you started merge will stay the
    same and the index entries for them stay as they were,
    i.e. matching `HEAD`.
 
+If you tried a merge which resulted in complex conflicts and
+want to start over, you can recover with `git reset --merged`.
+
+include::merge-strategies.txt[]
+
+
 HOW CONFLICTS ARE PRESENTED
 ---------------------------
 
-- 
1.6.6

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox