Git development
 help / color / mirror / Atom feed
* Re: [PATCH/RFC 0/3] faster inexact rename handling
From: Linus Torvalds @ 2007-10-30 20:39 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Andy C, Junio C Hamano
In-Reply-To: <20071030202014.GA22733@coredump.intra.peff.net>



On Tue, 30 Oct 2007, Jeff King wrote:
> 
> Well, the problem is that instead of just "dropping" boilerplate text,
> we fail to count it as a similarity, but it still counts towards the
> file size. It may be that just dropping it totally is the right thing
> (in which case those renames _will_ turn up, because they will be filled
> with identical non-boilerplate goodness).

Yeah, you may well be right, and the normalization of the scores will just 
solve things.

			Linus

^ permalink raw reply

* Re: [GIT-GUI PATCH 2/3] po2msg: ignore untranslated messages
From: Christian Stimming @ 2007-10-30 20:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, spearce, git
In-Reply-To: <7vabq0l7wn.fsf@gitster.siamese.dyndns.org>

Am Dienstag, 30. Oktober 2007 20:27 schrieb Junio C Hamano:
> > Do not generate translations when the translated message is empty.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  po/po2msg.sh |    3 +++
> >  1 files changed, 3 insertions(+), 0 deletions(-)
> >
> > diff --git a/po/po2msg.sh b/po/po2msg.sh
> > index 48a2669..91d420b 100644
> > --- a/po/po2msg.sh
> > +++ b/po/po2msg.sh
> > @@ -62,6 +62,9 @@ proc flush_msg {} {
> >  	if {$msgid == ""} {
> >  		set prefix "set ::msgcat::header"
> >  	} else {
> > +		if {$msgstr == ""} {
> > +			return
> > +		}
> >  		set prefix "::msgcat::mcset $lang \"[u2a $msgid]\""
> >  	}
>
> Is this change to fix some real issues?

I don't think to - it just makes the resulting foo.msg file smaller.

> Sometimes it is handy to be able to translate a non-empty string
> into an empty one in one target language.

Err... no, this is not the case. The semantics of an msgstr == "" is identical 
to saying "No translation exists to this source string".  Nothing more, 
nothing less. You can't specify a translation that should map a given string 
to an empty string. (If you make up a case when that would make some sense, 
usually the source string is rather weirdly chosen and should be reworded.)

Christian

^ permalink raw reply

* [PATCH 7/5] add throughput display to git-push
From: Nicolas Pitre @ 2007-10-30 21:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <alpine.LFD.0.9999.0710301535160.21255@xanadu.home>

This one triggers only when git-pack-objects is called with 
--all-progress and --stdout which is the combination used by
git-push.

Signed-off-by: Nicolas Pitre <nico@cam.org>
---
 builtin-pack-objects.c |    2 +-
 csum-file.c            |    8 ++++++++
 csum-file.h            |    4 ++++
 3 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 52a26a2..25ec65d 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -606,7 +606,7 @@ static void write_pack_file(void)
 		char *pack_tmp_name = NULL;
 
 		if (pack_to_stdout) {
-			f = sha1fd(1, "<stdout>");
+			f = sha1fd_throughput(1, "<stdout>", progress_state);
 		} else {
 			char tmpname[PATH_MAX];
 			int fd;
diff --git a/csum-file.c b/csum-file.c
index 9929991..3729e73 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -8,6 +8,7 @@
  * able to verify hasn't been messed with afterwards.
  */
 #include "cache.h"
+#include "progress.h"
 #include "csum-file.h"
 
 static void sha1flush(struct sha1file *f, unsigned int count)
@@ -17,6 +18,7 @@ static void sha1flush(struct sha1file *f, unsigned int count)
 	for (;;) {
 		int ret = xwrite(f->fd, buf, count);
 		if (ret > 0) {
+			display_throughput(f->tp, ret);
 			buf = (char *) buf + ret;
 			count -= ret;
 			if (count)
@@ -80,6 +82,11 @@ int sha1write(struct sha1file *f, void *buf, unsigned int count)
 
 struct sha1file *sha1fd(int fd, const char *name)
 {
+	return sha1fd_throughput(fd, name, NULL);
+}
+
+struct sha1file *sha1fd_throughput(int fd, const char *name, struct progress *tp)
+{
 	struct sha1file *f;
 	unsigned len;
 
@@ -94,6 +101,7 @@ struct sha1file *sha1fd(int fd, const char *name)
 	f->fd = fd;
 	f->error = 0;
 	f->offset = 0;
+	f->tp = tp;
 	f->do_crc = 0;
 	SHA1_Init(&f->ctx);
 	return f;
diff --git a/csum-file.h b/csum-file.h
index c3c792f..4d1b231 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -1,11 +1,14 @@
 #ifndef CSUM_FILE_H
 #define CSUM_FILE_H
 
+struct progress;
+
 /* A SHA1-protected file */
 struct sha1file {
 	int fd, error;
 	unsigned int offset, namelen;
 	SHA_CTX ctx;
+	struct progress *tp;
 	char name[PATH_MAX];
 	int do_crc;
 	uint32_t crc32;
@@ -13,6 +16,7 @@ struct sha1file {
 };
 
 extern struct sha1file *sha1fd(int fd, const char *name);
+extern struct sha1file *sha1fd_throughput(int fd, const char *name, struct progress *tp);
 extern int sha1close(struct sha1file *, unsigned char *, int);
 extern int sha1write(struct sha1file *, void *, unsigned int);
 extern void crc32_begin(struct sha1file *);

^ permalink raw reply related

* Re: Recording merges after repo conversion
From: Peter Karlsson @ 2007-10-30 21:06 UTC (permalink / raw)
  To: Lars Hjemli; +Cc: Benoit SIGOURE, git
In-Reply-To: <8c5c35580710300729t4a7b375dud01253d9b4ef7196@mail.gmail.com>

Lars Hjemli:

> No, the grafts file is purely local.

Hmm, any chance that will change in a future version?

> To achieve your goal, you'd have to 'git filter-branch' before 
> pushing/cloning. But beware: this _will_ rewrite your current branch(es).

Ouch. I'll have to think about whether I want to do that, then...

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply

* Re: Problem with git-cvsimport
From: Mike Snitzer @ 2007-10-30 21:15 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Eyvind Bernhardsen, Thomas Pasch, git, Jan Wielemaker,
	Gerald (Jerry) Carter, dev
In-Reply-To: <170fa0d20710301306o6b3798f9k72615eb811d871f2@mail.gmail.com>

On 10/30/07, Mike Snitzer <snitzer@gmail.com> wrote:
> On 10/10/07, Eyvind Bernhardsen <eyvind-git-list@orakel.ntnu.no> wrote:
> ...
> >
> > Thanks for making cvs2svn the best CVS-to-git conversion tool :)  Now
> > if it would only support incremental importing...
>
> Michael,
>
> I second this question: is there any chance incremental importing will
> be implemented in cvs2svn?
>
> I've not used cvs2svn much and when I did it was for svn not git; but
> given that git-cvsimport is known to mess up your git repo (as Eyvind
> pointed out earlier) there doesn't appear to be any reliable tools to
> allow for incrementally importing from cvs to git.
>
> Are others using a tool for reliably importing from cvs to git?

After reading the fairly recent "cvs2svn conversion directly to git
ready for experimentation" thread it is clear that its doable but
hasn't been done (seeing as you were looking for volunteers to do it).

Sorry for the noise,
Mike

^ permalink raw reply

* Re: Problem with git-cvsimport
From: Robin Rosenberg @ 2007-10-30 21:44 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Michael Haggerty, Eyvind Bernhardsen, Thomas Pasch, git,
	Jan Wielemaker, Gerald (Jerry) Carter, dev
In-Reply-To: <170fa0d20710301306o6b3798f9k72615eb811d871f2@mail.gmail.com>

tisdag 30 oktober 2007 skrev Mike Snitzer:
> On 10/10/07, Eyvind Bernhardsen <eyvind-git-list@orakel.ntnu.no> wrote:
> ...
> >
> > Thanks for making cvs2svn the best CVS-to-git conversion tool :)  Now
> > if it would only support incremental importing...
> 
> Michael,
> 
> I second this question: is there any chance incremental importing will
> be implemented in cvs2svn?
> 
> I've not used cvs2svn much and when I did it was for svn not git; but
> given that git-cvsimport is known to mess up your git repo (as Eyvind
> pointed out earlier) there doesn't appear to be any reliable tools to
> allow for incrementally importing from cvs to git.
> 
> Are others using a tool for reliably importing from cvs to git?

I use fromcvs which is *very* fast, and quite memory conservative compared to 
the others and seems reliable so far (six months). It probably breaks on 
exotic variants of branches though, but I don't have those / don't care about 
them.

Do not push into the same repo fromcvs works on. I don't understand why, but I 
pushed once and *poof* the conversion went bad. 

Drawbacks, more dependencies and access to the rcs files is required and tags 
are not converted.

-- robin

^ permalink raw reply

* Re: remote#branch
From: Pascal Obry @ 2007-10-30 19:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthieu Moy, Tom Prince, Theodore Tso, Junio C Hamano, Jan Hudec,
	Johannes Schindelin, Petr Baudis, Paolo Ciarrocchi, git
In-Reply-To: <alpine.LFD.0.999.0710301056070.30120@woody.linux-foundation.org>

Linus Torvalds a écrit :
> I keep talking about a web browser, because THE ONLY POINT of following a 
> standard is to interoperate.

Yes, and since URLs are not used for web browser only I do not see the
point to concentrate all this discussion about a single possible usage.

> Why is that so hard to understand?

I'm thinking alike :)

Pascal;

-- 

--|------------------------------------------------------
--| Pascal Obry                           Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--|              http://www.obry.net
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595

^ permalink raw reply

* Re: Recording merges after repo conversion
From: Lars Hjemli @ 2007-10-30 21:46 UTC (permalink / raw)
  To: Peter Karlsson; +Cc: Benoit SIGOURE, git
In-Reply-To: <Pine.LNX.4.62.0710302204590.6976@perkele.intern.softwolves.pp.se>

On Oct 30, 2007 10:06 PM, Peter Karlsson <peter@softwolves.pp.se> wrote:
> Lars Hjemli:
>
> > No, the grafts file is purely local.
>
> Hmm, any chance that will change in a future version?

Not likely

>
> > To achieve your goal, you'd have to 'git filter-branch' before
> > pushing/cloning. But beware: this _will_ rewrite your current branch(es).
>
> Ouch. I'll have to think about whether I want to do that, then...

Well, it isn't dangerous, but if someone has already cloned your repo
_and_ commited local changes they'll need to rebase their work onto
the new branch(es). Basically, you'll want to inform these people that
you're going to rewrite the branches.

-- 
larsh

^ permalink raw reply

* Re: remote#branch
From: Jeff King @ 2007-10-30 23:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pascal Obry, Matthieu Moy, Tom Prince, Theodore Tso,
	Junio C Hamano, Jan Hudec, Johannes Schindelin, Petr Baudis,
	Paolo Ciarrocchi, git
In-Reply-To: <alpine.LFD.0.999.0710301232000.30120@woody.linux-foundation.org>

On Tue, Oct 30, 2007 at 12:38:27PM -0700, Linus Torvalds wrote:

> So if you want to follow the RFC, you'd better give a real reason. And no, 
> the existence of an RFC, and the fact that people use the same name for 
> things that superficially _look_ the same is not a reason in itself.
> 
> So hands up, people. Anybody who asked for RFC quoting. Give a damn 
> *reason* already!

I didn't ask for RFC quoting, but a nice side effect of URL syntax is
that they are machine parseable. If you wanted to write a tool to pick
the URLs out of this email and clone them as git repos, then how do you
find the end of:

  http://host/git repo with spaces in the path

compared to:

  http://host/git+repo+with+spaces+in+the+path

I don't know if that's worth changing anything in git (in fact, I'm not
even clear on _what_ people want to change; the point of this discussion
seems to be to argue about terminology). But you did ask for any reason
for quoting URLs.

-Peff

^ permalink raw reply

* 1.5.3.5 will be out tomorrow
From: Junio C Hamano @ 2007-10-31  0:01 UTC (permalink / raw)
  To: git

A fix for a segfaulting bug warrants a new maintenance release,
so 1.5.3.5 will be out tomorrow.

Here is the current shortlog:

Alex Bennee (1):
    Ensure we add directories in the correct order

Alex Riesen (1):
    Fix generation of perl/perl.mak

Andrew Clausen (1):
    helpful error message when send-pack finds no refs in common.

Aurelien Bompard (1):
    honor the http.sslVerify option in shell scripts

Benoit Sigoure (1):
    Fix a small memory leak in builtin-add

Brian Gernhardt (1):
    cvsserver: Use exit 1 instead of die when req_Root fails.

Frank Lichtenheld (1):
    git-config: don't silently ignore options after --list

Gerrit Pape (2):
    git-config: handle --file option with relative pathname properly
    git-config: print error message if the config file cannot be read

Jean-Luc Herren (2):
    git add -i: Fix parsing of abbreviated hunk headers
    git add -i: Remove unused variables

Jeff King (1):
    send-pack: respect '+' on wildcard refspecs

Joakim Tjernlund (1):
    Improve receive-pack error message about funny ref creation

Johannes Schindelin (5):
    clear_commit_marks(): avoid deep recursion
    rebase -i: use diff plumbing instead of porcelain
    Fix setup_git_directory_gently() with relative GIT_DIR &
      GIT_WORK_TREE
    fix filter-branch documentation
    filter-branch: update current branch when rewritten

Julian Phillips (1):
    fast-import: Fix argument order to die in file_change_m

Junio C Hamano (6):
    git-remote: fix "Use of uninitialized value in string ne"
    sha1_file.c: avoid gcc signed overflow warnings
    merge-recursive.c: mrtree in merge() is not used before set
    RelNotes-1.5.3.5: describe recent fixes
    Prevent send-pack from segfaulting (backport from 'master')
    git-merge: document but discourage the historical syntax

Linus Torvalds (6):
    Fix embarrassing "git log --follow" bug
    Clean up "git log" format with DIFF_FORMAT_NO_OUTPUT
    git-blame shouldn't crash if run in an unmerged tree
    Avoid scary errors about tagged trees/blobs during git-fetch
    Fix directory scanner to correctly ignore files without d_type
    Fix diffcore-break total breakage

Mathias Megyei (1):
    Do not remove distributed configure script

Michael W. Olson (1):
    Documentation/git-cvsexportcommit.txt: s/mgs/msg/ in example

Michele Ballabio (2):
    git-reflog: document --verbose
    git-archive: document --exec

Nicolas Pitre (1):
    cherry-pick/revert: more compact user direction message

Patrick Welche (1):
    Define NI_MAXSERV if not defined by operating system

Ralf Wildenhues (1):
    gitk.txt: Fix markup.

Robert Schiele (1):
    fixing output of non-fast-forward output of post-receive-email

Sergei Organov (1):
    core-tutorial: Use new syntax for git-merge.

Shawn O. Pearce (18):
    git-gui: Display message box when we cannot find git in $PATH
    git-gui: Handle starting on mapped shares under Cygwin
    git-gui: Ensure .git/info/exclude is honored in Cygwin workdirs
    git-gui: Allow gitk to be started on Cygwin with native Tcl/Tk
    git-gui: Don't crash when starting gitk from a browser session
    Whip post 1.5.3.4 maintenance series into shape.
    Correct typos in release notes for 1.5.3.5
    Avoid 'expr index' on Mac OS X as it isn't supported
    Document additional 1.5.3.5 fixes in release notes
    Yet more 1.5.3.5 fixes mentioned in release notes
    Avoid invoking diff drivers during git-stash
    Further 1.5.3.5 fixes described in release notes
    Paper bag fix diff invocation in 'git stash show'
    git-gui: Correctly report failures from git-write-tree
    git-gui: Handle progress bars from newer gits
    git-gui: Don't display CR within console windows
    Merge branch 'maint' of git://repo.or.cz/git-gui into maint
    Describe more 1.5.3.5 fixes in release notes

Simon Sasburg (1):
    git-gui: Avoid using bold text in entire gui for some fonts

Steffen Prohaska (2):
    git-gui: accept versions containing text annotations, like
      1.5.3.mingw.1
    attr: fix segfault in gitattributes parsing code

^ permalink raw reply

* Re: [PATCH 2/5] make struct progress an opaque type
From: Junio C Hamano @ 2007-10-31  0:05 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git
In-Reply-To: <1193770655-20492-3-git-send-email-nico@cam.org>

The ../linux-2.6/scripts/checkpatch.pl (run with --no-tree)
script found a few instances of:

ERROR: "foo * bar" should be "foo *bar"
#239: FILE: progress.c:91:
+struct progress * start_progress_delay(const char *title, unsigned total,

I'll munge them away before applying.

^ permalink raw reply

* Re: git-merge: inconsistent manual page.
From: Junio C Hamano @ 2007-10-31  0:05 UTC (permalink / raw)
  To: Sergei Organov; +Cc: git
In-Reply-To: <87wst44cvb.fsf@osv.gnss.ru>

Sergei Organov <osv@javad.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>> Subject: git-merge: document but discourage the historical syntax
>>
>> Historically "git merge" took its command line arguments in a
>> rather strange order.  Document the historical syntax, and also
>> document clearly that it is not encouraged in new scripts.
>>
>> There is no reason to deprecate the historical syntax, as the
>> current code can sanely tell which syntax the caller is using,
>> and existing scripts by people do use the historical syntax.
>
> OK, your patch is better than what I've suggested. The only thing that
> your patch seems to be missing is prepending -m to <msg>:: in the
> OPTIONS section. Yeah, it could be more strict to just describe <msg>,
> ...

Nah, you are absolutely right.  We describe "-s <strategy>"
under the headline with the leading "-s".

Will fix.

^ permalink raw reply

* Re: [PATCH/RFC 0/3] faster inexact rename handling
From: Andy C @ 2007-10-31  0:06 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Linus Torvalds, Junio C Hamano
In-Reply-To: <20071030042118.GA14729@sigill.intra.peff.net>

Sorry I have been AWOL... I was going to try to work on this, but I
got abjectly sick (long story).  But it's great to see this out.


On 10/29/07, Jeff King <peff@peff.net> wrote:
> This is my first stab at faster rename handling based on Andy's code.
> The patches are on top of next (to get Linus' recent work on exact
> renames). Most of the interesting stuff is in 2/3.
>
>   1/3: extension of hash interface
>   2/3: similarity detection code
>   3/3: integrate similarity detection into diffcore-rename
>
> The implementation is pretty basic, so I think there is room for
> code optimization (50% of the time is spent in hash lookups, so we might
> be able to micro-optimize that) as well as algorithmic improvements (like the
> sampling Andy mentioned).

For microoptimization, I was thinking that the hash tables could be
implemented without pointers per value (or memory allocation per
value), so everything is in a contiguous block of memory.   In C++ you
can do this trivially by declaring a small struct as the second
template parameter of the container; in C I guess you can simulate it
with a macro or something.

For the inverted indexing step, the values in the hash are going to be
quite small, especially if line_threshold=1.  Then you only need 2
integers for the left side and right side == 4 integers.  The integers
could just be indexes into the lists (like the current code uses).

For the count matrix step, the values are just going to be integers,
so storing it right in the hash table makes sense.

The sampling should be only necessary for binary files, I think.


> With these patches, I can get my monster binary diff down from about 2
> minutes to 17 seconds. And comparing all of linux-2.4 to all of
> linux-2.6 (similar to Andy's previous demo) takes about 10 seconds.

Hopefully that should be close to just reading the files off disk.
The algorithm should take a fraction of the time that simply reading
the files does, which presumably a git diff has to do.

I was timing that by comparing it to doing a "| xargs wc -l" on the
lists of files.


> There are a few downsides:
>   - the current implementation tends to give lower similarity values
>     compared to the old code (see discussion in 2/3), but this should be
>     tweakable
>   - on large datasets, it's more memory hungry than the old code because
>     the hash grows very large. This can be helped by bumping up the
>     binary chunk size (actually, the 17 seconds quoted above is using
>     256-byte chunks rather than 64-byte -- with 64-byte chunks, it's
>     more like 24 seconds) as well as sampling.
>   - no improvement on smaller datasets. Running "git-whatchanged -M
>     --raw -l0" on the linux-2.6 repo takes about the same time with the
>     old and new code (presumably the algorithmic savings of the new code
>     are lost in a higher constant factor, so when n is small, it is a
>     wash).

I think the old code tries to respect the cache as much as possible,
from what I can tell.  The new code has to use hash tables which are
unpredictable of course.  Though for smaller data sets I would expect
the hash table to fit in cache.  What's your definition of small here?
 Are you sure the old code isn't triggering one of the limits that was
there?

thanks,
Andy

^ permalink raw reply

* Re: remote#branch
From: Jakub Narebski @ 2007-10-31  0:12 UTC (permalink / raw)
  To: git
In-Reply-To: <20071030235823.GA22747@coredump.intra.peff.net>

Jeff King wrote:

> On Tue, Oct 30, 2007 at 12:38:27PM -0700, Linus Torvalds wrote:
> 
>> So if you want to follow the RFC, you'd better give a real reason. And no, 
>> the existence of an RFC, and the fact that people use the same name for 
>> things that superficially _look_ the same is not a reason in itself.
>> 
>> So hands up, people. Anybody who asked for RFC quoting. Give a damn 
>> *reason* already!
> 
> I didn't ask for RFC quoting, but a nice side effect of URL syntax is
> that they are machine parseable. If you wanted to write a tool to pick
> the URLs out of this email and clone them as git repos, then how do you
> find the end of:
> 
>   http://host/git repo with spaces in the path
> 
> compared to:
> 
>   http://host/git+repo+with+spaces+in+the+path
> 
> I don't know if that's worth changing anything in git (in fact, I'm not
> even clear on _what_ people want to change; the point of this discussion
> seems to be to argue about terminology). But you did ask for any reason
> for quoting URLs.

You use

  'http://host/git repo with spaces in the path'

Theoretically, we can follow what other CLI tools dealing with URLs do
(like wget, lynx, ...), i.e. assume that URL is _not_ RFC-escaped if it
is in quotes, and assume that URL is properly escaped if it is not quoted.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply

* Re: 1.5.3.5 will be out tomorrow
From: Linus Torvalds @ 2007-10-31  0:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vfxzsjgos.fsf@gitster.siamese.dyndns.org>



On Tue, 30 Oct 2007, Junio C Hamano wrote:
>
> A fix for a segfaulting bug warrants a new maintenance release,
> so 1.5.3.5 will be out tomorrow.

Is the 

	"Make merge-recursive honor diff.renamelimit"

commit in the maintenance series?

If not, I'd suggest merging it.

The lack of this fix bit us during the kernel x86 merge, where there was 
no way for people using stable git versions to make their merges take 
renames into account, because there were too many of them..

It's commit df3a02f6125f7ac82b6e81e3e32cd7ca3c7905ee by Lars Hjemli.

		Linus

^ permalink raw reply

* Re: [PATCH/RFC 0/3] faster inexact rename handling
From: Andy C @ 2007-10-31  0:27 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, git, Junio C Hamano
In-Reply-To: <20071030202014.GA22733@coredump.intra.peff.net>

On 10/30/07, Jeff King <peff@peff.net> wrote:
> On Tue, Oct 30, 2007 at 08:38:24AM -0700, Linus Torvalds wrote:
>
> > > with the old and new code. Pairs like Documentation/git-add-script.txt
> > > -> Documentation/git-add.txt are not found, because the file is composed
> > > almost entirely of boilerplate.
> >
> > Ok, that does imply to me that we cannot just drop boilerplate text,
> > because the fact is, lots of files contain boilerplate, but people still
> > think they are "similar".
>
> Well, the problem is that instead of just "dropping" boilerplate text,
> we fail to count it as a similarity, but it still counts towards the
> file size. It may be that just dropping it totally is the right thing
> (in which case those renames _will_ turn up, because they will be filled
> with identical non-boilerplate goodness).

Right, in the demo I make an extra pass after the inverted indexing
step to prune the index -- which means eliminating the common lines
*entirely* from the index (so they don't get attributed to a random
file) *and* decrementing all the file sizes by 1.  That way the
similarity scores shouldn't get skewed.

And as you mentioned we could bump the threshold from 1 to some other
small integer.  Intuitively I guess you could say it is common to copy
a file to 2 places or 3 places, and you don't want all the lines to
get thrown out because of that.  But usually you don't copy a file to
10 or 50 places.

Andy

^ permalink raw reply

* Re: 1.5.3.5 will be out tomorrow
From: Junio C Hamano @ 2007-10-31  0:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <alpine.LFD.0.999.0710301712240.30120@woody.linux-foundation.org>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, 30 Oct 2007, Junio C Hamano wrote:
>>
>> A fix for a segfaulting bug warrants a new maintenance release,
>> so 1.5.3.5 will be out tomorrow.
>
> Is the 
>
> 	"Make merge-recursive honor diff.renamelimit"
>
> commit in the maintenance series?
>
> If not, I'd suggest merging it.
>
> The lack of this fix bit us during the kernel x86 merge, where there was 
> no way for people using stable git versions to make their merges take 
> renames into account, because there were too many of them..

Ah, and that is especially painful now the rename limit is quite
low thanks to 0024a54923a12f8c05ce4290f5ebefab0cce4336 (Fix the
rename detection limit checking)?

Will cherry-pick that one -- thanks.

^ permalink raw reply

* Re: remote#branch
From: Martin Langhoff @ 2007-10-31  0:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Tso, Junio C Hamano, Jan Hudec, Johannes Schindelin,
	Petr Baudis, Paolo Ciarrocchi, git
In-Reply-To: <alpine.LFD.0.999.0710292150400.30120@woody.linux-foundation.org>

On 10/30/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Quick! WHO THE F*CK CARES?

Ah, damn. In all the discussion & flamefesting to say that people
don't want to use the # character, noone talks about of what cogito
used it for.

Having something functionally similar to

  cg-clone git://foo.tld/bar.git#blue

would save a few steps -- and some potential confusion -- for projects
using GIT.

In case it's not clear what it does (not everyone here has used
cogito) it will create and checkout a branch tracking the "blue" head
on the repo when the clone is done. This is _instead of_ creating and
checking out the branch that tracks the configured "HEAD" of the repo.

IMHO is a quite nice thing to have -- and AFAICS we don't have it in
master or pu. I care about the shed for the bike, not its colour.
cheers,


m

^ permalink raw reply

* Re: remote#branch
From: Linus Torvalds @ 2007-10-31  0:59 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: Theodore Tso, Junio C Hamano, Jan Hudec, Johannes Schindelin,
	Petr Baudis, Paolo Ciarrocchi, git
In-Reply-To: <46a038f90710301741n67526976vda1cd131270aa7f@mail.gmail.com>



On Wed, 31 Oct 2007, Martin Langhoff wrote:
> 
> Having something functionally similar to
> 
>   cg-clone git://foo.tld/bar.git#blue
> 
> would save a few steps -- and some potential confusion -- for projects
> using GIT.

I do agree with that "functionally similar", I just disagree with the 
syntax.

The thing is, we don't want a single branch name. Not for clone, not for 
fetch, not for pull, and not for push.

Yes, a single branch may be one common case, but it's definitely not the 
only one, and it's fundamentally the wrong thing to use as a definition of 
syntax. 

It's also the wrong thing to do for local stuff.

			Linus

^ permalink raw reply

* Re: remote#branch
From: Jeff King @ 2007-10-31  1:38 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <fg8h9l$b4n$1@ger.gmane.org>

On Wed, Oct 31, 2007 at 01:12:37AM +0100, Jakub Narebski wrote:

> > that they are machine parseable. If you wanted to write a tool to pick
> > the URLs out of this email and clone them as git repos, then how do you
> > find the end of:
> > 
> >   http://host/git repo with spaces in the path
> You use
> 
>   'http://host/git repo with spaces in the path'

...which is a quoting mechanism, and it's not even one commonly used in
emails (i.e., people have written "parse a URL from this text" scripts
for RFC-encoded URLs, but _not_ for shell quoting).

-Peff

^ permalink raw reply

* Re: remote#branch
From: Jeff King @ 2007-10-31  1:43 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: Linus Torvalds, Theodore Tso, Junio C Hamano, Jan Hudec,
	Johannes Schindelin, Petr Baudis, Paolo Ciarrocchi, git
In-Reply-To: <46a038f90710301741n67526976vda1cd131270aa7f@mail.gmail.com>

On Wed, Oct 31, 2007 at 01:41:12PM +1300, Martin Langhoff wrote:

> Having something functionally similar to
> 
>   cg-clone git://foo.tld/bar.git#blue
> 
> would save a few steps -- and some potential confusion -- for projects
> using GIT.
> 
> In case it's not clear what it does (not everyone here has used
> cogito) it will create and checkout a branch tracking the "blue" head
> on the repo when the clone is done. This is _instead of_ creating and
> checking out the branch that tracks the configured "HEAD" of the repo.

Actually, IIRC it won't fetch any of the non 'blue' refs.

Anyway, to recap (my impression of) the discussion leading up to this:
  - the cogito feature is useful
  - the cogito syntax does not allow for multiple branches to be
    specified
  - one such syntax proposed was git://foo.tld/bar.git#blue,red
  - one problem with that syntax is that comma is a valid character
    in the branch name, and '#' is a valid character in the repo name
  - one proposed solution was that '#' and ',' when used as data should
    be URL-encoded
  - flamefest begin

So I think nobody disagrees that such a feature is useful; there is
disagreement about the syntax.

-Peff

^ permalink raw reply

* Re: remote#branch
From: Jakub Narebski @ 2007-10-31  1:49 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20071031013856.GA23274@coredump.intra.peff.net>

Jeff King wrote:
> On Wed, Oct 31, 2007 at 01:12:37AM +0100, Jakub Narebski wrote:
> 
>>> that they are machine parseable. If you wanted to write a tool to pick
>>> the URLs out of this email and clone them as git repos, then how do you
>>> find the end of:
>>> 
>>>   http://host/git repo with spaces in the path
>>
>> You use
>> 
>>   'http://host/git repo with spaces in the path'
> 
> ...which is a quoting mechanism, and it's not even one commonly used in
> emails (i.e., people have written "parse a URL from this text" scripts
> for RFC-encoded URLs, but _not_ for shell quoting).

I don't think RFC-encoding is quoting mechanism used in emails, either.

-- 
Jakub Narebski
Poland

^ permalink raw reply

* Re: remote#branch
From: Martin Langhoff @ 2007-10-31  1:49 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20071031014347.GB23274@coredump.intra.peff.net>

On 10/31/07, Jeff King <peff@peff.net> wrote:
> Actually, IIRC it won't fetch any of the non 'blue' refs.

You recall correctly, and that was a cogito misfeature. I don't think
git should follow that part of the spec ;-)

> Anyway, to recap (my impression of) the discussion leading up to this:
>   - the cogito feature is useful
...
>   - flamefest begin

Great summary. I read the first and last stages you describe (with a
trip in the middle distracting me). Heh.

No stress. Let the flames continue!


m

^ permalink raw reply

* Re: remote#branch
From: Jeff King @ 2007-10-31  1:57 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <200710310249.17233.jnareb@gmail.com>

On Wed, Oct 31, 2007 at 02:49:16AM +0100, Jakub Narebski wrote:

> > ...which is a quoting mechanism, and it's not even one commonly used in
> > emails (i.e., people have written "parse a URL from this text" scripts
> > for RFC-encoded URLs, but _not_ for shell quoting).
> 
> I don't think RFC-encoding is quoting mechanism used in emails, either.

That's funny, because I have hundreds of mails where that is the case,
and none where people used shell-quoting.  Most URLs don't _need_ any
encoding, so we don't notice either way. But are you honestly telling me
that if you needed to communicate a URL with a space via email, you
would write:

  'http://foo.tld/url with a space'

rather than:

  http://foo.tld/url+with+a+space

?

I think the latter is much more common, if only because of the fact that
copy and paste from most browsers' location bars gives the encoded
version.

-Peff

^ permalink raw reply

* Re: remote#branch
From: Jeff King @ 2007-10-31  1:59 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git
In-Reply-To: <46a038f90710301849h1c31736an1ec163aa1e274577@mail.gmail.com>

On Wed, Oct 31, 2007 at 02:49:49PM +1300, Martin Langhoff wrote:

> > Actually, IIRC it won't fetch any of the non 'blue' refs.
> You recall correctly, and that was a cogito misfeature. I don't think
> git should follow that part of the spec ;-)

I'm not so sure. Junio keeps unrelated branches in git.git like 'html'
and 'todo'. Is it unreasonable to say "clone git.git, but only the todo
branch" and expect it _not_ to download the entire git history?

-Peff

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox