Git development

Git development
 help / color / mirror / Atom feed

* Re: git-svn and huge data and modifying the git-svn-HEAD branch directly
From: Martin Langhoff @ 2006-03-01 21:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Andreas Ericsson, Eric Wong, git
In-Reply-To: <7virqyf094.fsf@assigned-by-dhcp.cox.net>

On 3/2/06, Junio C Hamano <junkio@cox.net> wrote:
> Linus Torvalds <torvalds@osdl.org> writes:
>
> > But if somebody does the get_sha1() magic, and Junio agrees, then I think
> > it would be a great thing to do.
>
> I am inclined to agree here.

Aren't we doing a lot of work (changes in core git, and corresponding
changes in the porcelain) when simple changes in porcelain would
suffice? Let's imagine that

 - git-commit refuses to commit to a head that has a corresponding
remote (cg-commit does this already with heads that match something in
'branches')
 - git-$SCMimport scripts generate a semi-bogus remotes/headname entry
 - git-pull/push can spot and ignore the semi-bogus remotes/headname entry
 - this means that `touch remotes/foo` is now a cheap way of making
the head readonly
 - depending on the git-$SCMimport script, the remotes/headname file
can perhaps contain useful configuration data for the import, so
git-$SCMimport headname does the right thing.

cheers,


martin

^ permalink raw reply

* Re: git-svn and huge data and modifying the git-svn-HEAD branch directly
From: Linus Torvalds @ 2006-03-01 21:28 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Andreas Ericsson, Eric Wong, Martin Langhoff, git
In-Reply-To: <200603012126.30797.Josef.Weidendorfer@gmx.de>



On Wed, 1 Mar 2006, Josef Weidendorfer wrote:
> 
> So the get_sha1() magic should map "origin" to "remote/origin/master" (or instead
> hardcoded master the remote branch from the first "Pull:" line) ?

Right.

> The ambiguity here would be that shortcut names of remote repositories should not be
> used as tag or head names...

Well, it's not so much an ambiguity, since we'd always try tags and heads 
first. So it's just a fallback, the same way the short SHA1 hash is a 
fallback.

> I think a big plus of this would be that gitk can show branches tracking remote ones
> with another color.

Yes. And with a meaningful name.

> To be able to say "git log origin.." you need the above magic, too.

It would all come automagically from just extending get_sha1().

(Actually, technically you'd put it at the end of "get_sha1_basic()")

		Linus

^ permalink raw reply

* Re: What's in git.git
From: Nicolas Pitre @ 2006-03-01 21:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vmzgagxox.fsf@assigned-by-dhcp.cox.net>

On Wed, 1 Mar 2006, Junio C Hamano wrote:

>   These are waiting for further progress by authors:
> 
>   - delta packer updates for tighter packs (Nicolas Pitre)

Please don't wait to merge the first two patches to diff-delta.c.  They 
are purely cleanups with no functional differences.

Nicolas

^ permalink raw reply

* Re: impure renames / history tracking
From: Paul Jakma @ 2006-03-01 21:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Andreas Ericsson, git list
In-Reply-To: <7v3bi2ey63.fsf@assigned-by-dhcp.cox.net>

Hi Junio,

On Wed, 1 Mar 2006, Junio C Hamano wrote:

> Interestingly enough, there are two levels of "rename tracking" the 
> current git does.  Whey you run "git whatchanged -M", you are 
> looking at renames between each commit in the commit chain, one 
> step at a time.  There as long as the rename+rewrite does not 
> amount to too much rewrite, you would see what should be detected 
> as rename to be detected as renames.

Right.

> I found the current default threshold parameters to be about right, 
> maybe a bit too tight sometimes, though.  If you want to loosen the 
> default, you can specify similiarity index after -M.

That's one option.

I'm wondering though if we couldn't also allow for users to 
additionally encode naming 'hints', to aid this 'similarity' 
detection process.

> The way recursive merge strategy uses the rename detection, unlike 
> what whatchanged shows you, does not use chains of commits down to 
> the common merge base in order to detect renames (my recollection 
> may be wrong here -- it's a while since I looked at the recursive 
> merge the last time).  It just looks at the two heads being merged, 
> and detects similarility between them.  So it does not make _any_ 
> difference with the current implementation of recursive merge if 
> you kept a history full of "honest but disgusting" commits or 
> collapsed them into a history with small number of "cleaned up" 
> commits.

I'm going to have to stare at this paragraph a lot longer and harder 
to understand it :).

> One thing it _could_ do (and you _could_ implement as another merge 
> strategy and call it "pauls-rename" merge) is to follow the commit 
> chain one by one down to the common merge base from both heads 
> being merged, and analyze rename history on the both commit chains.

Right, I was just thinking that while making tea actually. This could 
be part of the 'collapsing' process. (or call it "coalesce 
too-detailed commits" process if that is less offensive to ones sense 
of process ;) ).

Actually, you're sort of suggesting following the chains in parallel, 
right? Ie in wall-clock time order, rather than chain order. And 
doing name resolution across the 'to-be-merged' chains at each step 
of the way? Sort of a lesser subset of how other SCMs maintain state 
for names globally?

It's not so much /resolving/ names I'm worried about in the first 
place. It's there simply being no information in the first place to 
indicate (from one single-parent commit to the next) which names were 
renamed.

> Then, you would get better rename+rewrite detection than what it 
> currently does.

But if I follow the commit chain in order to try extract

> HOWEVER.

> If you have that kind of rename-following merge, a workflow that 
> collapses a useful history into a single huge commit "Ok, this 
> commit is a roll-up patch between version 2.6.14 and 2.6.15" 
> becomes far less attractive than it currently already is.  At that 
> point, you _are_ throwing away useful history.

Yes, I agree. And I am, as part of arguing git's case (several SCMs 
are being evaluated and considered, I'm the git proponent at the 
moment), I'm going to suggest workflow ought to be re-evaluated to 
ensure it is generally reasonable, rather than be kept for the sake 
of it keeping (particularly as it may be tailored to the 
needs/limitations of $TRADITIONAL_SCM).

However, I suspect at least some level of collapsing will be desired 
(just as it is with Linux and git).

The workflow issue is seperate from the 'impure rename' issue though, 
even if the workflow I gave as an example excerbates the issue, 
"rename and rewrite half of it" and hard-to-detect renames can still 
occur in the detailed git/linux workflows, surely?

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
If you really knew C++, you wouldn't even joke about putting it
in the kernel.

 	- Richard Johnson on linux-kernel

^ permalink raw reply

* Re: [PATCH 1/2] Let git-svnimport's author file use same syntax as git-cvsimport's
From: Jon Loeliger @ 2006-03-01 21:19 UTC (permalink / raw)
  To: Karl Hasselström; +Cc: Git Mailing List
In-Reply-To: <20060227230814.12298.63006.stgit@backpacker.hemma.treskal.com>

On Mon, 2006-02-27 at 17:08, Karl Hasselström wrote:

> diff --git a/Documentation/git-svnimport.txt b/Documentation/git-svnimport.txt
> index e0e3a5d..912a808 100644
> --- a/Documentation/git-svnimport.txt
> +++ b/Documentation/git-svnimport.txt
> @@ -75,9 +75,9 @@ When importing incrementally, you might 
>  -A <author_file>::
>  	Read a file with lines on the form
>  
> -	  username User's Full Name <email@addres.org>
> +	  username = User's Full Name <email@addr.es>
>  
> -	and use "User's Full Name <email@addres.org>" as the GIT
> +	and use "User's Full Name <email@addr.es>" as the GIT
>  	author and committer for Subversion commits made by
>  	"username". If encountering a commit made by a user not in the
>  	list, abort.

Actually, I believe that "example.com" was reserved
specifically for instances such as this.

See:
    http://www.faqs.org/rfcs/rfc2606.html

jdl

^ permalink raw reply

* Re: git-svn and huge data and modifying the git-svn-HEAD branch directly
From: Johannes Schindelin @ 2006-03-01 21:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andreas Ericsson, Eric Wong, Martin Langhoff, git
In-Reply-To: <Pine.LNX.4.64.0603010821590.22647@g5.osdl.org>

Hi,

On Wed, 1 Mar 2006, Linus Torvalds wrote:

> On Wed, 1 Mar 2006, Andreas Ericsson wrote:
> > 
> > Personally I'm all for namespace separation. I'm assuming the script 
> > has the tracker-branch hardcoded anyway, so I don't really understand 
> > why it would be necessary to keep other refs in a separate directory 
> > and, if it *is* necessary, why that subdirectory can't be 
> > .git/refs/heads/svn.
> > 
> > Eric mentioned earlier that the tracking-branch can't be committed to 
> > (ever), so the user convenience for searching other directories should 
> > be nearly non-existant.
> 
> The thing about it being .git/refs/heads/svn/xyzzy is that then you can 
> do
> 
> 	git checkout svn/xyzzy
> 
> _not_ a branch and you must _not_ commit to it.
> 
> It's much more like a tag: it's a pointer to the last point of an 
> svn-import.
> 
> So I think it should either _be_ a tag (although Dscho worries about some 
> broken porcelain being confused by tags changing) or it should be in a 
> namespace all it's own. Not under .git/refs/heads/ at any point, because 
> it is _not_ a head of development.

I almost missed that you reference me in the email (often, I just delete 
the email if the Subject is of no interest to me).

I did not worry about broken porcelain. I saw broken porcelain. But that 
is more a broken concept than broken porcelain: in a distributed 
environment, there is no way to have a reliable tag. Think about it: 
whenever you have two different versions of a tag, you cannot know which 
one is the correct one.

But my worries do not matter at all for local tags.

Conceptually, however, the last point of a svnimport should *never* be a 
tag, but *always* a head.

Ciao,
Dscho

^ permalink raw reply

* Re: git-svn and huge data and modifying the git-svn-HEAD branch directly
From: Josef Weidendorfer @ 2006-03-01 20:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7virqyf094.fsf@assigned-by-dhcp.cox.net>

On Wednesday 01 March 2006 20:11, Junio C Hamano wrote:
> The latter at first sounds sane, but it has a subtle issue,
> which was what bitten me previously between heads/ and tags/.
> In that broken version, if you have a head called "dead" and a
> tag with the same name, neither was taken ("they are not unique,
> so do not take either!") and we ended up finding an object whose
> SHA1 name began with those two bytes 0xDE 0xAD.  I do not think
> this has happened in the field, fortunately, but it would have
> been quite hard to diagnose.
> 
> So if we were to do it, I would say do the latter, but be very
> careful to make sure you fail the whole get_sha1() when you bail
> out of the "try possible prefixes" codepath because of
> ambiguity.

Yes.
Any ambiguity is a source of confusion and user error. Better
bail out. If it is not a performance problem, it would be better
to integrate the check for abbreviated object name into the
ambiguity analysis, and not have 2 stages of searching.
It probably would be a good idea to print out the ambigous names
with the error message, so that you can copy&paste the correct
full name afterwards.

If we go for the .git/refs/remotes/... and have an ambiguity becaues
of remote shortcut names, a error message pointing at a "git-rename-remote"
command would be handy, allowing the user to cleanup the namespace.

> There may be other issues involved, but I wouldn't 
> know -- I reverted the "do not take either if they are
> ambiguous between heads/ and tags/" patch primarily because of
> the reason from the above paragraph, but also did not want to
> deal with any other potential issues to keep my sanity ;-).

I think the real problem here is that names like "dead" can be interpreted
as abbreviated object name. When you introduce such a name as head or tag,
you have a potential ambiguity which can get real at any time.
Perhaps it would be good to print out a warning when the user is about to
create a head or tag name which can be interpreted as abbreviated object name?

Josef

^ permalink raw reply

* Re: git-svn and huge data and modifying the git-svn-HEAD branch directly
From: Josef Weidendorfer @ 2006-03-01 20:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andreas Ericsson, Eric Wong, Martin Langhoff, git
In-Reply-To: <Pine.LNX.4.64.0603011023080.22647@g5.osdl.org>

On Wednesday 01 March 2006 19:25, Linus Torvalds wrote:
> > 	git log origin/master..
> > 
> > is really not that bad
> 
> It really is.
> 
> Think like a user. If I pull from "origin", then the name of that thing is 
> "origin", not "origin/master" or "o/master". A user doesn't care what the  
> remote branch name is - the whole _point_ of the .git/remotes/xyzzy file 
> is to give a short description that includes the names of the branches you 
> pull from.

So the get_sha1() magic should map "origin" to "remote/origin/master" (or instead
hardcoded master the remote branch from the first "Pull:" line) ?
The ambiguity here would be that shortcut names of remote repositories should not be
used as tag or head names...

I think a big plus of this would be that gitk can show branches tracking remote ones
with another color.

> The good news is that "get_sha1()" shouldn't be thse at hard to extend on. 
> Just add a case at the end that says "do we have a .git/remotes/%s file, 
> and if so, parse it".

To be able to say "git log origin.." you need the above magic, too.

Josef

^ permalink raw reply

* Re: [PATCH 2/2] git-log (internal): more options.
From: Junio C Hamano @ 2006-03-01 20:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0603010730520.22647@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> Most helpers that want a list of commits probably want the printing 
> options too, and the ones that do not probably simply don't care (ie if 
> they silently pass a "--pretty=raw" without it affecting anything, who 
> really cares?)

Perhaps (meaning, agree in general but not 100% convinced and
haven't made up my mind yet).

> I can actually imagine using "--parents" as a way of parsing both the 
> commit log and the history. Of course, any such use is likely in a script, 
> at which point the script probably doesn't actually want "git log", but 
> just a raw "git-rev-list".

Yes, that is exactly why I did not see why "log viewer" wants --parents.

> To me, the question whether a flag would be parsed in the "revision.c" 
> library or in the "rev-list.c" binary was more a question of whether that 
> flag makes sense for other things than just "git log". 

Good to know we are in agreement (iow, I wasn't totally off the
mark) that revision.c should handle things that are common.
That means:

 * --bisect and --parents are for scripted use only and do not
   concern log viewer, so we would leave them in rev-list.

 * --header is good for anything that shows more than one
   record, so it may be worthwhile to have it in generic.

> For example, "git whatchanged" and "git diff" could both use 
> setup_revision(), although "git diff" wouldn't actually _walk_ the 
> revisions (it would just look at the "revs->commits" list to see what was 
> passed in).
>
> "git whatchanged" would obviously take all the same flags "git log" does, 
> and "git diff" could take them and just test the values for sanity (ie 
> error out if min/max_date is not -1, for example).

Perhaps.

> "git show" is like a "git-whatchanged" except it wouldn't walk the diffs 
> (I considered adding a "--nowalk" option to setup_revisions(), which would 
> just suppress the "add_parents_to_list()" entirely)

Umm.  The current "git show -4" walks and I find the behaviour
useful.  They are the same program with different defaults.

^ permalink raw reply

* increase in sexual desire
From: Samantha Sands @ 2006-03-01 20:04 UTC (permalink / raw)
  To: geogirb

Carefully chosen herbal ingredients are the key to peniis enlargement success. 
Not only the precise blend of ingredients but also many other factors have effect
on the overall potency and strength of peniis enlargement formula.

Some of these factors include growing conditions, geographical location where herbs are grown, 
harvest time, the way herbs are stored before processing, the way herbs are processed. 

http://aceldgkbfhijm.planetfaktor.com/?bfhijmxwqowyacelzppdgk

y6s

^ permalink raw reply

* Re: impure renames / history tracking
From: Junio C Hamano @ 2006-03-01 19:56 UTC (permalink / raw)
  To: paul; +Cc: Andreas Ericsson, git list
In-Reply-To: <Pine.LNX.4.64.0603011851430.13612@sheen.jakma.org>

Paul Jakma <paul@clubi.ie> writes:

> For sake of argument assume the workflow corresponds to:
>
>     o-o-o-o---o--o
>    /              \
> --o----------------m->
>
> And collapsing just the 'oops, made a typo' commits so it looks like:
>
>     o-----o------o
>    /              \
> --o----------------m->
>
>
> The /real/ point, other than workflow, is:
>
> - can we track 'rename and rewrite'?

Yes.  Especially the collapsing is 'oops, made a typo' kind.

Interestingly enough, there are two levels of "rename tracking"
the current git does.  Whey you run "git whatchanged -M", you
are looking at renames between each commit in the commit chain,
one step at a time.  There as long as the rename+rewrite does
not amount to too much rewrite, you would see what should be
detected as rename to be detected as renames.  I found the
current default threshold parameters to be about right, maybe a
bit too tight sometimes, though.  If you want to loosen the
default, you can specify similiarity index after -M.

The way recursive merge strategy uses the rename detection,
unlike what whatchanged shows you, does not use chains of
commits down to the common merge base in order to detect renames
(my recollection may be wrong here -- it's a while since I
looked at the recursive merge the last time).  It just looks at
the two heads being merged, and detects similarility between
them.  So it does not make _any_ difference with the current
implementation of recursive merge if you kept a history full of
"honest but disgusting" commits or collapsed them into a history
with small number of "cleaned up" commits.

One thing it _could_ do (and you _could_ implement as another
merge strategy and call it "pauls-rename" merge) is to follow
the commit chain one by one down to the common merge base from
both heads being merged, and analyze rename history on the both
commit chains.  Then, you would get better rename+rewrite
detection than what it currently does.

HOWEVER.

If you have that kind of rename-following merge, a workflow that
collapses a useful history into a single huge commit "Ok, this
commit is a roll-up patch between version 2.6.14 and 2.6.15"
becomes far less attractive than it currently already is.  At
that point, you _are_ throwing away useful history.

^ permalink raw reply

* Re: impure renames / history tracking
From: Paul Jakma @ 2006-03-01 19:13 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Andreas Ericsson, git list
In-Reply-To: <46a038f90603011005m68af7485qfdfffb9f82717427@mail.gmail.com>

On Thu, 2 Mar 2006, Martin Langhoff wrote:

> The moment you 'merge' by using git-diff | patch you lose all the 
> support git gives you, because you are discarding all of git's 
> metadata! git's metadata is about all the commits you are merging, 
> and is good enough that it will help future merges across renames.

> You should really use git-pull/git-merge at that point.

Let's try not get stuck on the workflow.

I probably shouldn't have brought it up. However, just assume it's 
been decided that 'detail' of the project implementation is too much 
clutter for the 'master'. I note that people do this already even in 
the "keep all the details" Linux and Git workflows, where they 
rejiggle commits in order to cut-out 'oops, made a typo' type of 
commits.

So the level of detail that is suitable is for 'merging upstream' 
clearly is arbitrary and subjective, and even with git and Linux that 
knob already is set past 0 (all detail), maybe to 1 - the workflow 
I'm thinking of has it set to (say) 2.

For sake of argument assume the workflow corresponds to:

     o-o-o-o---o--o
    /              \
--o----------------m->

And collapsing just the 'oops, made a typo' commits so it looks like:

     o-----o------o
    /              \
--o----------------m->

The /real/ point, other than workflow, is:

- can we track 'rename and rewrite'?

> And you can modify your practices ever so slightly to match the
> benefits of the old model:

I agree completely on the workflow argument, I intend to make it to 
the project concerned ;).

> And what I've found, managing a project with 13K files, is that in 
> practice git does far better tracking renames than several SCMs 
> that do explicit tracking. Don't be distracted by the 'we don't 
> track renames posturing'. We do, and it's so magic that it just 
> works.

Yep, I know. :).

I just wonder if that magic could use additional hints (*not* Attic/ 
type stuff, ick ye gods no! Agree fully there!). Cause 'rename and 
rewrite' it just does not get right.

Simplest test-case (simulating 'rename and rewrite half the file') 
is:

- create a one-line file
- commit to git
- mv it and add a line

To show:

$ git status
nothing to commit
$ cat test
foo
$ git-mv test toast
$ echo bar >> toast
$ git-update-index toast
$ git status
#
# Updated but not checked in:
#   (will commit)
#
#       deleted:  test
#       new file: toast
#

A year later, someone comes along and looks at the history for 
'toast', they'll never know they can look back further by following 
'test'.

I'd like to fix the above somehow, possibly by adding 'renamed test 
toast' meta-data to index cache and commit objects. Having git-mv / 
git-cp add that meta-data.

Then diffcore using that meta-data as /advisory/ and auxilliary 
information *only* in /helping/ to determining renames, as an 
additional input to its existing heuristics. This meta-data would not 
be intrinsic to the operation git, it would /only/ be to aid humans 
(or their tools rather) in tracking back/forward through history.

Would that be the best way to explore solving the above problem?

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
Human resources are human first, and resources second.
 		-- J. Garbers

^ permalink raw reply

* Re: git-svn and huge data and modifying the git-svn-HEAD branch directly
From: Junio C Hamano @ 2006-03-01 19:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andreas Ericsson, Eric Wong, Martin Langhoff, git
In-Reply-To: <Pine.LNX.4.64.0603010935201.22647@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> But if somebody does the get_sha1() magic, and Junio agrees, then I think 
> it would be a great thing to do.

I am inclined to agree here.

Some caveats upfront, though.

Since I was bitten at least once by attempting get_sha1() to
deal with ambiguous names (the issue was between heads and tags
but I think there are similar issues here) I am really reluctant
to have the function look at anywhere other than heads/ and
tags/ without explicit prefix.

Currently get_sha1_basic() says:

	* look in $GIT_DIR with these prefixes in turn and take
          the first match: "", "refs", "refs/tags", "refs/heads".

The extended one _would_ in addition say one of these things:

	* if none of the above prefixes work, try other
          directories under refs/ as prefixes and take the first
          match.

	or

	* if none of the above prefixes work, try other
          directories under refs/ as prefixes and if there is a
          unique match take it.  If there are more than one
          match, do not take either.

In the context of get_sha1(), get_sha1_basic() is used like
this:

	* if get_sha1_basic() finds an answer, use it.
          Otherwise see if it is an abbreviated object name.

The behaviour of a naive implementation of the former would
depend on readdir() and traversal order, which makes (from the
end user's point of view) a hard to understand confusion that is
not reproducible.  Another repository cloned from such would
even give you different answers.

The latter at first sounds sane, but it has a subtle issue,
which was what bitten me previously between heads/ and tags/.
In that broken version, if you have a head called "dead" and a
tag with the same name, neither was taken ("they are not unique,
so do not take either!") and we ended up finding an object whose
SHA1 name began with those two bytes 0xDE 0xAD.  I do not think
this has happened in the field, fortunately, but it would have
been quite hard to diagnose.

So if we were to do it, I would say do the latter, but be very
careful to make sure you fail the whole get_sha1() when you bail
out of the "try possible prefixes" codepath because of
ambiguity.  There may be other issues involved, but I wouldn't
know -- I reverted the "do not take either if they are
ambiguous between heads/ and tags/" patch primarily because of
the reason from the above paragraph, but also did not want to
deal with any other potential issues to keep my sanity ;-).

^ permalink raw reply

* Re: impure renames / history tracking
From: Paul Jakma @ 2006-03-01 18:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andreas Ericsson, git list
In-Reply-To: <Pine.LNX.4.64.0603010859200.22647@g5.osdl.org>

Hi Linus,

On Wed, 1 Mar 2006, Linus Torvalds wrote:

> The thing is, it does better than anything that _tries_ to be 
> "reliable".
>
> I can pretty much _guarantee_ that you can't do it better.

I'm willing to take that argument to the 'project' concerned, I just 
need to be pretty sure of it.

> Tracking "inodes" - aka file identities - (which is what BK does, 
> and I assume what SVN does) is fundamentally problematic. I 
> particular, it's a horrible problem when two inodes "meet" under 
> the same name. You now have two identities for the same file, and 
> you're fundamentally screwed.

Yes, in that model it is. This interestingly, is not the BK model, I 
suspect (see below).

> It doesn't even need renames to be a problem. JUST THE FACT THAT 
> YOU TRY TO TRACK FILE "IDENTITY" HISTORY IS BROKEN.

If it's "file identity" globally across the lifetime of the project, 
I agree 100% per cent. The 'traditional' SCM concerned does this.

That's not what a solution I'd want to explore either, I'm only 
interested in the identity of files for any one /one/ commit. In 
saying that, I recognise it's pointless to try annotate file-change 
information in multi-parent commits (merges).

> For example, take CVS, which doesn't actually try to do renames, 
> but _does_ try to track the identity of a file, since all the 
> history is tied into that identity: think about what happens in 
> Attic when a file is deleted. Completely broken model.

ACK, {Attic,deleted_files}/ is just horrid.

> And that's really fundamental. CVS doesn't show the problems so 
> much, because CVS actively tries to make it hard to do these 
> things.

ACK.

> With renames-tracking-file-identities, it's _really_ easy to get 
> some major confusion going. What happens when one branch creates a 
> file, and another one renames a file to that same name, and they 
> merge?

Well, the conflict has to be resolved somehow, even today.

> Don't tell me it doesn't happen. It happened under BK. The way BK 
> "solved" it was to keep the two separate identities: one of them 
> got resolved to the new filename, the other one went into the 
> "deleted" directory.

Right. That's what the 'traditional workflow' SCM I'm thinking of 
does - not BK funnily enough, but an SCM predating BK which also 
happens to use SCCS files, and with some of the same high-level 
push/pull constructs as BK (interestingly).

It also tracks name history globally using a deleted_files/ history, 
which is maintained, but I don't think it does this for name merges 
like the above.

In the one I'm thinking of, it does (I /think/, I'm not an expert in 
it) the following:

Given two files, say:

'old:

1.1---1.2---1.3

new:

1.1

- constructs a 'fake' base SCCS revision, empty
- adds the top 'old' version as a branch
- adds the top new version as a new delta

    1.1.1.1
   /
1.1---------1.2

Where in the merged file:

 	1.1: empty
 	1.1.1.1: was 1.3 from 'old'
 	1.2: is 1.1 from 'new'

However, it does /not/ create a deleted_files entry for the 'old' 
file. (AFAICT - I may not have a sufficiently full understanding of 
this SCM)

> Guess what happens when the side that got merged into "deleted" 
> continues to edit the file? That's right - their edits happen on 
> the deleted file, and never show up in the real tree in a 
> subsequent merge ever again.

Indeed - horrid.

> And as far as I can tell, BK really did the best you can do. 
> Following file identities really _is_ fundamentally broken. It 
> sounds like a nice idea, but while you migth solve a few problems, 
> you create a whole raft of much more fundamental problems.

For tracking identity across more than one commit - I fully agree.

That's not what quite I'm thinking of though. Is it worth going on 
with the discussion on a:

 	 'track identities *only* from context of /the/ parent to
           this commit'

> So next time you think about a merge that migt have been improved 
> by tracking renames, please also think about a merge where one of 
> the filenames came from two or more different sources through an 
> earlier merge, and thank your benevolent Gods that they instructed 
> me to make git be based purely on file contents.

Oh, I agree muchely here.

I wouldn't change git. I only wonder if it give its rename-heuristics 
an additional advisory-only hint? (for single-parent commits at least 
- never merges - and only on a per-commit basis).

I probably should first explore how git deals with rename clashes..

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
I'm glad I was not born before tea.
 		-- Sidney Smith (1771-1845)

^ permalink raw reply

* Re: git-svn and huge data and modifying the git-svn-HEAD branch directly
From: Linus Torvalds @ 2006-03-01 18:25 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Andreas Ericsson, Eric Wong, Martin Langhoff, git
In-Reply-To: <200603011906.33433.Josef.Weidendorfer@gmx.de>

On Wed, 1 Mar 2006, Josef Weidendorfer wrote:
>
> On Wednesday 01 March 2006 18:40, Linus Torvalds wrote:
> > But if somebody does the get_sha1() magic, and Junio agrees, then I think 
> > it would be a great thing to do.
> 
> Yes.
> 
> 	git log origin/master..
> 
> is really not that bad

It really is.

Think like a user. If I pull from "origin", then the name of that thing is 
"origin", not "origin/master" or "o/master". A user doesn't care what the 
remote branch name is - the whole _point_ of the .git/remotes/xyzzy file 
is to give a short description that includes the names of the branches you 
pull from.

The good news is that "get_sha1()" shouldn't be that hard to extend on. 
Just add a case at the end that says "do we have a .git/remotes/%s file, 
and if so, parse it".

				Linus

^ permalink raw reply

* [PATCH] git-mv: fixes for path handling
From: Josef Weidendorfer @ 2006-03-01 18:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Moving a directory ending in a slash was not working as the
destination was not calculated correctly.
E.g. in the git repo,

 git-mv t/ Documentation

gave the error

 Error: destination 'Documentation' already exists

To get rid of this problem, strip trailing slashes from all arguments.
The comment in cg-mv made me curious about this issue; Pasky, thanks!
As result, the workaround in cg-mv is not needed any more.

Also, another bug was shown by cg-mv. When moving files outside of
a subdirectory, it typically calls git-mv with something like

 git-mv Documentation/git.txt Documentation/../git-mv.txt

which triggers the following error from git-update-index:

 Ignoring path Documentation/../git-mv.txt

The result is a moved file, removed from git revisioning, but not
added again. To fix this, the paths have to be normalized not have ".."
in the middle. This was already done in git-mv, but only for
a better visual appearance :(

Signed-off-by: Josef Weidendorfer <Josef.Weidendorfer@gmx.de>

---

 git-mv.perl |   24 +++++++++++++-----------
 1 files changed, 13 insertions(+), 11 deletions(-)

15d94ce0807c1d99d10f6c3ddd32963b1ac0fece
diff --git a/git-mv.perl b/git-mv.perl
index 8cd95c4..9b43dcc 100755
--- a/git-mv.perl
+++ b/git-mv.perl
@@ -31,11 +31,12 @@ chomp($GIT_DIR);
 my (@srcArgs, @dstArgs, @srcs, @dsts);
 my ($src, $dst, $base, $dstDir);
 
+# remove any trailing slash in arguments
+for (@ARGV) { s/\/*$//; }
+
 my $argCount = scalar @ARGV;
 if (-d $ARGV[$argCount-1]) {
 	$dstDir = $ARGV[$argCount-1];
-	# remove any trailing slash
-	$dstDir =~ s/\/$//;
 	@srcArgs = @ARGV[0..$argCount-2];
 	
 	foreach $src (@srcArgs) {
@@ -61,6 +62,16 @@ else {
     $dstDir = "";
 }
 
+# normalize paths, needed to compare against versioned files and update-index
+# also, this is nicer to end-users by doing ".//a/./b/.//./c" ==> "a/b/c"
+for (@srcArgs, @dstArgs) {
+    s|^\./||;
+    s|/\./|/| while (m|/\./|);
+    s|//+|/|g;
+    # Also "a/b/../c" ==> "a/c"
+    1 while (s,(^|/)[^/]+/\.\./,$1,);
+}
+
 my (@allfiles,@srcfiles,@dstfiles);
 my $safesrc;
 my (%overwritten, %srcForDst);
@@ -79,15 +90,6 @@ while(scalar @srcArgs > 0) {
     $dst = shift @dstArgs;
     $bad = "";
 
-    for ($src, $dst) {
-	# Be nicer to end-users by doing ".//a/./b/.//./c" ==> "a/b/c"
-	s|^\./||;
-	s|/\./|/| while (m|/\./|);
-	s|//+|/|g;
-	# Also "a/b/../c" ==> "a/c"
-	1 while (s,(^|/)[^/]+/\.\./,$1,);
-    }
-
     if ($opt_v) {
 	print "Checking rename of '$src' to '$dst'\n";
     }
-- 
1.2.0.g719b

^ permalink raw reply related

* Re: git-svn and huge data and modifying the git-svn-HEAD branch directly
From: Josef Weidendorfer @ 2006-03-01 18:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andreas Ericsson, Eric Wong, Martin Langhoff, git
In-Reply-To: <Pine.LNX.4.64.0603010935201.22647@g5.osdl.org>

On Wednesday 01 March 2006 18:40, Linus Torvalds wrote:
> But if somebody does the get_sha1() magic, and Junio agrees, then I think 
> it would be a great thing to do.

Yes.

	git log origin/master..

is really not that bad. And if somebody complains about typing, git-clone
could get an option "--remote-name=o" to allow for

	git log o/master..

Josef

^ permalink raw reply

* Re: impure renames / history tracking
From: Martin Langhoff @ 2006-03-01 18:05 UTC (permalink / raw)
  To: paul; +Cc: Andreas Ericsson, git list
In-Reply-To: <Pine.LNX.4.64.0603011558390.13612@sheen.jakma.org>

On 3/2/06, Paul Jakma <paul@clubi.ie> wrote:
> I mean:
>
>         $ git checkout project
>         $ git pull . master
>         $ git checkout -b tmp project
>         $ git diff project..master | <git apply I think>

The moment you 'merge' by using git-diff | patch you lose all the
support git gives you, because you are discarding all of git's
metadata! git's metadata is about all the commits you are merging, and
is good enough that it will help future merges across renames.

You should really use git-pull/git-merge at that point.

My guess is that you do this to achieve what you describe later:

> Presume that 'project' in the workflow is defined as
>
>         "achieve one goal with one commit to the master"
>
> So by definition, it always correct that the project only ever has
> one commit.

What happens if you rephrase that to read: "achieve one goal with one
merge to the master"? Long term, it gives you much better support from
the SCM. If a particular commit broke something, you can use
whatchanged, log, annotate and bisect to figure out in which /small/
commit things went astray.

And you can modify your practices ever so slightly to match the
benefits of the old model:

 - force merge message editing in git-merge, and prepare appropriate
commit messages for your merges
 - write a modified git-log that displays only the merges to master

that way, you get the best of both worlds.

> The trouble is that /sometimes/ projects do indeed 'rename and
> rewrite' a file. At present, chances are git might not notice this,

It will, if you preserve git's metadata.

The thing is that with any scm that tracks metadata of some kind, the
moment you bypass its tools and do diff|patch to discard the
metadata... well, you lose its benefits...

And what I've found, managing a project with 13K files, is that in
practice git does far better tracking renames than several SCMs that
do explicit tracking. Don't be distracted by the 'we don't track
renames posturing'. We do, and it's so magic that it just works.

cheers,

^ permalink raw reply

* Re: [PATCH] Teach git-checkout-index to read filenames from stdin.
From: Shawn Pearce @ 2006-03-01 17:56 UTC (permalink / raw)
  To: Christopher Faylor; +Cc: git
In-Reply-To: <20060301155053.GC1010@trixie.casa.cgf.cx>

Christopher Faylor <me@cgf.cx> wrote:
> AFAIK, the length of the command line for cygwin apps is very large --
> if you're using recent versions of Cygwin.  I believe that it is longer
> than the linux default.  We bypass the Windows mechanism for setting the
> command line when a cygwin program starts a cygwin program.
> 
> For native Windows programs, the command line length is ~32K but I don't
> think that git uses any native Windows programs, does it?

No.  Currently GIT is entirely dependent on Cygwin.  So GIT
wouldn't bump into the ~32K limit due to the cygwin-cygwin feature
you mention.  But thanks for the information.  I had thought I had
read somewhere in the Cygwin documentation that the command line
length was rather limited (even under cygwin-cygwin calls).  Maybe
I was just seeing things.  :-)

But even if we can get a long set of args into git-checkout-index its
probably still better to stream them as you can get both programs
working at the same time (rather than waiting for xargs to build
the argument buffer) and you are saving yourself at least one fork
as you don't need to start xargs just to feed git-checkout-index.
Even on Linux where fork is cheap, that's still soemething saved.

-- 
Shawn.

^ permalink raw reply

* Re: bug?: stgit creates (unneccessary?) conflicts when pulling
From: Catalin Marinas @ 2006-03-01 17:53 UTC (permalink / raw)
  To: cel; +Cc: Karl Hasselström, git
In-Reply-To: <4405DC41.8020700@citi.umich.edu>

On 01/03/06, Chuck Lever <cel@citi.umich.edu> wrote:
> Catalin Marinas wrote:
> > I attached another patch that should work properly. It also pushes
> > empty patches on the stack if they were merged upstream (a 'stg clean'
> > is required to remove them). This is useful for the push --undo
> > command if you are not happy with the result.
>
> if maintainer X takes a patch "a" from developer Y, but modifies patch
> "a" before committing it, then your nifty automated mechanism will still
> have trouble merging developer Y's stack when Y pulls again.
>
> the convention might be that maintainers who accept patches will always
> accept exactly what was sent, and then immediately apply another commit
> that addresses any issues they have with the original commit.

This won't solve the problem since testing whether patch "a" was
merged upstream will fail because its reverse won't apply cleanly onto
the upstream HEAD. Of course, you can try combination of upstream
commits and local patches but it's not really feasible.

As I said, this method doesn't solve all the upstream merge situations
but it is OK for most of them.

--
Catalin

^ permalink raw reply

* Re: impure renames / history tracking
From: Andreas Ericsson @ 2006-03-01 17:43 UTC (permalink / raw)
  To: Paul Jakma; +Cc: git list
In-Reply-To: <Pine.LNX.4.64.0603011558390.13612@sheen.jakma.org>

Paul Jakma wrote:
> On Wed, 1 Mar 2006, Andreas Ericsson wrote:
> 
>>> o: commit
>>> m: merge
>>>
>>>    o---o-m--o-o-o--o----m <- project
>>>   /     /              /
>>> o-o-o-o-o--o-o-o--o-o-o <- main branch
>>>
>>> The project merge back to main in one 'big' combined merge 
>>> (collapsing all of the commits on 'project' into one commit). This 
>>> leads to 'impure renames' being not uncommon. The desired end-result 
>>> of merging back to 'main' being to rebase 'project' as one commit 
>>> against 'main', and merge that single commit back, a la:
>>>
>>>    o---o-m--o-o-o--o----m <- project
>>>   /     /              /
>>> o-o-o-o-o--o-o-o--o-o-o---m <- main branch
>>>                        \ /
>>>                         o <- project_collapsed
>>>
>>> So that 'm' on 'main' is that one commit[1].
> 
> 
>> I think you're misunderstanding the git meaning of rebase here. "git 
>> rebase" moves all commits since "project" forked from "main branch" to 
>> the tip of "main branch".
> 
> 
> Right, I'm referring to 'rebase' generally, as a concept, not to 
> git-rebase specifically. E.g. git diff main..project is another way of 
> rebasing I think.
> 

Yes, but imo a poor one, as you're losing all the history. git *can* do 
what you want, but it was designed to maintain a long history so that 
everyone can see it and improve on the code with many chains of small 
and simultanous changes.

>> Other than that, this is the recommended workflow, and exactly how 
>> Linux and git both are managed (i.e. topic branches eventually merged 
>> into 'master').
> 
> 
> They're not rebased though, generally. They're pulled. Ie, in Linux and 
> git when 'project' is merged, things look like:
> 
>     o---o-m--o-o-o--o----m   <- project
>    /     /              / \
> o-o-o-o-o--o-o-o--o-o-o----m <- main branch
> 
> The rest of the world sees /all/ the individual commits of 'project' 
> right? The traditional process for the case I'm thinking of results in 
> the 'main' tree seeing only /one/ single commit for the project.
> 

Perhpas we have a nomenclature clash here. When you say "one single 
commit", I can't help but thinking "snapshot". It's completely 
impossible to fold *ALL* the history into a single commit, and since you 
want heuristics I would imagine you wouldn't want that either.

>> I'm not sure what you mean by 'project_collapsed' though.
> 
> 
> All the commits on the project branch are 'collapsed' into one single 
> commit/delta, and then that /single/ commit is merged to 'main'. Rest of 
> the world sees:
> 
> o-o-o-o-o--o-o-o--o-o-o---m <- main branch
>                        \ /
>                         o <- project
> 

The only sane way to represent this is by doing a mega-patch and 
applying it with a new commit message. That way renamed files will show 
up as

	renamed from /path/to/foo
	renamed to /path/to/some/where/else

Since you're removing all the history in between one mega-patch and the 
next (as if Linus would have v2.6.12 one day and in the next commit it 
would be v2.6.13... strange thought), the history for that tree can't 
well know about renames that doesn't exist in its history. Again, if you 
wan't to keep "master" (can we please call it that? I can't keep up with 
what you call "project" and "main branch") to a single commit you'll 
have no history in it. In essence, that's a snapshot (or a release, 
which is just a snapshot with a tag).

>> Personally I think metadata is evil.
> 
> 
> Not sure I agree. Silly/redundant meta-data can be evil alright. But I'm 
> talking about meta-data which is not there and potentially not 
> reconstructable.
> 
>> Renames will still be auto-detected anyway,
> 
> 
> Chances are so, yes. Definitely with the git and Linux workflows.
> 
> The traditional workflow for the software project I'm thinking of is 
> different though. One commit may encompass multiple renames and edits of 
> a file (discouraged, but it's possible).
> 
> If my understanding is correct, following back history for such cases 
> would be difficult.
> 

It would be impossible. At best you can get "before mega-patch 64, the 
tree looked like this", "after mega-patch 64, it looked like this, and 
here are the files with 80% of above similarity index".

> There is an argument that that 'traditional' process should be changed. 
> However, leaving aside that argument, I'd like to know if git could 
> accomodate that process.
> 
>> be able to detect a rename is if you rename a file and hack it up so 
>> it doesn't even come close to matching its origin (close in this case 
>> is 80% by default, I think). In those cases it isn't so much a rename 
>> as a rewrite.
> 
> 
> Exactly - this is the case I'm concerned about. Imagine that you'd like 
> to be follow the history back through the rewrite and through to the 
> original file.
> 

I'm confused. First you say you want to have one single mega-patch for 
each commit, then you say you want to be able to follow history back. 
It's like deciding to throw away your wallet and then trying to get 
someone to pick it up and carry it around for you.

>> IMO this is far better than having to tell git "I renamed this file to 
>> that", since it also detects code-copying with modifications, and it's 
>> usually quick enough to find those renames as well.
> 
> 
> I think so too, but that involves arguing that very very long-standing 
> workflows should be changed to accomodate git. I intend to make that 
> argument to the 'project' concerned, however I would also like to be say 
> git could equally well deal with the 'traditional' workflow, modulo 
> having to explicitely use (say) git-mv.
> 

The simple fact is that once you start juggling 12MB patches instead of 
keeping the commits, your history is out the window anyway. Adding 
meta-data to accommodate for the lack of history when you throw it away 
is, to be honest, an approach that leaves "insane" in the dust.

As for convincing others, shove git-bisect under their noses and ask 
them if they'd like a tool to find their bugs for them.

>>
>>     $ git checkout master
>>     $ git pull . project
> 
> 
> Right, but 'pull' isn't what I mean :).
> 
> I mean:
> 
>     $ git checkout project
>     $ git pull . master
>     $ git checkout -b tmp project
>     $ git diff project..master | <git apply I think>
>

This way, 'project' and 'tmp' both would hold all patches since you 
merge 'master' into 'project' before creating the 'tmp' branch at the 
head of 'project'. As such, 'project' is ahead of 'master' (it has its 
own changes, those in master and the merge between 'project' and 
'master'), so the diff will be empty.

If 'master' is where you commit regularly (i.e. not mega-patches), you 
can do these two steps to create the mega-patch branch

	$ git checkout -b mega; # create the mega-patch branch
	$ # rewind the mega-patch branch to the dawn of time
	$ git reset --hard $(git rev-list HEAD | tail -n 1)

And for each mega-patch, do this:

	$ # create and apply mega-patch 1
	$ git diff project..master | git apply
	$ # commit the changes we just applied
	$ git commit -s -a -m "mega-patch 1"
	$ git checkout project; # back to project branch
	$ # Merge with 'master', or the next mega-patch won't apply
	$ git pull . master

>> Then you can apply patch-file to whatever branch you want and make the 
>> commit as if it was a single change-set. I'd recommend against it 
>> unless you're just toying around though. It's a bad idea to lie in a 
>> projects history.
> 
> 
> Presume that 'project' in the workflow is defined as
> 
>     "achieve one goal with one commit to the master"
> 
> So by definition, it always correct that the project only ever has one 
> commit.
> 

But that can't be true either, unless you intend to stop working at the 
project. At "best", you could be able to get a chain of commits in 
'master' where each commit hold several tons of changes.

The topic-branch approach to this would be to
a) Implement all changes required for a certain feature in one go and 
commit all of them. do "git pull . topic-branch" when on master branch. 
This will result in a "fast-forward" (i.e. top of 'master' is the 
merge-base between 'master' and 'topic-branch'), so no merge will happen.

b) Implement all changes required for a certain feature in small steps 
and then apply the diff between 'master..topic-branch' to master. The 
topic-branch has to be thrown away, since it can't ever be merged back 
into master, and master can't be merged into the topic-branch (that's 
ok, topic-branches are made to throw away).

For small changes, or one change and some stupid bugfixes, I'd say b) is 
a viable option. The kind of changes you talk about, with several 
renames of files and sometimes near-complete rewrite of them, would 
certainly warrant a merge (or a fast-forward).

> The trouble is that /sometimes/ projects do indeed 'rename and rewrite' 
> a file. At present, chances are git might not notice this, and ability 
> to follow history through the rename+rewrite would be lost.
> 
> I'm wondering whether:
> 
> - this could be solved?

Not with the mega-patch approach.

> - how? (some additional advisory-only meta-data in the
>   index-cache and commit?)
> 

You could maintain that data yourself in either an external or versioned 
file. I've never heard of anyone employing the workflow you describe so 
I doubt it's very common. I also shudder to think that git will be made 
less efficient for the benefit of throwing history away, when tracking 
history efficiently is what it's all about in the first place.

> If there is consensus on an acceptable way, I'm willing to implement it. 
> (I was thinking of just adding 'rename' headers to the commit objects, 
> then teaching diffcore to consider them in addition to current heuristics).
> 

The code is mightier than the mail. Perhaps if I see an implementation 
of this I could wrap my head around what you really mean. I'm sure I 
must misunderstand you one way or another.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply

* Re: git-svn and huge data and modifying the git-svn-HEAD branch directly
From: Linus Torvalds @ 2006-03-01 17:40 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Andreas Ericsson, Eric Wong, Martin Langhoff, git
In-Reply-To: <200603011814.43573.Josef.Weidendorfer@gmx.de>



On Wed, 1 Mar 2006, Josef Weidendorfer wrote:
>
> On Wednesday 01 March 2006 17:24, Linus Torvalds wrote:
> > The thing about it being .git/refs/heads/svn/xyzzy is that then you can do
> > 
> > 	git checkout svn/xyzzy
> > 
> > and start modifying it. Which is exactly against the point: the thing is 
> > _not_ a branch and you must _not_ commit to it.
> > 
> > It's much more like a tag: it's a pointer to the last point of an 
> > svn-import.
> 
> Isn't it the same with tracked branches of a remote git repo?
> With this reasoning, all heads that git-clone clones aside from the
> special "master" should not be under .git/refs/heads, but better
> under .git/refs/remotes/<remoteRepoName>/ ?

Yes, I think that would make tons of sense.

> <remoteRepoName> is "origin" in the case of git-clone, so .git/remotes/origin
> would contain
>  URL: http://host/repo.git
>  Pull: master:remotes/origin/master
> 
> Then there would not be the need for the confusing special branch "origin"
> after cloning, as namespaces are separate.

I think that would make things a lot more flexible, and yes, it sounds 
like a good idea.

HOWEVER.

I think it's not only very common, but quite useful, to do what we do now, 
ie

	git log origin..

to see "what is in origin but not in HEAD".

So there's a big usability issue: I don't think it's good to have to say

	git log remotes/origin/master..

to do the same.

So from a usability standpoint, we'd have to teach "get_sha1()" about 
parsing .git/remotes/* files if it cannot find a branch or a tag with that 
name (which it wouldn't be able to, since even if it were to walk the 
directories udner .git/refs/ recursively, it would be named "master" 
there).

But if somebody does the get_sha1() magic, and Junio agrees, then I think 
it would be a great thing to do.

			Linus

^ permalink raw reply

* Re: bug?: stgit creates (unneccessary?) conflicts when pulling
From: Chuck Lever @ 2006-03-01 17:39 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: Karl Hasselström, git
In-Reply-To: <b0943d9e0602281445w7160d915y@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1528 bytes --]

Catalin Marinas wrote:
> On 27/02/06, Catalin Marinas <catalin.marinas@gmail.com> wrote:
> 
>>An idea (untested, I don't even know whether it's feasible) would be to
>>check which patches were merged by reverse-applying them starting with
>>the last. In this situation, all the merged patches should just revert
>>their changes. You only need to do a git-diff between the bottom and the
>>top of the patch and git-apply the output (maybe without even modifying
>>the tree). If this operation succeeds, the patch was integrated and you
>>don't even need to push it.
> 
> 
> I attached another patch that should work properly. It also pushes
> empty patches on the stack if they were merged upstream (a 'stg clean'
> is required to remove them). This is useful for the push --undo
> command if you are not happy with the result.
> 
> I'll try this patch for a bit more before pushing into the repository.

i think this is a cool idea.  but it seems still to require a bit of 
convention on the part of the maintainer.

if maintainer X takes a patch "a" from developer Y, but modifies patch 
"a" before committing it, then your nifty automated mechanism will still 
have trouble merging developer Y's stack when Y pulls again.

the convention might be that maintainers who accept patches will always 
accept exactly what was sent, and then immediately apply another commit 
that addresses any issues they have with the original commit.  this is 
also a good idea so that the history contains the exact attribution of 
each change.

[-- Attachment #2: cel.vcf --]
[-- Type: text/x-vcard, Size: 451 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Open Source NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:cel@citi.umich.edu
title:Member of Technical Staff
tel;work:+1 734 763-4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668-1089
x-mozilla-html:FALSE
url:http://troy.citi.umich.edu/u/cel/
version:2.1
end:vcard


^ permalink raw reply

* Re: git-svn and huge data and modifying the git-svn-HEAD branch directly
From: Shawn Pearce @ 2006-03-01 17:28 UTC (permalink / raw)
  To: Josef Weidendorfer
  Cc: Linus Torvalds, Andreas Ericsson, Eric Wong, Martin Langhoff, git
In-Reply-To: <200603011814.43573.Josef.Weidendorfer@gmx.de>

Josef Weidendorfer <Josef.Weidendorfer@gmx.de> wrote:
> On Wednesday 01 March 2006 17:24, Linus Torvalds wrote:
> > The thing about it being .git/refs/heads/svn/xyzzy is that then you can do
> > 
> > 	git checkout svn/xyzzy
> > 
> > and start modifying it. Which is exactly against the point: the thing is 
> > _not_ a branch and you must _not_ commit to it.
> > 
> > It's much more like a tag: it's a pointer to the last point of an 
> > svn-import.
> 
> Isn't it the same with tracked branches of a remote git repo?
> With this reasoning, all heads that git-clone clones aside from the
> special "master" should not be under .git/refs/heads, but better
> under .git/refs/remotes/<remoteRepoName>/ ?
> 
> <remoteRepoName> is "origin" in the case of git-clone, so .git/remotes/origin
> would contain
>  URL: http://host/repo.git
>  Pull: master:remotes/origin/master
> 
> Then there would not be the need for the confusing special branch "origin"
> after cloning, as namespaces are separate.

This is a really good idea.  It certainly would prevent polluting the
heads namespace.  And its a lot easier to explain to someone than the
mapping in the Pull line usually is.

-- 
Shawn.

^ permalink raw reply

* Re: [PATCH] diff-delta: bound hash list length to avoid O(m*n) behavior
From: Nicolas Pitre @ 2006-03-01 17:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vmzgajvpl.fsf@assigned-by-dhcp.cox.net>

On Wed, 1 Mar 2006, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> >> I tried an experimental patch to cull collided hash buckets
> >> very aggressively.  I haven't applied your last "reuse index"
> >> patch, though -- I think that is orthogonal and I'd like to
> >> leave that to the next round.
> >
> > It is indeed orthogonal and I think you could apply it to the next 
> > branch without the other patches (it should apply with little problems).  
> > This is an obvious and undisputable gain, even more if pack-objects is 
> > reworked to reduce memory usage by keeping only one live index for 
> > multiple consecutive deltaattempts.
> 
> Umm.  The hash-index is rather huge, isn't it?  I did not
> realize it was two-pointer structure for every byte in the
> source material, and we typically delta from larger to smaller,
> so we will keep about 10x the unpacked source.  Until we swap
> the windowing around, that means about 100x the unpacked source
> with the default window size.

That's why I said that the window reversal has to be done as well to be 
effective.  As for the index itself it can be reduced to a single 
pointer since the "ptr" value can be deduced from the offset of the 
index entry.

> Also, I am not sure which one is more costly: hash-index
> building or use of that to search inside target.  I somehow got
> an impression that the former is relatively cheap, and that is
> what is being cached here.

Yes, but caching it saves 10% on CPU time, probably more when the window 
is swapped around due to less memory usage.

> > Let's suppose the reference buffer has:
> >  
> > ***********************************************************************/
> >...
> > One improvement might consist of counting the number of consecutive 
> > identical bytes when starting a compare, and manage to skip as many hash 
> > entries (minus the block size) before looping again with more entries in 
> > the same hash bucket.
> 
> Umm, again.  Consecutive identical bytes (BTW, I think "* * *"
> and "** ** **" patterns have the same collision issues without
> being consecutive bytes, so such an optimization may be trickier
> and cost more),

First, those "** ** **" are less frequent in general. Next, they will be 
spread amongst 3 hash buckets instead of all the same one.  And with 
large binary files with lots of zeroes then scanning over those areas in 
one pass instead of iterating over them from every offset would help 
enormously as well, even without limiting the hash list length.

 when emitted as literals, would compress well,
> wouldn't they?  At the end of the day, I think what matters is
> the size of deflated delta, since going to disk to read it out
> is more expensive than deflating and applying.  I think you made
> a suggestion along the same line, capping the max delta used by
> try_delta() more precisely by taking the deflated size into
> account.

Yes.  But deflating a bunch of characters will never be as dense as a 4 
byte delta sequence that might expand to hundreds.


Nicolas

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox