Re: impure renames / history tracking

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Paul Jakma <paul@clubi.ie>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Andreas Ericsson <ae@op5.se>, git list <git@vger.kernel.org>
Subject: Re: impure renames / history tracking
Date: Wed, 1 Mar 2006 18:50:21 +0000 (GMT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0603011815150.13612@sheen.jakma.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0603010859200.22647@g5.osdl.org>

Hi Linus,

On Wed, 1 Mar 2006, Linus Torvalds wrote:

> The thing is, it does better than anything that _tries_ to be 
> "reliable".
>
> I can pretty much _guarantee_ that you can't do it better.

I'm willing to take that argument to the 'project' concerned, I just 
need to be pretty sure of it.

> Tracking "inodes" - aka file identities - (which is what BK does, 
> and I assume what SVN does) is fundamentally problematic. I 
> particular, it's a horrible problem when two inodes "meet" under 
> the same name. You now have two identities for the same file, and 
> you're fundamentally screwed.

Yes, in that model it is. This interestingly, is not the BK model, I 
suspect (see below).

> It doesn't even need renames to be a problem. JUST THE FACT THAT 
> YOU TRY TO TRACK FILE "IDENTITY" HISTORY IS BROKEN.

If it's "file identity" globally across the lifetime of the project, 
I agree 100% per cent. The 'traditional' SCM concerned does this.

That's not what a solution I'd want to explore either, I'm only 
interested in the identity of files for any one /one/ commit. In 
saying that, I recognise it's pointless to try annotate file-change 
information in multi-parent commits (merges).

> For example, take CVS, which doesn't actually try to do renames, 
> but _does_ try to track the identity of a file, since all the 
> history is tied into that identity: think about what happens in 
> Attic when a file is deleted. Completely broken model.

ACK, {Attic,deleted_files}/ is just horrid.

> And that's really fundamental. CVS doesn't show the problems so 
> much, because CVS actively tries to make it hard to do these 
> things.

ACK.

> With renames-tracking-file-identities, it's _really_ easy to get 
> some major confusion going. What happens when one branch creates a 
> file, and another one renames a file to that same name, and they 
> merge?

Well, the conflict has to be resolved somehow, even today.

> Don't tell me it doesn't happen. It happened under BK. The way BK 
> "solved" it was to keep the two separate identities: one of them 
> got resolved to the new filename, the other one went into the 
> "deleted" directory.

Right. That's what the 'traditional workflow' SCM I'm thinking of 
does - not BK funnily enough, but an SCM predating BK which also 
happens to use SCCS files, and with some of the same high-level 
push/pull constructs as BK (interestingly).

It also tracks name history globally using a deleted_files/ history, 
which is maintained, but I don't think it does this for name merges 
like the above.

In the one I'm thinking of, it does (I /think/, I'm not an expert in 
it) the following:

Given two files, say:

'old:

1.1---1.2---1.3

new:

1.1

- constructs a 'fake' base SCCS revision, empty
- adds the top 'old' version as a branch
- adds the top new version as a new delta

    1.1.1.1
   /
1.1---------1.2

Where in the merged file:

 	1.1: empty
 	1.1.1.1: was 1.3 from 'old'
 	1.2: is 1.1 from 'new'

However, it does /not/ create a deleted_files entry for the 'old' 
file. (AFAICT - I may not have a sufficiently full understanding of 
this SCM)

> Guess what happens when the side that got merged into "deleted" 
> continues to edit the file? That's right - their edits happen on 
> the deleted file, and never show up in the real tree in a 
> subsequent merge ever again.

Indeed - horrid.

> And as far as I can tell, BK really did the best you can do. 
> Following file identities really _is_ fundamentally broken. It 
> sounds like a nice idea, but while you migth solve a few problems, 
> you create a whole raft of much more fundamental problems.

For tracking identity across more than one commit - I fully agree.

That's not what quite I'm thinking of though. Is it worth going on 
with the discussion on a:

 	 'track identities *only* from context of /the/ parent to
           this commit'

> So next time you think about a merge that migt have been improved 
> by tracking renames, please also think about a merge where one of 
> the filenames came from two or more different sources through an 
> earlier merge, and thank your benevolent Gods that they instructed 
> me to make git be based purely on file contents.

Oh, I agree muchely here.

I wouldn't change git. I only wonder if it give its rename-heuristics 
an additional advisory-only hint? (for single-parent commits at least 
- never merges - and only on a per-commit basis).

I probably should first explore how git deals with rename clashes..

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
I'm glad I was not born before tea.
 		-- Sidney Smith (1771-1845)

next prev parent reply	other threads:[~2006-03-01 18:51 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-01 14:01 impure renames / history tracking Paul Jakma
2006-03-01 15:38 ` Andreas Ericsson
2006-03-01 16:27   ` Paul Jakma
2006-03-01 17:13     ` Linus Torvalds
2006-03-01 18:50       ` Paul Jakma [this message]
2006-03-01 17:43     ` Andreas Ericsson
2006-03-02 21:10       ` Paul Jakma
2006-03-02 22:06         ` Andreas Ericsson
2006-03-01 18:05     ` Martin Langhoff
2006-03-01 19:13       ` Paul Jakma
2006-03-01 19:56         ` Junio C Hamano
2006-03-01 21:25           ` Paul Jakma
2006-03-01 22:12             ` Andreas Ericsson
2006-03-01 22:28               ` Paul Jakma
2006-03-01 22:46               ` Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2006-03-02 22:24 linux

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0603011815150.13612@sheen.jakma.org \
    --to=paul@clubi.ie \
    --cc=ae@op5.se \
    --cc=git@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).