* Merging split files [not found] <31155742.183989.1300374518689.JavaMail.root@mail.hq.genarts.com> @ 2011-03-18 13:22 ` Stephen Bash 2011-03-29 15:16 ` Jeff King 0 siblings, 1 reply; 4+ messages in thread From: Stephen Bash @ 2011-03-18 13:22 UTC (permalink / raw) To: git Hi all- In our previous release foo.cxx contained both the base class and a few subclasses. Since then the number of subclasses has grown, and we've split foo.cxx (base and sub-classes) into foo-base.cxx (base class) and foo-defs.cxx (sub-classes). Since the release, we've had a few bug fixes in foo.cxx on the maintenance branch, and need to merge those back to development. When I did the merge Git identified foo.cxx as moved to foo-defs.cxx, which worked for most changes, but a few needed to be in foo-base.cxx. In this case it was a pretty trivial manual resolution, but is there a method for handling merges of split files? Thanks, Stephen ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Merging split files 2011-03-18 13:22 ` Merging split files Stephen Bash @ 2011-03-29 15:16 ` Jeff King 2011-03-29 16:33 ` Stephen Bash 0 siblings, 1 reply; 4+ messages in thread From: Jeff King @ 2011-03-29 15:16 UTC (permalink / raw) To: Stephen Bash; +Cc: git On Fri, Mar 18, 2011 at 09:22:36AM -0400, Stephen Bash wrote: > In our previous release foo.cxx contained both the base class and a > few subclasses. Since then the number of subclasses has grown, and > we've split foo.cxx (base and sub-classes) into foo-base.cxx (base > class) and foo-defs.cxx (sub-classes). Since the release, we've had a > few bug fixes in foo.cxx on the maintenance branch, and need to merge > those back to development. When I did the merge Git identified > foo.cxx as moved to foo-defs.cxx, which worked for most changes, but a > few needed to be in foo-base.cxx. In this case it was a pretty > trivial manual resolution, but is there a method for handling merges > of split files? I don't think there is currently a good way to do this automatically. The problem is that the closest merge-recursive gets to understanding content movement is that it considers whole file renames. So it sees "foo.cxx became foo-defs.cxx", and applies changes to foo.cxx to foo-defs.cxx, but it has no clue that foo-base.cxx. So at the very least, it would need to represent "foo.cxx has split into foo-base.cxx and foo-defs.cxx", which is not something it can currently handle. But more than that, you want to know _which_ parts moved to each file. So I think the most flexible thing is to forget file renames at all. They are just a rough version of the general idea of content movement. In theory, we should be able to see that the content we changed in foo.cxx no longer exists, and then start looking for similar content elsewhere. Not similar _files_, but for the chunk of content that is changed between the merge base and the maintenance (and some surrounding context), find where that bit of content went. And then try to merge our changes into that new bit of content. One problem is that when it fails, it fails pretty hard. With file renames, your changes at least usually ends up in the right file (your present problem excluded), and you get some textual mess to clean up. But with content-level renaming, I suspect in conflict cases we would end up with no clue where the result goes (because the conflict means we can't easily match up the content for similarity), and have to stick it in the deleted file. On the other hand, it might simply work to keep expanding the amount of context we consider for content similarity until we find a match, which eventually would end up considering the whole file, and generalize to a file rename. Implementing that inside of merge-recursive is likely to be pretty nasty (even the current file-rename code is already pretty nasty). But it may be possible to prototype something that runs after we hit the conflicted state, like mergetool. I definitely think it's an interesting area to work in, but I would have to give it a lot of thought. -Peff ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Merging split files 2011-03-29 15:16 ` Jeff King @ 2011-03-29 16:33 ` Stephen Bash 2011-03-29 18:15 ` Jeff King 0 siblings, 1 reply; 4+ messages in thread From: Stephen Bash @ 2011-03-29 16:33 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff- Thanks for taking the time to think about this. More inline... ----- Original Message ----- > From: "Jeff King" <peff@peff.net> > To: "Stephen Bash" <bash@genarts.com> > Cc: git@vger.kernel.org > Sent: Tuesday, March 29, 2011 11:16:23 AM > Subject: Re: Merging split files > > On Fri, Mar 18, 2011 at 09:22:36AM -0400, Stephen Bash wrote: > > > In our previous release foo.cxx contained both the base class and a > > few subclasses. Since then the number of subclasses has grown, and > > we've split foo.cxx (base and sub-classes) into foo-base.cxx (base > > class) and foo-defs.cxx (sub-classes). Since the release, we've had > > a > > few bug fixes in foo.cxx on the maintenance branch, and need to > > merge > > those back to development. When I did the merge Git identified > > foo.cxx as moved to foo-defs.cxx, which worked for most changes, but > > a > > few needed to be in foo-base.cxx. In this case it was a pretty > > trivial manual resolution, but is there a method for handling merges > > of split files? > > I don't think there is currently a good way to do this automatically. > > The problem is that the closest merge-recursive gets to understanding > content movement is that it considers whole file renames. ... > > So I think the most flexible thing is to forget file renames at all. I agree that would be the best solution long term. ("Git doesn't track files, Git tracks content". Think I heard that somewhere before...) That being said, the back seat drivers in the office here (i.e. me and everyone else that knows almost nothing about the internals of merge recursive!) thought maybe a middle ground is teach merge recursive to do copy detection along with rename detection. Then the algorithm would have a (relatively small?) list of candidate files to check for hunks. You still have to deal with the similarity score in some corner cases, but hopefully since all we want is candidate files the process is relatively insensitive to the similarity threshold. Am I way off the deep end now? I'm not lying when I say I know *nothing* about the merge implementations. > I definitely think it's an interesting area to work in, but I would > have to give it a lot of thought. It's a "corner case" that I seem to have run into a lot in my work experience, so if the Git community can actually make a good solution work it will be a major win in my book. Thanks again! Stephen ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Merging split files 2011-03-29 16:33 ` Stephen Bash @ 2011-03-29 18:15 ` Jeff King 0 siblings, 0 replies; 4+ messages in thread From: Jeff King @ 2011-03-29 18:15 UTC (permalink / raw) To: Stephen Bash; +Cc: git On Tue, Mar 29, 2011 at 12:33:17PM -0400, Stephen Bash wrote: > > The problem is that the closest merge-recursive gets to understanding > > content movement is that it considers whole file renames. ... > > > > So I think the most flexible thing is to forget file renames at all. > > I agree that would be the best solution long term. ("Git doesn't track > files, Git tracks content". Think I heard that somewhere before...) Exactly. :) I think that is a tricky project, though, and in the meantime, I wouldn't be opposed to a more file-based solution if it generates good results. > That being said, the back seat drivers in the office here (i.e. me and > everyone else that knows almost nothing about the internals of merge > recursive!) thought maybe a middle ground is teach merge recursive to > do copy detection along with rename detection. Then the algorithm > would have a (relatively small?) list of candidate files to check for > hunks. You still have to deal with the similarity score in some > corner cases, but hopefully since all we want is candidate files the > process is relatively insensitive to the similarity threshold. This was something I gave some thought to recently in this other thread: http://thread.gmane.org/gmane.comp.version-control.git/169944 though I came to the conclusion in that case that break-rewriting was a much better match for that particular case. Namely, we see that content has been renamed, so we make sure to merge changes to the "original" content with each other, no matter whether the changes happened in the renamed path or the original. And similarly, we merge changes left over from any "new" content that has replaced the original (which, in the pure rename case, is just empty, but with break-rewriting we might have some dissimilar content at the old path). We know that the "new" content can't be related to the "old" content, because to find a rename, we would have to have triggered the "break" by finding that the content is dissimilar. Copy detection has to deal with that, but harder. :) I see two major challenges: 1. One source file may go to multiple destinations. So instead of saying "oops, I should be doing the merge with this other, renamed content", you have to pick a best one (either through heuristic, or even per-hunk by trying each hunk in turn). And this means you're interacting deeply with the content-level 3-way merger. I haven't looked at that code at all, so I don't know how feasible that is. And you have to accept that you may pick wrong, or even that there may be no right answer. If I do "cp foo bar; cp foo baz; rm foo", and then modify "foo" on another branch, the choice of merging changes to "bar" versus "baz" is going to be arbitrary. 2. Because it's a copy and not a rename, your source file may still exist and be a candidate for applying content to. And that violates the break-rewrite rename logic I mentioned above, which is that old content goes with old content and new content goes with new content. We're not sure which the source file is for a given hunk. I think that may not be a big deal, though. We already have to deal with the hard part in (1), which is finding _which_ copy is the right place for a given bit of content. So this may just simplify to adding the source file (if it still exists) as another possible place to merge changes to, and it is another case of (1) (though obviously we should prefer merging to the original pathname if it is still there, rather than a copy). > Am I way off the deep end now? I'm not lying when I say I know > *nothing* about the merge implementations. No, I don't think you're off the deep end. But then, I don't know that much about the merge code, either. :) -Peff ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-03-29 18:15 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <31155742.183989.1300374518689.JavaMail.root@mail.hq.genarts.com> 2011-03-18 13:22 ` Merging split files Stephen Bash 2011-03-29 15:16 ` Jeff King 2011-03-29 16:33 ` Stephen Bash 2011-03-29 18:15 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).