* BUG: git subtree split gets confused on removed and readded directory @ 2016-01-15 16:23 Marcus Brinkmann 2016-01-15 23:44 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Marcus Brinkmann @ 2016-01-15 16:23 UTC (permalink / raw) To: git Hi, I made a simple test repository showing the problem here: https://github.com/lambdafu/git-subtree-split-test After creating the master branch, I created the split/bar branch like this: $ git subtree split -P bar -b split/bar The resulting history is confused by the directory "bar" which was added, removed and then re-added again. The recent history up to adding the directory the second time is fine. But then it seems to loose track and add the parent of that commit up to the initial commit in the history. I'd expect that the parent of the readding commit is an empty tree commit (which removed the last files in the directory), and that before that are commits that reflect the initial creation of that directory with its files, but rewritten as a subtree, of course. Thanks! Marcus ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG: git subtree split gets confused on removed and readded directory 2016-01-15 16:23 BUG: git subtree split gets confused on removed and readded directory Marcus Brinkmann @ 2016-01-15 23:44 ` Junio C Hamano 2016-01-17 19:34 ` David Ware 2016-01-17 23:23 ` David A. Greene 0 siblings, 2 replies; 12+ messages in thread From: Junio C Hamano @ 2016-01-15 23:44 UTC (permalink / raw) To: Dave Ware, David A. Greene; +Cc: git, Marcus Brinkmann Marcus Brinkmann <marcus.brinkmann@ruhr-uni-bochum.de> writes: > I made a simple test repository showing the problem here: > https://github.com/lambdafu/git-subtree-split-test > > After creating the master branch, I created the split/bar branch like this: > > $ git subtree split -P bar -b split/bar > > The resulting history is confused by the directory "bar" which was > added, removed and then re-added again. The recent history up to adding > the directory the second time is fine. But then it seems to loose track > and add the parent of that commit up to the initial commit in the history. > > I'd expect that the parent of the readding commit is an empty tree > commit (which removed the last files in the directory), and that before > that are commits that reflect the initial creation of that directory > with its files, but rewritten as a subtree, of course. Thanks for a report. David, does this ring a bell? Dave, does your fix "subtree split" we saw recently on the list http://article.gmane.org/gmane.comp.version-control.git/284125 help this? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG: git subtree split gets confused on removed and readded directory 2016-01-15 23:44 ` Junio C Hamano @ 2016-01-17 19:34 ` David Ware 2016-01-17 23:23 ` David A. Greene 1 sibling, 0 replies; 12+ messages in thread From: David Ware @ 2016-01-17 19:34 UTC (permalink / raw) To: Junio C Hamano; +Cc: David A. Greene, git, Marcus Brinkmann No my patch doesn't seem to fix this. Cheers, Dave Ware (sorry if you're receiving this for the second time, I'm resending since the mailing list blocked my earlier reply for html content) On Sat, Jan 16, 2016 at 12:44 PM, Junio C Hamano <gitster@pobox.com> wrote: > Marcus Brinkmann <marcus.brinkmann@ruhr-uni-bochum.de> writes: > >> I made a simple test repository showing the problem here: >> https://github.com/lambdafu/git-subtree-split-test >> >> After creating the master branch, I created the split/bar branch like this: >> >> $ git subtree split -P bar -b split/bar >> >> The resulting history is confused by the directory "bar" which was >> added, removed and then re-added again. The recent history up to adding >> the directory the second time is fine. But then it seems to loose track >> and add the parent of that commit up to the initial commit in the history. >> >> I'd expect that the parent of the readding commit is an empty tree >> commit (which removed the last files in the directory), and that before >> that are commits that reflect the initial creation of that directory >> with its files, but rewritten as a subtree, of course. > > Thanks for a report. > > David, does this ring a bell? > > Dave, does your fix "subtree split" we saw recently on the list > > http://article.gmane.org/gmane.comp.version-control.git/284125 > > help this? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BUG: git subtree split gets confused on removed and readded directory 2016-01-15 23:44 ` Junio C Hamano 2016-01-17 19:34 ` David Ware @ 2016-01-17 23:23 ` David A. Greene 2016-01-20 1:17 ` [PATCH] contrib/subtree: Split history with empty trees correctly (was: Re: BUG: git subtree split gets confused on removed and readded directory) Marcus Brinkmann 1 sibling, 1 reply; 12+ messages in thread From: David A. Greene @ 2016-01-17 23:23 UTC (permalink / raw) To: Junio C Hamano; +Cc: Dave Ware, git, Marcus Brinkmann Junio C Hamano <gitster@pobox.com> writes: > Marcus Brinkmann <marcus.brinkmann@ruhr-uni-bochum.de> writes: > >> I made a simple test repository showing the problem here: >> https://github.com/lambdafu/git-subtree-split-test> >> After creating the master branch, I created the split/bar branch like this: >> >> $ git subtree split -P bar -b split/bar >> >> The resulting history is confused by the directory "bar" which was >> added, removed and then re-added again. The recent history up to adding >> the directory the second time is fine. But then it seems to loose track >> and add the parent of that commit up to the initial commit in the history. >> >> I'd expect that the parent of the readding commit is an empty tree >> commit (which removed the last files in the directory), and that before >> that are commits that reflect the initial creation of that directory >> with its files, but rewritten as a subtree, of course. > > Thanks for a report. Yes, thank you! > David, does this ring a bell? No, I have not run into this before. I'm actually going to be working in the split code starting sometime this month (work allowing, of course). So it's great to get a report like this. One of the things I want to do is eventually move over subtree split to using a proper filter-branch instead of the entirely custom code that's currently there. This does, however, appear to cause a semntic difference in preliminary testing which I am still tracking down. The filter-based split is *incredibly* faster than the current code. The current code can take hours on moderately-sized histories. This should shake out a lot of these kinds of problems since the filter-branch code is heavily used and tested while the subtree split code is not. Assuming this goes ahead, I plan to introduce a new switch to control filter-branch vs. original code and migrate the default to filter-branch if all goes well. I'll write up a failing test for this so that I remember to address it when I get to the code. Thanks again, Marcus! -David ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] contrib/subtree: Split history with empty trees correctly (was: Re: BUG: git subtree split gets confused on removed and readded directory) 2016-01-17 23:23 ` David A. Greene @ 2016-01-20 1:17 ` Marcus Brinkmann 2016-01-20 4:05 ` [PATCH] contrib/subtree: Split history with empty trees correctly David A. Greene 0 siblings, 1 reply; 12+ messages in thread From: Marcus Brinkmann @ 2016-01-20 1:17 UTC (permalink / raw) To: David A. Greene, Junio C Hamano; +Cc: Dave Ware, git 'git subtree split' will fail if the history of the subtree has empty tree commits (or trees that are considered empty, such as submodules). This fix keeps track of this condition and correctly follows the history over such commits. Signed-off-by: Marcus Brinkmann <m.brinkmann@semantics.de> --- contrib/subtree/git-subtree.sh | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh index edf36f8..b68828b 100755 --- a/contrib/subtree/git-subtree.sh +++ b/contrib/subtree/git-subtree.sh @@ -36,6 +36,8 @@ PATH=$PATH:$(git --exec-path) require_work_tree +EMPTY_TREE=`git hash-object -t tree /dev/null` + quiet= branch= debug= @@ -449,7 +451,8 @@ copy_or_skip() rev="$1" tree="$2" newparents="$3" - assert [ -n "$tree" ] + + [ -z "$tree" ] && tree=$EMPTY_TREE identical= nonidentical= @@ -603,6 +606,7 @@ cmd_split() revmax=$(eval "$grl" | wc -l) revcount=0 createcount=0 + found_first_commit= eval "$grl" | while read rev parents; do revcount=$(($revcount + 1)) @@ -625,12 +629,16 @@ cmd_split() # ugly. is there no better way to tell if this is a subtree # vs. a mainline commit? Does it matter? - if [ -z $tree ]; then - set_notree $rev - if [ -n "$newparents" ]; then - cache_set $rev $rev + if [ -z $found_first_commit ]; then + if [ -z $tree ]; then + set_notree $rev + if [ -n "$newparents" ]; then + cache_set $rev $rev + fi + continue + else + found_first_commit=yes fi - continue fi newrev=$(copy_or_skip "$rev" "$tree" "$newparents") || exit $? -- 2.5.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] contrib/subtree: Split history with empty trees correctly 2016-01-20 1:17 ` [PATCH] contrib/subtree: Split history with empty trees correctly (was: Re: BUG: git subtree split gets confused on removed and readded directory) Marcus Brinkmann @ 2016-01-20 4:05 ` David A. Greene 2016-01-20 11:22 ` Marcus Brinkmann 2016-01-24 13:07 ` Marcus Brinkmann 0 siblings, 2 replies; 12+ messages in thread From: David A. Greene @ 2016-01-20 4:05 UTC (permalink / raw) To: Marcus Brinkmann; +Cc: Junio C Hamano, Dave Ware, git Marcus Brinkmann <m.brinkmann@semantics.de> writes: > 'git subtree split' will fail if the history of the subtree has empty > tree commits (or trees that are considered empty, such as submodules). > This fix keeps track of this condition and correctly follows the history > over such commits. Thanks for working on this! Please add a test to t7900-subtree.sh. > @@ -625,12 +629,16 @@ cmd_split() > > # ugly. is there no better way to tell if this is a subtree > # vs. a mainline commit? Does it matter? > - if [ -z $tree ]; then > - set_notree $rev > - if [ -n "$newparents" ]; then > - cache_set $rev $rev > + if [ -z $found_first_commit ]; then > + if [ -z $tree ]; then > + set_notree $rev > + if [ -n "$newparents" ]; then > + cache_set $rev $rev > + fi > + continue > + else > + found_first_commit=yes > fi > - continue > fi > > newrev=$(copy_or_skip "$rev" "$tree" "$newparents") || exit $? Can you explain the logic here? The old code appears to be using the lack of a tree to filter out "mainline" commits from the subtree history when splitting. If that test is only done before seeing a proper subtree commit and never after, then any commit mainline commit following the first subtree commit in the rev list will miss being marked with set_notree and the cache will not have the identity entry added. Test #36 in t7900-subtree.sh has a mainline commit listed after the first subtree commit in the rev list, I believe. I'm not positive your change is wrong, I'd just like to understand it better. I'd also like a comment explaining why it works so future developers don't get confused. Overall, I am trying to better comment the code as I make my own changes. -David ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] contrib/subtree: Split history with empty trees correctly 2016-01-20 4:05 ` [PATCH] contrib/subtree: Split history with empty trees correctly David A. Greene @ 2016-01-20 11:22 ` Marcus Brinkmann 2016-01-28 2:55 ` David A. Greene 2016-01-24 13:07 ` Marcus Brinkmann 1 sibling, 1 reply; 12+ messages in thread From: Marcus Brinkmann @ 2016-01-20 11:22 UTC (permalink / raw) To: David A. Greene; +Cc: Junio C Hamano, Dave Ware, git On 01/20/2016 05:05 AM, David A. Greene wrote: > Marcus Brinkmann <m.brinkmann@semantics.de> writes: > >> 'git subtree split' will fail if the history of the subtree has empty >> tree commits (or trees that are considered empty, such as submodules). >> This fix keeps track of this condition and correctly follows the history >> over such commits. > > Thanks for working on this! Please add a test to t7900-subtree.sh. I couldn't get the tests to run and I couldn't find documentation on how to run it. If you enlighten me I can add a test :) >> @@ -625,12 +629,16 @@ cmd_split() >> >> # ugly. is there no better way to tell if this is a subtree >> # vs. a mainline commit? Does it matter? >> - if [ -z $tree ]; then >> - set_notree $rev >> - if [ -n "$newparents" ]; then >> - cache_set $rev $rev >> + if [ -z $found_first_commit ]; then >> + if [ -z $tree ]; then >> + set_notree $rev >> + if [ -n "$newparents" ]; then >> + cache_set $rev $rev >> + fi >> + continue >> + else >> + found_first_commit=yes >> fi >> - continue >> fi >> >> newrev=$(copy_or_skip "$rev" "$tree" "$newparents") || exit $? > > Can you explain the logic here? The old code appears to be using the > lack of a tree to filter out "mainline" commits from the subtree history > when splitting. If that test is only done before seeing a proper > subtree commit and never after, then any commit mainline commit > following the first subtree commit in the rev list will miss being > marked with set_notree and the cache will not have the identity entry > added. > > Test #36 in t7900-subtree.sh has a mainline commit listed after the > first subtree commit in the rev list, I believe. > > I'm not positive your change is wrong, I'd just like to understand it > better. I'd also like a comment explaining why it works so future > developers don't get confused. Overall, I am trying to better comment > the code as I make my own changes. It's possible the patch does not work for some cases. For example, I don't know how the rejoin variant of splits work. Some observations: 1) The notree list is never actually used except to identify which commits have been visited in check_parents. 2) I have no idea what use case is covered by the "if [ -n "$newparents" ]; then cache_set $rev $rev; fi". I left it in purely for traditional reasons. So, clarifying that would go a long way in understanding the code, and if there is a test for that, I will figure it out. 3a) The bug happens because on the first commit that deletes the subdir, newparents will not be empty, and the "cache_set $rev $rev" will kick in and subsequently (when the subdir is added again) the history will divert into the $rev commit which is not rewritten, but part of the unsplit tree. This seems very wrong to me! See 2). 3b) To be very clear: It seems logically inconsistent to me to ever call set_notree and cache_set on the same rev. It also seems logically inconsistent to me to call cache_set rev1 rev2 where rev2 is not rewritten. Both seem to be invariant errors that could be caught by assertions. They probably should. In fact, I think my patch makes the questionable if-case to be dead code, because newparents is never non-empty before found_first_commit is true. As such, I think it could be eliminated. But I am not 100% sure, as I don't know the intention of the original code. 4) My patch only preserves the special handling of empty trees up to the first commit that introduces subdir, because we don't want an empty commit at the beginning. After that, empty subdirs are not special at all - the empty tree is replaced by EMPTY_TREE and handled as if it's a normal subdir commit. copy_and_skip will do the right thing. 5) I didn't test any case with multiple parents (merge commits). There are several of those cases (merge commits into empty subdirs, branches with different non-empty subdirs from empty ones), and they don't apply to my use case (git-svn conversion). I read the copy_and_skip code and see that it optimizes some of those cases, and although I didn't see an obvious problem, I didn't think too deeply about it. Thanks, Marcus -- s<e>mantics GmbH Viktoriaallee 45 52066 Aachen Web: www.semantics.de Registergericht : Amtsgericht Aachen, HRB 8189 Geschäftsführer : Kay Heiligenhaus M.A. Dipl. Ing. José de la Rosa ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] contrib/subtree: Split history with empty trees correctly 2016-01-20 11:22 ` Marcus Brinkmann @ 2016-01-28 2:55 ` David A. Greene 0 siblings, 0 replies; 12+ messages in thread From: David A. Greene @ 2016-01-28 2:55 UTC (permalink / raw) To: Marcus Brinkmann; +Cc: Junio C Hamano, Dave Ware, git [ Sorry it took a few days to reply. I am absolutely slammed at work and will be for the next few weeks at least. The good news is that it's resulting in some nice work on git-subtree! :) ] Marcus Brinkmann <m.brinkmann@semantics.de> writes: > On 01/20/2016 05:05 AM, David A. Greene wrote: >> Marcus Brinkmann <m.brinkmann@semantics.de> writes: >> >>> 'git subtree split' will fail if the history of the subtree has empty >>> tree commits (or trees that are considered empty, such as submodules). >>> This fix keeps track of this condition and correctly follows the history >>> over such commits. >> >> Thanks for working on this! Please add a test to t7900-subtree.sh. > > I couldn't get the tests to run and I couldn't find documentation on how > to run it. If you enlighten me I can add a test :) Just run "make test" in contrib/subtree. You have to build git first. >> Can you explain the logic here? The old code appears to be using the >> lack of a tree to filter out "mainline" commits from the subtree history >> when splitting. If that test is only done before seeing a proper >> subtree commit and never after, then any commit mainline commit >> following the first subtree commit in the rev list will miss being >> marked with set_notree and the cache will not have the identity entry >> added. >> >> Test #36 in t7900-subtree.sh has a mainline commit listed after the >> first subtree commit in the rev list, I believe. >> >> I'm not positive your change is wrong, I'd just like to understand it >> better. I'd also like a comment explaining why it works so future >> developers don't get confused. Overall, I am trying to better comment >> the code as I make my own changes. > > It's possible the patch does not work for some cases. For example, I > don't know how the rejoin variant of splits work. I'm not so much worried about catching all cases of the bug you identified, though it would be good if the patch did. I'm much more concerned about not causing a regression in existing functionality. > Some observations: > > 1) The notree list is never actually used except to identify which > commits have been visited in check_parents. It's really verifying that we visited parents before children in the split code, I think. That seems like a good check to keep. Let me make sure I understand your fix too. Are you essentially skipping empty commits when splitting? You original patch said that split failed but didn't say how. Did git-subtree spit out an error message, or did the failure manifest in some other way? If I knew the failure mode it might help me understand your changes better. I see you explain the proble below (thanks!) but I'd still like to know how you discovered it. It will help in constructing a test. > 2) I have no idea what use case is covered by the "if [ -n "$newparents" > ]; then cache_set $rev $rev; fi". I left it in purely for traditional > reasons. So, clarifying that would go a long way in understanding the > code, and if there is a test for that, I will figure it out. As far as I understand things, $newparents being non-empty means that the commits parents were split out and $newparents contains the hashes of the split commits, so that when this commit is split it can set up the proper parent links. If the commit doesn't have a tree in the subdirectory, then I *think* the split codesimply sets the identity entry in the cache so that any future commit that has this (empty) one as a parent will see it in the $newparents list. Since copy_or_skip checks for parents with empty trees and does not link split commits to them, I don't understand the purpose of including these commits in $newparents. So you may very well be right that it just doesn't matter if we skip these altogether. > 3a) The bug happens because on the first commit that deletes the subdir, > newparents will not be empty, and the "cache_set $rev $rev" will kick in > and subsequently (when the subdir is added again) the history will > divert into the $rev commit which is not rewritten, but part of the > unsplit tree. This seems very wrong to me! See 2). Ah, so the problem isn't empty commits per se, it's the fact that a subdirectory was deleted and re-added. That makes sense. I didn't understand that from your original commit message though I now remember you discussed it in the first e-mail you sent. It would be good to clarify this in the final commit. I agree that the behavior you describe is wrong. So the fix basically relies on the fact that there is some tree in the subdirectory, which later gets deleted, but since "found_first_commit" triggered, we'll just skip those empty commits and never see them in the cache so that when a tree appears again we won't link to mainline commits. Have I got it right? Currently the split code is all-or-nothing. You split the whole history or none of it. Eventually I want to make it flexible enough to allow splitting ranges of commits or individual commits. I'm wondering how your code will work if the split range starts or ends within the set of empty subdirectory commits. It's not something you really have to worry about since I'd deal with it when I write the code, but it's something that popped into my head. > 3b) To be very clear: It seems logically inconsistent to me to ever call > set_notree and cache_set on the same rev. It also seems logically > inconsistent to me to call cache_set rev1 rev2 where rev2 is not > rewritten. Both seem to be invariant errors that could be caught by > assertions. They probably should. In fact, I think my patch makes the > questionable if-case to be dead code, because newparents is never > non-empty before found_first_commit is true. As such, I think it could > be eliminated. But I am not 100% sure, as I don't know the intention of > the original code. After your walk-through and some exploring of the code, I think you are right. Like you I am not 100% sure but it would definitely be nice to clean this bit of code up. > 4) My patch only preserves the special handling of empty trees up to the > first commit that introduces subdir, because we don't want an empty > commit at the beginning. After that, empty subdirs are not special at > all - the empty tree is replaced by EMPTY_TREE and handled as if it's a > normal subdir commit. copy_and_skip will do the right thing. I agree. > 5) I didn't test any case with multiple parents (merge commits). There > are several of those cases (merge commits into empty subdirs, branches > with different non-empty subdirs from empty ones), and they don't apply > to my use case (git-svn conversion). I read the copy_and_skip code and > see that it optimizes some of those cases, and although I didn't see an > obvious problem, I didn't think too deeply about it. Yeah, that's definitely a concern. I've run some tests on the current git-subtree with merge commits and IIRC the results made sense. I should add those tests to the testbase. I'm definitely inclined to accept your patch after this discussion but I do want to see a few things: 1. A testcase 2. A more expanded commit message basically describing what you said in 3-4 above, plus a bit more from your original e-mail describing the subdirectory delete and re-creation. It will help people who come along later understand the history of the code. 3. Remove the questionable cache_set. I agree that it seems wrong and copy_and_skip ignores it anyway. Let's get rid of this cruft. Include something in the commit message about why this was done. Thanks for the thorough explanation and for your work on this. If you can re-roll with the above and the existing tests pass, I think we can pass this on to Junio. Looking forward to the re-roll! -David ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] contrib/subtree: Split history with empty trees correctly 2016-01-20 4:05 ` [PATCH] contrib/subtree: Split history with empty trees correctly David A. Greene 2016-01-20 11:22 ` Marcus Brinkmann @ 2016-01-24 13:07 ` Marcus Brinkmann 2016-01-28 2:56 ` David A. Greene 1 sibling, 1 reply; 12+ messages in thread From: Marcus Brinkmann @ 2016-01-24 13:07 UTC (permalink / raw) To: David A. Greene; +Cc: Junio C Hamano, Dave Ware, git With my patch, "git subtree split -P" produces the same result (for my data set) as "git filter-branch --subdirectory-filter", which is much faster, because it selects the revisions to rewrite before rewriting. As I am not using any of the advanced features of "git subtree", I will just use "git filter-branch" instead. Thanks! Marcus On 01/20/2016 05:05 AM, David A. Greene wrote: > Marcus Brinkmann <m.brinkmann@semantics.de> writes: > >> 'git subtree split' will fail if the history of the subtree has empty >> tree commits (or trees that are considered empty, such as submodules). >> This fix keeps track of this condition and correctly follows the history >> over such commits. > > Thanks for working on this! Please add a test to t7900-subtree.sh. > >> @@ -625,12 +629,16 @@ cmd_split() >> >> # ugly. is there no better way to tell if this is a subtree >> # vs. a mainline commit? Does it matter? >> - if [ -z $tree ]; then >> - set_notree $rev >> - if [ -n "$newparents" ]; then >> - cache_set $rev $rev >> + if [ -z $found_first_commit ]; then >> + if [ -z $tree ]; then >> + set_notree $rev >> + if [ -n "$newparents" ]; then >> + cache_set $rev $rev >> + fi >> + continue >> + else >> + found_first_commit=yes >> fi >> - continue >> fi >> >> newrev=$(copy_or_skip "$rev" "$tree" "$newparents") || exit $? > > Can you explain the logic here? The old code appears to be using the > lack of a tree to filter out "mainline" commits from the subtree history > when splitting. If that test is only done before seeing a proper > subtree commit and never after, then any commit mainline commit > following the first subtree commit in the rev list will miss being > marked with set_notree and the cache will not have the identity entry > added. > > Test #36 in t7900-subtree.sh has a mainline commit listed after the > first subtree commit in the rev list, I believe. > > I'm not positive your change is wrong, I'd just like to understand it > better. I'd also like a comment explaining why it works so future > developers don't get confused. Overall, I am trying to better comment > the code as I make my own changes. > > -David > -- s<e>mantics GmbH Viktoriaallee 45 52066 Aachen Web: www.semantics.de Registergericht : Amtsgericht Aachen, HRB 8189 Geschäftsführer : Kay Heiligenhaus M.A. Dipl. Ing. José de la Rosa ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] contrib/subtree: Split history with empty trees correctly 2016-01-24 13:07 ` Marcus Brinkmann @ 2016-01-28 2:56 ` David A. Greene 2016-01-28 4:06 ` Marcus Brinkmann 0 siblings, 1 reply; 12+ messages in thread From: David A. Greene @ 2016-01-28 2:56 UTC (permalink / raw) To: Marcus Brinkmann; +Cc: Junio C Hamano, Dave Ware, git Marcus Brinkmann <m.brinkmann@semantics.de> writes: > With my patch, "git subtree split -P" produces the same result (for my > data set) as "git filter-branch --subdirectory-filter", which is much > faster, because it selects the revisions to rewrite before rewriting. > As I am not using any of the advanced features of "git subtree", I will > just use "git filter-branch" instead. Heh. :) I hope to replace all that ugly split code with filter-branch as you describe but there are some cases where it differs. It may be that your changes fix some of that. Are you still able to do a re-roll on this? -David ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] contrib/subtree: Split history with empty trees correctly 2016-01-28 2:56 ` David A. Greene @ 2016-01-28 4:06 ` Marcus Brinkmann 2016-02-03 2:34 ` David A. Greene 0 siblings, 1 reply; 12+ messages in thread From: Marcus Brinkmann @ 2016-01-28 4:06 UTC (permalink / raw) To: David A. Greene; +Cc: Junio C Hamano, Dave Ware, git On 01/28/2016 03:56 AM, David A. Greene wrote: > Marcus Brinkmann <m.brinkmann@semantics.de> writes: > >> With my patch, "git subtree split -P" produces the same result (for my >> data set) as "git filter-branch --subdirectory-filter", which is much >> faster, because it selects the revisions to rewrite before rewriting. >> As I am not using any of the advanced features of "git subtree", I will >> just use "git filter-branch" instead. > > Heh. :) > > I hope to replace all that ugly split code with filter-branch as you > describe but there are some cases where it differs. It may be that your > changes fix some of that. > > Are you still able to do a re-roll on this? I have to admit that my interest has declined steeply since discovering that subtree-split and filter-branch --subtree-filter give different results from "git svn" on the subdirectory. The reason is that git-svn includes all commits for revisions that regular "svn log" gives on that directory, which includes commits that serve as branch points only or that are empty except for unhandled properties. While empty commits for unhandled properties wouldn't be fatal, missing branch points make "git svn" really unhappy when asked to rebuild .git/svn. As migration from SVN is my main motivation at this point to use a subtree filter at this point (git-svn is just very slow - about one week on our repository), I am somewhat stuck and back to using git-svn. Although hacking up something with filter-branch seems like a remote option, it's probably nothing that generalizes. It didn't help that "make test" in contrib/subtree gives me 27 out of 29 failed tests (with no indication how to figure out what exactly failed). Oh well :) Marcus ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] contrib/subtree: Split history with empty trees correctly 2016-01-28 4:06 ` Marcus Brinkmann @ 2016-02-03 2:34 ` David A. Greene 0 siblings, 0 replies; 12+ messages in thread From: David A. Greene @ 2016-02-03 2:34 UTC (permalink / raw) To: Marcus Brinkmann; +Cc: Junio C Hamano, Dave Ware, git Marcus Brinkmann <m.brinkmann@semantics.de> writes: >> Are you still able to do a re-roll on this? > > I have to admit that my interest has declined steeply since > discovering that subtree-split and filter-branch --subtree-filter give > different results from "git svn" on the subdirectory. The reason is > that git-svn includes all commits for revisions that regular "svn log" > gives on that directory, which includes commits that serve as branch > points only or that are empty except for unhandled properties. What do you mean by "branch points only?" It's ok if you can't do a reroll. I can't work on it right now but perhaps when I get back to cleaning up the split code I can take what you have and incoporate it. I do very much appreciate your work on this! > While empty commits for unhandled properties wouldn't be fatal, > missing branch points make "git svn" really unhappy when asked to > rebuild .git/svn. [ I may have misunderstood your intent, see below. ] I just want to make sure I understand your situation. You used git-svn to mirror a project to git and then used git-subtree to incorporate that mirror into a larger project? Why is the split being done? If there's an active Subversion repository being mirrors it's much better to commit changes back to the Subversion repository than to the git mirror. > As migration from SVN is my main motivation at this point to use a > subtree filter at this point (git-svn is just very slow - about one > week on our repository), I am somewhat stuck and back to using > git-svn. Although hacking up something with filter-branch seems like a > remote option, it's probably nothing that generalizes. Ok, maybe I misunderstood your situation. Are you converting one big repository via git-svn and then trying to break out individual directories into smaller projects? git-svn + git-subtree/git-filter-branch is not the best way to do that. svn-all-fast-export is far superior for a one-off conversion and makes splitting repositories a breeze. It happens during conversion rather than as a post-processing step. https://techbase.kde.org/Projects/MoveToGit/UsingSvn2Git > It didn't help that "make test" in contrib/subtree gives me 27 out of > 29 failed tests (with no indication how to figure out what exactly > failed). Huh. I don't know why that would happen. Did you build the git tools first? A testing run using --debug and --verbose (see the Makefile in contrib/subtree/t) would be informative. I understand if you don't have time to do that. I haven't seen such failures before so I'm curious as to what happened. -David ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-02-03 2:34 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-01-15 16:23 BUG: git subtree split gets confused on removed and readded directory Marcus Brinkmann 2016-01-15 23:44 ` Junio C Hamano 2016-01-17 19:34 ` David Ware 2016-01-17 23:23 ` David A. Greene 2016-01-20 1:17 ` [PATCH] contrib/subtree: Split history with empty trees correctly (was: Re: BUG: git subtree split gets confused on removed and readded directory) Marcus Brinkmann 2016-01-20 4:05 ` [PATCH] contrib/subtree: Split history with empty trees correctly David A. Greene 2016-01-20 11:22 ` Marcus Brinkmann 2016-01-28 2:55 ` David A. Greene 2016-01-24 13:07 ` Marcus Brinkmann 2016-01-28 2:56 ` David A. Greene 2016-01-28 4:06 ` Marcus Brinkmann 2016-02-03 2:34 ` David A. Greene
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).