git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "M. Buecher" <maddes+git@maddes.net>
To: git@vger.kernel.org
Subject: Re: [bug] git svn fetch: defect history - missing merges and wrong tag ancestors
Date: Sun, 03 Jan 2021 19:12:23 +0100	[thread overview]
Message-ID: <84b991e2c08c8d0d2154d2f0a2b1b3c5@mailbox.org> (raw)
In-Reply-To: <8d48309d968019307915432395e226fc@mailbox.org>

[-- Attachment #1: Type: text/plain, Size: 12697 bytes --]


On 2020-12-25 23:24, M. Buecher wrote:
> Dear all,
> 
> I finally had the time to start converting some older Subversion
> repositories to git repositiers and run into issues with repos using
> "parallel" branches, so-called vendor branches [1] in Subversion to
> track upstream changes via snapshots in a separate branch and merge
> them into the custom build on trunk.
> 
> Just wanting to convert the Subversion history as-is to git.
> I studied the git-svn reference docs [2] plus the related chapter of
> the ProGit book [3] and assume that I understood how git-svn works.
> Still I'm not a git expert, just a sporadic user.
> 
> Somehow `git svn fetch` (2.29.2.windows.3) looses merge information
> between vendor branch and trunk plus tags are referencing the
> predecessors instead of the original ancestor.
> Maybe there is a manual way to fix this for that small repository, but
> it wouldn't be feasible for larger repositories, that's why I deceided
> to write this bug report.
> Is there anything I missed? (see my procedure below)
> 
> Fortunately I ran quite early into these issues and also with a very
> small repository of just 11 commits, so I can provide a small
> reproduction case (see below after the links).
> Hoping you can enhance `git svn fetch`.

Tested further with re-created Subversion repositories, that either 
looked the same but made sure that svn:mergeinfo is present (Subversion 
 >=1.5 via "cherry pick merge", no "2-URL merge"), or that has the vendor branch being copied from trunk (both attached).
Only when the vendor branch was a copy from trunk, then svn-git got the 
merges correct. Otherwise - even with svn:mergeinfo - it does not get 
the merges.

Assumption:
It seems that git svn handles trunk and branches in a special way, but 
Subversion actually does not have branches.
In Subversion there are just directories and files, and a branch/tag is 
just a copy of another directory+revision and merges can happen between 
any directories independent if they have related ancestry or not.

Workaround:
As git svn does not recognize all merges correctly (especially 
cross-branch copies) those lost links must be added manually.
I wrote a small GNU awk script (attached) to determine all cross-branch 
copies from an `svnadmin dump`. This way the first revision when 
something got copied is known.
Running git svn just to that revision, then fixing the parents and 
continuing with git svn up the next revision.
This helps git svn to find the correct ancestors and correctly build the 
follow merges.
The parents can be changed with `git replace -f --graft <commit> 
<existing correct parents> <additional correct parents>`.
Additionally before continuing with git svn this `git replace` change 
can be made permanent with `git-filter-repo --force --replace-refs 
delete-no-add`.


> Any help is appreciated, thanks in advance
> Matthias Bücher
> 
> 
> [Links]
> [1]
> http://svnbook.red-bean.com/en/1.8/svn.advanced.vendorbr.html#svn.advanced.vendorbr.mirrored-sources
> [2] https://git-scm.com/docs/git-svn
> [3] 
> https://git-scm.com/book/en/v2/Git-and-Other-Systems-Migrating-to-Git
> 
> 
> [System Info]
> git version:
> git version 2.29.2.windows.3
> cpu: x86_64
> built from commit: d054eb1fc46ff23e7c95756a7c747e2f2864b478
> sizeof-long: 4
> sizeof-size_t: 8
> shell-path: /bin/sh
> uname: Windows 10.0 19042
> compiler info: gnuc: 10.2
> libc info: no libc information available
> $SHELL (typically, interactive shell): C:\Program 
> Files\Git\usr\bin\bash.exe
> 
> [Enabled Hooks]
> none
> 
> 
> [Commits in Subversion]
> A Subversion repository dump is attached, plus a test where I
> recreated the same history directly in a git repository without an
> issue.
> 
> * trunk:  a------------d--e--h--i--j--k
> *                     /     /
> * vendor: a (empty)--b-----f
> *                    ^     ^
> * tags:              c     g
> 
> * Vendor releases: b, f
> * Custom modifications: e, i, j, k
> * Tags: really just used as tags, although Subversion internally they
> are branches. Therefore wanting to create lightweight git tags,
> although annotated git tags would be fine too.
> 
> 
> [Expected Subversion to git repo conversion]
> * trunk => branch "main"
> * branches/* => branch "*"
> * tags/* => tag "*"
> * vendor/current => branch "vendor/current" (can be renamed later to
> just "vendor")
> * vendor/* => tag "vendor/*" (except for vendor/current)
> 
> Expected "tags":
> vendor/5.4 = rev b (maybe c when annotated)
> vendor/5.8 = rev f (maybe g when annotated)
> 
> 
> [Wrong Results]
> Merges from "vendor/current" branch to trunk get lost.
> Tags are referencing the predecessors of the expected commit.
> 
> 
> [Procedure]
> ```
> ### a) preparation
> cd /d/Coding
> #
> cd dd-formmailer
> svn log --xml --quiet | grep author | sort -u | perl -pe
> 's/.*>(.*?)<.*/$1 = /' > authors-transform.txt
> ## edit authors-transform.txt accordingly
> 
> ### b) git-svn adapted from ProGit book, but with "svn/" prefix
> cd /d/Coding
> git svn init --stdlayout --no-metadata --prefix="svn/" --
> 'svn+ssh://svn@vcs/dd-formmailer' dd-formmailer-git
> cd dd-formmailer-git
> ## edit .git/config for additional vendor branches and tags
> << __EOF
> ...
> [svn-remote "svn"]
> 	...
> 	fetch = trunk:refs/remotes/svn/trunk
> #	branches = vendor/current:refs/remotes/svn/vendor/current ##
> non-glob definition not working for branches
> 	branches = vendor/{current}:refs/remotes/svn/vendor/*
> 	branches = branches/*:refs/remotes/svn/*
> 	tags = vendor/*:refs/remotes/svn/tags/vendor/*
> 	tags = tags/*:refs/remotes/svn/tags/*
> __EOF
> ##
> cat .git/config
> git svn fetch --authors-file ../dd-formmailer/authors-transform.txt
> #
> git branch -vv --list ; git for-each-ref
> gitk --all &
> ```
> 
> [Log]
> $ cat .git/config
> [core]
>         repositoryformatversion = 0
>         filemode = false
>         bare = false
>         logallrefupdates = true
>         symlinks = false
>         ignorecase = true
> [svn-remote "svn"]
>         noMetadata = 1
>         url = svn+ssh://svn@vcs/dd-formmailer
>         fetch = trunk:refs/remotes/svn/trunk
>         branches = vendor/{current}:refs/remotes/svn/vendor/*
>         branches = branches/*:refs/remotes/svn/*
>         tags = vendor/*:refs/remotes/svn/tags/vendor/*
>         tags = tags/*:refs/remotes/svn/tags/*
> 
> $ git svn fetch --authors-file ../dd-formmailer/authors-transform.txt
> r1 = 158d53044a5628897379403647a19ea13594b532 
> (refs/remotes/svn/vendor/current)
>         A       _svn__guideline.txt
>         A       _svn_client_config.txt
>         A       _svn_dir_ignore_list.txt
> r1 = a795b654edbc296e6f38da398f032a4851fd0a9e (refs/remotes/svn/trunk)
>         A       dd-formmailer.css
>         A       dd-formmailer.php
>         A       lang/BrazilianPortuguese.php
>         A       lang/Catalan.php
>         A       lang/Danish.php
>         A       lang/Deutsch.php
>         A       lang/Dutch.php
>         A       lang/English.php
>         A       lang/Finnish.php
>         A       lang/French.php
>         A       lang/Greek.php
>         A       lang/Italian.php
>         A       lang/NorwegianBokmaal.php
>         A       lang/Polish.php
>         A       lang/Portuguese.php
>         A       lang/Romanian.php
>         A       lang/Russian.php
>         A       lang/Slovak.php
>         A       lang/Slovene.php
>         A       lang/Spanish.php
>         A       lang/Swedish.php
>         A       lang/Turkish.php
>         A       recaptchalib.php
> r2 = 8fd8668dbed2dde6b55306c71b3b629f5ed794ec 
> (refs/remotes/svn/vendor/current)
> Found possible branch point:
> svn+ssh://svn@vcs/dd-formmailer/vendor/current =>
> svn+ssh://svn@vcs/dd-formmailer/vendor/5.4, 1
> Found branch parent: (refs/remotes/svn/tags/vendor/5.4)
> 158d53044a5628897379403647a19ea13594b532
> Following parent with do_switch
>         A       dd-formmailer.css
>         A       dd-formmailer.php
>         A       lang/BrazilianPortuguese.php
>         A       lang/Catalan.php
>         A       lang/Danish.php
>         A       lang/Deutsch.php
>         A       lang/Dutch.php
>         A       lang/English.php
>         A       lang/Finnish.php
>         A       lang/French.php
>         A       lang/Greek.php
>         A       lang/Italian.php
>         A       lang/NorwegianBokmaal.php
>         A       lang/Polish.php
>         A       lang/Portuguese.php
>         A       lang/Romanian.php
>         A       lang/Russian.php
>         A       lang/Slovak.php
>         A       lang/Slovene.php
>         A       lang/Spanish.php
>         A       lang/Swedish.php
>         A       lang/Turkish.php
>         A       recaptchalib.php
> Successfully followed parent
> r3 = 7ab4f436cafc8af3ed2e727a6d8cbef1a8f8b39f 
> (refs/remotes/svn/tags/vendor/5.4)
>         A       dd-formmailer.css
>         A       dd-formmailer.php
>         A       lang/BrazilianPortuguese.php
>         A       lang/Catalan.php
>         A       lang/Danish.php
>         A       lang/Deutsch.php
>         A       lang/Dutch.php
>         A       lang/English.php
>         A       lang/Finnish.php
>         A       lang/French.php
>         A       lang/Greek.php
>         A       lang/Italian.php
>         A       lang/NorwegianBokmaal.php
>         A       lang/Polish.php
>         A       lang/Portuguese.php
>         A       lang/Romanian.php
>         A       lang/Russian.php
>         A       lang/Slovak.php
>         A       lang/Slovene.php
>         A       lang/Spanish.php
>         A       lang/Swedish.php
>         A       lang/Turkish.php
>         A       recaptchalib.php
> r4 = 7ec3663fd8e7c9fcfeb2742968a948d6978776d4 (refs/remotes/svn/trunk)
>         M       dd-formmailer.css
>         M       dd-formmailer.php
>         A       dd-verify.php
> r5 = e65ba2bdcd12a1935c1a327507dad6b7117f452b (refs/remotes/svn/trunk)
>         A       calendar.gif
>         A       date_chooser.js
>         M       dd-formmailer.css
>         M       dd-formmailer.php
>         A       lang/Belarussian.php
>         A       lang/Czech.php
>         A       lang/Estonian.php
>         A       lang/Japanese.php
>         A       lang/Vietnamese.php
>         M       recaptchalib.php
> r6 = 9f95df7ce49c5d1cb4715017d223a8bd1c8dcffc 
> (refs/remotes/svn/vendor/current)
> Found possible branch point:
> svn+ssh://svn@vcs/dd-formmailer/vendor/current =>
> svn+ssh://svn@vcs/dd-formmailer/vendor/5.8, 3
> Found branch parent: (refs/remotes/svn/tags/vendor/5.8)
> 8fd8668dbed2dde6b55306c71b3b629f5ed794ec
> Following parent with do_switch
>         A       calendar.gif
>         A       date_chooser.js
>         M       dd-formmailer.css
>         M       dd-formmailer.php
>         A       lang/Belarussian.php
>         A       lang/Czech.php
>         A       lang/Estonian.php
>         A       lang/Japanese.php
>         A       lang/Vietnamese.php
>         M       recaptchalib.php
> Successfully followed parent
> r7 = cc00cb187386298cf974dda69f151d2ad4795917 
> (refs/remotes/svn/tags/vendor/5.8)
>         A       calendar.gif
>         A       date_chooser.js
>         M       dd-formmailer.css
>         M       dd-formmailer.php
>         M       dd-verify.php
>         A       lang/Belarussian.php
>         A       lang/Czech.php
>         A       lang/Estonian.php
>         A       lang/Japanese.php
>         A       lang/Vietnamese.php
>         M       recaptchalib.php
> Checking svn:mergeinfo changes since r5: 1 sources, 1 changed
> W: Cannot find common ancestor between
> e65ba2bdcd12a1935c1a327507dad6b7117f452b and
> 9f95df7ce49c5d1cb4715017d223a8bd1c8dcffc. Ignoring merge info.
> r8 = 9a5cd8f55f377f469280171cd87219b8f528c693 (refs/remotes/svn/trunk)
>         M       dd-formmailer.php
> r9 = 8143782376d9fbc91a9181b367b72701205b017f (refs/remotes/svn/trunk)
>         M       dd-formmailer.php
> r10 = 6f563fc4c511ddcc53c2ffc5d26c1e116725bc6b (refs/remotes/svn/trunk)
>         M       _svn_client_config.txt
> r11 = e9130854178d2c2743a981f303cdd7f34e54b052 (refs/remotes/svn/trunk)
> svnserve: E210002: Network connection closed unexpectedly
> Checked out HEAD:
>   svn+ssh://svn@vcs/dd-formmailer/trunk r11
> 
> $ git branch -vv --list ; git for-each-ref
> * main e913085 Updated client config for Windows Scripts
> e9130854178d2c2743a981f303cdd7f34e54b052 commit refs/heads/main
> 7ab4f436cafc8af3ed2e727a6d8cbef1a8f8b39f commit 
> refs/remotes/svn/tags/vendor/5.4
> cc00cb187386298cf974dda69f151d2ad4795917 commit 
> refs/remotes/svn/tags/vendor/5.8
> e9130854178d2c2743a981f303cdd7f34e54b052 commit refs/remotes/svn/trunk
> 9f95df7ce49c5d1cb4715017d223a8bd1c8dcffc commit 
> refs/remotes/svn/vendor/current

[-- Attachment #2: dd-formmailer-mergeinfo.svndump.7z --]
[-- Type: application/x-7z-compressed, Size: 41410 bytes --]

[-- Attachment #3: dd-formmailer-branched.svndump.7z --]
[-- Type: application/x-7z-compressed, Size: 26244 bytes --]

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: analyze-svn-dump-for-cross-branch-copies.sh.txt --]
[-- Type: text/x-gawk; name=analyze-svn-dump-for-cross-branch-copies.sh.txt, Size: 5821 bytes --]

#!/bin/gawk -f
### GNU awk: https://www.gnu.org/software/gawk/manual/

### analyze-svn-dump-for-cross-branch-copies.sh
###
### Parameters:
### * csv=";" (or ",") - export main information as CSV
### * details=1 - see all copied pathes
###
### Copyright (C) 2021  Matthias Bücher, Germany <maddes@maddes.net>
###
### This program is free software: you can redistribute it and/or modify
### it under the terms of the GNU General Public License as published by
### the Free Software Foundation, either version 3 of the License, or
### (at your option) any later version.
###
### This program is distributed in the hope that it will be useful,
### but WITHOUT ANY WARRANTY; without even the implied warranty of
### MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
### GNU General Public License for more details.
###
### You should have received a copy of the GNU General Public License
### along with this program.  If not, see <https://www.gnu.org/licenses/>.


### ATTENTION! function code has to be adapted to the structure of the repository and its historical changes
function getBranchOfNodePath(nodepath,    branch) {
  branch = ""
  ### --- A) special cases
  ## /vendor/*
  if (match(nodepath, /^vendor\/[^/]*/)) {
    branch = substr(nodepath, RSTART, RLENGTH)
  }
  ## /tags/vendor/*
  else if (match(nodepath, /^tags\/vendor\/[^/]*/)) {
    branch = substr(nodepath, RSTART, RLENGTH)
  }
  ### --- B) standard layout: /branches/*, /tags/*, /trunk
  else if (match(nodepath, /^branches\/[^/]*/)) {
    branch = substr(nodepath, RSTART, RLENGTH)
  }
  else if (match(nodepath, /^tags\/[^/]*/)) {
    branch = substr(nodepath, RSTART, RLENGTH)
  }
  else if (match(nodepath, /^trunk\//)) {
    branch = substr(nodepath, RSTART, RLENGTH-1)
  }
  ### --- C) fallback: remove last component
  else {
    branch = gensub(/\/[^/]+$/, "", "", nodepath)
  }
  #
  return branch
}


BEGIN {
  FS = "\n"
}

BEGINFILE {
  ## initialize variables
  revision = ""
  nodepath = ""
  nodecopyrev = ""
  nodecopypath = ""
  ## initialize arrays
  delete cross_branch_copies
  delete cross_branch_copies_first
}

/^Revision-number: / {
  revision = $0
  sub(/^Revision-number: /, "", revision)
  sub(/^\s+/, "", revision)
  sub(/\s+$/, "", revision)
  #
  nodepath = ""
  nodecopyrev = ""
  nodecopypath = ""
}

/^Node-path: / {
  nodepath = $0
  sub(/^Node-path: /, "", nodepath)
  #
  nodecopyrev = ""
  nodecopypath = ""
}

/^Node-copyfrom-rev: / {
  nodecopyrev = $0
  sub(/^Node-copyfrom-rev: /, "", nodecopyrev)
}

/^Node-copyfrom-path: / {
  nodecopypath = $0
  sub(/^Node-copyfrom-path: /, "", nodecopypath)
  #
  branchfrom = getBranchOfNodePath(nodecopypath)
  branchto = getBranchOfNodePath(nodepath)
  #
  if (branchfrom != branchto) {
    cross_branch_copies[revision][branchfrom][branchto][nodepath]["nodecopypath"] = nodecopypath
    cross_branch_copies[revision][branchfrom][branchto][nodepath]["nodecopyrev"] = nodecopyrev
    if (!((branchfrom, branchto) in cross_branch_copies_first)) {
      cross_branch_copies_first[branchfrom, branchto] = revision
    }
  }
}

ENDFILE {
  foundrevs = length(cross_branch_copies)
  if (foundrevs == 0) {
    printf("=== %s: No revisions found with cross-branch svn copies\n", FILENAME)
  } else {
    if (csv) {
      backupofs=OFS
    }
    printf("=== %s: Found %i revisions with cross-branch svn copies\n", FILENAME, foundrevs)
    if (csv) {
      OFS=csv
      print("\"Revision\"", "\"Branch from\"", "\"Branch to\"")
      OFS=backupofs
    }
    PROCINFO["sorted_in"] = "@ind_num_asc"
    for (revision in cross_branch_copies) {
      if (!(csv)) {
        print(">>> Revision:", revision)
      }
      count = 0
      PROCINFO["sorted_in"] = "@ind_str_asc"
      for (branchfrom in cross_branch_copies[revision]) {
        for (branchto in cross_branch_copies[revision][branchfrom]) {
          if (csv) {
            csvrevision = revision
            csvbranchfrom = "\"" gensub(/"/, "\"\"", "g", branchfrom) "\""
            csvbranchto = "\"" gensub(/"/, "\"\"", "g", branchto) "\""
            OFS=csv
            print(csvrevision, csvbranchfrom, csvbranchto)
            OFS=backupofs
          } else {
            printf(" svn copy from \"%s\" to \"%s\"\n", branchfrom, branchto)
            if (details) {
              for (nodepath in cross_branch_copies[revision][branchfrom][branchto]) {
                count++
                printf("  %4i. \"%s\" (Revision %i) to \"%s\"\n", count, cross_branch_copies[revision][branchfrom][branchto][nodepath]["nodecopypath"], cross_branch_copies[revision][branchfrom][branchto][nodepath]["nodecopyrev"], nodepath)
              } ## nodepath
            } ## details
          } ## csv
        } ## branchto
      } ## branchfrom
    } ## revision
    #
    printf("--- %s: List of first revision of each cross-branch copy\n", FILENAME)
    PROCINFO["sorted_in"] = "@val_num_asc"
    if (csv) {
      OFS=csv
      print("\"Revision\"", "\"Branch from\"", "\"Branch to\"")
      OFS=backupofs
    }
    for (combined in cross_branch_copies_first) { ## combined: branchfrom, branchto
      split(combined, separate, SUBSEP)
      if (csv) {
        csvrevision = cross_branch_copies_first[combined]
        csvbranchfrom = "\"" gensub(/"/, "\"\"", "g", separate[1]) "\""
        csvbranchto = "\"" gensub(/"/, "\"\"", "g", separate[2]) "\""
        OFS=csv
        print(csvrevision, csvbranchfrom, csvbranchto)
        OFS=backupofs
      } else {
        printf(" svn copy from \"%s\" to \"%s\" first in revision %i\n", separate[1], separate[2], cross_branch_copies_first[combined])
      } ## csv
    } ## combined
    #
    count = length(cross_branch_copies_first)
    printf("^^^ %s: Found %i revisions (unique %i) with cross-branch svn copies\n", FILENAME, foundrevs, count)
  } ## foundrevs
}

  reply	other threads:[~2021-01-03 18:13 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-25 22:24 [bug] git svn fetch: defect history - missing merges and wrong tag ancestors M. Buecher
2021-01-03 18:12 ` M. Buecher [this message]
2021-02-07 20:24   ` [bug] git svn fetch: defect history - missing merges and wrong tag ancestors - third-party tool found M. Buecher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84b991e2c08c8d0d2154d2f0a2b1b3c5@mailbox.org \
    --to=maddes+git@maddes.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).