All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Joachim Kuebart via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Luke Diamand <luke@diamand.org>,
	Joachim Kuebart <joachim.kuebart@gmail.com>,
	Joachim Kuebart <joachim.kuebart@gmail.com>
Subject: [PATCH v2 0/2] git-p4: speed up search for branch parent
Date: Wed, 05 May 2021 11:56:24 +0000	[thread overview]
Message-ID: <pull.1013.v2.git.git.1620215786.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1013.git.git.1619640416533.gitgitgadget@gmail.com>

In this iteration, I have added more context and measurements to the commit
message.

I have also made small improvements to the code suggested by reviewers.

I enhanced t9801-git-p4-branch.sh to test for the functionality, namely that
branches are branched off at the correct point in their parents' history.

Signed-off-by: Joachim Kuebart joachim.kuebart@gmail.com

cc: Joachim Kuebart joachim.kuebart@gmail.com

Joachim Kuebart (2):
  git-p4: ensure complex branches are cloned correctly
  git-p4: speed up search for branch parent

 git-p4.py                | 21 ++++++++++-----------
 t/t9801-git-p4-branch.sh |  2 ++
 2 files changed, 12 insertions(+), 11 deletions(-)


base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1013%2Fjkuebart%2Fp4-faster-parent-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1013/jkuebart/p4-faster-parent-v2
Pull-Request: https://github.com/git/git/pull/1013

Range-diff vs v1:

 -:  ------------ > 1:  0ee0b7b55691 git-p4: ensure complex branches are cloned correctly
 1:  a171f7e6c023 ! 2:  41b3a23f682c git-p4: speed up search for branch parent
     @@ Metadata
       ## Commit message ##
          git-p4: speed up search for branch parent
      
     -    Previously, the code iterated through the parent branch commits and
     -    compared each one to the target tree using diff-tree.
     +    For every new branch that git-p4 imports, it needs to find the commit
     +    where it branched off its parent branch. While p4 doesn't record this
     +    information explicitly, the first changelist on a branch is usually an
     +    identical copy of the parent branch.
      
     -    This patch outputs the revision's tree hash along with the commit hash,
     -    thereby saving the diff-tree invocation. This results in a considerable
     -    speed-up, at least on Windows.
     +    The method searchParent() tries to find a commit in the history of the
     +    given "parent" branch whose tree exactly matches the initial changelist
     +    of the new branch, "target". The code iterates through the parent
     +    commits and compares each of them to this initial changelist using
     +    diff-tree.
     +
     +    Since we already know the tree object name we are looking for, spawning
     +    diff-tree for each commit is wasteful.
     +
     +    Use the "--format" option of "rev-list" to find out the tree object name
     +    of each commit in the history, and find the tree whose name is exactly
     +    the same as the tree of the target commit to optimize this.
     +
     +    This results in a considerable speed-up, at least on Windows. On one
     +    Windows machine with a fairly large repository of about 16000 commits in
     +    the parent branch, the current code takes over 7 minutes, while the new
     +    code only takes just over 10 seconds for the same changelist:
     +
     +    Before:
     +
     +        $ time git p4 sync
     +        Importing from/into multiple branches
     +        Depot paths: //depot
     +        Importing revision 31274 (100.0%)
     +        Updated branches: b1
     +
     +        real    7m41.458s
     +        user    0m0.000s
     +        sys     0m0.077s
     +
     +    After:
     +
     +        $ time git p4 sync
     +        Importing from/into multiple branches
     +        Depot paths: //depot
     +        Importing revision 31274 (100.0%)
     +        Updated branches: b1
     +
     +        real    0m10.235s
     +        user    0m0.000s
     +        sys     0m0.062s
      
          Signed-off-by: Joachim Kuebart <joachim.kuebart@gmail.com>
     +    Helped-by: Junio C Hamano <gitster@pobox.com>
     +    Helped-by: Luke Diamand <luke@diamand.org>
      
       ## git-p4.py ##
      @@ git-p4.py: def importNewBranch(self, branch, maxChange):
     @@ git-p4.py: def importNewBranch(self, branch, maxChange):
           def searchParent(self, parent, branch, target):
      -        parentFound = False
      -        for blob in read_pipe_lines(["git", "rev-list", "--reverse",
     -+        for tree in read_pipe_lines(["git", "rev-parse",
     -+                                     "{}^{{tree}}".format(target)]):
     -+            targetTree = tree.strip()
     -+        for blob in read_pipe_lines(["git", "rev-list", "--format=%H %T",
     ++        targetTree = read_pipe(["git", "rev-parse",
     ++                                "{}^{{tree}}".format(target)]).strip()
     ++        for line in read_pipe_lines(["git", "rev-list", "--format=%H %T",
                                            "--no-merges", parent]):
      -            blob = blob.strip()
      -            if len(read_pipe(["git", "diff-tree", blob, target])) == 0:
      -                parentFound = True
     -+            if blob[:7] == "commit ":
     ++            if line.startswith("commit "):
      +                continue
     -+            blob = blob.strip().split(" ")
     -+            if blob[1] == targetTree:
     ++            commit, tree = line.strip().split(" ")
     ++            if tree == targetTree:
                       if self.verbose:
      -                    print("Found parent of %s in commit %s" % (branch, blob))
      -                break
     @@ git-p4.py: def importNewBranch(self, branch, maxChange):
      -            return blob
      -        else:
      -            return None
     -+                    print("Found parent of %s in commit %s" % (branch, blob[0]))
     -+                return blob[0]
     ++                    print("Found parent of %s in commit %s" % (branch, commit))
     ++                return commit
      +        return None
       
           def importChanges(self, changes, origin_revision=0):

-- 
gitgitgadget

  parent reply	other threads:[~2021-05-05 11:56 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-28 20:06 [PATCH] git-p4: speed up search for branch parent Joachim Kuebart via GitGitGadget
2021-04-29  2:22 ` Junio C Hamano
2021-04-29  7:48   ` Joachim Kuebart
2021-04-29  8:22     ` Luke Diamand
2021-04-29  8:31       ` Junio C Hamano
2021-04-29 19:31         ` Joachim Kuebart
2021-04-29 11:30       ` Joachim Kuebart
2021-05-05 11:56 ` Joachim Kuebart via GitGitGadget [this message]
2021-05-05 11:56   ` [PATCH v2 1/2] git-p4: ensure complex branches are cloned correctly Joachim Kuebart via GitGitGadget
2021-05-05 11:56   ` [PATCH v2 2/2] git-p4: speed up search for branch parent Joachim Kuebart via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1013.v2.git.git.1620215786.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=joachim.kuebart@gmail.com \
    --cc=luke@diamand.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.