* [StGit PATCH 0/2] push optimizations @ 2008-07-02 6:12 Karl Hasselström 2008-07-02 6:12 ` [StGit PATCH 1/2] Do simple in-index merge with diff+apply instead of read-tree Karl Hasselström ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Karl Hasselström @ 2008-07-02 6:12 UTC (permalink / raw) To: Catalin Marinas; +Cc: git Here's the git-apply call you asked for. You were right: it was a huge speed-up. I set up a benchmark to test it: * 32 directories, each containing 32 subdirectories, each containing 32 small (and different) files. * A series of 250 patches that each add a line to one of the files. * An "upstream" that adds a line first in every file. * I set all this up with a python script feeding fast-import. A huge time-saver! * Pop patches, git-reset to upstream, then goto top patch. This makes sure that we use the new infrastructure to push, and that we get one file-level conflict in each patch. Before the first patch, the "goto" command took 4:27 minutes, wall-clock time. After the first patch, it took 1:31. After the second, 0:48; one second or so slower than the stable branch (which does not have a patch stack log). Available in kha/experimental. --- Karl Hasselström (2): Reuse the same temp index in a transaction Do simple in-index merge with diff+apply instead of read-tree stgit/lib/git.py | 52 +++++++++++++++++++++++++++++++--------------- stgit/lib/transaction.py | 12 ++++++++++- 2 files changed, 46 insertions(+), 18 deletions(-) -- Karl Hasselström, kha@treskal.com www.treskal.com/kalle ^ permalink raw reply [flat|nested] 9+ messages in thread
* [StGit PATCH 1/2] Do simple in-index merge with diff+apply instead of read-tree 2008-07-02 6:12 [StGit PATCH 0/2] push optimizations Karl Hasselström @ 2008-07-02 6:12 ` Karl Hasselström 2008-07-12 10:22 ` Catalin Marinas 2008-07-02 6:13 ` [StGit PATCH 2/2] Reuse the same temp index in a transaction Karl Hasselström 2008-07-07 21:12 ` [StGit PATCH 0/2] push optimizations Catalin Marinas 2 siblings, 1 reply; 9+ messages in thread From: Karl Hasselström @ 2008-07-02 6:12 UTC (permalink / raw) To: Catalin Marinas; +Cc: git The advantage is that patch application will resolve some file content conflicts for us, so that we'll fall back to merge-recursive less often. This is a significant speedup, especially since merge-recursive needs to touch the worktree, which means we have to check out the index first. (A simple test, pushing 250 patches in a 32k-file repository, with one file-level merge necessary per push, went from 1.07 to 0.36 seconds per patch with this patch applied.) Signed-off-by: Karl Hasselström <kha@treskal.com> --- stgit/lib/git.py | 13 ++++++++----- 1 files changed, 8 insertions(+), 5 deletions(-) diff --git a/stgit/lib/git.py b/stgit/lib/git.py index 6ccdfa7..a38eaa5 100644 --- a/stgit/lib/git.py +++ b/stgit/lib/git.py @@ -475,9 +475,10 @@ class Repository(RunWithEnv): return ours index = self.temp_index() + index.read_tree(ours) try: - index.merge(base, ours, theirs) try: + index.apply_treediff(base, theirs) return index.write_tree() except MergeException: return None @@ -548,10 +549,6 @@ class Index(RunWithEnv): return False else: return True - def merge(self, base, ours, theirs): - """In-index merge, no worktree involved.""" - self.run(['git', 'read-tree', '-m', '-i', '--aggressive', - base.sha1, ours.sha1, theirs.sha1]).no_output() def apply(self, patch_text): """In-index patch application, no worktree involved.""" try: @@ -559,6 +556,12 @@ class Index(RunWithEnv): ).raw_input(patch_text).no_output() except run.RunException: raise MergeException('Patch does not apply cleanly') + def apply_treediff(self, tree1, tree2): + """Apply the diff from C{tree1} to C{tree2} to the index.""" + # Passing --full-index here is necessary to support binary + # files. It is also sufficient, since the repository already + # contains all involved objects. + self.apply(self.__repository.diff_tree(tree1, tree2, ['--full-index'])) def delete(self): if os.path.isfile(self.__filename): os.remove(self.__filename) ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [StGit PATCH 1/2] Do simple in-index merge with diff+apply instead of read-tree 2008-07-02 6:12 ` [StGit PATCH 1/2] Do simple in-index merge with diff+apply instead of read-tree Karl Hasselström @ 2008-07-12 10:22 ` Catalin Marinas 0 siblings, 0 replies; 9+ messages in thread From: Catalin Marinas @ 2008-07-12 10:22 UTC (permalink / raw) To: Karl Hasselström; +Cc: git 2008/7/2 Karl Hasselström <kha@treskal.com>: > The advantage is that patch application will resolve some file content > conflicts for us, so that we'll fall back to merge-recursive less > often. This is a significant speedup, especially since merge-recursive > needs to touch the worktree, which means we have to check out the > index first. Thanks. -- Catalin ^ permalink raw reply [flat|nested] 9+ messages in thread
* [StGit PATCH 2/2] Reuse the same temp index in a transaction 2008-07-02 6:12 [StGit PATCH 0/2] push optimizations Karl Hasselström 2008-07-02 6:12 ` [StGit PATCH 1/2] Do simple in-index merge with diff+apply instead of read-tree Karl Hasselström @ 2008-07-02 6:13 ` Karl Hasselström 2008-07-03 21:38 ` [StGit PATCH v2] " Karl Hasselström 2008-07-12 10:24 ` [StGit PATCH 2/2] " Catalin Marinas 2008-07-07 21:12 ` [StGit PATCH 0/2] push optimizations Catalin Marinas 2 siblings, 2 replies; 9+ messages in thread From: Karl Hasselström @ 2008-07-02 6:13 UTC (permalink / raw) To: Catalin Marinas; +Cc: git Instead of making a new temp index every time we need one, just keep reusing the same one. And keep track of which tree is currently stored in it -- if we do several consecutive successful pushes, it's always going to be the "right" tree so that we don't have to call read-tree before each patch application. The motivation behind this change is of course that it makes things faster. (The same simple test as in the previous patch -- pushing 250 patches in a 32k-file repository, with one file-level merge necessary per push -- went from 0.36 to 0.19 seconds per patch with this patch applied.) Signed-off-by: Karl Hasselström <kha@treskal.com> --- stgit/lib/git.py | 43 +++++++++++++++++++++++++++++-------------- stgit/lib/transaction.py | 12 +++++++++++- 2 files changed, 40 insertions(+), 15 deletions(-) diff --git a/stgit/lib/git.py b/stgit/lib/git.py index a38eaa5..c98e919 100644 --- a/stgit/lib/git.py +++ b/stgit/lib/git.py @@ -459,31 +459,46 @@ class Repository(RunWithEnv): def set_head_ref(self, ref, msg): self.run(['git', 'symbolic-ref', '-m', msg, 'HEAD', ref]).no_output() def simple_merge(self, base, ours, theirs): + index = self.temp_index() + try: + result, index_tree = self.index_merge(base, ours, theirs, + index, None) + finally: + index.delete() + return result + def index_merge(self, base, ours, theirs, index, current): """Given three L{Tree}s, tries to do an in-index merge with a - temporary index. Returns the result L{Tree}, or None if the - merge failed (due to conflicts).""" + temporary index. Returns a pair: the result L{Tree}, or None + if the merge failed (due to conflicts); and the L{Tree} now + stored in the index. + + C{index} is the L{Index} object to use for the merge. + C{current} is the L{Tree} object currently stored in the given + index. If this is the same as C{ours}, some work is saved. + (C{current} may be C{None}, in which case this optimization is + disabled.)""" assert isinstance(base, Tree) assert isinstance(ours, Tree) assert isinstance(theirs, Tree) + assert isinstance(index, Index) + assert current == None or isinstance(current, Tree) # Take care of the really trivial cases. if base == ours: - return theirs + return (theirs, current) if base == theirs: - return ours + return (ours, current) if ours == theirs: - return ours + return (ours, current) - index = self.temp_index() - index.read_tree(ours) + if current != ours: + index.read_tree(ours) try: - try: - index.apply_treediff(base, theirs) - return index.write_tree() - except MergeException: - return None - finally: - index.delete() + index.apply_treediff(base, theirs) + result = index.write_tree() + return result, result + except MergeException: + return None, ours def apply(self, tree, patch_text): """Given a L{Tree} and a patch, will either return the new L{Tree} that results when the patch is applied, or None if the patch diff --git a/stgit/lib/transaction.py b/stgit/lib/transaction.py index e47997e..b4d4b75 100644 --- a/stgit/lib/transaction.py +++ b/stgit/lib/transaction.py @@ -1,6 +1,8 @@ """The L{StackTransaction} class makes it possible to make complex updates to an StGit stack in a safe and convenient way.""" +import atexit + from stgit import exception, utils from stgit.utils import any, all from stgit.out import * @@ -84,6 +86,7 @@ class StackTransaction(object): self.__allow_conflicts = lambda trans: allow_conflicts else: self.__allow_conflicts = allow_conflicts + self.__temp_index = self.temp_index_tree = None stack = property(lambda self: self.__stack) patches = property(lambda self: self.__patches) def __set_applied(self, val): @@ -97,6 +100,12 @@ class StackTransaction(object): or self.patches[self.applied[0]].data.parent == val) self.__base = val base = property(lambda self: self.__base, __set_base) + @property + def temp_index(self): + if not self.__temp_index: + self.__temp_index = self.__stack.repository.temp_index() + atexit.register(self.__temp_index.delete) + return self.__temp_index def __checkout(self, tree, iw): if not self.__stack.head_top_equal(): out.error( @@ -238,7 +247,8 @@ class StackTransaction(object): base = oldparent.data.tree ours = cd.parent.data.tree theirs = cd.tree - tree = self.__stack.repository.simple_merge(base, ours, theirs) + tree, self.temp_index_tree = self.__stack.repository.index_merge( + base, ours, theirs, self.temp_index, self.temp_index_tree) merge_conflict = False if not tree: if iw == None: ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [StGit PATCH v2] Reuse the same temp index in a transaction 2008-07-02 6:13 ` [StGit PATCH 2/2] Reuse the same temp index in a transaction Karl Hasselström @ 2008-07-03 21:38 ` Karl Hasselström 2008-07-12 10:24 ` [StGit PATCH 2/2] " Catalin Marinas 1 sibling, 0 replies; 9+ messages in thread From: Karl Hasselström @ 2008-07-03 21:38 UTC (permalink / raw) To: Catalin Marinas; +Cc: git Instead of making a new temp index every time we need one, just keep reusing the same one. And keep track of which tree is currently stored in it -- if we do several consecutive successful pushes, it's always going to be the "right" tree so that we don't have to call read-tree before each patch application. The motivation behind this change is of course that it makes things faster. (The same simple test as in the previous patch -- pushing 250 patches in a 32k-file repository, with one file-level merge necessary per push -- went from 0.36 to 0.19 seconds per patch with this patch applied.) Signed-off-by: Karl Hasselström <kha@treskal.com> --- Here's the same patch, but with Repository.index_merge() instead called Index.merge(). A much better fit, I think. Also some doc changes compared to v1. stgit/lib/git.py | 63 +++++++++++++++++++++++++++++++--------------- stgit/lib/transaction.py | 12 ++++++++- 2 files changed, 53 insertions(+), 22 deletions(-) diff --git a/stgit/lib/git.py b/stgit/lib/git.py index 9402606..8fb8bd2 100644 --- a/stgit/lib/git.py +++ b/stgit/lib/git.py @@ -459,31 +459,12 @@ class Repository(RunWithEnv): def set_head_ref(self, ref, msg): self.run(['git', 'symbolic-ref', '-m', msg, 'HEAD', ref]).no_output() def simple_merge(self, base, ours, theirs): - """Given three L{Tree}s, tries to do an in-index merge with a - temporary index. Returns the result L{Tree}, or None if the - merge failed (due to conflicts).""" - assert isinstance(base, Tree) - assert isinstance(ours, Tree) - assert isinstance(theirs, Tree) - - # Take care of the really trivial cases. - if base == ours: - return theirs - if base == theirs: - return ours - if ours == theirs: - return ours - index = self.temp_index() - index.read_tree(ours) try: - try: - index.apply_treediff(base, theirs) - return index.write_tree() - except MergeException: - return None + result, index_tree = index.merge(base, ours, theirs) finally: index.delete() + return result def apply(self, tree, patch_text): """Given a L{Tree} and a patch, will either return the new L{Tree} that results when the patch is applied, or None if the patch @@ -563,6 +544,46 @@ class Index(RunWithEnv): # contains all involved objects; in other words, we don't have # to use --binary. self.apply(self.__repository.diff_tree(tree1, tree2, ['--full-index'])) + def merge(self, base, ours, theirs, current = None): + """Use the index (and only the index) to do a 3-way merge of the + L{Tree}s C{base}, C{ours} and C{theirs}. The merge will either + succeed (in which case the first half of the return value is + the resulting tree) or fail cleanly (in which case the first + half of the return value is C{None}). + + If C{current} is given (and not C{None}), it is assumed to be + the L{Tree} currently stored in the index; this information is + used to avoid having to read the right tree (either of C{ours} + and C{theirs}) into the index if it's already there. The + second half of the return value is the tree now stored in the + index, or C{None} if unknown. If the merge succeeded, this is + often the merge result.""" + assert isinstance(base, Tree) + assert isinstance(ours, Tree) + assert isinstance(theirs, Tree) + assert current == None or isinstance(current, Tree) + + # Take care of the really trivial cases. + if base == ours: + return (theirs, current) + if base == theirs: + return (ours, current) + if ours == theirs: + return (ours, current) + + if current == theirs: + # Swap the trees. It doesn't matter since merging is + # symmetric, and will allow us to avoid the read_tree() + # call below. + ours, theirs = theirs, ours + if current != ours: + self.read_tree(ours) + try: + self.apply_treediff(base, theirs) + result = self.write_tree() + return (result, result) + except MergeException: + return (None, ours) def delete(self): if os.path.isfile(self.__filename): os.remove(self.__filename) diff --git a/stgit/lib/transaction.py b/stgit/lib/transaction.py index e47997e..74bc74d 100644 --- a/stgit/lib/transaction.py +++ b/stgit/lib/transaction.py @@ -1,6 +1,8 @@ """The L{StackTransaction} class makes it possible to make complex updates to an StGit stack in a safe and convenient way.""" +import atexit + from stgit import exception, utils from stgit.utils import any, all from stgit.out import * @@ -84,6 +86,7 @@ class StackTransaction(object): self.__allow_conflicts = lambda trans: allow_conflicts else: self.__allow_conflicts = allow_conflicts + self.__temp_index = self.temp_index_tree = None stack = property(lambda self: self.__stack) patches = property(lambda self: self.__patches) def __set_applied(self, val): @@ -97,6 +100,12 @@ class StackTransaction(object): or self.patches[self.applied[0]].data.parent == val) self.__base = val base = property(lambda self: self.__base, __set_base) + @property + def temp_index(self): + if not self.__temp_index: + self.__temp_index = self.__stack.repository.temp_index() + atexit.register(self.__temp_index.delete) + return self.__temp_index def __checkout(self, tree, iw): if not self.__stack.head_top_equal(): out.error( @@ -238,7 +247,8 @@ class StackTransaction(object): base = oldparent.data.tree ours = cd.parent.data.tree theirs = cd.tree - tree = self.__stack.repository.simple_merge(base, ours, theirs) + tree, self.temp_index_tree = self.temp_index.merge( + base, ours, theirs, self.temp_index_tree) merge_conflict = False if not tree: if iw == None: ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [StGit PATCH 2/2] Reuse the same temp index in a transaction 2008-07-02 6:13 ` [StGit PATCH 2/2] Reuse the same temp index in a transaction Karl Hasselström 2008-07-03 21:38 ` [StGit PATCH v2] " Karl Hasselström @ 2008-07-12 10:24 ` Catalin Marinas 2008-07-14 6:35 ` Karl Hasselström 1 sibling, 1 reply; 9+ messages in thread From: Catalin Marinas @ 2008-07-12 10:24 UTC (permalink / raw) To: Karl Hasselström; +Cc: git 2008/7/2 Karl Hasselström <kha@treskal.com>: > Instead of making a new temp index every time we need one, just keep > reusing the same one. And keep track of which tree is currently stored > in it -- if we do several consecutive successful pushes, it's always > going to be the "right" tree so that we don't have to call read-tree > before each patch application. > > The motivation behind this change is of course that it makes things > faster. > > (The same simple test as in the previous patch -- pushing 250 patches > in a 32k-file repository, with one file-level merge necessary per push > -- went from 0.36 to 0.19 seconds per patch with this patch applied.) That's an impressive improvement (together with the previous patch). Is this with the new patch log infrastructure? Thanks. -- Catalin ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [StGit PATCH 2/2] Reuse the same temp index in a transaction 2008-07-12 10:24 ` [StGit PATCH 2/2] " Catalin Marinas @ 2008-07-14 6:35 ` Karl Hasselström 0 siblings, 0 replies; 9+ messages in thread From: Karl Hasselström @ 2008-07-14 6:35 UTC (permalink / raw) To: Catalin Marinas; +Cc: git On 2008-07-12 11:24:24 +0100, Catalin Marinas wrote: > 2008/7/2 Karl Hasselström <kha@treskal.com>: > > > Instead of making a new temp index every time we need one, just > > keep reusing the same one. And keep track of which tree is > > currently stored in it -- if we do several consecutive successful > > pushes, it's always going to be the "right" tree so that we don't > > have to call read-tree before each patch application. > > > > The motivation behind this change is of course that it makes > > things faster. > > > > (The same simple test as in the previous patch -- pushing 250 > > patches in a 32k-file repository, with one file-level merge > > necessary per push -- went from 0.36 to 0.19 seconds per patch > > with this patch applied.) > > That's an impressive improvement (together with the previous patch). > Is this with the new patch log infrastructure? Yes, with all the bells and whistles -- in particular, with the overhead of the simplified log. (But note that it's a synthetic benchmark, and not a real linux-kernel patch series.) -- Karl Hasselström, kha@treskal.com www.treskal.com/kalle ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [StGit PATCH 0/2] push optimizations 2008-07-02 6:12 [StGit PATCH 0/2] push optimizations Karl Hasselström 2008-07-02 6:12 ` [StGit PATCH 1/2] Do simple in-index merge with diff+apply instead of read-tree Karl Hasselström 2008-07-02 6:13 ` [StGit PATCH 2/2] Reuse the same temp index in a transaction Karl Hasselström @ 2008-07-07 21:12 ` Catalin Marinas 2008-07-08 4:36 ` Karl Hasselström 2 siblings, 1 reply; 9+ messages in thread From: Catalin Marinas @ 2008-07-07 21:12 UTC (permalink / raw) To: Karl Hasselström; +Cc: git 2008/7/2 Karl Hasselström <kha@treskal.com>: > Here's the git-apply call you asked for. You were right: it was a huge > speed-up. I know, I've been through this couple of years ago :-) > I set up a benchmark to test it: > > * 32 directories, each containing 32 subdirectories, each containing > 32 small (and different) files. Can you try with a Linux kernel like the -mm tree? You get normally sized patches which might show a difference with the patch log. You can clone the for-akpm branch on git://linux-arm.org/linux-2.6 and just uncommit ~300 patches. > * I set all this up with a python script feeding fast-import. A huge > time-saver! What is fast-import? > > * Pop patches, git-reset to upstream, then goto top patch. This > makes sure that we use the new infrastructure to push, and that we > get one file-level conflict in each patch. > > Before the first patch, the "goto" command took 4:27 minutes, > wall-clock time. After the first patch, it took 1:31. After the > second, 0:48; one second or so slower than the stable branch (which > does not have a patch stack log). One second is just noise and depends on how warm the caches are. You could run a few times consecutively and discard the first result but we don't need to be that accurate. -- Catalin ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [StGit PATCH 0/2] push optimizations 2008-07-07 21:12 ` [StGit PATCH 0/2] push optimizations Catalin Marinas @ 2008-07-08 4:36 ` Karl Hasselström 0 siblings, 0 replies; 9+ messages in thread From: Karl Hasselström @ 2008-07-08 4:36 UTC (permalink / raw) To: Catalin Marinas; +Cc: git On 2008-07-07 22:12:26 +0100, Catalin Marinas wrote: > 2008/7/2 Karl Hasselström <kha@treskal.com>: > > > Here's the git-apply call you asked for. You were right: it was a > > huge speed-up. > > I know, I've been through this couple of years ago :-) :-P > > I set up a benchmark to test it: > > > > * 32 directories, each containing 32 subdirectories, each > > containing 32 small (and different) files. > > Can you try with a Linux kernel like the -mm tree? You get normally > sized patches which might show a difference with the patch log. You > can clone the for-akpm branch on git://linux-arm.org/linux-2.6 and > just uncommit ~300 patches. Sure, I'll do that. (But one of the reasons I chose a fully synthetic benchmark is that I wanted to start a performance suite similar to our test suite, and we want such a thing to be repeatable but not too large. (That said, operating on points of the kernel history that are fixed should do the trick as well. I'll try to find such points -- a long string of -mm patches somewhere in the history maybe?)) > > * I set all this up with a python script feeding fast-import. A > > huge time-saver! > > What is fast-import? man git-fast-import I'll try to clean up and publish the script I used. > > * Pop patches, git-reset to upstream, then goto top patch. This > > makes sure that we use the new infrastructure to push, and that > > we get one file-level conflict in each patch. > > > > Before the first patch, the "goto" command took 4:27 minutes, > > wall-clock time. After the first patch, it took 1:31. After the > > second, 0:48; one second or so slower than the stable branch > > (which does not have a patch stack log). > > One second is just noise and depends on how warm the caches are. You > could run a few times consecutively and discard the first result but > we don't need to be that accurate. I did run a few times, and it did indeed fluctuate some -- but I'm pretty sure there was a measurable slowdown. Though I agree that it's close enough that it doesn't really make a difference. -- Karl Hasselström, kha@treskal.com www.treskal.com/kalle ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-07-14 6:37 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-07-02 6:12 [StGit PATCH 0/2] push optimizations Karl Hasselström 2008-07-02 6:12 ` [StGit PATCH 1/2] Do simple in-index merge with diff+apply instead of read-tree Karl Hasselström 2008-07-12 10:22 ` Catalin Marinas 2008-07-02 6:13 ` [StGit PATCH 2/2] Reuse the same temp index in a transaction Karl Hasselström 2008-07-03 21:38 ` [StGit PATCH v2] " Karl Hasselström 2008-07-12 10:24 ` [StGit PATCH 2/2] " Catalin Marinas 2008-07-14 6:35 ` Karl Hasselström 2008-07-07 21:12 ` [StGit PATCH 0/2] push optimizations Catalin Marinas 2008-07-08 4:36 ` Karl Hasselström
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).