From: Sam Hocevar <sam@zoy.org>
To: git@vger.kernel.org
Subject: Re: [PATCH] git-p4: improve performance with large files
Date: Thu, 5 Mar 2009 18:23:33 +0100 [thread overview]
Message-ID: <20090305172332.GF25693@zoy.org> (raw)
In-Reply-To: <20090305100527.shmtfbdvk0ggsk4s@webmail.fussycoder.id.au>
On Thu, Mar 05, 2009, thestar@fussycoder.id.au wrote:
> > The current git-p4 way of concatenating strings performs in O(n^2)
> >and is therefore terribly slow with large files because of unnecessary
> >memory copies. The following patch makes the operation O(n).
>
> The reason why it uses simple concatenation is to cut down on memory usage.
> - It is a tradeoff.
>
> I think the modification you have made below is reasonable, however be
> aware that memory usage could double, which substantially reduce the
> size of the changesets that git-p4 would be able to import /at all/,
> rather than to merely be slow.
Uhm, no. The memory usage could be an additional X, where X is the
size of the biggest file in the commit. Remember that commit() stores
the complete commit data in memory before sending it to fast-import.
Also, on my machine the extra memory is already used because at some
point, "text += foo" calls realloc() anyway and often duplicates the
memory used by text.
The ideal solution is to use a generator and refactor the commit
handling as a stream. I am working on that but it involves deeper
changes, so as I am not sure it will be accepted, I'm providing the
attached compromise patch first. At least it solves the appaling speed
issue. I tuned it so that it never uses more than 32 MiB extra memory.
Signed-off-by: Sam Hocevar <sam@zoy.org>
---
contrib/fast-import/git-p4 | 10 +++++++++-
1 files changed, 9 insertions(+), 1 deletions(-)
diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
index 3832f60..151ae1c 100755
--- a/contrib/fast-import/git-p4
+++ b/contrib/fast-import/git-p4
@@ -984,11 +984,19 @@ class P4Sync(Command):
while j < len(filedata):
stat = filedata[j]
j += 1
+ data = []
text = ''
while j < len(filedata) and filedata[j]['code'] in ('text', 'unicod
e', 'binary'):
- text += filedata[j]['data']
+ data.append(filedata[j]['data'])
del filedata[j]['data']
+ # p4 sends 4k chunks, make sure we don't use more than 32 MiB
+ # of additional memory while rebuilding the file data.
+ if len(data) > 8192:
+ text += ''.join(data)
+ data = []
j += 1
+ text += ''.join(data)
+ del data
if not stat.has_key('depotFile'):
sys.stderr.write("p4 print fails with: %s\n" % repr(stat))
--
Sam.
next prev parent reply other threads:[~2009-03-05 17:50 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-04 21:54 [PATCH] git-p4: improve performance with large files Sam Hocevar
2009-03-04 23:05 ` thestar
2009-03-05 17:23 ` Sam Hocevar [this message]
2009-03-06 0:01 ` thestar
2009-03-06 1:14 ` Junio C Hamano
2009-03-06 1:25 ` Han-Wen Nienhuys
2009-03-06 8:53 ` Sam Hocevar
2009-03-06 9:42 ` Junio C Hamano
2009-03-06 10:13 ` [PATCH v4] " Sam Hocevar
2009-03-07 12:25 ` [PATCH v5] git-p4: improve performance when importing huge files by reducing the number of string concatenations while constraining memory usage Sam Hocevar
-- strict thread matches above, loose matches on Subject: below --
2009-03-06 15:53 [PATCH] git-p4: remove unnecessary semicolons at end of lines Sam Hocevar
2009-03-06 16:55 ` Brandon Casey
2009-03-06 17:11 ` msysgit corrupting commit messages? Sam Hocevar
2009-03-07 2:48 ` Johannes Schindelin
2009-03-07 12:26 ` [PATCH v2] git-p4: remove unnecessary semicolons at end of lines Sam Hocevar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090305172332.GF25693@zoy.org \
--to=sam@zoy.org \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.