git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Luke Diamand <luke@diamand.org>
To: Lex Spoon <lex@lexspoon.org>,
	git@vger.kernel.org, Pete Wyckoff <pw@padd.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v3] git-p4: Use -m when running p4 changes
Date: Mon, 20 Apr 2015 10:53:07 +0100	[thread overview]
Message-ID: <5534CC83.2000304@diamand.org> (raw)
In-Reply-To: <1429312285-13552-1-git-send-email-lex@lexspoon.org>

On 18/04/15 00:11, Lex Spoon wrote:
> Simply running "p4 changes" on a large branch can
> result in a "too many rows scanned" error from the
> Perforce server. It is better to use a sequence
> of smaller calls to "p4 changes", using the "-m"
> option to limit the size of each call.
>
> Signed-off-by: Lex Spoon <lex@lexspoon.org>
> Reviewed-by: Junio C Hamano <gitster@pobox.com>
> Reviewed-by: Luke Diamand <luke@diamand.org>

I could be wrong about this, but it looks like importNewBranches() is 
taking an extra argument, but that isn't reflected in the place where it 
gets called. I think it just got missed.

As a result, t9801-git-p4-branch.sh fails with this error:

Importing revision 3 (37%)
     Importing new branch depot/branch1
Traceback (most recent call last):
   File "/home/lgd/git/git/git-p4", line 3327, in <module>
     main()
   File "/home/lgd/git/git/git-p4", line 3321, in main
     if not cmd.run(args):
   File "/home/lgd/git/git/git-p4", line 3195, in run
     if not P4Sync.run(self, depotPaths):
   File "/home/lgd/git/git/git-p4", line 3057, in run
     self.importChanges(changes)
   File "/home/lgd/git/git/git-p4", line 2692, in importChanges
     if self.importNewBranch(branch, change - 1):
TypeError: importNewBranch() takes exactly 4 arguments (3 given)
rm: cannot remove `/home/lgd/git/git/t/trash 
directory.t9801-git-p4-branch/git/.git/objects/pack': Directory not empty
not ok 8 - import depot, branch detection, branchList branch definition


Thanks!
Luke


> ---
> Updated as suggested:
> - documentation added
> - avoided touch(1)
> - used test_seq
> - used || exit for test commands inside for loops
> - more tabs
> - fewer line breaks
> - expanded commit message
>
>   Documentation/git-p4.txt | 17 ++++++++++---
>   git-p4.py                | 54 +++++++++++++++++++++++++++++++---------
>   t/t9818-git-p4-block.sh  | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 120 insertions(+), 15 deletions(-)
>   create mode 100755 t/t9818-git-p4-block.sh
>
> diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
> index a1664b9..82aa5d6 100644
> --- a/Documentation/git-p4.txt
> +++ b/Documentation/git-p4.txt
> @@ -225,9 +225,20 @@ Git repository:
>   	they can find the p4 branches in refs/heads.
>
>   --max-changes <n>::
> -	Limit the number of imported changes to 'n'.  Useful to
> -	limit the amount of history when using the '@all' p4 revision
> -	specifier.
> +	Import at most 'n' changes, rather than the entire range of
> +	changes included in the given revision specifier. A typical
> +	usage would be use '@all' as the revision specifier, but then
> +	to use '--max-changes 1000' to import only the last 1000
> +	revisions rather than the entire revision history.
> +
> +--changes-block-size <n>::
> +	The internal block size to use when converting a revision
> +	specifier such as '@all' into a list of specific change
> +	numbers. Instead of using a single call to 'p4 changes' to
> +	find the full list of changes for the conversion, there are a
> +	sequence of calls to 'p4 changes -m', each of which requests
> +	one block of changes of the given size. The default block size
> +	is 500, which should usually be suitable.
>
>   --keep-path::
>   	The mapping of file names from the p4 depot path to Git, by
> diff --git a/git-p4.py b/git-p4.py
> index 549022e..1fba3aa 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -740,17 +740,43 @@ def createOrUpdateBranchesFromOrigin(localRefPrefix = "refs/remotes/p4/", silent
>   def originP4BranchesExist():
>           return gitBranchExists("origin") or gitBranchExists("origin/p4") or gitBranchExists("origin/p4/master")
>
> -def p4ChangesForPaths(depotPaths, changeRange):
> +def p4ChangesForPaths(depotPaths, changeRange, block_size):
>       assert depotPaths
> -    cmd = ['changes']
> -    for p in depotPaths:
> -        cmd += ["%s...%s" % (p, changeRange)]
> -    output = p4_read_pipe_lines(cmd)
> +    assert block_size
> +
> +    # Parse the change range into start and end
> +    if changeRange is None or changeRange == '':
> +        changeStart = '@1'
> +        changeEnd = '#head'
> +    else:
> +        parts = changeRange.split(',')
> +        assert len(parts) == 2
> +        changeStart = parts[0]
> +        changeEnd = parts[1]
>
> +    # Accumulate change numbers in a dictionary to avoid duplicates
>       changes = {}
> -    for line in output:
> -        changeNum = int(line.split(" ")[1])
> -        changes[changeNum] = True
> +
> +    for p in depotPaths:
> +        # Retrieve changes a block at a time, to prevent running
> +        # into a MaxScanRows error from the server.
> +        start = changeStart
> +        end = changeEnd
> +        get_another_block = True
> +        while get_another_block:
> +            new_changes = []
> +            cmd = ['changes']
> +            cmd += ['-m', str(block_size)]
> +            cmd += ["%s...%s,%s" % (p, start, end)]
> +            for line in p4_read_pipe_lines(cmd):
> +                changeNum = int(line.split(" ")[1])
> +                new_changes.append(changeNum)
> +                changes[changeNum] = True
> +            if len(new_changes) == block_size:
> +                get_another_block = True
> +                end = '@' + str(min(new_changes))
> +            else:
> +                get_another_block = False
>
>       changelist = changes.keys()
>       changelist.sort()
> @@ -1911,7 +1937,10 @@ class P4Sync(Command, P4UserMap):
>                   optparse.make_option("--import-labels", dest="importLabels", action="store_true"),
>                   optparse.make_option("--import-local", dest="importIntoRemotes", action="store_false",
>                                        help="Import into refs/heads/ , not refs/remotes"),
> -                optparse.make_option("--max-changes", dest="maxChanges"),
> +                optparse.make_option("--max-changes", dest="maxChanges",
> +                                     help="Maximum number of changes to import"),
> +                optparse.make_option("--changes-block-size", dest="changes_block_size", type="int",
> +                                     help="Internal block size to use when iteratively calling p4 changes"),
>                   optparse.make_option("--keep-path", dest="keepRepoPath", action='store_true',
>                                        help="Keep entire BRANCH/DIR/SUBDIR prefix during import"),
>                   optparse.make_option("--use-client-spec", dest="useClientSpec", action='store_true',
> @@ -1940,6 +1969,7 @@ class P4Sync(Command, P4UserMap):
>           self.syncWithOrigin = True
>           self.importIntoRemotes = True
>           self.maxChanges = ""
> +        self.changes_block_size = 500
>           self.keepRepoPath = False
>           self.depotPaths = None
>           self.p4BranchesInGit = []
> @@ -2578,7 +2608,7 @@ class P4Sync(Command, P4UserMap):
>
>           return ""
>
> -    def importNewBranch(self, branch, maxChange):
> +    def importNewBranch(self, branch, maxChange, changes_block_size):
>           # make fast-import flush all changes to disk and update the refs using the checkpoint
>           # command so that we can try to find the branch parent in the git history
>           self.gitStream.write("checkpoint\n\n");
> @@ -2586,7 +2616,7 @@ class P4Sync(Command, P4UserMap):
>           branchPrefix = self.depotPaths[0] + branch + "/"
>           range = "@1,%s" % maxChange
>           #print "prefix" + branchPrefix
> -        changes = p4ChangesForPaths([branchPrefix], range)
> +        changes = p4ChangesForPaths([branchPrefix], range, changes_block_size)
>           if len(changes) <= 0:
>               return False
>           firstChange = changes[0]
> @@ -3002,7 +3032,7 @@ class P4Sync(Command, P4UserMap):
>                   if self.verbose:
>                       print "Getting p4 changes for %s...%s" % (', '.join(self.depotPaths),
>                                                                 self.changeRange)
> -                changes = p4ChangesForPaths(self.depotPaths, self.changeRange)
> +                changes = p4ChangesForPaths(self.depotPaths, self.changeRange, self.changes_block_size)
>
>                   if len(self.maxChanges) > 0:
>                       changes = changes[:min(int(self.maxChanges), len(changes))]
> diff --git a/t/t9818-git-p4-block.sh b/t/t9818-git-p4-block.sh
> new file mode 100755
> index 0000000..153b20a
> --- /dev/null
> +++ b/t/t9818-git-p4-block.sh
> @@ -0,0 +1,64 @@
> +#!/bin/sh
> +
> +test_description='git p4 fetching changes in multiple blocks'
> +
> +. ./lib-git-p4.sh
> +
> +test_expect_success 'start p4d' '
> +	start_p4d
> +'
> +
> +test_expect_success 'Create a repo with ~100 changes' '
> +	(
> +		cd "$cli" &&
> +		>file.txt &&
> +		p4 add file.txt &&
> +		p4 submit -d "Add file.txt" &&
> +		for i in $(test_seq 0 9)
> +		do
> +			>outer$i.txt &&
> +			p4 add outer$i.txt &&
> +			p4 submit -d "Adding outer$i.txt" &&
> +			for j in $(test_seq 0 9)
> +			do
> +				p4 edit file.txt &&
> +				echo $i$j >file.txt &&
> +				p4 submit -d "Commit $i$j" || exit
> +			done || exit
> +		done
> +	)
> +'
> +
> +test_expect_success 'Clone the repo' '
> +	git p4 clone --dest="$git" --changes-block-size=10 --verbose //depot@all
> +'
> +
> +test_expect_success 'All files are present' '
> +	echo file.txt >expected &&
> +	test_write_lines outer0.txt outer1.txt outer2.txt outer3.txt outer4.txt >>expected &&
> +	test_write_lines outer5.txt outer6.txt outer7.txt outer8.txt outer9.txt >>expected &&
> +	ls "$git" >current &&
> +	test_cmp expected current
> +'
> +
> +test_expect_success 'file.txt is correct' '
> +	echo 99 >expected &&
> +	test_cmp expected "$git/file.txt"
> +'
> +
> +test_expect_success 'Correct number of commits' '
> +	(cd "$git" && git log --oneline) >log &&
> +	test_line_count = 111 log
> +'
> +
> +test_expect_success 'Previous version of file.txt is correct' '
> +	(cd "$git" && git checkout HEAD^^) &&
> +	echo 97 >expected &&
> +	test_cmp expected "$git/file.txt"
> +'
> +
> +test_expect_success 'kill p4d' '
> +	kill_p4d
> +'
> +
> +test_done
>

  reply	other threads:[~2015-04-20  9:53 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CALM2SnY62u3OXJOMSqSfghH_NYwZhzSedm3-wcde-dQCX6eB9Q@mail.gmail.com>
2015-04-14 11:25 ` [PATCH] git-p4: Use -m when running p4 changes Luke Diamand
     [not found]   ` <CALM2SnY=ZcSMSXk6Ks0uU65gPX5vC8QKG+iSrQxd3X7N=sw+Ww@mail.gmail.com>
2015-04-14 19:48     ` Lex Spoon
2015-04-15  3:47   ` Lex Spoon
2015-04-16 18:56     ` Junio C Hamano
2015-04-16 23:15     ` Luke Diamand
2015-04-17 13:20       ` Lex Spoon
2015-04-17 23:11         ` [PATCH v3] " Lex Spoon
2015-04-20  9:53           ` Luke Diamand [this message]
2015-04-20 14:30             ` Lex Spoon
2015-04-20 15:00               ` [PATCH v4] " Lex Spoon
2015-04-20 15:15                 ` Luke Diamand
2015-04-20 15:25                   ` Lex Spoon
2015-04-20 19:17                     ` Luke Diamand
2015-04-20 17:54                   ` Junio C Hamano
2015-04-20 18:04                     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5534CC83.2000304@diamand.org \
    --to=luke@diamand.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=lex@lexspoon.org \
    --cc=pw@padd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).