git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] git-p4: add support for large file systems
@ 2015-09-09 11:59 larsxschneider
  2015-09-09 11:59 ` [PATCH v4 1/5] git-p4: add optional type specifier to gitConfig reader larsxschneider
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: larsxschneider @ 2015-09-09 11:59 UTC (permalink / raw)
  To: git; +Cc: luke, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Diff to v3:
* add large file system mock class for testing (Thanks Luke!)
* add early exit for compressed threshold: If a file is smaller then the threshold then we don't need to compress it.

Thanks,
Lars

Lars Schneider (5):
  git-p4: add optional type specifier to gitConfig reader
  git-p4: add gitConfigInt reader
  git-p4: return an empty list if a list config has no values
  git-p4: add support for large file systems
  git-p4: add Git LFS backend for large file system

 Documentation/git-p4.txt   |  28 +++++
 git-p4.py                  | 223 +++++++++++++++++++++++++++++++++++++---
 t/t9823-git-p4-mock-lfs.sh | 106 +++++++++++++++++++
 t/t9824-git-p4-git-lfs.sh  | 249 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 593 insertions(+), 13 deletions(-)
 create mode 100755 t/t9823-git-p4-mock-lfs.sh
 create mode 100755 t/t9824-git-p4-git-lfs.sh

--
2.5.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 1/5] git-p4: add optional type specifier to gitConfig reader
  2015-09-09 11:59 [PATCH v4 0/5] git-p4: add support for large file systems larsxschneider
@ 2015-09-09 11:59 ` larsxschneider
  2015-09-09 11:59 ` [PATCH v4 2/5] git-p4: add gitConfigInt reader larsxschneider
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: larsxschneider @ 2015-09-09 11:59 UTC (permalink / raw)
  To: git; +Cc: luke, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

The functions “gitConfig” and “gitConfigBool” are almost identical. Make “gitConfig” more generic by adding an optional type specifier. Use the type specifier “—bool” with “gitConfig” to implement “gitConfigBool. This prepares the implementation of other type specifiers such as “—int”.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 git-p4.py | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 073f87b..c139cab 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -604,9 +604,12 @@ def gitBranchExists(branch):
 
 _gitConfig = {}
 
-def gitConfig(key):
+def gitConfig(key, typeSpecifier=None):
     if not _gitConfig.has_key(key):
-        cmd = [ "git", "config", key ]
+        cmd = [ "git", "config" ]
+        if typeSpecifier:
+            cmd += [ typeSpecifier ]
+        cmd += [ key ]
         s = read_pipe(cmd, ignore_error=True)
         _gitConfig[key] = s.strip()
     return _gitConfig[key]
@@ -617,10 +620,7 @@ def gitConfigBool(key):
        in the config."""
 
     if not _gitConfig.has_key(key):
-        cmd = [ "git", "config", "--bool", key ]
-        s = read_pipe(cmd, ignore_error=True)
-        v = s.strip()
-        _gitConfig[key] = v == "true"
+        _gitConfig[key] = gitConfig(key, '--bool') == "true"
     return _gitConfig[key]
 
 def gitConfigList(key):
-- 
2.5.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 2/5] git-p4: add gitConfigInt reader
  2015-09-09 11:59 [PATCH v4 0/5] git-p4: add support for large file systems larsxschneider
  2015-09-09 11:59 ` [PATCH v4 1/5] git-p4: add optional type specifier to gitConfig reader larsxschneider
@ 2015-09-09 11:59 ` larsxschneider
  2015-09-09 11:59 ` [PATCH v4 3/5] git-p4: return an empty list if a list config has no values larsxschneider
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: larsxschneider @ 2015-09-09 11:59 UTC (permalink / raw)
  To: git; +Cc: luke, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Add a git config reader for integer variables. Please note that the git config implementation automatically supports k, m, and g suffixes.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 git-p4.py | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/git-p4.py b/git-p4.py
index c139cab..40ad4ae 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -623,6 +623,17 @@ def gitConfigBool(key):
         _gitConfig[key] = gitConfig(key, '--bool') == "true"
     return _gitConfig[key]
 
+def gitConfigInt(key):
+    if not _gitConfig.has_key(key):
+        cmd = [ "git", "config", "--int", key ]
+        s = read_pipe(cmd, ignore_error=True)
+        v = s.strip()
+        try:
+            _gitConfig[key] = int(gitConfig(key, '--int'))
+        except ValueError:
+            _gitConfig[key] = None
+    return _gitConfig[key]
+
 def gitConfigList(key):
     if not _gitConfig.has_key(key):
         s = read_pipe(["git", "config", "--get-all", key], ignore_error=True)
-- 
2.5.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 3/5] git-p4: return an empty list if a list config has no values
  2015-09-09 11:59 [PATCH v4 0/5] git-p4: add support for large file systems larsxschneider
  2015-09-09 11:59 ` [PATCH v4 1/5] git-p4: add optional type specifier to gitConfig reader larsxschneider
  2015-09-09 11:59 ` [PATCH v4 2/5] git-p4: add gitConfigInt reader larsxschneider
@ 2015-09-09 11:59 ` larsxschneider
  2015-09-09 11:59 ` [PATCH v4 4/5] git-p4: add support for large file systems larsxschneider
  2015-09-09 11:59 ` [PATCH v4 5/5] git-p4: add Git LFS backend for large file system larsxschneider
  4 siblings, 0 replies; 9+ messages in thread
From: larsxschneider @ 2015-09-09 11:59 UTC (permalink / raw)
  To: git; +Cc: luke, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 git-p4.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/git-p4.py b/git-p4.py
index 40ad4ae..90d3b90 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -638,6 +638,8 @@ def gitConfigList(key):
     if not _gitConfig.has_key(key):
         s = read_pipe(["git", "config", "--get-all", key], ignore_error=True)
         _gitConfig[key] = s.strip().split(os.linesep)
+        if _gitConfig[key] == ['']:
+            _gitConfig[key] = []
     return _gitConfig[key]
 
 def p4BranchesInGit(branchesAreInRemotes=True):
-- 
2.5.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 4/5] git-p4: add support for large file systems
  2015-09-09 11:59 [PATCH v4 0/5] git-p4: add support for large file systems larsxschneider
                   ` (2 preceding siblings ...)
  2015-09-09 11:59 ` [PATCH v4 3/5] git-p4: return an empty list if a list config has no values larsxschneider
@ 2015-09-09 11:59 ` larsxschneider
  2015-09-09 17:20   ` Junio C Hamano
  2015-09-09 11:59 ` [PATCH v4 5/5] git-p4: add Git LFS backend for large file system larsxschneider
  4 siblings, 1 reply; 9+ messages in thread
From: larsxschneider @ 2015-09-09 11:59 UTC (permalink / raw)
  To: git; +Cc: luke, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Perforce repositories can contain large (binary) files. Migrating these
repositories to Git generates very large local clones. External storage
systems such as Git LFS [1], Git Fat [2], or Git Media [2] try to
address this problem.

Add a generic mechanism to detect large files based on extension,
uncompressed size, and/or compressed size.

[1] https://git-lfs.github.com/
[2] https://github.com/jedbrown/git-fat
[3] https://github.com/alebedev/git-media

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 Documentation/git-p4.txt   |  26 ++++++++
 git-p4.py                  | 150 ++++++++++++++++++++++++++++++++++++++++++---
 t/t9823-git-p4-mock-lfs.sh | 106 ++++++++++++++++++++++++++++++++
 3 files changed, 273 insertions(+), 9 deletions(-)
 create mode 100755 t/t9823-git-p4-mock-lfs.sh

diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
index 82aa5d6..e0d0239 100644
--- a/Documentation/git-p4.txt
+++ b/Documentation/git-p4.txt
@@ -510,6 +510,32 @@ git-p4.useClientSpec::
 	option '--use-client-spec'.  See the "CLIENT SPEC" section above.
 	This variable is a boolean, not the name of a p4 client.
 
+git-p4.largeFileSystem::
+	Specify the system that is used used for large (binary) files.
+	Please note that most of these large file systems depend on the
+	Git clean/smudge filters. These filters are not applied through
+	git-p4. You need to create a fresh clone of the repository after
+	running git-p4.
+
+git-p4.largeFileExtensions::
+	All files matching a file extension in the list will be processed
+	by the large file system. Do not prefix the extensions with '.'.
+
+git-p4.largeFileThreshold::
+	All files with an uncompressed size exceeding the threshold will be
+	processed by the large file system. By default the threshold is
+	defined in bytes. Add the suffix k, m, or g to change the unit.
+
+git-p4.largeFileCompressedThreshold::
+	All files with a compressed size exceeding the threshold will be
+	processed by the large file system. This option might significantly
+	slow down your clone/sync process. By default the threshold is
+	defined in bytes. Add the suffix k, m, or g to change the unit.
+
+git-p4.pushLargeFiles::
+	Boolean variable which defines if large files are automatically
+	pushed to a server.
+
 Submit variables
 ~~~~~~~~~~~~~~~~
 git-p4.detectRenames::
diff --git a/git-p4.py b/git-p4.py
index 90d3b90..ada4174 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -22,6 +22,8 @@ import platform
 import re
 import shutil
 import stat
+import zipfile
+import zlib
 
 try:
     from subprocess import CalledProcessError
@@ -922,6 +924,50 @@ def wildcard_present(path):
     m = re.search("[*#@%]", path)
     return m is not None
 
+def largeFileSystem():
+    try:
+        largeFileSystem = getattr(sys.modules[__name__], gitConfig('git-p4.largeFileSystem'))
+        assert(hasattr(largeFileSystem, "attributeDescription"))
+        assert(hasattr(largeFileSystem, "attributeFilter"))
+        assert(hasattr(largeFileSystem, "generatePointer"))
+        assert(hasattr(largeFileSystem, "pushFile"))
+        return largeFileSystem
+    except AttributeError as e:
+        die('Large file system not supported: %s' % gitConfig('git-p4.largeFileSystem'))
+
+class MockLFS:
+    """Mock large file system for testing."""
+
+    @staticmethod
+    def attributeDescription():
+        return 'Mock LFS'
+
+    @staticmethod
+    def attributeFilter():
+        return 'mock'
+
+    @staticmethod
+    def generatePointer(cloneDestination, contentFile):
+        """The pointer content is the original content prefixed with "pointer-".
+           The local filename of the large file storage is derived from the file content.
+           """
+        with open(contentFile, 'r') as f:
+            content = next(f)
+            gitMode = '100644'
+            pointerContents = 'pointer-' + content
+            localLargeFile = os.path.join(cloneDestination, '.git', 'mock-storage', 'local', content[:-1])
+            return (gitMode, pointerContents, localLargeFile)
+
+    @staticmethod
+    def pushFile(localLargeFile):
+        """The remote filename of the large file storage is the same as the local
+           one but in a different directory.
+           """
+        remotePath = os.path.join(os.path.dirname(localLargeFile), '..', 'remote')
+        if not os.path.exists(remotePath):
+            os.makedirs(remotePath)
+        shutil.copyfile(localLargeFile, os.path.join(remotePath, os.path.basename(localLargeFile)))
+
 class Command:
     def __init__(self):
         self.usage = "usage: %prog [options]"
@@ -2038,6 +2084,7 @@ class P4Sync(Command, P4UserMap):
         self.clientSpecDirs = None
         self.tempBranches = []
         self.tempBranchLocation = "git-p4-tmp"
+        self.largeFiles = set()
 
         if gitConfig("git-p4.syncFromOrigin") == "false":
             self.syncWithOrigin = False
@@ -2158,6 +2205,88 @@ class P4Sync(Command, P4UserMap):
 
         return branches
 
+    def writeToGitStream(self, gitMode, relPath, contents):
+        self.gitStream.write('M %s inline %s\n' % (gitMode, relPath))
+        self.gitStream.write('data %d\n' % sum(len(d) for d in contents))
+        for d in contents:
+            self.gitStream.write(d)
+        self.gitStream.write('\n')
+
+    def writeGitAttributesToStream(self):
+        self.writeToGitStream(
+            '100644',
+            '.gitattributes',
+            [
+                '\n',
+                '#\n',
+                '# %s\n' % largeFileSystem().attributeDescription(),
+                '#\n',
+            ] +
+            ['*.' + f.replace(' ', '[:space:]') + ' filter=%s -text\n'
+                % largeFileSystem().attributeFilter()
+                for f in sorted(gitConfigList("git-p4.largeFileExtensions"))
+            ] +
+            ['/' + f.replace(' ', '[:space:]') + ' filter=%s -text\n'
+                % largeFileSystem().attributeFilter()
+                for f in sorted(self.largeFiles) if not self.hasLargeFileExtension(f)
+            ]
+        )
+
+    def hasLargeFileExtension(self, relPath):
+        return reduce(
+            lambda a, b: a or b,
+            [relPath.endswith('.' + e) for e in gitConfigList('git-p4.largeFileExtensions')],
+            False
+        )
+
+    def exceedsLargeFileThreshold(self, relPath, contents):
+        if gitConfigInt('git-p4.largeFileThreshold'):
+            contentsSize = sum(len(d) for d in contents)
+            if contentsSize > gitConfigInt('git-p4.largeFileThreshold'):
+                return True
+        if gitConfigInt('git-p4.largeFileCompressedThreshold'):
+            contentsSize = sum(len(d) for d in contents)
+            if contentsSize <= gitConfigInt('git-p4.largeFileCompressedThreshold'):
+                return False
+            contentFile = tempfile.NamedTemporaryFile(prefix='git-p4-large-file', delete=False)
+            for d in contents:
+                contentFile.write(d)
+            contentFile.flush()
+            compressedContentFile = tempfile.NamedTemporaryFile(prefix='git-p4-large-file', delete=False)
+            zf = zipfile.ZipFile(compressedContentFile.name, mode='w')
+            zf.write(contentFile.name, compress_type=zipfile.ZIP_DEFLATED)
+            zf.close()
+            compressedContentsSize = zf.infolist()[0].compress_size
+            os.remove(contentFile.name)
+            os.remove(compressedContentFile.name)
+            if compressedContentsSize > gitConfigInt('git-p4.largeFileCompressedThreshold'):
+                return True
+        return False
+
+    def moveToLargeFileSystem(self, relPath, contents):
+        # Write P4 content to temp file
+        contentFile = tempfile.NamedTemporaryFile(prefix='git-p4-large-file', delete=False)
+        for d in contents:
+            contentFile.write(d)
+        contentFile.flush()
+        contentFile.close()
+        (git_mode, contents, localLargeFile) = \
+            largeFileSystem().generatePointer(self.cloneDestination, contentFile.name)
+        # Move temp file to final location in large file system
+        largeFileDir = os.path.dirname(localLargeFile)
+        if not os.path.isdir(largeFileDir):
+            os.makedirs(largeFileDir)
+        shutil.move(contentFile.name, localLargeFile)
+        if verbose:
+            sys.stderr.write("%s moved to large file system (%s)\n" % (relPath, localLargeFile))
+
+        if gitConfigBool('git-p4.pushLargeFiles'):
+            largeFileSystem().pushFile(localLargeFile)
+
+        self.largeFiles.add(relPath)
+        self.writeGitAttributesToStream()
+        return (git_mode, contents)
+
     # output one file from the P4 stream
     # - helper for streamP4Files
 
@@ -2226,17 +2355,16 @@ class P4Sync(Command, P4UserMap):
             text = regexp.sub(r'$\1$', text)
             contents = [ text ]
 
-        self.gitStream.write("M %s inline %s\n" % (git_mode, relPath))
+        if relPath == '.gitattributes':
+            die('.gitattributes already exists in P4.')
 
-        # total length...
-        length = 0
-        for d in contents:
-            length = length + len(d)
+        if (gitConfig('git-p4.largeFileSystem') and
+            (   self.exceedsLargeFileThreshold(relPath, contents) or
+                self.hasLargeFileExtension(relPath)
+            )):
+            (git_mode, contents) = self.moveToLargeFileSystem(relPath, contents)
 
-        self.gitStream.write("data %d\n" % length)
-        for d in contents:
-            self.gitStream.write(d)
-        self.gitStream.write("\n")
+        self.writeToGitStream(git_mode, relPath, contents)
 
     def streamOneP4Deletion(self, file):
         relPath = self.stripRepoPath(file['path'], self.branchPrefixes)
@@ -2244,6 +2372,10 @@ class P4Sync(Command, P4UserMap):
             sys.stderr.write("delete %s\n" % relPath)
         self.gitStream.write("D %s\n" % relPath)
 
+        if relPath in self.largeFiles:
+            self.largeFiles.remove(relPath)
+            self.writeGitAttributesToStream()
+
     # handle another chunk of streaming data
     def streamP4FilesCb(self, marshalled):
 
diff --git a/t/t9823-git-p4-mock-lfs.sh b/t/t9823-git-p4-mock-lfs.sh
new file mode 100755
index 0000000..6e4fd54
--- /dev/null
+++ b/t/t9823-git-p4-mock-lfs.sh
@@ -0,0 +1,106 @@
+#!/bin/sh
+
+test_description='Clone repositories and store files in Mock LFS'
+
+. ./lib-git-p4.sh
+
+test_file_in_mock () {
+	FILE="$1"
+	CONTENT="$2"
+	LOCAL_STORAGE=".git/mock-storage/local/$CONTENT"
+	SERVER_STORAGE=".git/mock-storage/remote/$CONTENT"
+	echo "pointer-$CONTENT" >expect_pointer
+	echo "$CONTENT" >expect_content
+	test_path_is_file "$FILE" &&
+	test_path_is_file "$LOCAL_STORAGE" &&
+	test_path_is_file "$SERVER_STORAGE" &&
+	test_cmp expect_pointer "$FILE" &&
+	test_cmp expect_content "$LOCAL_STORAGE" &&
+	test_cmp expect_content "$SERVER_STORAGE"
+}
+
+test_file_count_in_dir () {
+	DIR="$1"
+	EXPECTED_COUNT="$2"
+	find "$DIR" -type f >actual
+	test_line_count = $EXPECTED_COUNT actual
+}
+
+test_expect_success 'start p4d' '
+	start_p4d
+'
+
+test_expect_success 'Create repo with binary files' '
+	client_view "//depot/... //client/..." &&
+	(
+		cd "$cli" &&
+
+		echo "content 1 txt 23 bytes" >file1.txt &&
+		p4 add file1.txt &&
+		echo "content 2-3 bin 25 bytes" >file2.dat &&
+		p4 add file2.dat &&
+		p4 submit -d "Add text and binary file" &&
+
+		mkdir "path with spaces" &&
+		echo "content 2-3 bin 25 bytes" >"path with spaces/file3.bin" &&
+		p4 add "path with spaces/file3.bin" &&
+		p4 submit -d "Add another binary file with same content and spaces in path" &&
+
+		echo "content 4 bin 26 bytes XX" >file4.bin &&
+		p4 add file4.bin &&
+		p4 submit -d "Add another binary file with different content"
+	)
+'
+
+test_expect_success 'Store files in Mock based on size (>24 bytes)' '
+	client_view "//depot/... //client/..." &&
+	test_when_finished cleanup_git &&
+	(
+		cd "$git" &&
+		git init . &&
+		git config git-p4.useClientSpec true &&
+		git config git-p4.largeFileSystem MockLFS &&
+		git config git-p4.largeFileThreshold 24 &&
+		git config git-p4.pushLargeFiles True &&
+		git p4 clone --destination="$git" //depot@all &&
+
+		test_file_in_mock file2.dat "content 2-3 bin 25 bytes" &&
+		test_file_in_mock "path with spaces/file3.bin" "content 2-3 bin 25 bytes" &&
+		test_file_in_mock file4.bin "content 4 bin 26 bytes XX" &&
+
+		test_file_count_in_dir ".git/mock-storage/local" 2 &&
+		test_file_count_in_dir ".git/mock-storage/remote" 2 &&
+
+		cat >expect <<-\EOF &&
+
+		#
+		# Mock LFS
+		#
+		/file2.dat filter=mock -text
+		/file4.bin filter=mock -text
+		/path[:space:]with[:space:]spaces/file3.bin filter=mock -text
+		EOF
+		test_path_is_file .gitattributes &&
+		test_cmp expect .gitattributes
+	)
+'
+
+test_expect_success 'Clone repo with existing .gitattributes file' '
+	client_view "//depot/... //client/..." &&
+	(
+		cd "$cli" &&
+
+		echo "*.txt text" >.gitattributes &&
+		p4 add .gitattributes &&
+		p4 submit -d "Add .gitattributes"
+	) &&
+
+	test_must_fail git p4 clone --use-client-spec --destination="$git" //depot 2>errs &&
+	grep ".gitattributes already exists in P4." errs
+'
+
+test_expect_success 'kill p4d' '
+	kill_p4d
+'
+
+test_done
-- 
2.5.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 5/5] git-p4: add Git LFS backend for large file system
  2015-09-09 11:59 [PATCH v4 0/5] git-p4: add support for large file systems larsxschneider
                   ` (3 preceding siblings ...)
  2015-09-09 11:59 ` [PATCH v4 4/5] git-p4: add support for large file systems larsxschneider
@ 2015-09-09 11:59 ` larsxschneider
  4 siblings, 0 replies; 9+ messages in thread
From: larsxschneider @ 2015-09-09 11:59 UTC (permalink / raw)
  To: git; +Cc: luke, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Add example implementation including test cases for the large file
system using Git LFS.

Pushing files to the Git LFS server is not tested.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 Documentation/git-p4.txt  |   4 +-
 git-p4.py                 |  52 ++++++++++
 t/t9824-git-p4-git-lfs.sh | 249 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 304 insertions(+), 1 deletion(-)
 create mode 100755 t/t9824-git-p4-git-lfs.sh

diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
index e0d0239..3168a64 100644
--- a/Documentation/git-p4.txt
+++ b/Documentation/git-p4.txt
@@ -515,7 +515,9 @@ git-p4.largeFileSystem::
 	Please note that most of these large file systems depend on the
 	Git clean/smudge filters. These filters are not applied through
 	git-p4. You need to create a fresh clone of the repository after
-	running git-p4.
+	running git-p4. Only Git LFS [1] is supported right now. Download
+	and install the Git LFS command line extension to use this option.
+	[1] https://git-lfs.github.com/
 
 git-p4.largeFileExtensions::
 	All files matching a file extension in the list will be processed
diff --git a/git-p4.py b/git-p4.py
index ada4174..58bdaf1 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -968,6 +968,58 @@ class MockLFS:
             os.makedirs(remotePath)
         shutil.copyfile(localLargeFile, os.path.join(remotePath, os.path.basename(localLargeFile)))
 
+class GitLFS:
+    """Git LFS as backend for the git-p4 large file system.
+       See https://git-lfs.github.com/ for details."""
+
+    @staticmethod
+    def attributeDescription():
+        """Return a description which is used to mark LFS entries in the
+           .gitattributes file."""
+        return 'Git LFS (see https://git-lfs.github.com/)'
+
+    @staticmethod
+    def attributeFilter():
+        """Return the name of the filter which is used for LFS entries in the
+           .gitattributes file."""
+        return 'lfs'
+
+    @staticmethod
+    def generatePointer(cloneDestination, contentFile):
+        """Generate a Git LFS pointer for the content. Return LFS Pointer file
+           mode and content which is stored in the Git repository instead of
+           the actual content. Return also the new location of the actual
+           content.
+           """
+        pointerProcess = subprocess.Popen(
+            ['git', 'lfs', 'pointer', '--file=' + contentFile],
+            stdout=subprocess.PIPE
+        )
+        pointerFile = pointerProcess.stdout.read()
+        if pointerProcess.wait():
+            os.remove(contentFile)
+            die('git-lfs pointer command failed. Did you install the extension?')
+        pointerContents = [i+'\n' for i in pointerFile.split('\n')[2:][:-1]]
+        oid = pointerContents[1].split(' ')[1].split(':')[1][:-1]
+        localLargeFile = os.path.join(
+            cloneDestination,
+            '.git', 'lfs', 'objects', oid[:2], oid[2:4],
+            oid,
+        )
+        # LFS Spec states that pointer files should not have the executable bit set.
+        gitMode = '100644'
+        return (gitMode, pointerContents, localLargeFile)
+
+    @staticmethod
+    def pushFile(localLargeFile):
+        """Push the actual content which is not stored in the Git repository to
+        a server."""
+        uploadProcess = subprocess.Popen(
+            ['git', 'lfs', 'push', '--object-id', 'origin', os.path.basename(localLargeFile)]
+        )
+        if uploadProcess.wait():
+            die('git-lfs push command failed. Did you define a remote?')
+
 class Command:
     def __init__(self):
         self.usage = "usage: %prog [options]"
diff --git a/t/t9824-git-p4-git-lfs.sh b/t/t9824-git-p4-git-lfs.sh
new file mode 100755
index 0000000..f46768f
--- /dev/null
+++ b/t/t9824-git-p4-git-lfs.sh
@@ -0,0 +1,249 @@
+#!/bin/sh
+
+test_description='Clone repositories and store files in Git LFS'
+
+. ./lib-git-p4.sh
+
+git lfs help >/dev/null 2>&1 || {
+	skip_all='skipping git p4 Git LFS tests; Git LFS not found'
+	test_done
+}
+
+test_file_in_lfs () {
+	FILE="$1"
+	SIZE="$2"
+	EXPECTED_CONTENT="$3"
+	cat "$FILE" | grep "size $SIZE"
+	HASH=$(cat "$FILE" | grep "oid sha256:" | sed -e 's/oid sha256://g')
+	LFS_FILE=".git/lfs/objects/${HASH:0:2}/${HASH:2:2}/$HASH"
+	echo $EXPECTED_CONTENT >expect
+	test_path_is_file "$FILE" &&
+	test_path_is_file "$LFS_FILE" &&
+	test_cmp expect "$LFS_FILE"
+}
+
+test_file_count_in_dir () {
+	DIR="$1"
+	EXPECTED_COUNT="$2"
+	find "$DIR" -type f >actual
+	test_line_count = $EXPECTED_COUNT actual
+}
+
+test_expect_success 'start p4d' '
+	start_p4d
+'
+
+test_expect_success 'Create repo with binary files' '
+	client_view "//depot/... //client/..." &&
+	(
+		cd "$cli" &&
+
+		echo "content 1 txt 23 bytes" >file1.txt &&
+		p4 add file1.txt &&
+		echo "content 2-3 bin 25 bytes" >file2.dat &&
+		p4 add file2.dat &&
+		p4 submit -d "Add text and binary file" &&
+
+		mkdir "path with spaces" &&
+		echo "content 2-3 bin 25 bytes" >"path with spaces/file3.bin" &&
+		p4 add "path with spaces/file3.bin" &&
+		p4 submit -d "Add another binary file with same content and spaces in path" &&
+
+		echo "content 4 bin 26 bytes XX" >file4.bin &&
+		p4 add file4.bin &&
+		p4 submit -d "Add another binary file with different content"
+	)
+'
+
+test_expect_success 'Store files in LFS based on size (>24 bytes)' '
+	client_view "//depot/... //client/..." &&
+	test_when_finished cleanup_git &&
+	(
+		cd "$git" &&
+		git init . &&
+		git config git-p4.useClientSpec true &&
+		git config git-p4.largeFileSystem GitLFS &&
+		git config git-p4.largeFileThreshold 24 &&
+		git p4 clone --destination="$git" //depot@all &&
+
+		test_file_in_lfs file2.dat 25 "content 2-3 bin 25 bytes" &&
+		test_file_in_lfs "path with spaces/file3.bin" 25 "content 2-3 bin 25 bytes" &&
+		test_file_in_lfs file4.bin 26 "content 4 bin 26 bytes XX" &&
+
+		test_file_count_in_dir ".git/lfs/objects" 2 &&
+
+		cat >expect <<-\EOF &&
+
+		#
+		# Git LFS (see https://git-lfs.github.com/)
+		#
+		/file2.dat filter=lfs -text
+		/file4.bin filter=lfs -text
+		/path[:space:]with[:space:]spaces/file3.bin filter=lfs -text
+		EOF
+		test_path_is_file .gitattributes &&
+		test_cmp expect .gitattributes
+	)
+'
+
+test_expect_success 'Store files in LFS based on size (>25 bytes)' '
+	client_view "//depot/... //client/..." &&
+	test_when_finished cleanup_git &&
+	(
+		cd "$git" &&
+		git init . &&
+		git config git-p4.useClientSpec true &&
+		git config git-p4.largeFileSystem GitLFS &&
+		git config git-p4.largeFileThreshold 25 &&
+		git p4 clone --destination="$git" //depot@all &&
+
+		test_file_in_lfs file4.bin 26 "content 4 bin 26 bytes XX" &&
+		test_file_count_in_dir ".git/lfs/objects" 1 &&
+
+		cat >expect <<-\EOF &&
+
+		#
+		# Git LFS (see https://git-lfs.github.com/)
+		#
+		/file4.bin filter=lfs -text
+		EOF
+		test_path_is_file .gitattributes &&
+		test_cmp expect .gitattributes
+	)
+'
+
+test_expect_success 'Store files in LFS based on extension (dat)' '
+	client_view "//depot/... //client/..." &&
+	test_when_finished cleanup_git &&
+	(
+		cd "$git" &&
+		git init . &&
+		git config git-p4.useClientSpec true &&
+		git config git-p4.largeFileSystem GitLFS &&
+		git config git-p4.largeFileExtensions dat &&
+		git p4 clone --destination="$git" //depot@all &&
+
+		test_file_in_lfs file2.dat 25 "content 2-3 bin 25 bytes" &&
+		test_file_count_in_dir ".git/lfs/objects" 1 &&
+
+		cat >expect <<-\EOF &&
+
+		#
+		# Git LFS (see https://git-lfs.github.com/)
+		#
+		*.dat filter=lfs -text
+		EOF
+		test_path_is_file .gitattributes &&
+		test_cmp expect .gitattributes
+	)
+'
+
+test_expect_success 'Store files in LFS based on size (>25 bytes) and extension (dat)' '
+	client_view "//depot/... //client/..." &&
+	test_when_finished cleanup_git &&
+	(
+		cd "$git" &&
+		git init . &&
+		git config git-p4.useClientSpec true &&
+		git config git-p4.largeFileSystem GitLFS &&
+		git config git-p4.largeFileExtensions dat &&
+		git config git-p4.largeFileThreshold 25 &&
+		git p4 clone --destination="$git" //depot@all &&
+
+		test_file_in_lfs file2.dat 25 "content 2-3 bin 25 bytes" &&
+		test_file_in_lfs file4.bin 26 "content 4 bin 26 bytes XX" &&
+		test_file_count_in_dir ".git/lfs/objects" 2 &&
+
+		cat >expect <<-\EOF &&
+
+		#
+		# Git LFS (see https://git-lfs.github.com/)
+		#
+		*.dat filter=lfs -text
+		/file4.bin filter=lfs -text
+		EOF
+		test_path_is_file .gitattributes &&
+		test_cmp expect .gitattributes
+	)
+'
+
+test_expect_success 'Remove file from repo and store files in LFS based on size (>24 bytes)' '
+	client_view "//depot/... //client/..." &&
+	(
+		cd "$cli" &&
+		p4 delete file4.bin &&
+		p4 submit -d "Remove file"
+	) &&
+
+	client_view "//depot/... //client/..." &&
+	test_when_finished cleanup_git &&
+	(
+		cd "$git" &&
+		git init . &&
+		git config git-p4.useClientSpec true &&
+		git config git-p4.largeFileSystem GitLFS &&
+		git config git-p4.largeFileThreshold 24 &&
+		git p4 clone --destination="$git" //depot@all &&
+
+		test_file_in_lfs file2.dat 25 "content 2-3 bin 25 bytes" &&
+		test_file_in_lfs "path with spaces/file3.bin" 25 "content 2-3 bin 25 bytes" &&
+		! test_path_is_file file4.bin &&
+		test_file_count_in_dir ".git/lfs/objects" 2 &&
+
+		cat >expect <<-\EOF &&
+
+		#
+		# Git LFS (see https://git-lfs.github.com/)
+		#
+		/file2.dat filter=lfs -text
+		/path[:space:]with[:space:]spaces/file3.bin filter=lfs -text
+		EOF
+		test_path_is_file .gitattributes &&
+		test_cmp expect .gitattributes
+	)
+'
+
+test_expect_success 'Add big files to repo and store files in LFS based on compressed size (>28 bytes)' '
+	client_view "//depot/... //client/..." &&
+	(
+		cd "$cli" &&
+		echo "content 5 bin 40 bytes XXXXXXXXXXXXXXXX" >file5.bin &&
+		p4 add file5.bin &&
+		p4 submit -d "Add file with small footprint after compression" &&
+
+		echo "content 6 bin 39 bytes XXXXXYYYYYZZZZZ" >file6.bin &&
+		p4 add file6.bin &&
+		p4 submit -d "Add file with large footprint after compression"
+	) &&
+
+	client_view "//depot/... //client/..." &&
+	test_when_finished cleanup_git &&
+	(
+		cd "$git" &&
+		git init . &&
+		git config git-p4.useClientSpec true &&
+		git config git-p4.largeFileSystem GitLFS &&
+		git config git-p4.largeFileCompressedThreshold 28 &&
+		# We only import HEAD here ("@all" is missing!)
+		git p4 clone --destination="$git" //depot &&
+
+		test_file_in_lfs file6.bin 13 "content 6 bin 39 bytes XXXXXYYYYYZZZZZ"
+		test_file_count_in_dir ".git/lfs/objects" 1 &&
+
+		cat >expect <<-\EOF &&
+
+		#
+		# Git LFS (see https://git-lfs.github.com/)
+		#
+		/file6.bin filter=lfs -text
+		EOF
+		test_path_is_file .gitattributes &&
+		test_cmp expect .gitattributes
+	)
+'
+
+test_expect_success 'kill p4d' '
+	kill_p4d
+'
+
+test_done
-- 
2.5.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 4/5] git-p4: add support for large file systems
  2015-09-09 11:59 ` [PATCH v4 4/5] git-p4: add support for large file systems larsxschneider
@ 2015-09-09 17:20   ` Junio C Hamano
  2015-09-09 20:09     ` Lars Schneider
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2015-09-09 17:20 UTC (permalink / raw)
  To: larsxschneider; +Cc: git, luke

larsxschneider@gmail.com writes:

> @@ -2226,17 +2355,16 @@ class P4Sync(Command, P4UserMap):
>              text = regexp.sub(r'$\1$', text)
>              contents = [ text ]
>  
> -        self.gitStream.write("M %s inline %s\n" % (git_mode, relPath))
> +        if relPath == '.gitattributes':
> +            die('.gitattributes already exists in P4.')

This looks like an unfortunate limitation to me.

Is it really necessary that you need to reject unrelated attributes
the user has (presumably for a good reason)?  It seems to me that
you only need to _add_ entries to make file extension based decision
to send paths selectively to LFS?

Also the exact format of these attributes entries looks like very
closely tied to GitHub LFS and not generic (for example, there is no
reason to expect that any large-file support would always use the
"filter" mechanism or the gitattributes mechanism for that
matter), ...

+    def writeGitAttributesToStream(self):
+        self.writeToGitStream(
+            '100644',
+            '.gitattributes',
+            [
+                '\n',
+                '#\n',
+                '# %s\n' % largeFileSystem().attributeDescription(),
+                '#\n',
+            ] +
+            ['*.' + f.replace(' ', '[:space:]') + ' filter=%s -text\n'
+                % largeFileSystem().attributeFilter()
+                for f in sorted(gitConfigList("git-p4.largeFileExtensions"))
+            ] +
+            ['/' + f.replace(' ', '[:space:]') + ' filter=%s -text\n'
+                % largeFileSystem().attributeFilter()
+                for f in sorted(self.largeFiles) if not self.hasLargeFileExtension(f)
+            ]
+        )
+

... so while I can see the code like the above needs to exist
somewhere in "git p4" to support GitHub LFS, I am not sure if it
belongs to the generic part of the code.  For the same reason, I do
not know if these replacements with largeFileSystem().getters() are
really adding much value.

How is collaboration between those who talk to the same p4 depot
backed by GitHub LFS expected to work?  You use config to set size
limits and list of file extensions in your repository, and grab new
changes from p4 and turn them into Git commits (with pointers to LFS
and the .gitattributes file that records your choice of the config).
I as a new member to the same project come, clone the resulting Git
repository from you and then what do I do before talking to the same
p4 depot to further update the Git history?  Are the values recorded
in .gitattributes (which essentially were derived from your
configuration) somehow reflected back automatically to my config so
that our world view would become consistent?  Perhaps you added
'iso' to largeFileExtensions before I cloned from you, and that
would be present in the copy of .gitattributes I obtained from you.
I may be trying to add a new ".iso" file, and I presume that an
existing .gitattributes entry you created based on the file
extension automatically covers it from the LFS side, but does my
"git p4 submit" also know what to do, or would it be broken until I
add a set of configrations that matches yours?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 4/5] git-p4: add support for large file systems
  2015-09-09 17:20   ` Junio C Hamano
@ 2015-09-09 20:09     ` Lars Schneider
  2015-09-09 20:44       ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Lars Schneider @ 2015-09-09 20:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, luke


On 09 Sep 2015, at 19:20, Junio C Hamano <gitster@pobox.com> wrote:

> larsxschneider@gmail.com writes:
> 
>> @@ -2226,17 +2355,16 @@ class P4Sync(Command, P4UserMap):
>>             text = regexp.sub(r'$\1$', text)
>>             contents = [ text ]
>> 
>> -        self.gitStream.write("M %s inline %s\n" % (git_mode, relPath))
>> +        if relPath == '.gitattributes':
>> +            die('.gitattributes already exists in P4.')
> 
> This looks like an unfortunate limitation to me.
> 
> Is it really necessary that you need to reject unrelated attributes
> the user has (presumably for a good reason)?  It seems to me that
> you only need to _add_ entries to make file extension based decision
> to send paths selectively to LFS?
No, it is not necessary. I will remove this limitation.

> 
> Also the exact format of these attributes entries looks like very
> closely tied to GitHub LFS and not generic (for example, there is no
> reason to expect that any large-file support would always use the
> "filter" mechanism or the gitattributes mechanism for that
> matter), …
Agreed. Instead of just the filter name I will replace everything after the pathname with a single git-p4 config value:

['*.' + f.replace(' ', '[:space:]') + ' %s\n' % largeFileSystem().attributes()
    for f in sorted(gitConfigList("git-p4.largeFileExtensions"))
] +
['/' + f.replace(' ', '[:space:]') + ' %s\n' % largeFileSystem().attributes()
    for f in sorted(self.largeFiles) if not self.hasLargeFileExtension(f)
]

This, of course, would only work for gitattributes based solutions. 

> 
> +    def writeGitAttributesToStream(self):
> +        self.writeToGitStream(
> +            '100644',
> +            '.gitattributes',
> +            [
> +                '\n',
> +                '#\n',
> +                '# %s\n' % largeFileSystem().attributeDescription(),
> +                '#\n',
> +            ] +
> +            ['*.' + f.replace(' ', '[:space:]') + ' filter=%s -text\n'
> +                % largeFileSystem().attributeFilter()
> +                for f in sorted(gitConfigList("git-p4.largeFileExtensions"))
> +            ] +
> +            ['/' + f.replace(' ', '[:space:]') + ' filter=%s -text\n'
> +                % largeFileSystem().attributeFilter()
> +                for f in sorted(self.largeFiles) if not self.hasLargeFileExtension(f)
> +            ]
> +        )
> +
> 
> ... so while I can see the code like the above needs to exist
> somewhere in "git p4" to support GitHub LFS, I am not sure if it
> belongs to the generic part of the code.  For the same reason, I do
> not know if these replacements with largeFileSystem().getters() are
> really adding much value.
I have the impression you would prefer to move all the attributes code from the generic code to the GitLFS code? I will explore that solution and see if I can come up with a nice generic interface.
 
> 
> How is collaboration between those who talk to the same p4 depot
> backed by GitHub LFS expected to work?  You use config to set size
> limits and list of file extensions in your repository, and grab new
> changes from p4 and turn them into Git commits (with pointers to LFS
> and the .gitattributes file that records your choice of the config).
> I as a new member to the same project come, clone the resulting Git
> repository from you and then what do I do before talking to the same
> p4 depot to further update the Git history?  Are the values recorded
> in .gitattributes (which essentially were derived from your
> configuration) somehow reflected back automatically to my config so
> that our world view would become consistent?  Perhaps you added
> 'iso' to largeFileExtensions before I cloned from you, and that
> would be present in the copy of .gitattributes I obtained from you.
> I may be trying to add a new ".iso" file, and I presume that an
> existing .gitattributes entry you created based on the file
> extension automatically covers it from the LFS side, but does my
> "git p4 submit" also know what to do, or would it be broken until I
> add a set of configrations that matches yours?
Well, you have a very good point here. From my point of view you can use git-p4 in two ways:

_Way 1_: Your project is stored in Perforce and it will stay in Perforce. You use git-p4 on a regular basis to interact with the Perforce repository.
_Way 2_: Your project is stored in Perforce and you want to migrate it to Git. You use git-p4 once to perform the migration. Afterwards you never touch git-p4 or Perforce again.

My large file system feature is intended to be used only for _Way 2_ for exactly the reasons you mentioned. Would it still be OK to add this feature to git-p4? Maybe with a appropriate warning in the docs?

Thanks,
Lars

 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 4/5] git-p4: add support for large file systems
  2015-09-09 20:09     ` Lars Schneider
@ 2015-09-09 20:44       ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2015-09-09 20:44 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, luke

Lars Schneider <larsxschneider@gmail.com> writes:

>> Also the exact format of these attributes entries looks like very
>> closely tied to GitHub LFS and not generic (for example, there is no
>> reason to expect that any large-file support would always use the
>> "filter" mechanism or the gitattributes mechanism for that
>> matter), …
> Agreed. Instead of just the filter name I will replace everything
> after the pathname with a single git-p4 config value:

I actually was going to suggest not doing any such replacing.

You obviously need to cutomize the extensions list and specific
paths that hold large contents, but having .attributeDescription()
and .attributeFilter() that appear to be customizable sends a
message to the reader that the code aims to support things other
than GitHub LFS.  I somehow doubt that really is the case (as you
later mention, this would not work at all for solutions that are not
based on gitattributes in the first place).  It would be less
misleading if it is made painfully obvious that this part of the
code is about one specific backend.

> Well, you have a very good point here. From my point of view you can
> use git-p4 in two ways:
>
> _Way 1_: Your project is stored in Perforce and it will stay in
> Perforce. You use git-p4 on a regular basis to interact with the
> Perforce repository.
> _Way 2_: Your project is stored in Perforce and you want to migrate it
> to Git. You use git-p4 once to perform the migration. Afterwards you
> never touch git-p4 or Perforce again.
>
> My large file system feature is intended to be used only for _Way 2_
> for exactly the reasons you mentioned. Would it still be OK to add
> this feature to git-p4? Maybe with a appropriate warning in the docs?

I could not tell, from the documentation that came in the patch,
which one between the above two ways is supported, which made me
assume that it aimed to support #1, as the desire to have a bidi
bridge seems to be fairly common.

It is a valid use case to migrate away and not looking back.  But
that is the only workflow that is supported, that fact needs to be
documented more clearly, I would think.

Otherwise those who expected a bidirectional bridge by reading the
documentation would be disappointed, possibly after wasting a lot of
time trying to figure out why their "git p4 submit" is not sending
the full large blob contents back to the depot.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-09-09 20:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-09 11:59 [PATCH v4 0/5] git-p4: add support for large file systems larsxschneider
2015-09-09 11:59 ` [PATCH v4 1/5] git-p4: add optional type specifier to gitConfig reader larsxschneider
2015-09-09 11:59 ` [PATCH v4 2/5] git-p4: add gitConfigInt reader larsxschneider
2015-09-09 11:59 ` [PATCH v4 3/5] git-p4: return an empty list if a list config has no values larsxschneider
2015-09-09 11:59 ` [PATCH v4 4/5] git-p4: add support for large file systems larsxschneider
2015-09-09 17:20   ` Junio C Hamano
2015-09-09 20:09     ` Lars Schneider
2015-09-09 20:44       ` Junio C Hamano
2015-09-09 11:59 ` [PATCH v4 5/5] git-p4: add Git LFS backend for large file system larsxschneider

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).