git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] git-p4 shouldn't attempt to store symlinks in LFS
@ 2023-10-18 18:58 Matthew McClain
  2023-10-18 19:30 ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Matthew McClain @ 2023-10-18 18:58 UTC (permalink / raw)
  To: git; +Cc: Matthew McClain

If a symlink in your Perforce repository matches
git-p4.largeFileExtensions, git-p4.py will attempt to put the symlink in
LFS but will blow up when it passes a string to generateTempFile.

Git LFS normal behavior does not stash symlinks in LFS.

Importing revision 42889 (100%)Traceback (most recent call last):
  File "./git/git-p4.py", line 4621, in <module>
    main()
  File "./git/git-p4.py", line 4615, in main
    if not cmd.run(args):
  File "./git/git-p4.py", line 4225, in run
    self.importRevisions(args, branch_arg_given)
  File "./git/git-p4.py", line 4002, in importRevisions
    self.importChanges(changes)
  File "./git/git-p4.py", line 3876, in importChanges
    self.initialParent)
  File "./git/git-p4.py", line 3496, in commit
    self.streamP4Files(files)
  File "./git/git-p4.py", line 3336, in streamP4Files
    cb=streamP4FilesCbSelf)
  File "./git/git-p4.py", line 910, in p4CmdList
    cb(entry)
  File "./git/git-p4.py", line 3321, in streamP4FilesCbSelf
    self.streamP4FilesCb(entry)
  File "./git/git-p4.py", line 3266, in streamP4FilesCb
    self.streamOneP4File(self.stream_file, self.stream_contents)
  File "./git/git-p4.py", line 3217, in streamOneP4File
    git_mode, contents = self.largeFileSystem.processContent(git_mode, relPath, contents)
  File "./git/git-p4.py", line 1656, in processContent
    return LargeFileSystem.processContent(self, git_mode, relPath, contents)
  File "./git/git-p4.py", line 1526, in processContent
    contentTempFile = self.generateTempFile(contents)
  File "./git/git-p4.py", line 1488, in generateTempFile
    contentFile.write(d)
  File "/usr/lib64/python3.6/tempfile.py", line 485, in func_wrapper
    return func(*args, **kwargs)
TypeError: a bytes-like object is required, not 'str'

Signed-off-by: Matthew McClain <mmcclain@noprivs.com>
---
 git-p4.py | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/git-p4.py b/git-p4.py
index d26a980e5a..f5fda2a3dc 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1522,6 +1522,10 @@ def processContent(self, git_mode, relPath, contents):
            file is stored in the large file system and handles all necessary
            steps.
            """
+        # no need to store symlinks in LFS (generateTempFile wants bytes)
+        if git_mode == "120000":
+            return (git_mode, contents)
+
         if self.exceedsLargeFileThreshold(relPath, contents) or self.hasLargeFileExtension(relPath):
             contentTempFile = self.generateTempFile(contents)
             pointer_git_mode, contents, localLargeFile = self.generatePointer(contentTempFile)
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] git-p4 shouldn't attempt to store symlinks in LFS
  2023-10-18 18:58 [PATCH] git-p4 shouldn't attempt to store symlinks in LFS Matthew McClain
@ 2023-10-18 19:30 ` Junio C Hamano
  2023-10-18 22:26   ` brian m. carlson
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2023-10-18 19:30 UTC (permalink / raw)
  To: Matthew McClain; +Cc: git, brian m. carlson

Matthew McClain <mmcclain@noprivs.com> writes:

> If a symlink in your Perforce repository matches
> git-p4.largeFileExtensions, git-p4.py will attempt to put the symlink in
> LFS but will blow up when it passes a string to generateTempFile.
>
> Git LFS normal behavior does not stash symlinks in LFS.

I am not a "git p4" (or "git lfs") user, but if nobody uses LFS to
store symbolic links (which is very much understandable given what
it was invented for), then that alone is a good enough reason to
avoid the regular blob codepath, and that is true even if
generateTempFile accepted a str and silently converted it to bytes
to help callers, no?

So "but will blow up ..." and the stacktrace below are more or less
irrelevant details and do not help convince readers why this change
is desirable.  I'd suggest removing it (and perhaps place more stress
on explaining why storing symbolic links in LFS is a bad practice
nobody uses, but if it is so obvious perhaps the single sentence
explanation you have above would be sufficient).

I know I can ask brian to take a look at this change from LFS angle,
but who's authoritatively reviewing "git p4" related changes these
days (Matthew, this is not a question for you, but to the list)?

Thanks.

> Importing revision 42889 (100%)Traceback (most recent call last):
>   File "./git/git-p4.py", line 4621, in <module>
>     main()
>   File "./git/git-p4.py", line 4615, in main
>     if not cmd.run(args):
>   File "./git/git-p4.py", line 4225, in run
>     self.importRevisions(args, branch_arg_given)
>   File "./git/git-p4.py", line 4002, in importRevisions
>     self.importChanges(changes)
>   File "./git/git-p4.py", line 3876, in importChanges
>     self.initialParent)
>   File "./git/git-p4.py", line 3496, in commit
>     self.streamP4Files(files)
>   File "./git/git-p4.py", line 3336, in streamP4Files
>     cb=streamP4FilesCbSelf)
>   File "./git/git-p4.py", line 910, in p4CmdList
>     cb(entry)
>   File "./git/git-p4.py", line 3321, in streamP4FilesCbSelf
>     self.streamP4FilesCb(entry)
>   File "./git/git-p4.py", line 3266, in streamP4FilesCb
>     self.streamOneP4File(self.stream_file, self.stream_contents)
>   File "./git/git-p4.py", line 3217, in streamOneP4File
>     git_mode, contents = self.largeFileSystem.processContent(git_mode, relPath, contents)
>   File "./git/git-p4.py", line 1656, in processContent
>     return LargeFileSystem.processContent(self, git_mode, relPath, contents)
>   File "./git/git-p4.py", line 1526, in processContent
>     contentTempFile = self.generateTempFile(contents)
>   File "./git/git-p4.py", line 1488, in generateTempFile
>     contentFile.write(d)
>   File "/usr/lib64/python3.6/tempfile.py", line 485, in func_wrapper
>     return func(*args, **kwargs)
> TypeError: a bytes-like object is required, not 'str'
>
> Signed-off-by: Matthew McClain <mmcclain@noprivs.com>
> ---
>  git-p4.py | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/git-p4.py b/git-p4.py
> index d26a980e5a..f5fda2a3dc 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -1522,6 +1522,10 @@ def processContent(self, git_mode, relPath, contents):
>             file is stored in the large file system and handles all necessary
>             steps.
>             """
> +        # no need to store symlinks in LFS (generateTempFile wants bytes)
> +        if git_mode == "120000":
> +            return (git_mode, contents)
> +
>          if self.exceedsLargeFileThreshold(relPath, contents) or self.hasLargeFileExtension(relPath):
>              contentTempFile = self.generateTempFile(contents)
>              pointer_git_mode, contents, localLargeFile = self.generatePointer(contentTempFile)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] git-p4 shouldn't attempt to store symlinks in LFS
  2023-10-18 19:30 ` Junio C Hamano
@ 2023-10-18 22:26   ` brian m. carlson
  2023-10-18 22:46     ` Junio C Hamano
  2023-10-19  0:25     ` [PATCH v2] " Matthew McClain
  0 siblings, 2 replies; 6+ messages in thread
From: brian m. carlson @ 2023-10-18 22:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matthew McClain, git

[-- Attachment #1: Type: text/plain, Size: 2591 bytes --]

On 2023-10-18 at 19:30:02, Junio C Hamano wrote:
> Matthew McClain <mmcclain@noprivs.com> writes:
> 
> > If a symlink in your Perforce repository matches
> > git-p4.largeFileExtensions, git-p4.py will attempt to put the symlink in
> > LFS but will blow up when it passes a string to generateTempFile.
> >
> > Git LFS normal behavior does not stash symlinks in LFS.
> 
> I am not a "git p4" (or "git lfs") user, but if nobody uses LFS to
> store symbolic links (which is very much understandable given what
> it was invented for), then that alone is a good enough reason to
> avoid the regular blob codepath, and that is true even if
> generateTempFile accepted a str and silently converted it to bytes
> to help callers, no?

Git LFS doesn't store symlinks because smudge/clean filters don't handle
symlinks.  They never get passed to the filter process nor the
smudge/clean filters, nor could that occur without a change to the
protocol or command-line interface.  Unless Git learned how to send them
to the filters, Git LFS would have a hard time using them in any useful
way.

Also, for Git LFS, whose goal is to move large files out of the
repository history, symlinks are typically not an interesting thing to
process because they are functionally limited to 4 KiB or a similar size
on most systems.

> So "but will blow up ..." and the stacktrace below are more or less
> irrelevant details and do not help convince readers why this change
> is desirable.  I'd suggest removing it (and perhaps place more stress
> on explaining why storing symbolic links in LFS is a bad practice
> nobody uses, but if it is so obvious perhaps the single sentence
> explanation you have above would be sufficient).

Hopefully my explanation above can be part of that commit message.

> I know I can ask brian to take a look at this change from LFS angle,
> but who's authoritatively reviewing "git p4" related changes these
> days (Matthew, this is not a question for you, but to the list)?

It looks fine to me from an LFS angle, but I know nothing about Perforce
or git-p4.

Also, as a minor request, it would be great if you could continue to
email me at my personal address, since that's the address with which I
read the list.  My work address appearing on series is more of a
reflection that I'm more frequently finding time to work on things on
work time (hence the work address) versus personal time (where I'd be
using my personal email for patches).  I've fixed it up above.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] git-p4 shouldn't attempt to store symlinks in LFS
  2023-10-18 22:26   ` brian m. carlson
@ 2023-10-18 22:46     ` Junio C Hamano
  2023-10-19  0:25     ` [PATCH v2] " Matthew McClain
  1 sibling, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2023-10-18 22:46 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Matthew McClain, git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Also, as a minor request, it would be great if you could continue to
> email me at my personal address, since that's the address with which I
> read the list.  My work address appearing on series is more of a
> reflection that I'm more frequently finding time to work on things on
> work time (hence the work address) versus personal time (where I'd be
> using my personal email for patches).  I've fixed it up above.

Thanks.

With

    [alias]
	who = !sh -c 'git log -1 --format=\"%an <%ae>\" --author=\"$1\"' -

I say "git who sandals" to get your e-mail address, but I guess I
should use %aE instead to let .mailmap kick in.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2] git-p4 shouldn't attempt to store symlinks in LFS
  2023-10-18 22:26   ` brian m. carlson
  2023-10-18 22:46     ` Junio C Hamano
@ 2023-10-19  0:25     ` Matthew McClain
  2023-10-19 17:59       ` Junio C Hamano
  1 sibling, 1 reply; 6+ messages in thread
From: Matthew McClain @ 2023-10-19  0:25 UTC (permalink / raw)
  To: git; +Cc: sandals, gitster, Matthew McClain

git-p4.py would attempt to put a symlink in LFS if its file extension
matched git-p4.largeFileExtensions.

Git LFS doesn't store symlinks because smudge/clean filters don't handle
symlinks. They never get passed to the filter process nor the
smudge/clean filters, nor could that occur without a change to the
protocol or command-line interface. Unless Git learned how to send them
to the filters, Git LFS would have a hard time using them in any useful
way.

Git LFS's goal is to move large files out of the repository history, and
symlinks are functionally limited to 4 KiB or a similar size on most
systems.

Signed-off-by: Matthew McClain <mmcclain@noprivs.com>
---
 git-p4.py | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/git-p4.py b/git-p4.py
index d26a980e5a..0eb3bb4c47 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1522,6 +1522,10 @@ def processContent(self, git_mode, relPath, contents):
            file is stored in the large file system and handles all necessary
            steps.
            """
+        # symlinks aren't processed by smudge/clean filters
+        if git_mode == "120000":
+            return (git_mode, contents)
+
         if self.exceedsLargeFileThreshold(relPath, contents) or self.hasLargeFileExtension(relPath):
             contentTempFile = self.generateTempFile(contents)
             pointer_git_mode, contents, localLargeFile = self.generatePointer(contentTempFile)
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] git-p4 shouldn't attempt to store symlinks in LFS
  2023-10-19  0:25     ` [PATCH v2] " Matthew McClain
@ 2023-10-19 17:59       ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2023-10-19 17:59 UTC (permalink / raw)
  To: Matthew McClain; +Cc: git, sandals

Matthew McClain <mmcclain@noprivs.com> writes:

> git-p4.py would attempt to put a symlink in LFS if its file extension
> matched git-p4.largeFileExtensions.
>
> Git LFS doesn't store symlinks because smudge/clean filters don't handle
> symlinks. They never get passed to the filter process nor the
> smudge/clean filters, nor could that occur without a change to the
> protocol or command-line interface. Unless Git learned how to send them
> to the filters, Git LFS would have a hard time using them in any useful
> way.
>
> Git LFS's goal is to move large files out of the repository history, and
> symlinks are functionally limited to 4 KiB or a similar size on most
> systems.

Reads much better and easier to reason about.

Will queue and see if "git p4" stakeholders object for a few days;
if nothing happens, let's merge it down to 'next' and then to
'master' as usual.

Thanks.

> Signed-off-by: Matthew McClain <mmcclain@noprivs.com>
> ---
>  git-p4.py | 4 ++++
>  1 file changed, 4 insertions(+)

>
> diff --git a/git-p4.py b/git-p4.py
> index d26a980e5a..0eb3bb4c47 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -1522,6 +1522,10 @@ def processContent(self, git_mode, relPath, contents):
>             file is stored in the large file system and handles all necessary
>             steps.
>             """
> +        # symlinks aren't processed by smudge/clean filters
> +        if git_mode == "120000":
> +            return (git_mode, contents)
> +
>          if self.exceedsLargeFileThreshold(relPath, contents) or self.hasLargeFileExtension(relPath):
>              contentTempFile = self.generateTempFile(contents)
>              pointer_git_mode, contents, localLargeFile = self.generatePointer(contentTempFile)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-10-19 17:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-18 18:58 [PATCH] git-p4 shouldn't attempt to store symlinks in LFS Matthew McClain
2023-10-18 19:30 ` Junio C Hamano
2023-10-18 22:26   ` brian m. carlson
2023-10-18 22:46     ` Junio C Hamano
2023-10-19  0:25     ` [PATCH v2] " Matthew McClain
2023-10-19 17:59       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).