git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [2.46 regression] git ls-remote crash with import remote-helper
@ 2024-07-27 19:19 Mike Hommey
  2024-07-28  3:54 ` Junio C Hamano
  2024-08-02  4:44 ` [PATCH] builtin/ls-remote: fall back to SHA1 outside of a repo Patrick Steinhardt
  0 siblings, 2 replies; 6+ messages in thread
From: Mike Hommey @ 2024-07-27 19:19 UTC (permalink / raw)
  To: git

Hi,

Running `git ls-remote $helper::$url` crashes when run outside a git
repo and the helper uses the import feature.

Here is a minimal reproducer:
```
$ cat > git-remote-foo <<EOF
#!/bin/sh
echo import
echo refspec '*:*'
EOF
$ chmod +x git-remote-foo
$ PATH=$PWD:$PATH git ls-remote foo::bar
```

The crash happens in parse_refspec in refspec.c, on a deref of the_hash_algo,
because the_hash_also is not set anymore at that point since c8aed5e8da.

Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [2.46 regression] git ls-remote crash with import remote-helper
  2024-07-27 19:19 [2.46 regression] git ls-remote crash with import remote-helper Mike Hommey
@ 2024-07-28  3:54 ` Junio C Hamano
  2024-07-29 15:41   ` Patrick Steinhardt
  2024-08-02  4:44 ` [PATCH] builtin/ls-remote: fall back to SHA1 outside of a repo Patrick Steinhardt
  1 sibling, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2024-07-28  3:54 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Mike Hommey

Mike Hommey <mh@glandium.org> writes:

> Running `git ls-remote $helper::$url` crashes when run outside a git
> repo and the helper uses the import feature.
>
> Here is a minimal reproducer:
> ```
> $ cat > git-remote-foo <<EOF
> #!/bin/sh
> echo import
> echo refspec '*:*'
> EOF
> $ chmod +x git-remote-foo
> $ PATH=$PWD:$PATH git ls-remote foo::bar
> ```
>
> The crash happens in parse_refspec in refspec.c, on a deref of the_hash_algo,
> because the_hash_also is not set anymore at that point since c8aed5e8da.

Thanks for a report, Mike.

Patrick, we have expected reports like this when we did c8aed5e8
(repository: stop setting SHA1 as the default object hash,
2024-05-07), so it is not very surprising.  In general, I think any
command that is designed to be usable outside a repository should
continue to fall back and use SHA-1, at least for now.  A command
like ls-remote _might_ want to do even better by waiting until it
has a chance to inspect what the other side said before setting the
hash-algo, or even better is to make it work without having any
concrete value in the hash-algo.  After all, when SHA-256
repositories become common out in the world, you should be able to
say ls-remote against them from your SHA-1 repository and the fact
that the hash-algo is read from local repository and set to SHA-1
should *not* negatively affect our ability to receive the ls-remote
response from SHA-256 repositories.  But that are all for longer
term future.  At least assuming SHA-1 like we have always done
should be better than the current situation.

Thanks.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [2.46 regression] git ls-remote crash with import remote-helper
  2024-07-28  3:54 ` Junio C Hamano
@ 2024-07-29 15:41   ` Patrick Steinhardt
  2024-07-29 17:18     ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Patrick Steinhardt @ 2024-07-29 15:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Mike Hommey

[-- Attachment #1: Type: text/plain, Size: 5795 bytes --]

On Sat, Jul 27, 2024 at 08:54:13PM -0700, Junio C Hamano wrote:
> Mike Hommey <mh@glandium.org> writes:
> 
> > Running `git ls-remote $helper::$url` crashes when run outside a git
> > repo and the helper uses the import feature.
> >
> > Here is a minimal reproducer:
> > ```
> > $ cat > git-remote-foo <<EOF
> > #!/bin/sh
> > echo import
> > echo refspec '*:*'
> > EOF
> > $ chmod +x git-remote-foo
> > $ PATH=$PWD:$PATH git ls-remote foo::bar
> > ```
> >
> > The crash happens in parse_refspec in refspec.c, on a deref of the_hash_algo,
> > because the_hash_also is not set anymore at that point since c8aed5e8da.
> 
> Thanks for a report, Mike.

Indeed, thanks for the report, I was able to reproduce the segfault
easily with that reproducer.

> Patrick, we have expected reports like this when we did c8aed5e8
> (repository: stop setting SHA1 as the default object hash,
> 2024-05-07), so it is not very surprising.  In general, I think any
> command that is designed to be usable outside a repository should
> continue to fall back and use SHA-1, at least for now.  A command
> like ls-remote _might_ want to do even better by waiting until it
> has a chance to inspect what the other side said before setting the
> hash-algo, or even better is to make it work without having any
> concrete value in the hash-algo.  After all, when SHA-256
> repositories become common out in the world, you should be able to
> say ls-remote against them from your SHA-1 repository and the fact
> that the hash-algo is read from local repository and set to SHA-1
> should *not* negatively affect our ability to receive the ls-remote
> response from SHA-256 repositories.  But that are all for longer
> term future.  At least assuming SHA-1 like we have always done
> should be better than the current situation.

I definitely agree that as a short-term solution it's probably the best
way to handle this.

The below patch is what I came up with to paper over the issue. This
restores the old behaviour, which is somewhat broken because we end up
mis-parsing refspecs. But I guess "somewhat broken" is arguably better
than "completely broken".

Let me know whether you want me to send this as a proper standalone
patch.

Patrick

--- >8 ---

commit c52112d3946b2fd8d030580cd7acb809fa54012a
Author: Patrick Steinhardt <ps@pks.im>
Date:   Mon Jul 29 17:21:00 2024 +0200

    builtin/ls-remote: fall back to SHA1 outside of a repo
    
    In c8aed5e8da (repository: stop setting SHA1 as the default object hash,
    2024-05-07), we have stopped setting the default hash algorithm for
    `the_repository`. Consequently, code that relies on `the_hash_algo` will
    now crash when it hasn't explicitly been initialized, which may be the
    case when running outside of a Git repository.
    
    It was reported that git-ls-remote(1) may crash in such a way when using
    a remote helper that advertises refspecs. This is because the refspec
    announced by the helper will get parsed during capability negotiation.
    At that point we haven't yet figured out what object format the remote
    uses though, so when run outside of a repository then we will fail.
    
    The course of action is somewhat dubious in the first place. Ideally, we
    should only parse object IDs once we have asked the remote helper for
    the object format. And if the helper didn't announce the "object-format"
    capability, then we should always assume SHA256. But instead, we used to
    take either SHA1 if there was no repository, or we used the hash of the
    local repository, which is wrong.
    
    Arguably though, crashing hard may not be in the best interest of our
    users, either. So while the old behaviour was buggy, let's restore it
    for now as a short-term fix. We should eventually revisit, potentially
    by deferring the point in time when we parse the refspec until after we
    have figured out the remote's object hash.
    
    Reported-by: Mike Hommey <mh@glandium.org>
    Signed-off-by: Patrick Steinhardt <ps@pks.im>

diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index debf2d4f88..6da63a67f5 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -91,6 +91,21 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 	dest = argv[0];
 
+	/*
+	 * TODO: This is buggy, but required for transport helpers. When a
+	 * transport helper advertises a "refspec", then we'd add that to a
+	 * list of refspecs via `refspec_append()`, which transitively depends
+	 * on `the_hash_algo`. Thus, when the hash algorithm isn't properly set
+	 * up, this would lead to a segfault.
+	 *
+	 * We really should fix this in the transport helper logic such that we
+	 * lazily parse refspec capabilities _after_ we have learned about the
+	 * remote's object format. Otherwise, we may end up misparsing refspecs
+	 * depending on what object hash the remote uses.
+	 */
+	if (!the_repository->hash_algo)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	packet_trace_identity("ls-remote");
 
 	if (argc > 1) {
diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index 42e77eb5a9..bc442ec221 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -402,4 +402,17 @@ test_expect_success 'v0 clients can handle multiple symrefs' '
 	test_cmp expect actual
 '
 
+test_expect_success 'helper with refspec capability fails gracefully' '
+	mkdir test-bin &&
+	write_script test-bin/git-remote-foo <<-EOF &&
+	echo import
+	echo refspec ${SQ}*:*${SQ}
+	EOF
+	(
+		PATH="$PWD/test-bin:$PATH" &&
+		export PATH &&
+		test_must_fail nongit git ls-remote foo::bar
+	)
+'
+
 test_done


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [2.46 regression] git ls-remote crash with import remote-helper
  2024-07-29 15:41   ` Patrick Steinhardt
@ 2024-07-29 17:18     ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2024-07-29 17:18 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Mike Hommey

Patrick Steinhardt <ps@pks.im> writes:

>> ...  After all, when SHA-256
>> repositories become common out in the world, you should be able to
>> say ls-remote against them from your SHA-1 repository and the fact
>> that the hash-algo is read from local repository and set to SHA-1
>> should *not* negatively affect our ability to receive the ls-remote
>> response from SHA-256 repositories.  But that are all for longer
>> term future.  At least assuming SHA-1 like we have always done
>> should be better than the current situation.
>
> I definitely agree that as a short-term solution it's probably the best
> way to handle this.
>
> The below patch is what I came up with to paper over the issue. This
> restores the old behaviour, which is somewhat broken because we end up
> mis-parsing refspecs. But I guess "somewhat broken" is arguably better
> than "completely broken".

It probably matters much more that this "will restore the old
behaviour" than it is "somewhat but not completely broken".  Going
back to the old behaviour is not making anything worse.

I do not consider this (or any other "a command that optionally can
work outside a repository no longer works") issue a ultra-high
priority.  If your "ls-remote $URL" does not work in a directory you
wanted to run it, you can temporarily create an empty repository
there and run the command to obtain the result you wanted to get (in
other words, there is an easy workaround).

I was scanning the command[] table in git.c for entries marked with
RUN_SETUP_GENTLY but other than ls-remote nothing that is commonly
used stood out.

So let's make this one of the early "oops, here is a fix for .1 when
enough of them have accumulated" patch.

Thanks.

> Patrick
>
> --- >8 ---
>
> commit c52112d3946b2fd8d030580cd7acb809fa54012a
> Author: Patrick Steinhardt <ps@pks.im>
> Date:   Mon Jul 29 17:21:00 2024 +0200
>
>     builtin/ls-remote: fall back to SHA1 outside of a repo
>     
>     In c8aed5e8da (repository: stop setting SHA1 as the default object hash,
>     2024-05-07), we have stopped setting the default hash algorithm for
>     `the_repository`. Consequently, code that relies on `the_hash_algo` will
>     now crash when it hasn't explicitly been initialized, which may be the
>     case when running outside of a Git repository.
>     
>     It was reported that git-ls-remote(1) may crash in such a way when using
>     a remote helper that advertises refspecs. This is because the refspec
>     announced by the helper will get parsed during capability negotiation.
>     At that point we haven't yet figured out what object format the remote
>     uses though, so when run outside of a repository then we will fail.
>     
>     The course of action is somewhat dubious in the first place. Ideally, we
>     should only parse object IDs once we have asked the remote helper for
>     the object format. And if the helper didn't announce the "object-format"
>     capability, then we should always assume SHA256. But instead, we used to
>     take either SHA1 if there was no repository, or we used the hash of the
>     local repository, which is wrong.
>     
>     Arguably though, crashing hard may not be in the best interest of our
>     users, either. So while the old behaviour was buggy, let's restore it
>     for now as a short-term fix. We should eventually revisit, potentially
>     by deferring the point in time when we parse the refspec until after we
>     have figured out the remote's object hash.
>     
>     Reported-by: Mike Hommey <mh@glandium.org>
>     Signed-off-by: Patrick Steinhardt <ps@pks.im>
>
> diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
> index debf2d4f88..6da63a67f5 100644
> --- a/builtin/ls-remote.c
> +++ b/builtin/ls-remote.c
> @@ -91,6 +91,21 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
>  			     PARSE_OPT_STOP_AT_NON_OPTION);
>  	dest = argv[0];
>  
> +	/*
> +	 * TODO: This is buggy, but required for transport helpers. When a
> +	 * transport helper advertises a "refspec", then we'd add that to a
> +	 * list of refspecs via `refspec_append()`, which transitively depends
> +	 * on `the_hash_algo`. Thus, when the hash algorithm isn't properly set
> +	 * up, this would lead to a segfault.
> +	 *
> +	 * We really should fix this in the transport helper logic such that we
> +	 * lazily parse refspec capabilities _after_ we have learned about the
> +	 * remote's object format. Otherwise, we may end up misparsing refspecs
> +	 * depending on what object hash the remote uses.
> +	 */
> +	if (!the_repository->hash_algo)
> +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +
>  	packet_trace_identity("ls-remote");
>  
>  	if (argc > 1) {
> diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
> index 42e77eb5a9..bc442ec221 100755
> --- a/t/t5512-ls-remote.sh
> +++ b/t/t5512-ls-remote.sh
> @@ -402,4 +402,17 @@ test_expect_success 'v0 clients can handle multiple symrefs' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'helper with refspec capability fails gracefully' '
> +	mkdir test-bin &&
> +	write_script test-bin/git-remote-foo <<-EOF &&
> +	echo import
> +	echo refspec ${SQ}*:*${SQ}
> +	EOF
> +	(
> +		PATH="$PWD/test-bin:$PATH" &&
> +		export PATH &&
> +		test_must_fail nongit git ls-remote foo::bar
> +	)
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] builtin/ls-remote: fall back to SHA1 outside of a repo
  2024-07-27 19:19 [2.46 regression] git ls-remote crash with import remote-helper Mike Hommey
  2024-07-28  3:54 ` Junio C Hamano
@ 2024-08-02  4:44 ` Patrick Steinhardt
  2024-08-02 15:26   ` Junio C Hamano
  1 sibling, 1 reply; 6+ messages in thread
From: Patrick Steinhardt @ 2024-08-02  4:44 UTC (permalink / raw)
  To: git; +Cc: Mike Hommey, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3511 bytes --]

In c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), we have stopped setting the default hash algorithm for
`the_repository`. Consequently, code that relies on `the_hash_algo` will
now crash when it hasn't explicitly been initialized, which may be the
case when running outside of a Git repository.

It was reported that git-ls-remote(1) may crash in such a way when using
a remote helper that advertises refspecs. This is because the refspec
announced by the helper will get parsed during capability negotiation.
At that point we haven't yet figured out what object format the remote
uses though, so when run outside of a repository then we will fail.

The course of action is somewhat dubious in the first place. Ideally, we
should only parse object IDs once we have asked the remote helper for
the object format. And if the helper didn't announce the "object-format"
capability, then we should always assume SHA256. But instead, we used to
take either SHA1 if there was no repository, or we used the hash of the
local repository, which is wrong.

Arguably though, crashing hard may not be in the best interest of our
users, either. So while the old behaviour was buggy, let's restore it
for now as a short-term fix. We should eventually revisit, potentially
by deferring the point in time when we parse the refspec until after we
have figured out the remote's object hash.

Reported-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---

I didn't spot this in the "What's cooking" report. I guess that's my own
fault for not sending it as a proper patch, so let me fix that now :)

Patrick

 builtin/ls-remote.c  | 15 +++++++++++++++
 t/t5512-ls-remote.sh | 13 +++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index debf2d4f88..6da63a67f5 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -91,6 +91,21 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 	dest = argv[0];
 
+	/*
+	 * TODO: This is buggy, but required for transport helpers. When a
+	 * transport helper advertises a "refspec", then we'd add that to a
+	 * list of refspecs via `refspec_append()`, which transitively depends
+	 * on `the_hash_algo`. Thus, when the hash algorithm isn't properly set
+	 * up, this would lead to a segfault.
+	 *
+	 * We really should fix this in the transport helper logic such that we
+	 * lazily parse refspec capabilities _after_ we have learned about the
+	 * remote's object format. Otherwise, we may end up misparsing refspecs
+	 * depending on what object hash the remote uses.
+	 */
+	if (!the_repository->hash_algo)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	packet_trace_identity("ls-remote");
 
 	if (argc > 1) {
diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh
index 42e77eb5a9..bc442ec221 100755
--- a/t/t5512-ls-remote.sh
+++ b/t/t5512-ls-remote.sh
@@ -402,4 +402,17 @@ test_expect_success 'v0 clients can handle multiple symrefs' '
 	test_cmp expect actual
 '
 
+test_expect_success 'helper with refspec capability fails gracefully' '
+	mkdir test-bin &&
+	write_script test-bin/git-remote-foo <<-EOF &&
+	echo import
+	echo refspec ${SQ}*:*${SQ}
+	EOF
+	(
+		PATH="$PWD/test-bin:$PATH" &&
+		export PATH &&
+		test_must_fail nongit git ls-remote foo::bar
+	)
+'
+
 test_done
-- 
2.46.0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] builtin/ls-remote: fall back to SHA1 outside of a repo
  2024-08-02  4:44 ` [PATCH] builtin/ls-remote: fall back to SHA1 outside of a repo Patrick Steinhardt
@ 2024-08-02 15:26   ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2024-08-02 15:26 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Mike Hommey

Patrick Steinhardt <ps@pks.im> writes:

> I didn't spot this in the "What's cooking" report. I guess that's my own
> fault for not sending it as a proper patch, so let me fix that now :)

Yeah, I had a fix already when I gave my response to the initial
problem report, but felt that it was too premature to commit to the
approach before listening to others (you included) for potentially
better alternative approaches, and then forgot about it.

The fix here is in line with my thoughts, after seeing how other
parts of the transport work.  Thanks for tying the loose ends.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-08-02 15:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-27 19:19 [2.46 regression] git ls-remote crash with import remote-helper Mike Hommey
2024-07-28  3:54 ` Junio C Hamano
2024-07-29 15:41   ` Patrick Steinhardt
2024-07-29 17:18     ` Junio C Hamano
2024-08-02  4:44 ` [PATCH] builtin/ls-remote: fall back to SHA1 outside of a repo Patrick Steinhardt
2024-08-02 15:26   ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).