[PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper
@ 2024-11-05 19:05 Taylor Blau
  2024-11-05 19:05 ` [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode Taylor Blau
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Taylor Blau @ 2024-11-05 19:05 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, brian m. carlson

This series implements a new 'sha1-unsafe' test helper, similar to
't/helper/test-tool sha1'.

I have found such a helper to be really handy when debugging the new
SHA1_UNSAFE build knobs, e.g., to easily compare the performance of the
safe versus unsafe routines, different unsafe variants, etc.

The first patch prepares us by setting up the existing cmd_hash_impl()
function to be able to conditionally use the unsafe variant. The final
patch introduces a new 'sha1-unsafe' test helper which calls the new
variant.

Thanks in advance for your review!

Taylor Blau (2):
  t/helper/test-sha1: prepare for an unsafe mode
  t/helper/test-tool: implement sha1-unsafe helper

 t/helper/test-hash.c   | 17 +++++++++++++----
 t/helper/test-sha1.c   |  7 ++++++-
 t/helper/test-sha1.sh  | 38 ++++++++++++++++++++++----------------
 t/helper/test-sha256.c |  2 +-
 t/helper/test-tool.c   |  1 +
 t/helper/test-tool.h   |  3 ++-
 6 files changed, 45 insertions(+), 23 deletions(-)

Range-diff against v1:
1:  3b31db4d2df = 1:  0e2fcee6894 t/helper/test-sha1: prepare for an unsafe mode
2:  d343f5dc9e5 = 2:  d8c1fc78b57 t/helper/test-tool: implement sha1-unsafe helper

base-commit: 8f8d6eee531b3fa1a8ef14f169b0cb5035f7a772
-- 
2.47.0.231.gd8c1fc78b57

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode
  2024-11-05 19:05 [PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper Taylor Blau
@ 2024-11-05 19:05 ` Taylor Blau
  2024-11-06  1:38   ` Junio C Hamano
  2024-11-05 19:05 ` [PATCH v2 2/2] t/helper/test-tool: implement sha1-unsafe helper Taylor Blau
  2024-11-07  1:47 ` [PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper Jeff King
  2 siblings, 1 reply; 19+ messages in thread
From: Taylor Blau @ 2024-11-05 19:05 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, brian m. carlson

With the new "unsafe" SHA-1 build knob, it would be convenient to have
a test-tool that can exercise Git's unsafe SHA-1 wrappers for testing,
similar to 't/helper/test-tool sha1'.

Prepare for such a helper by altering the implementation of that
test-tool (in cmd_hash_impl(), which is generic and parameterized over
different hash functions) to conditionally run the unsafe variants of
the chosen hash function.

The following commit will add a new test-tool which makes use of this
new parameter.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/helper/test-hash.c   | 17 +++++++++++++----
 t/helper/test-sha1.c   |  2 +-
 t/helper/test-sha256.c |  2 +-
 t/helper/test-tool.h   |  2 +-
 4 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/t/helper/test-hash.c b/t/helper/test-hash.c
index 45d829c908f..d0ee668df95 100644
--- a/t/helper/test-hash.c
+++ b/t/helper/test-hash.c
@@ -1,7 +1,7 @@
 #include "test-tool.h"
 #include "hex.h"
 
-int cmd_hash_impl(int ac, const char **av, int algo)
+int cmd_hash_impl(int ac, const char **av, int algo, int unsafe)
 {
 	git_hash_ctx ctx;
 	unsigned char hash[GIT_MAX_HEXSZ];
@@ -27,7 +27,10 @@ int cmd_hash_impl(int ac, const char **av, int algo)
 			die("OOPS");
 	}
 
-	algop->init_fn(&ctx);
+	if (unsafe)
+		algop->unsafe_init_fn(&ctx);
+	else
+		algop->init_fn(&ctx);
 
 	while (1) {
 		ssize_t sz, this_sz;
@@ -46,9 +49,15 @@ int cmd_hash_impl(int ac, const char **av, int algo)
 		}
 		if (this_sz == 0)
 			break;
-		algop->update_fn(&ctx, buffer, this_sz);
+		if (unsafe)
+			algop->unsafe_update_fn(&ctx, buffer, this_sz);
+		else
+			algop->update_fn(&ctx, buffer, this_sz);
 	}
-	algop->final_fn(hash, &ctx);
+	if (unsafe)
+		algop->unsafe_final_fn(hash, &ctx);
+	else
+		algop->final_fn(hash, &ctx);
 
 	if (binary)
 		fwrite(hash, 1, algop->rawsz, stdout);
diff --git a/t/helper/test-sha1.c b/t/helper/test-sha1.c
index e60d000c039..1c1272cc1f9 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-sha1.c
@@ -3,7 +3,7 @@
 
 int cmd__sha1(int ac, const char **av)
 {
-	return cmd_hash_impl(ac, av, GIT_HASH_SHA1);
+	return cmd_hash_impl(ac, av, GIT_HASH_SHA1, 0);
 }
 
 int cmd__sha1_is_sha1dc(int argc UNUSED, const char **argv UNUSED)
diff --git a/t/helper/test-sha256.c b/t/helper/test-sha256.c
index 2fb20438f3c..7fd0aa1fcd3 100644
--- a/t/helper/test-sha256.c
+++ b/t/helper/test-sha256.c
@@ -3,5 +3,5 @@
 
 int cmd__sha256(int ac, const char **av)
 {
-	return cmd_hash_impl(ac, av, GIT_HASH_SHA256);
+	return cmd_hash_impl(ac, av, GIT_HASH_SHA256, 0);
 }
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 21802ac27da..f3524d9a0f6 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -81,6 +81,6 @@ int cmd__windows_named_pipe(int argc, const char **argv);
 #endif
 int cmd__write_cache(int argc, const char **argv);
 
-int cmd_hash_impl(int ac, const char **av, int algo);
+int cmd_hash_impl(int ac, const char **av, int algo, int unsafe);
 
 #endif
-- 
2.47.0.231.gd8c1fc78b57


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/2] t/helper/test-tool: implement sha1-unsafe helper
  2024-11-05 19:05 [PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper Taylor Blau
  2024-11-05 19:05 ` [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode Taylor Blau
@ 2024-11-05 19:05 ` Taylor Blau
  2024-11-07  1:47 ` [PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper Jeff King
  2 siblings, 0 replies; 19+ messages in thread
From: Taylor Blau @ 2024-11-05 19:05 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jeff King, Junio C Hamano, brian m. carlson

Add a new helper similar to 't/helper/test-tool sha1' called instead
"sha1-unsafe" which uses the unsafe variant of Git's SHA-1 wrappers.

While we're at it, modify the test-sha1.sh script to exercise both
the sha1 and sha1-unsafe test tools to ensure that both produce the
expected hash values.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/helper/test-sha1.c  |  5 +++++
 t/helper/test-sha1.sh | 38 ++++++++++++++++++++++----------------
 t/helper/test-tool.c  |  1 +
 t/helper/test-tool.h  |  1 +
 4 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/t/helper/test-sha1.c b/t/helper/test-sha1.c
index 1c1272cc1f9..349540c4df8 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-sha1.c
@@ -13,3 +13,8 @@ int cmd__sha1_is_sha1dc(int argc UNUSED, const char **argv UNUSED)
 #endif
 	return 1;
 }
+
+int cmd__sha1_unsafe(int ac, const char **av)
+{
+	return cmd_hash_impl(ac, av, GIT_HASH_SHA1, 1);
+}
diff --git a/t/helper/test-sha1.sh b/t/helper/test-sha1.sh
index 84594885c70..bf387d3db14 100755
--- a/t/helper/test-sha1.sh
+++ b/t/helper/test-sha1.sh
@@ -3,25 +3,31 @@
 dd if=/dev/zero bs=1048576 count=100 2>/dev/null |
 /usr/bin/time t/helper/test-tool sha1 >/dev/null
 
+dd if=/dev/zero bs=1048576 count=100 2>/dev/null |
+/usr/bin/time t/helper/test-tool sha1-unsafe >/dev/null
+
 while read expect cnt pfx
 do
 	case "$expect" in '#'*) continue ;; esac
-	actual=$(
-		{
-			test -z "$pfx" || echo "$pfx"
-			dd if=/dev/zero bs=1048576 count=$cnt 2>/dev/null |
-			perl -pe 'y/\000/g/'
-		} | ./t/helper/test-tool sha1 $cnt
-	)
-	if test "$expect" = "$actual"
-	then
-		echo "OK: $expect $cnt $pfx"
-	else
-		echo >&2 "OOPS: $cnt"
-		echo >&2 "expect: $expect"
-		echo >&2 "actual: $actual"
-		exit 1
-	fi
+	for sha1 in sha1 sha1-unsafe
+	do
+		actual=$(
+			{
+				test -z "$pfx" || echo "$pfx"
+				dd if=/dev/zero bs=1048576 count=$cnt 2>/dev/null |
+				perl -pe 'y/\000/g/'
+			} | ./t/helper/test-tool $sha1 $cnt
+		)
+		if test "$expect" = "$actual"
+		then
+			echo "OK ($sha1): $expect $cnt $pfx"
+		else
+			echo >&2 "OOPS ($sha1): $cnt"
+			echo >&2 "expect ($sha1): $expect"
+			echo >&2 "actual ($sha1): $actual"
+			exit 1
+		fi
+	done
 done <<EOF
 da39a3ee5e6b4b0d3255bfef95601890afd80709 0
 3f786850e387550fdab836ed7e6dc881de23001b 0 a
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 1ebb69a5dc4..51ed25c07e2 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -70,6 +70,7 @@ static struct test_cmd cmds[] = {
 	{ "serve-v2", cmd__serve_v2 },
 	{ "sha1", cmd__sha1 },
 	{ "sha1-is-sha1dc", cmd__sha1_is_sha1dc },
+	{ "sha1-unsafe", cmd__sha1_unsafe },
 	{ "sha256", cmd__sha256 },
 	{ "sigchain", cmd__sigchain },
 	{ "simple-ipc", cmd__simple_ipc },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index f3524d9a0f6..24149edd414 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -63,6 +63,7 @@ int cmd__scrap_cache_tree(int argc, const char **argv);
 int cmd__serve_v2(int argc, const char **argv);
 int cmd__sha1(int argc, const char **argv);
 int cmd__sha1_is_sha1dc(int argc, const char **argv);
+int cmd__sha1_unsafe(int argc, const char **argv);
 int cmd__sha256(int argc, const char **argv);
 int cmd__sigchain(int argc, const char **argv);
 int cmd__simple_ipc(int argc, const char **argv);
-- 
2.47.0.231.gd8c1fc78b57

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode
  2024-11-05 19:05 ` [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode Taylor Blau
@ 2024-11-06  1:38   ` Junio C Hamano
  2024-11-07  0:48     ` brian m. carlson
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2024-11-06  1:38 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Jeff King, brian m. carlson

Taylor Blau <me@ttaylorr.com> writes:

> -int cmd_hash_impl(int ac, const char **av, int algo)
> +int cmd_hash_impl(int ac, const char **av, int algo, int unsafe)
>  {
>  	git_hash_ctx ctx;
>  	unsigned char hash[GIT_MAX_HEXSZ];
> @@ -27,7 +27,10 @@ int cmd_hash_impl(int ac, const char **av, int algo)
>  			die("OOPS");
>  	}
>  
> -	algop->init_fn(&ctx);
> +	if (unsafe)
> +		algop->unsafe_init_fn(&ctx);
> +	else
> +		algop->init_fn(&ctx);

It may be just me, and it would not matter all that much within the
context of the project because this is merely a test helper, but
giving a pair of init/unsafe_init methods to algop looks unnerving.
It gives an impression that every design of hash algorithm must come
with normal and unsafe variant, which is not what you want to say.

I would have expected that there are different algorighm instances,
and one of them would be "unsafe SHA-1", among "normal SHA-1" and
"SHA-256" (as the last one would not even have unsafe_init_fn
method), and the callers that want to measure the performance of
each algorithm would simply pick one of these instances and go
through the usual "init", "update", "final" flow, regardless of the
"unsafe" ness of the algorithm.

IOW, ...

>  	while (1) {
>  		ssize_t sz, this_sz;
> @@ -46,9 +49,15 @@ int cmd_hash_impl(int ac, const char **av, int algo)
>  		}
>  		if (this_sz == 0)
>  			break;
> -		algop->update_fn(&ctx, buffer, this_sz);
> +		if (unsafe)
> +			algop->unsafe_update_fn(&ctx, buffer, this_sz);
> +		else
> +			algop->update_fn(&ctx, buffer, this_sz);
>  	}
> -	algop->final_fn(hash, &ctx);
> +	if (unsafe)
> +		algop->unsafe_final_fn(hash, &ctx);
> +	else
> +		algop->final_fn(hash, &ctx);

... I didn't expect any of these "if (unsafe) .. else .." switches.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode
  2024-11-06  1:38   ` Junio C Hamano
@ 2024-11-07  0:48     ` brian m. carlson
  2024-11-07  1:39       ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: brian m. carlson @ 2024-11-07  0:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, Elijah Newren, Jeff King

[-- Attachment #1: Type: text/plain, Size: 3502 bytes --]

On 2024-11-06 at 01:38:48, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> 
> > -int cmd_hash_impl(int ac, const char **av, int algo)
> > +int cmd_hash_impl(int ac, const char **av, int algo, int unsafe)
> >  {
> >  	git_hash_ctx ctx;
> >  	unsigned char hash[GIT_MAX_HEXSZ];
> > @@ -27,7 +27,10 @@ int cmd_hash_impl(int ac, const char **av, int algo)
> >  			die("OOPS");
> >  	}
> >  
> > -	algop->init_fn(&ctx);
> > +	if (unsafe)
> > +		algop->unsafe_init_fn(&ctx);
> > +	else
> > +		algop->init_fn(&ctx);
> 
> It may be just me, and it would not matter all that much within the
> context of the project because this is merely a test helper, but
> giving a pair of init/unsafe_init methods to algop looks unnerving.
> It gives an impression that every design of hash algorithm must come
> with normal and unsafe variant, which is not what you want to say.
> 
> I would have expected that there are different algorighm instances,
> and one of them would be "unsafe SHA-1", among "normal SHA-1" and
> "SHA-256" (as the last one would not even have unsafe_init_fn
> method), and the callers that want to measure the performance of
> each algorithm would simply pick one of these instances and go
> through the usual "init", "update", "final" flow, regardless of the
> "unsafe" ness of the algorithm.
> 
> IOW, ...
> 
> >  	while (1) {
> >  		ssize_t sz, this_sz;
> > @@ -46,9 +49,15 @@ int cmd_hash_impl(int ac, const char **av, int algo)
> >  		}
> >  		if (this_sz == 0)
> >  			break;
> > -		algop->update_fn(&ctx, buffer, this_sz);
> > +		if (unsafe)
> > +			algop->unsafe_update_fn(&ctx, buffer, this_sz);
> > +		else
> > +			algop->update_fn(&ctx, buffer, this_sz);
> >  	}
> > -	algop->final_fn(hash, &ctx);
> > +	if (unsafe)
> > +		algop->unsafe_final_fn(hash, &ctx);
> > +	else
> > +		algop->final_fn(hash, &ctx);
> 
> ... I didn't expect any of these "if (unsafe) .. else .." switches.

I don't think we can remove those, and here's why.  Certainly Taylor can
correct me if I'm off base, though.

In the normal case, our hash struct is a union, and different
implementations can have a different layout.  A SHA-1 implementation
will usually keep track of a 64-bit size, 5 32-bit words, and up to 64
bytes of data that might need to be processed.  Maybe SHA-1-DC, our safe
implementation, stores the size first, but our unsafe implementation
is OpenSSL and it stores the 5 hash words first, or whatever.

If we use the same update and final functions, we'll end up with
incorrect data because we'll have initialized the union contents with
data for one implementation but are trying to update or finalize it with
different data, which in the very best case will just produce broken
results, and might just cause a segfault (if one of the implementations
stores a pointer instead of storing the data in the union).

We could certainly adopt a different hash algorithm structure for safe
and unsafe code, but we have a lot of places that assume that there's
just one structure per algorithm.  It would require a substantial amount
of refactoring to remove those assumptions and deal with the fact that
we now have two SHA-1 implementations.  It would also be tricky to deal
with the fact that for SHA-1, we might use the safe or unsafe algorithm,
but for SHA-256, there's only one algorithm structure and we need to use
it for both.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode
  2024-11-07  0:48     ` brian m. carlson
@ 2024-11-07  1:39       ` Jeff King
  2024-11-07  1:49         ` brian m. carlson
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2024-11-07  1:39 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Junio C Hamano, Taylor Blau, git, Elijah Newren

On Thu, Nov 07, 2024 at 12:48:26AM +0000, brian m. carlson wrote:

> > I would have expected that there are different algorighm instances,
> > and one of them would be "unsafe SHA-1", among "normal SHA-1" and
> > "SHA-256" (as the last one would not even have unsafe_init_fn
> > method), and the callers that want to measure the performance of
> > each algorithm would simply pick one of these instances and go
> > through the usual "init", "update", "final" flow, regardless of the
> > "unsafe" ness of the algorithm.
> [...]
> > ... I didn't expect any of these "if (unsafe) .. else .." switches.
> 
> I don't think we can remove those, and here's why.  Certainly Taylor can
> correct me if I'm off base, though.
> 
> In the normal case, our hash struct is a union, and different
> implementations can have a different layout.  A SHA-1 implementation
> will usually keep track of a 64-bit size, 5 32-bit words, and up to 64
> bytes of data that might need to be processed.  Maybe SHA-1-DC, our safe
> implementation, stores the size first, but our unsafe implementation
> is OpenSSL and it stores the 5 hash words first, or whatever.
> 
> If we use the same update and final functions, we'll end up with
> incorrect data because we'll have initialized the union contents with
> data for one implementation but are trying to update or finalize it with
> different data, which in the very best case will just produce broken
> results, and might just cause a segfault (if one of the implementations
> stores a pointer instead of storing the data in the union).

I don't think Junio is proposing mixing the functions on a single data
type. Which, as you note, would be a disaster. I think the idea is for a
separate git_hash_algo struct entirely, that can be slotted in as a
pointer (since git_hash_algo is already essentially a virtual type).

That gets weird as you say here:

> We could certainly adopt a different hash algorithm structure for safe
> and unsafe code, but we have a lot of places that assume that there's
> just one structure per algorithm.  It would require a substantial amount
> of refactoring to remove those assumptions and deal with the fact that
> we now have two SHA-1 implementations.  It would also be tricky to deal
> with the fact that for SHA-1, we might use the safe or unsafe algorithm,
> but for SHA-256, there's only one algorithm structure and we need to use
> it for both.

both because we have two different algos for "sha1", but also because
they are not _really_ independent. If the_hash_algo is sha1, then
whichever implementation a given piece of code is using, it must still
be one of the sha1 variants.

So I think you wouldn't want to allocate an enum or a slot in the
hash_algos array, because it is not really an independent algorithm.
But I think it _could_ work as a separate struct that the caller derives
from the main hash algorithm. For example, imagine that the
git_hash_algo struct didn't have unsafe_init_fn, etc, but instead had:

  struct git_hash_algo *unsafe_implementation;

with a matching accessor like:

  struct git_hash_algo *unsafe_hash_algo(struct git_hash_algo *algo)
  {
	/* if we have a faster "unsafe" implementation, use that */
	if (algo->unsafe_implementation)
		return algo->unsafe_implementation;
	/* otherwise just use the default one */
	return algo;
  }

And then csum-file.c, rather than calling:

  the_hash_algo->unsafe_init_fn(...);
  ...
  the_hash_algo->unsafe_final_fn(...);

would do this:

  struct git_hash_algo *algo = unsafe_hash_algo(the_hash_algo);
  algo->init_fn(...);
  ...
  algo->final_fn(...);

And likewise this test helper would have a single conditional at the
start:

  if (unsafe)
	algo = unsafe_hash_algo(algo);

and the rest of the code would just use that.

All that said, I do not think it buys us anything for "real" code. There
the decision between safe/unsafe is in the context of how we are using
it, and not based on some conditional. So while the code in this test
helper is ugly, I don't think we'd ever write anything like that for
real. So it may not be worth the effort to refactor into a more virtual
object-oriented way.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper
  2024-11-05 19:05 [PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper Taylor Blau
  2024-11-05 19:05 ` [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode Taylor Blau
  2024-11-05 19:05 ` [PATCH v2 2/2] t/helper/test-tool: implement sha1-unsafe helper Taylor Blau
@ 2024-11-07  1:47 ` Jeff King
  2024-11-07  2:05   ` brian m. carlson
  2024-11-07 21:36   ` Taylor Blau
  2 siblings, 2 replies; 19+ messages in thread
From: Jeff King @ 2024-11-07  1:47 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, brian m. carlson

On Tue, Nov 05, 2024 at 02:05:10PM -0500, Taylor Blau wrote:

> This series implements a new 'sha1-unsafe' test helper, similar to
> 't/helper/test-tool sha1'.
> 
> I have found such a helper to be really handy when debugging the new
> SHA1_UNSAFE build knobs, e.g., to easily compare the performance of the
> safe versus unsafe routines, different unsafe variants, etc.
> 
> The first patch prepares us by setting up the existing cmd_hash_impl()
> function to be able to conditionally use the unsafe variant. The final
> patch introduces a new 'sha1-unsafe' test helper which calls the new
> variant.

I think this is a useful thing to have, and I didn't see anything wrong
in the implementation. I did notice some oddities that existed before
your series:

  1. Why do we have "test-tool sha256" at all? Nobody in the test suite
     calls it. It feels like the whole test-sha1/sha256/hash split is
     overly complicated. A single "test-tool hash" seems like it would
     be simpler, and it could take an "--algo" parameter (and an
     "--unsafe" one). I guess in the end we end up with the same options
     ,but the proliferation of top-level test-tool commands seems ugly
     to me (likewise "sha1_is_sha1dc").

  2. You modified test-sha1.sh, but I've wondered if we should just
     delete that script. It is not ever invoked in the test suite AFAIK.
     If we want correctness tests, they should go into a real t[0-9]*.sh
     script (though one imagines we exercise the sha1 code quite a lot
     in the rest of the tests). And it starts with a weird ad-hoc
     performance test that doesn't really tell us much. A t/perf test
     would be better (if it is even worth measuring at all).

So I dunno. None of those are the fault of your series, but it is piling
on yet another test-tool command.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode
  2024-11-07  1:39       ` Jeff King
@ 2024-11-07  1:49         ` brian m. carlson
  2024-11-07  2:08           ` Jeff King
  2024-11-07 21:30           ` Taylor Blau
  0 siblings, 2 replies; 19+ messages in thread
From: brian m. carlson @ 2024-11-07  1:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Taylor Blau, git, Elijah Newren

[-- Attachment #1: Type: text/plain, Size: 2551 bytes --]

On 2024-11-07 at 01:39:15, Jeff King wrote:
> So I think you wouldn't want to allocate an enum or a slot in the
> hash_algos array, because it is not really an independent algorithm.
> But I think it _could_ work as a separate struct that the caller derives
> from the main hash algorithm. For example, imagine that the
> git_hash_algo struct didn't have unsafe_init_fn, etc, but instead had:
> 
>   struct git_hash_algo *unsafe_implementation;
> 
> with a matching accessor like:
> 
>   struct git_hash_algo *unsafe_hash_algo(struct git_hash_algo *algo)
>   {
> 	/* if we have a faster "unsafe" implementation, use that */
> 	if (algo->unsafe_implementation)
> 		return algo->unsafe_implementation;
> 	/* otherwise just use the default one */
> 	return algo;
>   }
> 
> And then csum-file.c, rather than calling:
> 
>   the_hash_algo->unsafe_init_fn(...);
>   ...
>   the_hash_algo->unsafe_final_fn(...);
> 
> would do this:
> 
>   struct git_hash_algo *algo = unsafe_hash_algo(the_hash_algo);
>   algo->init_fn(...);
>   ...
>   algo->final_fn(...);
> 
> And likewise this test helper would have a single conditional at the
> start:
> 
>   if (unsafe)
> 	algo = unsafe_hash_algo(algo);
> 
> and the rest of the code would just use that.

Ah, yes, I think that approach would be simpler and nicer to work with
than a separate entry in the `hash_algos` array.  We do, however, have
some places that assume that a `struct git_hash_algo *` points into the
`hash_algos` array (notably `hash_algo_by_ptr`), so we'd have to adjust
for that, move the function pointers out into their own struct which
we'd use for `unsafe_hash_algo`, or be careful never to call the
relevant functions on our special `git_hash_algo` struct.

> All that said, I do not think it buys us anything for "real" code. There
> the decision between safe/unsafe is in the context of how we are using
> it, and not based on some conditional. So while the code in this test
> helper is ugly, I don't think we'd ever write anything like that for
> real. So it may not be worth the effort to refactor into a more virtual
> object-oriented way.

Yeah, I don't have a strong opinion one way or the other.  I think, with
the limitation I mentioned above, it would probably require a decent
amount of refactoring if we took a different approach, and I'm fine with
going with Taylor's current approach unless he wants to do that
refactoring (in which case, great).
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper
  2024-11-07  1:47 ` [PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper Jeff King
@ 2024-11-07  2:05   ` brian m. carlson
  2024-11-07 21:33     ` Taylor Blau
  2024-11-08 17:22     ` Jeff King
  2024-11-07 21:36   ` Taylor Blau
  1 sibling, 2 replies; 19+ messages in thread
From: brian m. carlson @ 2024-11-07  2:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, Elijah Newren, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1499 bytes --]

On 2024-11-07 at 01:47:37, Jeff King wrote:
> I think this is a useful thing to have, and I didn't see anything wrong
> in the implementation. I did notice some oddities that existed before
> your series:
> 
>   1. Why do we have "test-tool sha256" at all? Nobody in the test suite
>      calls it. It feels like the whole test-sha1/sha256/hash split is
>      overly complicated. A single "test-tool hash" seems like it would
>      be simpler, and it could take an "--algo" parameter (and an
>      "--unsafe" one). I guess in the end we end up with the same options
>      ,but the proliferation of top-level test-tool commands seems ugly
>      to me (likewise "sha1_is_sha1dc").

I think we do in `pack_trailer` in `t/lib-pack.sh`, but not in a
greppable way:

  # Compute and append pack trailer to "$1"
  pack_trailer () {
  	test-tool $(test_oid algo) -b <"$1" >trailer.tmp &&
  	cat trailer.tmp >>"$1" &&
  	rm -f trailer.tmp
  }

When you posed the question above, I wasn't sure why I added this
functionality: was it just to test my SHA-256 implementation adequately
or did it have some actual utility in the testsuite?  However, I knew if
it didn't appear straightforwardly in `git grep`, any uses would involve
`test_oid`, and boom, I was right.

I think a single helper with `--algo` and `--unsafe` parameters would
also be fine and would probably be a little more tidy, as you mention.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode
  2024-11-07  1:49         ` brian m. carlson
@ 2024-11-07  2:08           ` Jeff King
  2024-11-07  3:08             ` Junio C Hamano
  2024-11-07 21:30           ` Taylor Blau
  1 sibling, 1 reply; 19+ messages in thread
From: Jeff King @ 2024-11-07  2:08 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Junio C Hamano, Taylor Blau, git, Elijah Newren

On Thu, Nov 07, 2024 at 01:49:35AM +0000, brian m. carlson wrote:

> Ah, yes, I think that approach would be simpler and nicer to work with
> than a separate entry in the `hash_algos` array.  We do, however, have
> some places that assume that a `struct git_hash_algo *` points into the
> `hash_algos` array (notably `hash_algo_by_ptr`), so we'd have to adjust
> for that, move the function pointers out into their own struct which
> we'd use for `unsafe_hash_algo`, or be careful never to call the
> relevant functions on our special `git_hash_algo` struct.

Yeah, I wondered if some code might be surprised by having a separate
hash algo. Another weird thing is that the sub-implementation algo
struct will have its own rawsz, hexsz, etc, even though those _must_ be
the same its parent.

If all of the virtual implementation functions were in a git_hash_impl
struct, then each git_hash_algo struct could have one embedded for the
main implementation (which in sha1's case would be a collision detecting
one), and an optional pointer to another unsafe _impl struct.

And then you get more type-safety, because you can never confuse the
_impl struct for a parent git_hash_algo.

The downside is that every single caller, even if it doesn't care about
the unsafe variant, needs to refer to the_hash_algo->impl.init_fn(),
etc, due to the extra layer of indirection. Probably not worth it.

> Yeah, I don't have a strong opinion one way or the other.  I think, with
> the limitation I mentioned above, it would probably require a decent
> amount of refactoring if we took a different approach, and I'm fine with
> going with Taylor's current approach unless he wants to do that
> refactoring (in which case, great).

Me too. If we were fixing something ugly or error-prone that we expected
to come up in real code, it might be worth it. But it's probably not for
trying to accommodate a one-off test helper.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode
  2024-11-07  2:08           ` Jeff King
@ 2024-11-07  3:08             ` Junio C Hamano
  0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2024-11-07  3:08 UTC (permalink / raw)
  To: Jeff King; +Cc: brian m. carlson, Taylor Blau, git, Elijah Newren

Jeff King <peff@peff.net> writes:

> Me too. If we were fixing something ugly or error-prone that we expected
> to come up in real code, it might be worth it. But it's probably not for
> trying to accommodate a one-off test helper.

I 100% agree that it would not matter all that much within the
context of the project because this is merely a test helper.

It looked odd not to have sha1-unsafe as a first class citizen next
to sha1 and sha256, with an entry for it in the list enums and list
of hash algos.  If there is a good reason why adding the "unsafe"
variant as a first class citizen among algos, the approach posted
patch took is fine.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode
  2024-11-07  1:49         ` brian m. carlson
  2024-11-07  2:08           ` Jeff King
@ 2024-11-07 21:30           ` Taylor Blau
  2024-11-07 23:20             ` Taylor Blau
  2024-11-08 17:26             ` Jeff King
  1 sibling, 2 replies; 19+ messages in thread
From: Taylor Blau @ 2024-11-07 21:30 UTC (permalink / raw)
  To: brian m. carlson, Jeff King, Junio C Hamano, git, Elijah Newren

On Thu, Nov 07, 2024 at 01:49:35AM +0000, brian m. carlson wrote:
> On 2024-11-07 at 01:39:15, Jeff King wrote:
> > So I think you wouldn't want to allocate an enum or a slot in the
> > hash_algos array, because it is not really an independent algorithm.
> > But I think it _could_ work as a separate struct that the caller derives
> > from the main hash algorithm. For example, imagine that the
> > git_hash_algo struct didn't have unsafe_init_fn, etc, but instead had:
> >
> >   struct git_hash_algo *unsafe_implementation;
> >
> > with a matching accessor like:
> >
> >   struct git_hash_algo *unsafe_hash_algo(struct git_hash_algo *algo)
> >   {
> > 	/* if we have a faster "unsafe" implementation, use that */
> > 	if (algo->unsafe_implementation)
> > 		return algo->unsafe_implementation;
> > 	/* otherwise just use the default one */
> > 	return algo;
> >   }
> >
> > And then csum-file.c, rather than calling:
> >
> >   the_hash_algo->unsafe_init_fn(...);
> >   ...
> >   the_hash_algo->unsafe_final_fn(...);
> >
> > would do this:
> >
> >   struct git_hash_algo *algo = unsafe_hash_algo(the_hash_algo);
> >   algo->init_fn(...);
> >   ...
> >   algo->final_fn(...);
> >
> > And likewise this test helper would have a single conditional at the
> > start:
> >
> >   if (unsafe)
> > 	algo = unsafe_hash_algo(algo);
> >
> > and the rest of the code would just use that.
>
> Ah, yes, I think that approach would be simpler and nicer to work with
> than a separate entry in the `hash_algos` array.  We do, however, have
> some places that assume that a `struct git_hash_algo *` points into the
> `hash_algos` array (notably `hash_algo_by_ptr`), so we'd have to adjust
> for that, move the function pointers out into their own struct which
> we'd use for `unsafe_hash_algo`, or be careful never to call the
> relevant functions on our special `git_hash_algo` struct.

I agree that the approach suggested by Peff here is clean and would
button up the test-helper code nicely. It may be worth doing in the
future, but I agree that it doesn't seem entirely in scope here.

Part of the reason that I did not initially add a separate sha1-unsafe
struct is that while it avoids this awkwardness, it creates new
awkwardness when in a non-SHA-1 repository, where you have to keep
track of whether you want to use the_unsafe_hash_algo or the SHA-256
hash algo.

In that instance, it's not a matter of computing the same result two
different ways, but rather using the wrong hash function entirely. IOW,
the hashfile API would have to realize that when in SHA-256 mode, that
it should *not* use the separate (hypothetical) unsafe SHA-1 struct.

> > All that said, I do not think it buys us anything for "real" code. There
> > the decision between safe/unsafe is in the context of how we are using
> > it, and not based on some conditional. So while the code in this test
> > helper is ugly, I don't think we'd ever write anything like that for
> > real. So it may not be worth the effort to refactor into a more virtual
> > object-oriented way.
>
> Yeah, I don't have a strong opinion one way or the other.  I think, with
> the limitation I mentioned above, it would probably require a decent
> amount of refactoring if we took a different approach, and I'm fine with
> going with Taylor's current approach unless he wants to do that
> refactoring (in which case, great).

I think it does buy you something for real code, which is that you don't
have to remember to consistently call the unsafe_ variants of all of the
various function pointers.

For instance, if you do

    the_hash_algo->unsafe_init_fn(...);

early on, and then later on by mistake write:

    the_hash_algo->update_fn(...);

Then your code is broken and will (as brian said) either in the best
case produce wrong results, or likely segfault.

So in that sense it is worth doing as it avoids the caller from having
to keep track of what "mode" they are using the_hash_algo in. But let's
take it up later, I think.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper
  2024-11-07  2:05   ` brian m. carlson
@ 2024-11-07 21:33     ` Taylor Blau
  2024-11-08 17:23       ` Jeff King
  2024-11-08 17:22     ` Jeff King
  1 sibling, 1 reply; 19+ messages in thread
From: Taylor Blau @ 2024-11-07 21:33 UTC (permalink / raw)
  To: brian m. carlson, Jeff King, git, Elijah Newren, Junio C Hamano

On Thu, Nov 07, 2024 at 02:05:26AM +0000, brian m. carlson wrote:
> On 2024-11-07 at 01:47:37, Jeff King wrote:
> > I think this is a useful thing to have, and I didn't see anything wrong
> > in the implementation. I did notice some oddities that existed before
> > your series:
> >
> >   1. Why do we have "test-tool sha256" at all? Nobody in the test suite
> >      calls it. It feels like the whole test-sha1/sha256/hash split is
> >      overly complicated. A single "test-tool hash" seems like it would
> >      be simpler, and it could take an "--algo" parameter (and an
> >      "--unsafe" one). I guess in the end we end up with the same options
> >      ,but the proliferation of top-level test-tool commands seems ugly
> >      to me (likewise "sha1_is_sha1dc").
>
> I think we do in `pack_trailer` in `t/lib-pack.sh`, but not in a
> greppable way:
>
>   # Compute and append pack trailer to "$1"
>   pack_trailer () {
>   	test-tool $(test_oid algo) -b <"$1" >trailer.tmp &&
>   	cat trailer.tmp >>"$1" &&
>   	rm -f trailer.tmp
>   }

Nice find. I think that it may be worth writing this as:

    case "$(test_oid algo)" in
    sha1)
        test-tool sha1 -b <"$1" >trailer.tmp
        ;;
    sha256)
        test-tool sha256 -b <"$1" >trailer.tmp
        ;;
    *)
        echo >&2 "unknown algorithm: $(test_oid algo)"
        exit 1
        ;;
    esac

To make it more greppable. Obviously the existing implementation is not
wrong, but I do find it remarkably hard to search for ;-).

> I think a single helper with `--algo` and `--unsafe` parameters would
> also be fine and would probably be a little more tidy, as you mention.

That would be nice too.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper
  2024-11-07  1:47 ` [PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper Jeff King
  2024-11-07  2:05   ` brian m. carlson
@ 2024-11-07 21:36   ` Taylor Blau
  2024-11-08 17:23     ` Jeff King
  1 sibling, 1 reply; 19+ messages in thread
From: Taylor Blau @ 2024-11-07 21:36 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano, brian m. carlson

On Wed, Nov 06, 2024 at 08:47:37PM -0500, Jeff King wrote:
>   2. You modified test-sha1.sh, but I've wondered if we should just
>      delete that script. It is not ever invoked in the test suite AFAIK.
>      If we want correctness tests, they should go into a real t[0-9]*.sh
>      script (though one imagines we exercise the sha1 code quite a lot
>      in the rest of the tests). And it starts with a weird ad-hoc
>      performance test that doesn't really tell us much. A t/perf test
>      would be better (if it is even worth measuring at all).

Yeah, I was sort of puzzled and thinking the same thing as I wrote the
patch.

I wouldn't be opposed to deleting it in the future, and certainly have
no strong feelings about keeping it around in the meantime. It just
seemed like the path of least resistance to bring it along for the
sha1-unsafe ride instead of deleting it outright.

> So I dunno. None of those are the fault of your series, but it is piling
> on yet another test-tool command.

Yeah, I think there was a fair amount of interesting discussion about
possible alternatives in this thread, which I am appreciative of.

I think that if nobody has strong feelings or issues with the current
implementation to add the sha1-unsafe test-tool, that we should do so
and take these patches.

In the future we can consider other things on top, like dropping the
test-sha1.sh script, having an unsafe pointer embedded within
the_hash_algo, or something else entirely. But those can be done on top,
or not at all, and I don't want to hold up this series for them.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode
  2024-11-07 21:30           ` Taylor Blau
@ 2024-11-07 23:20             ` Taylor Blau
  2024-11-08 17:26             ` Jeff King
  1 sibling, 0 replies; 19+ messages in thread
From: Taylor Blau @ 2024-11-07 23:20 UTC (permalink / raw)
  To: brian m. carlson, Jeff King, Junio C Hamano, git, Elijah Newren

On Thu, Nov 07, 2024 at 04:30:37PM -0500, Taylor Blau wrote:
> > > All that said, I do not think it buys us anything for "real" code. There
> > > the decision between safe/unsafe is in the context of how we are using
> > > it, and not based on some conditional. So while the code in this test
> > > helper is ugly, I don't think we'd ever write anything like that for
> > > real. So it may not be worth the effort to refactor into a more virtual
> > > object-oriented way.
> >
> > Yeah, I don't have a strong opinion one way or the other.  I think, with
> > the limitation I mentioned above, it would probably require a decent
> > amount of refactoring if we took a different approach, and I'm fine with
> > going with Taylor's current approach unless he wants to do that
> > refactoring (in which case, great).
>
> I think it does buy you something for real code, which is that you don't
> have to remember to consistently call the unsafe_ variants of all of the
> various function pointers.
>
> For instance, if you do
>
>     the_hash_algo->unsafe_init_fn(...);
>
> early on, and then later on by mistake write:
>
>     the_hash_algo->update_fn(...);
>
> Then your code is broken and will (as brian said) either in the best
> case produce wrong results, or likely segfault.
>
> So in that sense it is worth doing as it avoids the caller from having
> to keep track of what "mode" they are using the_hash_algo in. But let's
> take it up later, I think.

The idea seemed too fun to play with, so I tried my hand at implementing
it and was quite pleased with the result.

The WIP-quality patches are available in my tree under the
wip/tb/unsafe-hash-algop[1] branch. The result, at least as it applies
to the test-tool modified here, is quite pleasing IMHO:

--- 8< ---
Subject: [PATCH] t/helper/test-hash.c: use unsafe_hash_algo()

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/helper/test-hash.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/t/helper/test-hash.c b/t/helper/test-hash.c
index d0ee668df95..aa82638c621 100644
--- a/t/helper/test-hash.c
+++ b/t/helper/test-hash.c
@@ -9,6 +9,8 @@ int cmd_hash_impl(int ac, const char **av, int algo, int unsafe)
 	int binary = 0;
 	char *buffer;
 	const struct git_hash_algo *algop = &hash_algos[algo];
+	if (unsafe)
+		algop = unsafe_hash_algo(algop);

 	if (ac == 2) {
 		if (!strcmp(av[1], "-b"))
@@ -27,10 +29,7 @@ int cmd_hash_impl(int ac, const char **av, int algo, int unsafe)
 			die("OOPS");
 	}

-	if (unsafe)
-		algop->unsafe_init_fn(&ctx);
-	else
-		algop->init_fn(&ctx);
+	algop->init_fn(&ctx);

 	while (1) {
 		ssize_t sz, this_sz;
@@ -49,15 +48,9 @@ int cmd_hash_impl(int ac, const char **av, int algo, int unsafe)
 		}
 		if (this_sz == 0)
 			break;
-		if (unsafe)
-			algop->unsafe_update_fn(&ctx, buffer, this_sz);
-		else
-			algop->update_fn(&ctx, buffer, this_sz);
+		algop->update_fn(&ctx, buffer, this_sz);
 	}
-	if (unsafe)
-		algop->unsafe_final_fn(hash, &ctx);
-	else
-		algop->final_fn(hash, &ctx);
+	algop->final_fn(hash, &ctx);

 	if (binary)
 		fwrite(hash, 1, algop->rawsz, stdout);

base-commit: d8c1fc78b57e02a140b5c363caaa14c2dc2bb274
prerequisite-patch-id: 5df87a6ba8a596bc7834ce8b5a656a3c4f1a869f
prerequisite-patch-id: c3f346e57f98b067eef8adf1e20c70ac23f41c36
prerequisite-patch-id: c5d1ec4f5e9796c5c9232442a3fc3be79379b8c4
prerequisite-patch-id: bdc2d6cbc23c467b24f184a889a5242e5c9fe774
--
2.47.0.238.g87615aecf12
--- >8 ---

To be clear: I think that we should still take this series as-is, and
I'll send the additional half dozen or so patches on top as a new series
dependent on this one.

Thanks,
Taylor

[1]: https://github.com/ttaylorr/git/compare/tb/sha1-unsafe-helper...ttaylorr:git:wip/tb/unsafe-hash-algop

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper
  2024-11-07  2:05   ` brian m. carlson
  2024-11-07 21:33     ` Taylor Blau
@ 2024-11-08 17:22     ` Jeff King
  1 sibling, 0 replies; 19+ messages in thread
From: Jeff King @ 2024-11-08 17:22 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Taylor Blau, git, Elijah Newren, Junio C Hamano

On Thu, Nov 07, 2024 at 02:05:26AM +0000, brian m. carlson wrote:

> >   1. Why do we have "test-tool sha256" at all? Nobody in the test suite
> >      calls it. It feels like the whole test-sha1/sha256/hash split is
> >      overly complicated. A single "test-tool hash" seems like it would
> >      be simpler, and it could take an "--algo" parameter (and an
> >      "--unsafe" one). I guess in the end we end up with the same options
> >      ,but the proliferation of top-level test-tool commands seems ugly
> >      to me (likewise "sha1_is_sha1dc").
> 
> I think we do in `pack_trailer` in `t/lib-pack.sh`, but not in a
> greppable way:
> 
>   # Compute and append pack trailer to "$1"
>   pack_trailer () {
>   	test-tool $(test_oid algo) -b <"$1" >trailer.tmp &&
>   	cat trailer.tmp >>"$1" &&
>   	rm -f trailer.tmp
>   }
> 
> When you posed the question above, I wasn't sure why I added this
> functionality: was it just to test my SHA-256 implementation adequately
> or did it have some actual utility in the testsuite?  However, I knew if
> it didn't appear straightforwardly in `git grep`, any uses would involve
> `test_oid`, and boom, I was right.

Ah, good catch. Or good recall, I suppose. ;)

It feels a tiny bit hacky that we have to specify the algo here. If
there were a plumbing tool to work with the trailing hash of a
csum-file, then naturally it would use the repo's algorithm instead of
having to specify it. And I have run into situations once or twice where
that might have been a useful debugging tool, rather than hacking
together shell tools like dd. But since we have already done that
hacking together in the test suite, and this is the only spot that has
needed it so far, it's probably worth letting sleeping dogs lie.

> I think a single helper with `--algo` and `--unsafe` parameters would
> also be fine and would probably be a little more tidy, as you mention.

Yup.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper
  2024-11-07 21:33     ` Taylor Blau
@ 2024-11-08 17:23       ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2024-11-08 17:23 UTC (permalink / raw)
  To: Taylor Blau; +Cc: brian m. carlson, git, Elijah Newren, Junio C Hamano

On Thu, Nov 07, 2024 at 04:33:56PM -0500, Taylor Blau wrote:

> >   # Compute and append pack trailer to "$1"
> >   pack_trailer () {
> >   	test-tool $(test_oid algo) -b <"$1" >trailer.tmp &&
> >   	cat trailer.tmp >>"$1" &&
> >   	rm -f trailer.tmp
> >   }
> 
> Nice find. I think that it may be worth writing this as:
> 
>     case "$(test_oid algo)" in
>     sha1)
>         test-tool sha1 -b <"$1" >trailer.tmp
>         ;;
>     sha256)
>         test-tool sha256 -b <"$1" >trailer.tmp
>         ;;
>     *)
>         echo >&2 "unknown algorithm: $(test_oid algo)"
>         exit 1
>         ;;
>     esac

If it were "test-tool hash --algo=$(test_oid algo)", then presumably
test-tool would naturally do the same switch and error message
internally.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper
  2024-11-07 21:36   ` Taylor Blau
@ 2024-11-08 17:23     ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2024-11-08 17:23 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano, brian m. carlson

On Thu, Nov 07, 2024 at 04:36:04PM -0500, Taylor Blau wrote:

> > So I dunno. None of those are the fault of your series, but it is piling
> > on yet another test-tool command.
> 
> Yeah, I think there was a fair amount of interesting discussion about
> possible alternatives in this thread, which I am appreciative of.
> 
> I think that if nobody has strong feelings or issues with the current
> implementation to add the sha1-unsafe test-tool, that we should do so
> and take these patches.

I'm OK with that direction.

> In the future we can consider other things on top, like dropping the
> test-sha1.sh script, having an unsafe pointer embedded within
> the_hash_algo, or something else entirely. But those can be done on top,
> or not at all, and I don't want to hold up this series for them.

Sounds good.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode
  2024-11-07 21:30           ` Taylor Blau
  2024-11-07 23:20             ` Taylor Blau
@ 2024-11-08 17:26             ` Jeff King
  1 sibling, 0 replies; 19+ messages in thread
From: Jeff King @ 2024-11-08 17:26 UTC (permalink / raw)
  To: Taylor Blau; +Cc: brian m. carlson, Junio C Hamano, git, Elijah Newren

On Thu, Nov 07, 2024 at 04:30:37PM -0500, Taylor Blau wrote:

> > Yeah, I don't have a strong opinion one way or the other.  I think, with
> > the limitation I mentioned above, it would probably require a decent
> > amount of refactoring if we took a different approach, and I'm fine with
> > going with Taylor's current approach unless he wants to do that
> > refactoring (in which case, great).
> 
> I think it does buy you something for real code, which is that you don't
> have to remember to consistently call the unsafe_ variants of all of the
> various function pointers.
> 
> For instance, if you do
> 
>     the_hash_algo->unsafe_init_fn(...);
> 
> early on, and then later on by mistake write:
> 
>     the_hash_algo->update_fn(...);
> 
> Then your code is broken and will (as brian said) either in the best
> case produce wrong results, or likely segfault.

Yes, true. I sort of assume that all of those calls are happening within
one function (or at least a suite of related functions). Just because
there's an implicit context of "I am computing the hash for an object"
versus "I am computing a checksum".

And if we ever do move to splitting those further (to have crc32 or
whatever for the checksum), then having a git_hash_algo for that would
seem even weirder.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-11-08 17:26 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-05 19:05 [PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper Taylor Blau
2024-11-05 19:05 ` [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode Taylor Blau
2024-11-06  1:38   ` Junio C Hamano
2024-11-07  0:48     ` brian m. carlson
2024-11-07  1:39       ` Jeff King
2024-11-07  1:49         ` brian m. carlson
2024-11-07  2:08           ` Jeff King
2024-11-07  3:08             ` Junio C Hamano
2024-11-07 21:30           ` Taylor Blau
2024-11-07 23:20             ` Taylor Blau
2024-11-08 17:26             ` Jeff King
2024-11-05 19:05 ` [PATCH v2 2/2] t/helper/test-tool: implement sha1-unsafe helper Taylor Blau
2024-11-07  1:47 ` [PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper Jeff King
2024-11-07  2:05   ` brian m. carlson
2024-11-07 21:33     ` Taylor Blau
2024-11-08 17:23       ` Jeff King
2024-11-08 17:22     ` Jeff King
2024-11-07 21:36   ` Taylor Blau
2024-11-08 17:23     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).