git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>,
	Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode
Date: Thu, 7 Nov 2024 16:30:37 -0500	[thread overview]
Message-ID: <Zy0xfVqtOXhEVDQB@nand.local> (raw)
In-Reply-To: <Zywcr2lMM_Ij8suu@tapette.crustytoothpaste.net>

On Thu, Nov 07, 2024 at 01:49:35AM +0000, brian m. carlson wrote:
> On 2024-11-07 at 01:39:15, Jeff King wrote:
> > So I think you wouldn't want to allocate an enum or a slot in the
> > hash_algos array, because it is not really an independent algorithm.
> > But I think it _could_ work as a separate struct that the caller derives
> > from the main hash algorithm. For example, imagine that the
> > git_hash_algo struct didn't have unsafe_init_fn, etc, but instead had:
> >
> >   struct git_hash_algo *unsafe_implementation;
> >
> > with a matching accessor like:
> >
> >   struct git_hash_algo *unsafe_hash_algo(struct git_hash_algo *algo)
> >   {
> > 	/* if we have a faster "unsafe" implementation, use that */
> > 	if (algo->unsafe_implementation)
> > 		return algo->unsafe_implementation;
> > 	/* otherwise just use the default one */
> > 	return algo;
> >   }
> >
> > And then csum-file.c, rather than calling:
> >
> >   the_hash_algo->unsafe_init_fn(...);
> >   ...
> >   the_hash_algo->unsafe_final_fn(...);
> >
> > would do this:
> >
> >   struct git_hash_algo *algo = unsafe_hash_algo(the_hash_algo);
> >   algo->init_fn(...);
> >   ...
> >   algo->final_fn(...);
> >
> > And likewise this test helper would have a single conditional at the
> > start:
> >
> >   if (unsafe)
> > 	algo = unsafe_hash_algo(algo);
> >
> > and the rest of the code would just use that.
>
> Ah, yes, I think that approach would be simpler and nicer to work with
> than a separate entry in the `hash_algos` array.  We do, however, have
> some places that assume that a `struct git_hash_algo *` points into the
> `hash_algos` array (notably `hash_algo_by_ptr`), so we'd have to adjust
> for that, move the function pointers out into their own struct which
> we'd use for `unsafe_hash_algo`, or be careful never to call the
> relevant functions on our special `git_hash_algo` struct.

I agree that the approach suggested by Peff here is clean and would
button up the test-helper code nicely. It may be worth doing in the
future, but I agree that it doesn't seem entirely in scope here.

Part of the reason that I did not initially add a separate sha1-unsafe
struct is that while it avoids this awkwardness, it creates new
awkwardness when in a non-SHA-1 repository, where you have to keep
track of whether you want to use the_unsafe_hash_algo or the SHA-256
hash algo.

In that instance, it's not a matter of computing the same result two
different ways, but rather using the wrong hash function entirely. IOW,
the hashfile API would have to realize that when in SHA-256 mode, that
it should *not* use the separate (hypothetical) unsafe SHA-1 struct.

> > All that said, I do not think it buys us anything for "real" code. There
> > the decision between safe/unsafe is in the context of how we are using
> > it, and not based on some conditional. So while the code in this test
> > helper is ugly, I don't think we'd ever write anything like that for
> > real. So it may not be worth the effort to refactor into a more virtual
> > object-oriented way.
>
> Yeah, I don't have a strong opinion one way or the other.  I think, with
> the limitation I mentioned above, it would probably require a decent
> amount of refactoring if we took a different approach, and I'm fine with
> going with Taylor's current approach unless he wants to do that
> refactoring (in which case, great).

I think it does buy you something for real code, which is that you don't
have to remember to consistently call the unsafe_ variants of all of the
various function pointers.

For instance, if you do

    the_hash_algo->unsafe_init_fn(...);

early on, and then later on by mistake write:

    the_hash_algo->update_fn(...);

Then your code is broken and will (as brian said) either in the best
case produce wrong results, or likely segfault.

So in that sense it is worth doing as it avoids the caller from having
to keep track of what "mode" they are using the_hash_algo in. But let's
take it up later, I think.

Thanks,
Taylor

  parent reply	other threads:[~2024-11-07 21:30 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-05 19:05 [PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper Taylor Blau
2024-11-05 19:05 ` [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode Taylor Blau
2024-11-06  1:38   ` Junio C Hamano
2024-11-07  0:48     ` brian m. carlson
2024-11-07  1:39       ` Jeff King
2024-11-07  1:49         ` brian m. carlson
2024-11-07  2:08           ` Jeff King
2024-11-07  3:08             ` Junio C Hamano
2024-11-07 21:30           ` Taylor Blau [this message]
2024-11-07 23:20             ` Taylor Blau
2024-11-08 17:26             ` Jeff King
2024-11-05 19:05 ` [PATCH v2 2/2] t/helper/test-tool: implement sha1-unsafe helper Taylor Blau
2024-11-07  1:47 ` [PATCH v2 0/2] t/helper/test-tool: implement 'sha1-unsafe' helper Jeff King
2024-11-07  2:05   ` brian m. carlson
2024-11-07 21:33     ` Taylor Blau
2024-11-08 17:23       ` Jeff King
2024-11-08 17:22     ` Jeff King
2024-11-07 21:36   ` Taylor Blau
2024-11-08 17:23     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zy0xfVqtOXhEVDQB@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).