All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brandon Williams <bmwill@google.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: git@vger.kernel.org, Jonathan Nieder <jrnieder@gmail.com>,
	Stefan Beller <sbeller@google.com>
Subject: Re: [PATCH v2 2/4] Add structure representing hash algorithm
Date: Mon, 30 Oct 2017 16:36:15 -0700	[thread overview]
Message-ID: <20171030233615.GA94048@google.com> (raw)
In-Reply-To: <20171028181239.59458-3-sandals@crustytoothpaste.net>

On 10/28, brian m. carlson wrote:
> Since in the future we want to support an additional hash algorithm, add
> a structure that represents a hash algorithm and all the data that must
> go along with it.  Add a constant to allow easy enumeration of hash
> algorithms.  Implement function typedefs to create an abstract API that
> can be used by any hash algorithm, and wrappers for the existing SHA1
> functions that conform to this API.
> 
> Expose a value for hex size as well as binary size.  While one will
> always be twice the other, the two values are both used extremely
> commonly throughout the codebase and providing both leads to improved
> readability.
> 
> Don't include an entry in the hash algorithm structure for the null
> object ID.  As this value is all zeros, any suitably sized all-zero
> object ID can be used, and there's no need to store a given one on a
> per-hash basis.
> 
> The current hash function transition plan envisions a time when we will
> accept input from the user that might be in SHA-1 or in the NewHash
> format.  Since we cannot know which the user has provided, add a
> constant representing the unknown algorithm to allow us to indicate that
> we must look the correct value up.
> 
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> I believe I did get the format_id constant for SHA-1 right, but
> sanity-checking would be appreciated.  We don't want another Referer
> mess.
> 
>  cache.h     | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  sha1_file.c | 43 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 98 insertions(+)
> 
> diff --git a/cache.h b/cache.h
> index 6440e2bf21..9e9eb08f05 100644
> --- a/cache.h
> +++ b/cache.h

Maybe it would be good to place this interface in its own header file
that way we can avoid cluttering cache.h with more stuff?

> @@ -77,6 +77,61 @@ struct object_id {
>  	unsigned char hash[GIT_MAX_RAWSZ];
>  };
>  
> +/*
> + * Note that these constants are suitable for indexing the hash_algos array and
> + * comparing against each other, but are otherwise arbitrary, so they should not
> + * be exposed to the user or serialized to disk.  To know whether a
> + * git_hash_algo struct points to some usable hash function, test the format_id
> + * field for being non-zero.  Use the name field for user-visible situations and
> + * the format_id field for fixed-length fields on disk.
> + */
> +/* An unknown hash function. */
> +#define GIT_HASH_UNKNOWN 0
> +/* SHA-1 */
> +#define GIT_HASH_SHA1 1
> +/* Number of algorithms supported (including unknown). */
> +#define GIT_HASH_NALGOS (GIT_HASH_SHA1 + 1)
> +
> +typedef void (*git_hash_init_fn)(void *ctx);
> +typedef void (*git_hash_update_fn)(void *ctx, const void *in, size_t len);
> +typedef void (*git_hash_final_fn)(unsigned char *hash, void *ctx);
> +
> +struct git_hash_algo {
> +	/*
> +	 * The name of the algorithm, as appears in the config file and in
> +	 * messages.
> +	 */
> +	const char *name;
> +
> +	/* A four-byte version identifier, used in pack indices. */
> +	uint32_t format_id;
> +
> +	/* The size of a hash context (e.g. git_SHA_CTX). */
> +	size_t ctxsz;
> +
> +	/* The length of the hash in binary. */
> +	size_t rawsz;
> +
> +	/* The length of the hash in hex characters. */
> +	size_t hexsz;
> +
> +	/* The hash initialization function. */
> +	git_hash_init_fn init_fn;
> +
> +	/* The hash update function. */
> +	git_hash_update_fn update_fn;
> +
> +	/* The hash finalization function. */
> +	git_hash_final_fn final_fn;
> +
> +	/* The OID of the empty tree. */
> +	const struct object_id *empty_tree;
> +
> +	/* The OID of the empty blob. */
> +	const struct object_id *empty_blob;
> +};
> +extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS];
> +
>  #if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
>  #define DTYPE(de)	((de)->d_type)
>  #else
> diff --git a/sha1_file.c b/sha1_file.c
> index 10c3a0083d..77b320126a 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -39,6 +39,49 @@ const struct object_id empty_blob_oid = {
>  	EMPTY_BLOB_SHA1_BIN_LITERAL
>  };
>  
> +static inline void git_hash_sha1_init(void *ctx)
> +{
> +	git_SHA1_Init((git_SHA_CTX *)ctx);
> +}
> +
> +static inline void git_hash_sha1_update(void *ctx, const void *data, size_t len)
> +{
> +	git_SHA1_Update((git_SHA_CTX *)ctx, data, len);
> +}
> +
> +static inline void git_hash_sha1_final(unsigned char *hash, void *ctx)
> +{
> +	git_SHA1_Final(hash, (git_SHA_CTX *)ctx);
> +}
> +
> +const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
> +	{
> +		NULL,
> +		0x00000000,
> +		0,
> +		0,
> +		0,
> +		NULL,
> +		NULL,
> +		NULL,
> +		NULL,
> +		NULL,
> +	},
> +	{
> +		"sha-1",
> +		/* "sha1", big-endian */
> +		0x73686131,
> +		sizeof(git_SHA_CTX),
> +		GIT_SHA1_RAWSZ,
> +		GIT_SHA1_HEXSZ,
> +		git_hash_sha1_init,
> +		git_hash_sha1_update,
> +		git_hash_sha1_final,
> +		&empty_tree_oid,
> +		&empty_blob_oid,
> +	},
> +};
> +
>  /*
>   * This is meant to hold a *small* number of objects that you would
>   * want read_sha1_file() to be able to return, but yet you do not want
> -- 
> 2.15.0.rc2
> 

-- 
Brandon Williams

  parent reply	other threads:[~2017-10-30 23:36 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-28 18:12 [PATCH v2 0/4] Hash Abstraction brian m. carlson
2017-10-28 18:12 ` [PATCH v2 1/4] setup: expose enumerated repo info brian m. carlson
2017-10-30 16:08   ` Stefan Beller
2017-10-28 18:12 ` [PATCH v2 2/4] Add structure representing hash algorithm brian m. carlson
2017-10-29  1:36   ` Eric Sunshine
2017-10-29 17:00     ` brian m. carlson
2017-10-30 16:14   ` Stefan Beller
2017-10-30 23:36   ` Brandon Williams [this message]
2017-11-01  1:35     ` brian m. carlson
2017-10-28 18:12 ` [PATCH v2 3/4] Integrate hash algorithm support with repo setup brian m. carlson
2017-10-29  1:44   ` Eric Sunshine
2017-10-29 17:57     ` brian m. carlson
2017-10-29 19:02       ` Eric Sunshine
2017-10-29 19:33         ` brian m. carlson
2017-10-30  2:13           ` Junio C Hamano
2017-10-30  2:54             ` brian m. carlson
2017-10-30 16:27   ` Stefan Beller
2017-10-28 18:12 ` [PATCH v2 4/4] Switch empty tree and blob lookups to use hash abstraction brian m. carlson
2017-10-30 16:45 ` [PATCH v2 0/4] Hash Abstraction Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171030233615.GA94048@google.com \
    --to=bmwill@google.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.