git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-07 23:23 brian m. carlson
@ 2015-03-07 23:23 ` brian m. carlson
       [not found]   ` <CEA07500-9F47-4B24-AD5D-1423A601A4DD@gmail.com>
  0 siblings, 1 reply; 27+ messages in thread
From: brian m. carlson @ 2015-03-07 23:23 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Andreas Schwab

Many places throughout the code use "unsigned char [20]" to store object IDs
(SHA-1 values).  This leads to lots of hardcoded numbers throughout the
codebase.  It also leads to confusion about the purposes of a buffer.

Introduce a structure for object IDs.  This allows us to obtain the benefits
of compile-time checking for misuse.  The structure is expected to remain
the same size and have the same alignment requirements on all known
platforms, compared to the array of unsigned char, although this is not
required for correctness.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 object.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/object.h b/object.h
index 6416247..f99b502 100644
--- a/object.h
+++ b/object.h
@@ -1,6 +1,14 @@
 #ifndef OBJECT_H
 #define OBJECT_H
 
+/* The length in bytes and in hex digits of an object name (SHA-1 value). */
+#define GIT_SHA1_RAWSZ 20
+#define GIT_SHA1_HEXSZ 40
+
+struct object_id {
+	unsigned char sha1[GIT_SHA1_RAWSZ];
+};
+
 struct object_list {
 	struct object *item;
 	struct object_list *next;
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
       [not found]   ` <CEA07500-9F47-4B24-AD5D-1423A601A4DD@gmail.com>
@ 2015-03-11 22:08     ` brian m. carlson
  2015-03-12  0:26       ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: brian m. carlson @ 2015-03-11 22:08 UTC (permalink / raw)
  To: Kyle J. McKay; +Cc: git, Michael Haggerty, Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 1224 bytes --]

On Tue, Mar 10, 2015 at 07:38:42PM -0700, Kyle J. McKay wrote:
>GIT_SHA1_HEXSZ will always be exactly 2 * GIT_SHA1_RAWSZ, right?  In
>fact, if it's not things will almost certainly break, yes?
>
>Does it make more sense then to reflect this requirement by using:
>
>  #define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
>
>instead?

Yes.  I'll make that change in the next version.

>I don't see anything wrong with this.  However, in part 02/10 the
>utility functions all use "oid" in their names, so I'm thinking that
>it may make more sense to just go with:
>
>struct object_id {
>	unsigned char oid[GIT_SHA1_RAWSZ];
>};
>
>to match?

Michael Haggerty recommended that I call the structure element sha1
instead of oid in case we want to turn this into a union if we decide to
go the additional hash route.

I think it can also improve readability if we use "oid" only for the
instances of the struct itself, especially since it makes it more
obvious what code has been converted already.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-11 22:08     ` brian m. carlson
@ 2015-03-12  0:26       ` Junio C Hamano
  2015-03-12  9:34         ` brian m. carlson
  2015-03-12 10:28         ` Michael Haggerty
  0 siblings, 2 replies; 27+ messages in thread
From: Junio C Hamano @ 2015-03-12  0:26 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Kyle J. McKay, git, Michael Haggerty, Andreas Schwab

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Michael Haggerty recommended that I call the structure element sha1
> instead of oid in case we want to turn this into a union if we decide to
> go the additional hash route.

I'd advise against it.

As I wrote in $gmane/265337 in response to Michael:

    > 4. We continue to support working with SHA-1s declared to be (unsigned
    > char *) in some performance-critical code, even as we migrate most other
    > code to using SHA-1s embedded within a (struct object_id). This will
    > cost some duplication of code. To accept this approach, we would need an
    > idea of *how much* code duplication would be needed. E.g., how many
    > functions will need both (unsigned char *) versions and (struct
    > object_id *) versions?

    ...

    I do not know what kind of code duplication you are worried about,
    though.  If a callee needs "unsigned char *", the caller that has a
    "struct object_id *o" should pass o->hash to the callee.

And that would break the abstraction effort if you start calling the
field with a name that is specific to the underlying hash function.
The caller has to change o->sha1 to o->sha256 instead of keeping
that as o->oid and letting the callee handle the implementation
details when calling

        if (!hashcmp(o1->oid, o2->oid))
                ; /* they are the same */
        else
                ; /* they are different */

The only folks that need to _know_ what hash function is used or
how long the field is are the ones that have raw bytes of the hash
obtained from files (e.g. from the index) and they would do
something like this to implement a function that checks the file
records an object name that is expected by the caller:

        void check_oid(int fd, struct object_id *expected)
        {
                unsigned char object_name[GIT_HASH_RAWSZ];

                ...
                read(fd, object_name, GIT_HASH_RAWSZ);
                if (hashcmp(object_name, expected->oid))
                        die("object name mismatch???");
        }

or when they do know they are using a specific function, do:

        void compute_object_name(struct object_id *id,
                                unsignd char*data,
                                size_t len)
        {
                git_SHA_CTX c;
                unsigned char *sha1 = &(id->oid);

                /* if we are paranoid... */
                assert(sizeof(id->oid) >= 20);

                ...
                git_SHA1_Init(&c);
                git_SHA1_Update(&c, data, len);
                ...
                git_SHA1_Final(sha1, &c);
        }

Even the latter would not be helped if the field to store the hash
were named id->sha1 very much, I would think.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-12  0:26       ` Junio C Hamano
@ 2015-03-12  9:34         ` brian m. carlson
  2015-03-12 10:28         ` Michael Haggerty
  1 sibling, 0 replies; 27+ messages in thread
From: brian m. carlson @ 2015-03-12  9:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kyle J. McKay, git, Michael Haggerty, Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 849 bytes --]

On Wed, Mar 11, 2015 at 05:26:56PM -0700, Junio C Hamano wrote:
>"brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
>> Michael Haggerty recommended that I call the structure element sha1
>> instead of oid in case we want to turn this into a union if we decide to
>> go the additional hash route.
>
>I'd advise against it.

Yeah, after re-reading this, I think "hash" is the best choice for the
underlying element name.  That way, it's different from "oid", which
would be the instances of the struct itself, and it's sufficiently
different from unconverted places in the code, which don't typically use
that term.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-12  0:26       ` Junio C Hamano
  2015-03-12  9:34         ` brian m. carlson
@ 2015-03-12 10:28         ` Michael Haggerty
  2015-03-12 10:46           ` brian m. carlson
  1 sibling, 1 reply; 27+ messages in thread
From: Michael Haggerty @ 2015-03-12 10:28 UTC (permalink / raw)
  To: Junio C Hamano, brian m. carlson; +Cc: Kyle J. McKay, git, Andreas Schwab

On 03/12/2015 01:26 AM, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
>> Michael Haggerty recommended that I call the structure element sha1
>> instead of oid in case we want to turn this into a union if we decide to
>> go the additional hash route.
> 
> I'd advise against it.
> 
> As I wrote in $gmane/265337 in response to Michael:
> 
>     > 4. We continue to support working with SHA-1s declared to be (unsigned
>     > char *) in some performance-critical code, even as we migrate most other
>     > code to using SHA-1s embedded within a (struct object_id). This will
>     > cost some duplication of code. To accept this approach, we would need an
>     > idea of *how much* code duplication would be needed. E.g., how many
>     > functions will need both (unsigned char *) versions and (struct
>     > object_id *) versions?
> 
>     ...
> 
>     I do not know what kind of code duplication you are worried about,
>     though.  If a callee needs "unsigned char *", the caller that has a
>     "struct object_id *o" should pass o->hash to the callee.
> 
> And that would break the abstraction effort if you start calling the
> field with a name that is specific to the underlying hash function.
> The caller has to change o->sha1 to o->sha256 instead of keeping
> that as o->oid and letting the callee handle the implementation
> details when calling
> 
>         if (!hashcmp(o1->oid, o2->oid))
>                 ; /* they are the same */
>         else
>                 ; /* they are different */
> [...]

Hmm, I guess you imagine that we might sometimes pack SHA-1s, sometimes
SHA-256s (or whatever) in the "oid" field, which would be dimensioned
large enough for either one (with, say, SHA-1s padded with zeros).

I was imagining that this would evolve into a union (or maybe struct) of
different hash types, like

    struct object_id {
        unsigned char hash_type;
        union {
            unsigned char sha1[GIT_SHA1_RAWSZ];
            unsigned char sha256[GIT_SHA256_RAWSZ];
        } hash;
    };

BTW in either case, any hopes of mapping object_id objects directly on
top of buffer memory would disappear.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-12 10:28         ` Michael Haggerty
@ 2015-03-12 10:46           ` brian m. carlson
  2015-03-12 11:16             ` Duy Nguyen
  0 siblings, 1 reply; 27+ messages in thread
From: brian m. carlson @ 2015-03-12 10:46 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, Kyle J. McKay, git, Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 2405 bytes --]

On Thu, Mar 12, 2015 at 11:28:10AM +0100, Michael Haggerty wrote:
>On 03/12/2015 01:26 AM, Junio C Hamano wrote:
>> And that would break the abstraction effort if you start calling the
>> field with a name that is specific to the underlying hash function.
>> The caller has to change o->sha1 to o->sha256 instead of keeping
>> that as o->oid and letting the callee handle the implementation
>> details when calling
>>
>>         if (!hashcmp(o1->oid, o2->oid))
>>                 ; /* they are the same */
>>         else
>>                 ; /* they are different */
>> [...]
>
>Hmm, I guess you imagine that we might sometimes pack SHA-1s, sometimes
>SHA-256s (or whatever) in the "oid" field, which would be dimensioned
>large enough for either one (with, say, SHA-1s padded with zeros).
>
>I was imagining that this would evolve into a union (or maybe struct) of
>different hash types, like
>
>    struct object_id {
>        unsigned char hash_type;
>        union {
>            unsigned char sha1[GIT_SHA1_RAWSZ];
>            unsigned char sha256[GIT_SHA256_RAWSZ];
>        } hash;
>    };
>
>BTW in either case, any hopes of mapping object_id objects directly on
>top of buffer memory would disappear.

What I think might be more beneficial is to make the hash function
specific to a particular repository, and therefore you could maintain
something like this:

  struct object_id {
      unsigned char hash[GIT_MAX_RAWSZ];
  };

and make hash_type (or hash_length) a global[0].  I don't think it's
very worthwhile to try to mix two different hash functions in the same
repository, so we could still map directly onto buffer memory if we
decide that's portable enough.  I expect the cases where we need to do
that will be relatively limited.

Regardless, it seems that this solution has the most support (including
Junio's) and it's more self-documenting than my current set of patches,
so I'm going to go with it for now.  It should be easy to change if the
consensus goes back the other way.

[0] I personally think globals are a bit gross, but they don't seem to
have the problems that they would if git were a shared library.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-12 10:46           ` brian m. carlson
@ 2015-03-12 11:16             ` Duy Nguyen
  2015-03-12 18:24               ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Duy Nguyen @ 2015-03-12 11:16 UTC (permalink / raw)
  To: brian m. carlson, Michael Haggerty, Junio C Hamano, Kyle J. McKay,
	Git Mailing List, Andreas Schwab

On Thu, Mar 12, 2015 at 5:46 PM, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> On Thu, Mar 12, 2015 at 11:28:10AM +0100, Michael Haggerty wrote:
>>
>> On 03/12/2015 01:26 AM, Junio C Hamano wrote:
>>>
>>> And that would break the abstraction effort if you start calling the
>>> field with a name that is specific to the underlying hash function.
>>> The caller has to change o->sha1 to o->sha256 instead of keeping
>>> that as o->oid and letting the callee handle the implementation
>>> details when calling
>>>
>>>         if (!hashcmp(o1->oid, o2->oid))
>>>                 ; /* they are the same */
>>>         else
>>>                 ; /* they are different */
>>> [...]
>>
>>
>> Hmm, I guess you imagine that we might sometimes pack SHA-1s, sometimes
>> SHA-256s (or whatever) in the "oid" field, which would be dimensioned
>> large enough for either one (with, say, SHA-1s padded with zeros).
>>
>> I was imagining that this would evolve into a union (or maybe struct) of
>> different hash types, like
>>
>>    struct object_id {
>>        unsigned char hash_type;
>>        union {
>>            unsigned char sha1[GIT_SHA1_RAWSZ];
>>            unsigned char sha256[GIT_SHA256_RAWSZ];
>>        } hash;
>>    };
>>
>> BTW in either case, any hopes of mapping object_id objects directly on
>> top of buffer memory would disappear.
>
>
> What I think might be more beneficial is to make the hash function
> specific to a particular repository, and therefore you could maintain
> something like this:
>
>  struct object_id {
>      unsigned char hash[GIT_MAX_RAWSZ];
>  };
>
> and make hash_type (or hash_length) a global[0].  I don't think it's
> very worthwhile to try to mix two different hash functions in the same
> repository,

This may or may not fall into the "mix different hash functions"
category. In pack files version 4, trees are encoded to point to other
trees or blobs by a (pack, offset) tuple. It would be great if the new
object_id could support carrying this kind of object id around because
it could help reduce object lookup cost a lot. (pack, offset) can be
converted back to SHA-1 so no info is lost and hashcmp() can compare
(pack, tuple) against an SHA-1 just fine.

> so we could still map directly onto buffer memory if we
> decide that's portable enough.  I expect the cases where we need to do
> that will be relatively limited.
>
> Regardless, it seems that this solution has the most support (including
> Junio's) and it's more self-documenting than my current set of patches,
> so I'm going to go with it for now.  It should be easy to change if the
> consensus goes back the other way.
>
> [0] I personally think globals are a bit gross, but they don't seem to
> have the problems that they would if git were a shared library.
>
> --
> brian m. carlson / brian with sandals: Houston, Texas, US
> +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
> OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187



-- 
Duy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-12 11:16             ` Duy Nguyen
@ 2015-03-12 18:24               ` Junio C Hamano
  2015-03-12 18:44                 ` Junio C Hamano
  2015-03-13  0:58                 ` Duy Nguyen
  0 siblings, 2 replies; 27+ messages in thread
From: Junio C Hamano @ 2015-03-12 18:24 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: brian m. carlson, Michael Haggerty, Kyle J. McKay,
	Git Mailing List, Andreas Schwab

Duy Nguyen <pclouds@gmail.com> writes:

> This may or may not fall into the "mix different hash functions"
> category. In pack files version 4, trees are encoded to point to other
> trees or blobs by a (pack, offset) tuple. It would be great if the new
> object_id could support carrying this kind of object id around because
> it could help reduce object lookup cost a lot. (pack, offset) can be
> converted back to SHA-1 so no info is lost and hashcmp() can compare
> (pack, tuple) against an SHA-1 just fine.

You mean "if it came in <pack, offset> format, convert it down to
<sha1> until the last second that it is needed (e.g. need to put
that in a tree object in order to compute the object name of the
containing tree object)"?

After converting an object name originally represented as <pack,
offset>, if we are doing the "union in struct" thing, to <sha1>
representation, you would have to look it up from .idx in order to
read the contents the usual way.  If that happens often enough, then
it may not be worth adding complexity to the code to carry the
<pack, offset> pair around.

Unless you fix that "union in struct" assumption, that is.

To me, <pack, offset> information smells to belong more to a "struct
object" (or its subclass) as an optional annotation---when a caller
is asked to parse_object(), you would bypass the sha1_read_file()
that goes and looks the object name up from the list of pack .idx
and instead go there straight using that annotation.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-12 18:24               ` Junio C Hamano
@ 2015-03-12 18:44                 ` Junio C Hamano
  2015-03-13  0:58                 ` Duy Nguyen
  1 sibling, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2015-03-12 18:44 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: brian m. carlson, Michael Haggerty, Kyle J. McKay,
	Git Mailing List, Andreas Schwab

Junio C Hamano <gitster@pobox.com> writes:

> Duy Nguyen <pclouds@gmail.com> writes:
>
>> This may or may not fall into the "mix different hash functions"
>> category. In pack files version 4, trees are encoded to point to other
>> trees or blobs by a (pack, offset) tuple. It would be great if the new
>> object_id could support carrying this kind of object id around because
>> it could help reduce object lookup cost a lot. (pack, offset) can be
>> converted back to SHA-1 so no info is lost and hashcmp() can compare
>> (pack, tuple) against an SHA-1 just fine.
>
> You mean "if it came in <pack, offset> format, convert it down to
> <sha1> until the last second that it is needed (e.g. need to put
> that in a tree object in order to compute the object name of the
> containing tree object)"?

Sorry, obviously I meant "do not convert until the last second".

> To me, <pack, offset> information smells to belong more to a "struct
> object" (or its subclass) as an optional annotation---when a caller
> is asked to parse_object(), you would bypass the sha1_read_file()
> that goes and looks the object name up from the list of pack .idx
> and instead go there straight using that annotation.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-12 18:24               ` Junio C Hamano
  2015-03-12 18:44                 ` Junio C Hamano
@ 2015-03-13  0:58                 ` Duy Nguyen
  2015-03-13  6:03                   ` Junio C Hamano
  1 sibling, 1 reply; 27+ messages in thread
From: Duy Nguyen @ 2015-03-13  0:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: brian m. carlson, Michael Haggerty, Kyle J. McKay,
	Git Mailing List, Andreas Schwab

On Fri, Mar 13, 2015 at 1:24 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
>
>> This may or may not fall into the "mix different hash functions"
>> category. In pack files version 4, trees are encoded to point to other
>> trees or blobs by a (pack, offset) tuple. It would be great if the new
>> object_id could support carrying this kind of object id around because
>> it could help reduce object lookup cost a lot. (pack, offset) can be
>> converted back to SHA-1 so no info is lost and hashcmp() can compare
>> (pack, tuple) against an SHA-1 just fine.
>
> You mean "if it came in <pack, offset> format, convert it down to
> <sha1> until the last second that it is needed (e.g. need to put
> that in a tree object in order to compute the object name of the
> containing tree object)"?

I picked my words poorly. It should be <pack, the index in pack>
instead of the _byte_ offset. This index ranges from 0 to the number
of objects minus one. This is essentially what pack v4 stores in
places where pack v2 would store SHA-1, which makes me make the
connection: both are different ways of identifying an object.

> After converting an object name originally represented as <pack,
> offset>, if we are doing the "union in struct" thing, to <sha1>
> representation, you would have to look it up from .idx in order to
> read the contents the usual way.  If that happens often enough, then
> it may not be worth adding complexity to the code to carry the
> <pack, offset> pair around.

I'd keep it in <pack, index> for as long as possible (or until higher
layer decides that it's not worth keeping in this representation
anymore). I can only see two use cases where actual SHA-1 is involved:
sorting by SHA-1 (rarely done, probably only in index-pack and
pack-objects) and output to the screen (or producing v2 packs).

> Unless you fix that "union in struct" assumption, that is.
>
> To me, <pack, offset> information smells to belong more to a "struct
> object" (or its subclass) as an optional annotation---when a caller
> is asked to parse_object(), you would bypass the sha1_read_file()
> that goes and looks the object name up from the list of pack .idx
> and instead go there straight using that annotation.

For pack v4, commits and trees can be encoded this way. parse_object()
could help the commit case (maybe, maybe not, I haven't looked at
commit walkers), not recursive tree walking where we now pass SHA-1
around to user callback and all. Carrying optional annotation around
would impact the code in many places.  Also, storing this info in
struct object seems conflict with v4 goal of reducing "struct object"
lookup cost.  Maybe I'm missing something here.

Then again, we don't know if pack v4 will get merged in the end (I do
hope it will). And we have an option of making specialized commit/tree
walkers that are aware of pack v4 and only use them in hot places to
reduce impact to the rest of the code base. If hash[GIT_MAX_RAWSZ]
looks like a good enough solution, we can go with that and worry about
pack v4 later when/if it comes.
-- 
Duy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-13  0:58                 ` Duy Nguyen
@ 2015-03-13  6:03                   ` Junio C Hamano
  2015-03-14 11:49                     ` Duy Nguyen
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2015-03-13  6:03 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: brian m. carlson, Michael Haggerty, Kyle J. McKay,
	Git Mailing List, Andreas Schwab

Duy Nguyen <pclouds@gmail.com> writes:

>> You mean "if it came in <pack, offset> format, convert it down to
>> <sha1> until the last second that it is needed (e.g. need to put
>> that in a tree object in order to compute the object name of the
>> containing tree object)"?
>
> I picked my words poorly. It should be <pack, the index in pack>
> instead of the _byte_ offset.

Thanks for a clarification, but I do not think it affects the main
point of the discussion very much.  If we use "union in struct",
where we can store either an SHA-1 hash or some other identifying
information for the object, but not both, then at some point you
would need to convert <pack, nth> to <sha-1> in a codepath that
needs the sha-1 hash value (e.g. if the object A, that is known to
you as <pack, nth>, is placed in a tree object B, you need the
object name of A in <sha-1> representation in order to compute the
object name of the tree object B.  You can choose to keep it in
<pack, nth> form in "struct object_id { union }" and look up the
<sha-1> from the pack index every time, or you can choose to give
up the <pack, nth> form and upgrade the "struct object_id" to store
<sha-1> at that point.

If you keep both <pack, nth> *and* <sha-1> in "struct object_id" at
the same time, you can choose whichever is convenient, but it would
bloat everything in core.  Not just it bloats "struct object" and
its subclasses, the in-core index entries, which is what I meant
by ...

>> Unless you fix that "union in struct" assumption, that is.

... this.

>> To me, <pack, offset> information smells to belong more to a "struct
>> object" (or its subclass) as an optional annotation---when a caller
>> is asked to parse_object(), you would bypass the sha1_read_file()
>> that goes and looks the object name up from the list of pack .idx
>> and instead go there straight using that annotation.
>
> For pack v4, commits and trees can be encoded this way.

Even if your in-pack representation of a commit object allowed to
store the tree pointer in <pack, nth> format, its object name must
be computed as if you have the commit object in the traditional
format and computed the hash of that (together with the standard
"<type> <size>\0" header), and at that point, you need the contained
object's name in <sha-1> format (imagine how you would implement the
commit-tree command).  Hence, I do not think the v4 encoding changes
the discussion very much.  I see the primary value of v4 encoding is
to shorten the length of various fields take on-disk and in-pack.
If it were <pack, offset>, it could be argued that it would also be
faster to get to the packed data in the packfile, and going from
<pack, nth> to the .idx file and then going to the location in the
data in the packfile would be faster than going from <sha-1> to a
particular pack and its in-pack offset, with the difference of cost
around log(n)*m where n is the number of objects in a pack and m is
the total number of packs in the repository.

It is true that <nth> format (force that the referred-to object
lives in the same pack as the referrer) can help speed up
interpretation of extended SHA-1 expression, e.g. "v1.0^0:t", which
can read v1.0 tag in v4 format, find the <nth> info for the commit
pointed by the tag and get to that data in the pack, find the <nth>
info for the top-tree recorded in that commit and directly get to
the data of that tree, and then find the entry for "t", which will
give the object name for that subtree again in <nth> format, and at
that point you can find the <sha-1> of that final object, without
having to know any object names of the intermediate objects
(i.e. you must start from <sha-1> of the tag you obtain from the
refs API, but you didn't use the object name of the commit and its
top-level tree).  So for such a codepath, I would say it would be
sufficient to use a "union in struct" people have been envisioning
and convert <pack, nth> to <sha-1> when the latter form becomes
necessary for the first time for the object.

Anyway, wouldn't this be all academic?  I do not see how you would
keep the object name in the <pack, nth> format in-core, as the
obj_hash[] is a hashtable keyed by <sha-1>, and even when we switch
to a different hash, I cannot see how such a table to ensure the
singleton-ness of in-core objects can be keyed sometimes by <hash>
and by <pack, nth> in some other time.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 00/10] Use a structure for object IDs.
@ 2015-03-13 23:39 brian m. carlson
  2015-03-13 23:39 ` [PATCH v2 01/10] Define " brian m. carlson
                   ` (10 more replies)
  0 siblings, 11 replies; 27+ messages in thread
From: brian m. carlson @ 2015-03-13 23:39 UTC (permalink / raw)
  To: git
  Cc: Andreas Schwab, Kyle J. McKay, Nguyen Thai Ngoc Duy,
	Junio C Hamano, Johannes Sixt, David Kastrup, James Denholm

This is a patch series to convert some of the relevant uses of unsigned
char [20] to struct object_id.

The goal of this series to improve type-checking in the codebase and to
make it easier to move to a different hash function if the project
decides to do that.

There should be no functional change from this patch series.

I've tried to adopt most of the suggestions where possible without major
reworking.

Changes since v1:
* Call the struct member "hash".
* Convert some additional magic numbers to be based off the constants
  GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ.  Introduce variables where this
  makes sense.
* Define GIT_SHA1_HEXSZ in terms of GIT_SHA1_RAWSZ.
* Move the definition of struct object_id to cache.h.  No new #includes
  are required as a result.
* Use better names for arguments to oidcmp.
* Convert one or two structs that were missed last time.

You can see this series on the object-id branch (may be rebased) at

  https://github.com/bk2204/git.git
  https://git.crustytoothpaste.net/git/bmc/git.git

Both repositories are pushed at the same time.

brian m. carlson (10):
  Define a structure for object IDs.
  Define utility functions for object IDs.
  bisect.c: convert leaf functions to use struct object_id
  archive.c: convert to use struct object_id
  zip: use GIT_SHA1_HEXSZ for trailers
  bulk-checkin.c: convert to use struct object_id
  diff: convert struct combine_diff_path to object_id
  commit: convert parts to struct object_id
  patch-id: convert to use struct object_id
  apply: convert threeway_stage to object_id

 archive-zip.c      |  4 ++--
 archive.c          | 22 ++++++++++-----------
 bisect.c           | 40 +++++++++++++++++++-------------------
 builtin/apply.c    | 14 +++++++-------
 builtin/patch-id.c | 34 ++++++++++++++++-----------------
 bulk-checkin.c     | 12 ++++++------
 cache.h            | 40 ++++++++++++++++++++++++++++++++++----
 combine-diff.c     | 56 +++++++++++++++++++++++++++---------------------------
 commit.c           | 56 +++++++++++++++++++++++++++++-------------------------
 commit.h           |  4 ++--
 diff-lib.c         | 10 +++++-----
 diff.h             |  5 +++--
 hex.c              | 16 +++++++++++++---
 log-tree.c         |  2 +-
 send-pack.c        |  2 +-
 shallow.c          |  8 ++++----
 tree-diff.c        | 10 +++++-----
 upload-pack.c      |  2 +-
 18 files changed, 192 insertions(+), 145 deletions(-)

-- 
2.2.1.209.g41e5f3a

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-13 23:39 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
@ 2015-03-13 23:39 ` brian m. carlson
  2015-03-13 23:39 ` [PATCH v2 02/10] Define utility functions " brian m. carlson
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: brian m. carlson @ 2015-03-13 23:39 UTC (permalink / raw)
  To: git
  Cc: Andreas Schwab, Kyle J. McKay, Nguyen Thai Ngoc Duy,
	Junio C Hamano, Johannes Sixt, David Kastrup, James Denholm

Many places throughout the code use "unsigned char [20]" to store object IDs
(SHA-1 values).  This leads to lots of hardcoded numbers throughout the
codebase.  It also leads to confusion about the purposes of a buffer.

Introduce a structure for object IDs.  This allows us to obtain the benefits
of compile-time checking for misuse.  The structure is expected to remain
the same size and have the same alignment requirements on all known
platforms, compared to the array of unsigned char, although this is not
required for correctness.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/cache.h b/cache.h
index 761c570..6582c35 100644
--- a/cache.h
+++ b/cache.h
@@ -43,6 +43,14 @@ int git_deflate_end_gently(git_zstream *);
 int git_deflate(git_zstream *, int flush);
 unsigned long git_deflate_bound(git_zstream *, unsigned long);
 
+/* The length in bytes and in hex digits of an object name (SHA-1 value). */
+#define GIT_SHA1_RAWSZ 20
+#define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
+
+struct object_id {
+	unsigned char hash[GIT_SHA1_RAWSZ];
+};
+
 #if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
 #define DTYPE(de)	((de)->d_type)
 #else
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 02/10] Define utility functions for object IDs.
  2015-03-13 23:39 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
  2015-03-13 23:39 ` [PATCH v2 01/10] Define " brian m. carlson
@ 2015-03-13 23:39 ` brian m. carlson
  2015-03-13 23:39 ` [PATCH v2 03/10] bisect.c: convert leaf functions to use struct object_id brian m. carlson
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: brian m. carlson @ 2015-03-13 23:39 UTC (permalink / raw)
  To: git
  Cc: Andreas Schwab, Kyle J. McKay, Nguyen Thai Ngoc Duy,
	Junio C Hamano, Johannes Sixt, David Kastrup, James Denholm

There are several utility functions (hashcmp and friends) that are used
for comparing object IDs (SHA-1 values).  Using these functions, which
take pointers to unsigned char, with struct object_id requires tiresome
access to the sha1 member, which bloats code and violates the desired
encapsulation.  Provide wrappers around these functions for struct
object_id for neater, more maintainable code.  Use the new constants to
avoid the hard-coded 20s and 40s throughout the original functions.

These functions simply call the underlying pointer-to-unsigned-char
versions to ensure that any performance improvements will be passed
through to the new functions.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h | 32 ++++++++++++++++++++++++++++----
 hex.c   | 16 +++++++++++++---
 2 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/cache.h b/cache.h
index 6582c35..95206a3 100644
--- a/cache.h
+++ b/cache.h
@@ -718,13 +718,13 @@ extern char *sha1_pack_name(const unsigned char *sha1);
 extern char *sha1_pack_index_name(const unsigned char *sha1);
 
 extern const char *find_unique_abbrev(const unsigned char *sha1, int);
-extern const unsigned char null_sha1[20];
+extern const unsigned char null_sha1[GIT_SHA1_RAWSZ];
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
 	int i;
 
-	for (i = 0; i < 20; i++, sha1++, sha2++) {
+	for (i = 0; i < GIT_SHA1_RAWSZ; i++, sha1++, sha2++) {
 		if (*sha1 != *sha2)
 			return *sha1 - *sha2;
 	}
@@ -732,20 +732,42 @@ static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 	return 0;
 }
 
+static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
+{
+	return hashcmp(oid1->hash, oid2->hash);
+}
+
 static inline int is_null_sha1(const unsigned char *sha1)
 {
 	return !hashcmp(sha1, null_sha1);
 }
 
+static inline int is_null_oid(const struct object_id *oid)
+{
+	return !hashcmp(oid->hash, null_sha1);
+}
+
 static inline void hashcpy(unsigned char *sha_dst, const unsigned char *sha_src)
 {
-	memcpy(sha_dst, sha_src, 20);
+	memcpy(sha_dst, sha_src, GIT_SHA1_RAWSZ);
 }
+
+static inline void oidcpy(struct object_id *dst, const struct object_id *src)
+{
+	hashcpy(dst->hash, src->hash);
+}
+
 static inline void hashclr(unsigned char *hash)
 {
-	memset(hash, 0, 20);
+	memset(hash, 0, GIT_SHA1_RAWSZ);
 }
 
+static inline void oidclr(struct object_id *oid)
+{
+	hashclr(oid->hash);
+}
+
+
 #define EMPTY_TREE_SHA1_HEX \
 	"4b825dc642cb6eb9a060e54bf8d69288fbee4904"
 #define EMPTY_TREE_SHA1_BIN_LITERAL \
@@ -952,8 +974,10 @@ extern int for_each_abbrev(const char *prefix, each_abbrev_fn, void *);
  * null-terminated string.
  */
 extern int get_sha1_hex(const char *hex, unsigned char *sha1);
+extern int get_oid_hex(const char *hex, struct object_id *sha1);
 
 extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
+extern char *oid_to_hex(const struct object_id *oid);	/* same static buffer as sha1_to_hex */
 extern int read_ref_full(const char *refname, int resolve_flags,
 			 unsigned char *sha1, int *flags);
 extern int read_ref(const char *refname, unsigned char *sha1);
diff --git a/hex.c b/hex.c
index cfd9d72..899b74a 100644
--- a/hex.c
+++ b/hex.c
@@ -38,7 +38,7 @@ const signed char hexval_table[256] = {
 int get_sha1_hex(const char *hex, unsigned char *sha1)
 {
 	int i;
-	for (i = 0; i < 20; i++) {
+	for (i = 0; i < GIT_SHA1_RAWSZ; i++) {
 		unsigned int val;
 		/*
 		 * hex[1]=='\0' is caught when val is checked below,
@@ -56,15 +56,20 @@ int get_sha1_hex(const char *hex, unsigned char *sha1)
 	return 0;
 }
 
+int get_oid_hex(const char *hex, struct object_id *oid)
+{
+	return get_sha1_hex(hex, oid->hash);
+}
+
 char *sha1_to_hex(const unsigned char *sha1)
 {
 	static int bufno;
-	static char hexbuffer[4][41];
+	static char hexbuffer[4][GIT_SHA1_HEXSZ + 1];
 	static const char hex[] = "0123456789abcdef";
 	char *buffer = hexbuffer[3 & ++bufno], *buf = buffer;
 	int i;
 
-	for (i = 0; i < 20; i++) {
+	for (i = 0; i < GIT_SHA1_RAWSZ; i++) {
 		unsigned int val = *sha1++;
 		*buf++ = hex[val >> 4];
 		*buf++ = hex[val & 0xf];
@@ -73,3 +78,8 @@ char *sha1_to_hex(const unsigned char *sha1)
 
 	return buffer;
 }
+
+char *oid_to_hex(const struct object_id *oid)
+{
+	return sha1_to_hex(oid->hash);
+}
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 03/10] bisect.c: convert leaf functions to use struct object_id
  2015-03-13 23:39 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
  2015-03-13 23:39 ` [PATCH v2 01/10] Define " brian m. carlson
  2015-03-13 23:39 ` [PATCH v2 02/10] Define utility functions " brian m. carlson
@ 2015-03-13 23:39 ` brian m. carlson
  2015-03-13 23:39 ` [PATCH v2 04/10] archive.c: convert " brian m. carlson
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: brian m. carlson @ 2015-03-13 23:39 UTC (permalink / raw)
  To: git
  Cc: Andreas Schwab, Kyle J. McKay, Nguyen Thai Ngoc Duy,
	Junio C Hamano, Johannes Sixt, David Kastrup, James Denholm

Convert some constants to GIT_SHA1_HEXSZ.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 bisect.c | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/bisect.c b/bisect.c
index 8c6d843..10f5e57 100644
--- a/bisect.c
+++ b/bisect.c
@@ -15,7 +15,7 @@
 static struct sha1_array good_revs;
 static struct sha1_array skipped_revs;
 
-static unsigned char *current_bad_sha1;
+static struct object_id *current_bad_oid;
 
 static const char *argv_checkout[] = {"checkout", "-q", NULL, "--", NULL};
 static const char *argv_show_branch[] = {"show-branch", NULL, NULL};
@@ -404,8 +404,8 @@ static int register_ref(const char *refname, const unsigned char *sha1,
 			int flags, void *cb_data)
 {
 	if (!strcmp(refname, "bad")) {
-		current_bad_sha1 = xmalloc(20);
-		hashcpy(current_bad_sha1, sha1);
+		current_bad_oid = xmalloc(sizeof(*current_bad_oid));
+		hashcpy(current_bad_oid->hash, sha1);
 	} else if (starts_with(refname, "good-")) {
 		sha1_array_append(&good_revs, sha1);
 	} else if (starts_with(refname, "skip-")) {
@@ -564,7 +564,7 @@ static struct commit_list *skip_away(struct commit_list *list, int count)
 
 	for (i = 0; cur; cur = cur->next, i++) {
 		if (i == index) {
-			if (hashcmp(cur->item->object.sha1, current_bad_sha1))
+			if (hashcmp(cur->item->object.sha1, current_bad_oid->hash))
 				return cur;
 			if (previous)
 				return previous;
@@ -607,7 +607,7 @@ static void bisect_rev_setup(struct rev_info *revs, const char *prefix,
 
 	/* rev_argv.argv[0] will be ignored by setup_revisions */
 	argv_array_push(&rev_argv, "bisect_rev_setup");
-	argv_array_pushf(&rev_argv, bad_format, sha1_to_hex(current_bad_sha1));
+	argv_array_pushf(&rev_argv, bad_format, oid_to_hex(current_bad_oid));
 	for (i = 0; i < good_revs.nr; i++)
 		argv_array_pushf(&rev_argv, good_format,
 				 sha1_to_hex(good_revs.sha1[i]));
@@ -628,7 +628,7 @@ static void bisect_common(struct rev_info *revs)
 }
 
 static void exit_if_skipped_commits(struct commit_list *tried,
-				    const unsigned char *bad)
+				    const struct object_id *bad)
 {
 	if (!tried)
 		return;
@@ -637,12 +637,12 @@ static void exit_if_skipped_commits(struct commit_list *tried,
 	       "The first bad commit could be any of:\n");
 	print_commit_list(tried, "%s\n", "%s\n");
 	if (bad)
-		printf("%s\n", sha1_to_hex(bad));
+		printf("%s\n", oid_to_hex(bad));
 	printf("We cannot bisect more!\n");
 	exit(2);
 }
 
-static int is_expected_rev(const unsigned char *sha1)
+static int is_expected_rev(const struct object_id *oid)
 {
 	const char *filename = git_path("BISECT_EXPECTED_REV");
 	struct stat st;
@@ -658,7 +658,7 @@ static int is_expected_rev(const unsigned char *sha1)
 		return 0;
 
 	if (strbuf_getline(&str, fp, '\n') != EOF)
-		res = !strcmp(str.buf, sha1_to_hex(sha1));
+		res = !strcmp(str.buf, oid_to_hex(oid));
 
 	strbuf_release(&str);
 	fclose(fp);
@@ -719,7 +719,7 @@ static struct commit **get_bad_and_good_commits(int *rev_nr)
 	struct commit **rev = xmalloc(len * sizeof(*rev));
 	int i, n = 0;
 
-	rev[n++] = get_commit_reference(current_bad_sha1);
+	rev[n++] = get_commit_reference(current_bad_oid->hash);
 	for (i = 0; i < good_revs.nr; i++)
 		rev[n++] = get_commit_reference(good_revs.sha1[i]);
 	*rev_nr = n;
@@ -729,8 +729,8 @@ static struct commit **get_bad_and_good_commits(int *rev_nr)
 
 static void handle_bad_merge_base(void)
 {
-	if (is_expected_rev(current_bad_sha1)) {
-		char *bad_hex = sha1_to_hex(current_bad_sha1);
+	if (is_expected_rev(current_bad_oid)) {
+		char *bad_hex = oid_to_hex(current_bad_oid);
 		char *good_hex = join_sha1_array_hex(&good_revs, ' ');
 
 		fprintf(stderr, "The merge base %s is bad.\n"
@@ -750,7 +750,7 @@ static void handle_bad_merge_base(void)
 static void handle_skipped_merge_base(const unsigned char *mb)
 {
 	char *mb_hex = sha1_to_hex(mb);
-	char *bad_hex = sha1_to_hex(current_bad_sha1);
+	char *bad_hex = sha1_to_hex(current_bad_oid->hash);
 	char *good_hex = join_sha1_array_hex(&good_revs, ' ');
 
 	warning("the merge base between %s and [%s] "
@@ -781,7 +781,7 @@ static void check_merge_bases(int no_checkout)
 
 	for (; result; result = result->next) {
 		const unsigned char *mb = result->item->object.sha1;
-		if (!hashcmp(mb, current_bad_sha1)) {
+		if (!hashcmp(mb, current_bad_oid->hash)) {
 			handle_bad_merge_base();
 		} else if (0 <= sha1_array_lookup(&good_revs, mb)) {
 			continue;
@@ -838,7 +838,7 @@ static void check_good_are_ancestors_of_bad(const char *prefix, int no_checkout)
 	struct stat st;
 	int fd;
 
-	if (!current_bad_sha1)
+	if (!current_bad_oid)
 		die("a bad revision is needed");
 
 	/* Check if file BISECT_ANCESTORS_OK exists. */
@@ -903,7 +903,7 @@ int bisect_next_all(const char *prefix, int no_checkout)
 	struct commit_list *tried;
 	int reaches = 0, all = 0, nr, steps;
 	const unsigned char *bisect_rev;
-	char bisect_rev_hex[41];
+	char bisect_rev_hex[GIT_SHA1_HEXSZ + 1];
 
 	if (read_bisect_refs())
 		die("reading bisect refs failed");
@@ -927,7 +927,7 @@ int bisect_next_all(const char *prefix, int no_checkout)
 		exit_if_skipped_commits(tried, NULL);
 
 		printf("%s was both good and bad\n",
-		       sha1_to_hex(current_bad_sha1));
+		       oid_to_hex(current_bad_oid));
 		exit(1);
 	}
 
@@ -938,10 +938,10 @@ int bisect_next_all(const char *prefix, int no_checkout)
 	}
 
 	bisect_rev = revs.commits->item->object.sha1;
-	memcpy(bisect_rev_hex, sha1_to_hex(bisect_rev), 41);
+	memcpy(bisect_rev_hex, sha1_to_hex(bisect_rev), GIT_SHA1_HEXSZ + 1);
 
-	if (!hashcmp(bisect_rev, current_bad_sha1)) {
-		exit_if_skipped_commits(tried, current_bad_sha1);
+	if (!hashcmp(bisect_rev, current_bad_oid->hash)) {
+		exit_if_skipped_commits(tried, current_bad_oid);
 		printf("%s is the first bad commit\n", bisect_rev_hex);
 		show_diff_tree(prefix, revs.commits->item);
 		/* This means the bisection process succeeded. */
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 04/10] archive.c: convert to use struct object_id
  2015-03-13 23:39 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
                   ` (2 preceding siblings ...)
  2015-03-13 23:39 ` [PATCH v2 03/10] bisect.c: convert leaf functions to use struct object_id brian m. carlson
@ 2015-03-13 23:39 ` brian m. carlson
  2015-03-13 23:39 ` [PATCH v2 05/10] zip: use GIT_SHA1_HEXSZ for trailers brian m. carlson
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: brian m. carlson @ 2015-03-13 23:39 UTC (permalink / raw)
  To: git
  Cc: Andreas Schwab, Kyle J. McKay, Nguyen Thai Ngoc Duy,
	Junio C Hamano, Johannes Sixt, David Kastrup, James Denholm

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 archive.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/archive.c b/archive.c
index 96057ed..d37c41d 100644
--- a/archive.c
+++ b/archive.c
@@ -101,7 +101,7 @@ static void setup_archive_check(struct git_attr_check *check)
 
 struct directory {
 	struct directory *up;
-	unsigned char sha1[20];
+	struct object_id oid;
 	int baselen, len;
 	unsigned mode;
 	int stage;
@@ -177,7 +177,7 @@ static void queue_directory(const unsigned char *sha1,
 	d->stage   = stage;
 	c->bottom  = d;
 	d->len = sprintf(d->path, "%.*s%s/", (int)base->len, base->buf, filename);
-	hashcpy(d->sha1, sha1);
+	hashcpy(d->oid.hash, sha1);
 }
 
 static int write_directory(struct archiver_context *c)
@@ -191,7 +191,7 @@ static int write_directory(struct archiver_context *c)
 	d->path[d->len - 1] = '\0'; /* no trailing slash */
 	ret =
 		write_directory(c) ||
-		write_archive_entry(d->sha1, d->path, d->baselen,
+		write_archive_entry(d->oid.hash, d->path, d->baselen,
 				    d->path + d->baselen, d->mode,
 				    d->stage, c) != READ_TREE_RECURSIVE;
 	free(d);
@@ -354,7 +354,7 @@ static void parse_treeish_arg(const char **argv,
 	time_t archive_time;
 	struct tree *tree;
 	const struct commit *commit;
-	unsigned char sha1[20];
+	struct object_id oid;
 
 	/* Remotes are only allowed to fetch actual refs */
 	if (remote && !remote_allow_unreachable) {
@@ -362,15 +362,15 @@ static void parse_treeish_arg(const char **argv,
 		const char *colon = strchrnul(name, ':');
 		int refnamelen = colon - name;
 
-		if (!dwim_ref(name, refnamelen, sha1, &ref))
+		if (!dwim_ref(name, refnamelen, oid.hash, &ref))
 			die("no such ref: %.*s", refnamelen, name);
 		free(ref);
 	}
 
-	if (get_sha1(name, sha1))
+	if (get_sha1(name, oid.hash))
 		die("Not a valid object name");
 
-	commit = lookup_commit_reference_gently(sha1, 1);
+	commit = lookup_commit_reference_gently(oid.hash, 1);
 	if (commit) {
 		commit_sha1 = commit->object.sha1;
 		archive_time = commit->date;
@@ -379,21 +379,21 @@ static void parse_treeish_arg(const char **argv,
 		archive_time = time(NULL);
 	}
 
-	tree = parse_tree_indirect(sha1);
+	tree = parse_tree_indirect(oid.hash);
 	if (tree == NULL)
 		die("not a tree object");
 
 	if (prefix) {
-		unsigned char tree_sha1[20];
+		struct object_id tree_oid;
 		unsigned int mode;
 		int err;
 
 		err = get_tree_entry(tree->object.sha1, prefix,
-				     tree_sha1, &mode);
+				     tree_oid.hash, &mode);
 		if (err || !S_ISDIR(mode))
 			die("current working directory is untracked");
 
-		tree = parse_tree_indirect(tree_sha1);
+		tree = parse_tree_indirect(tree_oid.hash);
 	}
 	ar_args->tree = tree;
 	ar_args->commit_sha1 = commit_sha1;
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 05/10] zip: use GIT_SHA1_HEXSZ for trailers
  2015-03-13 23:39 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
                   ` (3 preceding siblings ...)
  2015-03-13 23:39 ` [PATCH v2 04/10] archive.c: convert " brian m. carlson
@ 2015-03-13 23:39 ` brian m. carlson
  2015-03-13 23:39 ` [PATCH v2 06/10] bulk-checkin.c: convert to use struct object_id brian m. carlson
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: brian m. carlson @ 2015-03-13 23:39 UTC (permalink / raw)
  To: git
  Cc: Andreas Schwab, Kyle J. McKay, Nguyen Thai Ngoc Duy,
	Junio C Hamano, Johannes Sixt, David Kastrup, James Denholm

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 archive-zip.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/archive-zip.c b/archive-zip.c
index 4bde019..b669e50 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -427,12 +427,12 @@ static void write_zip_trailer(const unsigned char *sha1)
 	copy_le16(trailer.entries, zip_dir_entries);
 	copy_le32(trailer.size, zip_dir_offset);
 	copy_le32(trailer.offset, zip_offset);
-	copy_le16(trailer.comment_length, sha1 ? 40 : 0);
+	copy_le16(trailer.comment_length, sha1 ? GIT_SHA1_HEXSZ : 0);
 
 	write_or_die(1, zip_dir, zip_dir_offset);
 	write_or_die(1, &trailer, ZIP_DIR_TRAILER_SIZE);
 	if (sha1)
-		write_or_die(1, sha1_to_hex(sha1), 40);
+		write_or_die(1, sha1_to_hex(sha1), GIT_SHA1_HEXSZ);
 }
 
 static void dos_time(time_t *time, int *dos_date, int *dos_time)
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 06/10] bulk-checkin.c: convert to use struct object_id
  2015-03-13 23:39 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
                   ` (4 preceding siblings ...)
  2015-03-13 23:39 ` [PATCH v2 05/10] zip: use GIT_SHA1_HEXSZ for trailers brian m. carlson
@ 2015-03-13 23:39 ` brian m. carlson
  2015-03-13 23:39 ` [PATCH v2 07/10] diff: convert struct combine_diff_path to object_id brian m. carlson
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: brian m. carlson @ 2015-03-13 23:39 UTC (permalink / raw)
  To: git
  Cc: Andreas Schwab, Kyle J. McKay, Nguyen Thai Ngoc Duy,
	Junio C Hamano, Johannes Sixt, David Kastrup, James Denholm

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 bulk-checkin.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index 0c4b8a7..c80e503 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -24,7 +24,7 @@ static struct bulk_checkin_state {
 
 static void finish_bulk_checkin(struct bulk_checkin_state *state)
 {
-	unsigned char sha1[20];
+	struct object_id oid;
 	struct strbuf packname = STRBUF_INIT;
 	int i;
 
@@ -36,11 +36,11 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state)
 		unlink(state->pack_tmp_name);
 		goto clear_exit;
 	} else if (state->nr_written == 1) {
-		sha1close(state->f, sha1, CSUM_FSYNC);
+		sha1close(state->f, oid.hash, CSUM_FSYNC);
 	} else {
-		int fd = sha1close(state->f, sha1, 0);
-		fixup_pack_header_footer(fd, sha1, state->pack_tmp_name,
-					 state->nr_written, sha1,
+		int fd = sha1close(state->f, oid.hash, 0);
+		fixup_pack_header_footer(fd, oid.hash, state->pack_tmp_name,
+					 state->nr_written, oid.hash,
 					 state->offset);
 		close(fd);
 	}
@@ -48,7 +48,7 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state)
 	strbuf_addf(&packname, "%s/pack/pack-", get_object_directory());
 	finish_tmp_packfile(&packname, state->pack_tmp_name,
 			    state->written, state->nr_written,
-			    &state->pack_idx_opts, sha1);
+			    &state->pack_idx_opts, oid.hash);
 	for (i = 0; i < state->nr_written; i++)
 		free(state->written[i]);
 
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 07/10] diff: convert struct combine_diff_path to object_id
  2015-03-13 23:39 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
                   ` (5 preceding siblings ...)
  2015-03-13 23:39 ` [PATCH v2 06/10] bulk-checkin.c: convert to use struct object_id brian m. carlson
@ 2015-03-13 23:39 ` brian m. carlson
  2015-03-13 23:39 ` [PATCH v2 08/10] commit: convert parts to struct object_id brian m. carlson
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: brian m. carlson @ 2015-03-13 23:39 UTC (permalink / raw)
  To: git
  Cc: Andreas Schwab, Kyle J. McKay, Nguyen Thai Ngoc Duy,
	Junio C Hamano, Johannes Sixt, David Kastrup, James Denholm

Also, convert a constant to GIT_SHA1_HEXSZ.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 combine-diff.c | 56 ++++++++++++++++++++++++++++----------------------------
 diff-lib.c     | 10 +++++-----
 diff.h         |  5 +++--
 tree-diff.c    | 10 +++++-----
 4 files changed, 41 insertions(+), 40 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index 91edce5..8eb7278 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -44,9 +44,9 @@ static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr,
 			memset(p->parent, 0,
 			       sizeof(p->parent[0]) * num_parent);
 
-			hashcpy(p->sha1, q->queue[i]->two->sha1);
+			hashcpy(p->oid.hash, q->queue[i]->two->sha1);
 			p->mode = q->queue[i]->two->mode;
-			hashcpy(p->parent[n].sha1, q->queue[i]->one->sha1);
+			hashcpy(p->parent[n].oid.hash, q->queue[i]->one->sha1);
 			p->parent[n].mode = q->queue[i]->one->mode;
 			p->parent[n].status = q->queue[i]->status;
 			*tail = p;
@@ -77,7 +77,7 @@ static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr,
 			continue;
 		}
 
-		hashcpy(p->parent[n].sha1, q->queue[i]->one->sha1);
+		hashcpy(p->parent[n].oid.hash, q->queue[i]->one->sha1);
 		p->parent[n].mode = q->queue[i]->one->mode;
 		p->parent[n].status = q->queue[i]->status;
 
@@ -284,7 +284,7 @@ static struct lline *coalesce_lines(struct lline *base, int *lenbase,
 	return base;
 }
 
-static char *grab_blob(const unsigned char *sha1, unsigned int mode,
+static char *grab_blob(const struct object_id *oid, unsigned int mode,
 		       unsigned long *size, struct userdiff_driver *textconv,
 		       const char *path)
 {
@@ -294,20 +294,20 @@ static char *grab_blob(const unsigned char *sha1, unsigned int mode,
 	if (S_ISGITLINK(mode)) {
 		blob = xmalloc(100);
 		*size = snprintf(blob, 100,
-				 "Subproject commit %s\n", sha1_to_hex(sha1));
-	} else if (is_null_sha1(sha1)) {
+				 "Subproject commit %s\n", oid_to_hex(oid));
+	} else if (is_null_oid(oid)) {
 		/* deleted blob */
 		*size = 0;
 		return xcalloc(1, 1);
 	} else if (textconv) {
 		struct diff_filespec *df = alloc_filespec(path);
-		fill_filespec(df, sha1, 1, mode);
+		fill_filespec(df, oid->hash, 1, mode);
 		*size = fill_textconv(textconv, df, &blob);
 		free_filespec(df);
 	} else {
-		blob = read_sha1_file(sha1, &type, size);
+		blob = read_sha1_file(oid->hash, &type, size);
 		if (type != OBJ_BLOB)
-			die("object '%s' is not a blob!", sha1_to_hex(sha1));
+			die("object '%s' is not a blob!", oid_to_hex(oid));
 	}
 	return blob;
 }
@@ -389,7 +389,7 @@ static void consume_line(void *state_, char *line, unsigned long len)
 	}
 }
 
-static void combine_diff(const unsigned char *parent, unsigned int mode,
+static void combine_diff(const struct object_id *parent, unsigned int mode,
 			 mmfile_t *result_file,
 			 struct sline *sline, unsigned int cnt, int n,
 			 int num_parent, int result_deleted,
@@ -897,7 +897,7 @@ static void show_combined_header(struct combine_diff_path *elem,
 				 int show_file_header)
 {
 	struct diff_options *opt = &rev->diffopt;
-	int abbrev = DIFF_OPT_TST(opt, FULL_INDEX) ? 40 : DEFAULT_ABBREV;
+	int abbrev = DIFF_OPT_TST(opt, FULL_INDEX) ? GIT_SHA1_HEXSZ : DEFAULT_ABBREV;
 	const char *a_prefix = opt->a_prefix ? opt->a_prefix : "a/";
 	const char *b_prefix = opt->b_prefix ? opt->b_prefix : "b/";
 	const char *c_meta = diff_get_color_opt(opt, DIFF_METAINFO);
@@ -914,11 +914,11 @@ static void show_combined_header(struct combine_diff_path *elem,
 			 "", elem->path, line_prefix, c_meta, c_reset);
 	printf("%s%sindex ", line_prefix, c_meta);
 	for (i = 0; i < num_parent; i++) {
-		abb = find_unique_abbrev(elem->parent[i].sha1,
+		abb = find_unique_abbrev(elem->parent[i].oid.hash,
 					 abbrev);
 		printf("%s%s", i ? "," : "", abb);
 	}
-	abb = find_unique_abbrev(elem->sha1, abbrev);
+	abb = find_unique_abbrev(elem->oid.hash, abbrev);
 	printf("..%s%s\n", abb, c_reset);
 
 	if (mode_differs) {
@@ -991,7 +991,7 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 
 	/* Read the result of merge first */
 	if (!working_tree_file)
-		result = grab_blob(elem->sha1, elem->mode, &result_size,
+		result = grab_blob(&elem->oid, elem->mode, &result_size,
 				   textconv, elem->path);
 	else {
 		/* Used by diff-tree to read from the working tree */
@@ -1013,12 +1013,12 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 			result = strbuf_detach(&buf, NULL);
 			elem->mode = canon_mode(st.st_mode);
 		} else if (S_ISDIR(st.st_mode)) {
-			unsigned char sha1[20];
-			if (resolve_gitlink_ref(elem->path, "HEAD", sha1) < 0)
-				result = grab_blob(elem->sha1, elem->mode,
+			struct object_id oid;
+			if (resolve_gitlink_ref(elem->path, "HEAD", oid.hash) < 0)
+				result = grab_blob(&elem->oid, elem->mode,
 						   &result_size, NULL, NULL);
 			else
-				result = grab_blob(sha1, elem->mode,
+				result = grab_blob(&oid, elem->mode,
 						   &result_size, NULL, NULL);
 		} else if (textconv) {
 			struct diff_filespec *df = alloc_filespec(elem->path);
@@ -1090,7 +1090,7 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 		for (i = 0; !is_binary && i < num_parent; i++) {
 			char *buf;
 			unsigned long size;
-			buf = grab_blob(elem->parent[i].sha1,
+			buf = grab_blob(&elem->parent[i].oid,
 					elem->parent[i].mode,
 					&size, NULL, NULL);
 			if (buffer_is_binary(buf, size))
@@ -1139,14 +1139,14 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 	for (i = 0; i < num_parent; i++) {
 		int j;
 		for (j = 0; j < i; j++) {
-			if (!hashcmp(elem->parent[i].sha1,
-				     elem->parent[j].sha1)) {
+			if (!oidcmp(&elem->parent[i].oid,
+				     &elem->parent[j].oid)) {
 				reuse_combine_diff(sline, cnt, i, j);
 				break;
 			}
 		}
 		if (i <= j)
-			combine_diff(elem->parent[i].sha1,
+			combine_diff(&elem->parent[i].oid,
 				     elem->parent[i].mode,
 				     &result_file, sline,
 				     cnt, i, num_parent, result_deleted,
@@ -1206,9 +1206,9 @@ static void show_raw_diff(struct combine_diff_path *p, int num_parent, struct re
 
 		/* Show sha1's */
 		for (i = 0; i < num_parent; i++)
-			printf(" %s", diff_unique_abbrev(p->parent[i].sha1,
+			printf(" %s", diff_unique_abbrev(p->parent[i].oid.hash,
 							 opt->abbrev));
-		printf(" %s ", diff_unique_abbrev(p->sha1, opt->abbrev));
+		printf(" %s ", diff_unique_abbrev(p->oid.hash, opt->abbrev));
 	}
 
 	if (opt->output_format & (DIFF_FORMAT_RAW | DIFF_FORMAT_NAME_STATUS)) {
@@ -1271,16 +1271,16 @@ static struct diff_filepair *combined_pair(struct combine_diff_path *p,
 	for (i = 0; i < num_parent; i++) {
 		pair->one[i].path = p->path;
 		pair->one[i].mode = p->parent[i].mode;
-		hashcpy(pair->one[i].sha1, p->parent[i].sha1);
-		pair->one[i].sha1_valid = !is_null_sha1(p->parent[i].sha1);
+		hashcpy(pair->one[i].sha1, p->parent[i].oid.hash);
+		pair->one[i].sha1_valid = !is_null_oid(&p->parent[i].oid);
 		pair->one[i].has_more_entries = 1;
 	}
 	pair->one[num_parent - 1].has_more_entries = 0;
 
 	pair->two->path = p->path;
 	pair->two->mode = p->mode;
-	hashcpy(pair->two->sha1, p->sha1);
-	pair->two->sha1_valid = !is_null_sha1(p->sha1);
+	hashcpy(pair->two->sha1, p->oid.hash);
+	pair->two->sha1_valid = !is_null_oid(&p->oid);
 	return pair;
 }
 
diff --git a/diff-lib.c b/diff-lib.c
index a85c497..241a843 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -125,7 +125,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 			dpath->next = NULL;
 			memcpy(dpath->path, ce->name, path_len);
 			dpath->path[path_len] = '\0';
-			hashclr(dpath->sha1);
+			oidclr(&dpath->oid);
 			memset(&(dpath->parent[0]), 0,
 			       sizeof(struct combine_diff_parent)*5);
 
@@ -155,7 +155,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 				if (2 <= stage) {
 					int mode = nce->ce_mode;
 					num_compare_stages++;
-					hashcpy(dpath->parent[stage-2].sha1, nce->sha1);
+					hashcpy(dpath->parent[stage-2].oid.hash, nce->sha1);
 					dpath->parent[stage-2].mode = ce_mode_from_stat(nce, mode);
 					dpath->parent[stage-2].status =
 						DIFF_STATUS_MODIFIED;
@@ -339,14 +339,14 @@ static int show_modified(struct rev_info *revs,
 		memcpy(p->path, new->name, pathlen);
 		p->path[pathlen] = 0;
 		p->mode = mode;
-		hashclr(p->sha1);
+		oidclr(&p->oid);
 		memset(p->parent, 0, 2 * sizeof(struct combine_diff_parent));
 		p->parent[0].status = DIFF_STATUS_MODIFIED;
 		p->parent[0].mode = new->ce_mode;
-		hashcpy(p->parent[0].sha1, new->sha1);
+		hashcpy(p->parent[0].oid.hash, new->sha1);
 		p->parent[1].status = DIFF_STATUS_MODIFIED;
 		p->parent[1].mode = old->ce_mode;
-		hashcpy(p->parent[1].sha1, old->sha1);
+		hashcpy(p->parent[1].oid.hash, old->sha1);
 		show_combined_diff(p, 2, revs->dense_combined_merges, revs);
 		free(p);
 		return 0;
diff --git a/diff.h b/diff.h
index b4a624d..f6fdf49 100644
--- a/diff.h
+++ b/diff.h
@@ -6,6 +6,7 @@
 
 #include "tree-walk.h"
 #include "pathspec.h"
+#include "object.h"
 
 struct rev_info;
 struct diff_options;
@@ -207,11 +208,11 @@ struct combine_diff_path {
 	struct combine_diff_path *next;
 	char *path;
 	unsigned int mode;
-	unsigned char sha1[20];
+	struct object_id oid;
 	struct combine_diff_parent {
 		char status;
 		unsigned int mode;
-		unsigned char sha1[20];
+		struct object_id oid;
 	} parent[FLEX_ARRAY];
 };
 #define combine_diff_path_size(n, l) \
diff --git a/tree-diff.c b/tree-diff.c
index e7b378c..290a1da 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -64,7 +64,7 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_
 {
 	struct combine_diff_parent *p0 = &p->parent[0];
 	if (p->mode && p0->mode) {
-		opt->change(opt, p0->mode, p->mode, p0->sha1, p->sha1,
+		opt->change(opt, p0->mode, p->mode, p0->oid.hash, p->oid.hash,
 			1, 1, p->path, 0, 0);
 	}
 	else {
@@ -74,11 +74,11 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_
 
 		if (p->mode) {
 			addremove = '+';
-			sha1 = p->sha1;
+			sha1 = p->oid.hash;
 			mode = p->mode;
 		} else {
 			addremove = '-';
-			sha1 = p0->sha1;
+			sha1 = p0->oid.hash;
 			mode = p0->mode;
 		}
 
@@ -151,7 +151,7 @@ static struct combine_diff_path *path_appendnew(struct combine_diff_path *last,
 	memcpy(p->path + base->len, path, pathlen);
 	p->path[len] = 0;
 	p->mode = mode;
-	hashcpy(p->sha1, sha1 ? sha1 : null_sha1);
+	hashcpy(p->oid.hash, sha1 ? sha1 : null_sha1);
 
 	return p;
 }
@@ -238,7 +238,7 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
 			}
 
 			p->parent[i].mode = mode_i;
-			hashcpy(p->parent[i].sha1, sha1_i ? sha1_i : null_sha1);
+			hashcpy(p->parent[i].oid.hash, sha1_i ? sha1_i : null_sha1);
 		}
 
 		keep = 1;
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 08/10] commit: convert parts to struct object_id
  2015-03-13 23:39 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
                   ` (6 preceding siblings ...)
  2015-03-13 23:39 ` [PATCH v2 07/10] diff: convert struct combine_diff_path to object_id brian m. carlson
@ 2015-03-13 23:39 ` brian m. carlson
  2015-03-13 23:39 ` [PATCH v2 09/10] patch-id: convert to use " brian m. carlson
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: brian m. carlson @ 2015-03-13 23:39 UTC (permalink / raw)
  To: git
  Cc: Andreas Schwab, Kyle J. McKay, Nguyen Thai Ngoc Duy,
	Junio C Hamano, Johannes Sixt, David Kastrup, James Denholm

Convert struct commit_graft and necessary local parts of commit.c.
Also, convert several constants based on the hex length of an SHA-1 to
use GIT_SHA1_HEXSZ, and move several magic constants into variables for
readability.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 commit.c      | 56 ++++++++++++++++++++++++++++++--------------------------
 commit.h      |  4 ++--
 log-tree.c    |  2 +-
 send-pack.c   |  2 +-
 shallow.c     |  8 ++++----
 upload-pack.c |  2 +-
 6 files changed, 39 insertions(+), 35 deletions(-)

diff --git a/commit.c b/commit.c
index a8c7577..2d9de80 100644
--- a/commit.c
+++ b/commit.c
@@ -55,12 +55,12 @@ struct commit *lookup_commit(const unsigned char *sha1)
 
 struct commit *lookup_commit_reference_by_name(const char *name)
 {
-	unsigned char sha1[20];
+	struct object_id oid;
 	struct commit *commit;
 
-	if (get_sha1_committish(name, sha1))
+	if (get_sha1_committish(name, oid.hash))
 		return NULL;
-	commit = lookup_commit_reference(sha1);
+	commit = lookup_commit_reference(oid.hash);
 	if (parse_commit(commit))
 		return NULL;
 	return commit;
@@ -99,7 +99,7 @@ static int commit_graft_alloc, commit_graft_nr;
 static const unsigned char *commit_graft_sha1_access(size_t index, void *table)
 {
 	struct commit_graft **commit_graft_table = table;
-	return commit_graft_table[index]->sha1;
+	return commit_graft_table[index]->oid.hash;
 }
 
 static int commit_graft_pos(const unsigned char *sha1)
@@ -110,7 +110,7 @@ static int commit_graft_pos(const unsigned char *sha1)
 
 int register_commit_graft(struct commit_graft *graft, int ignore_dups)
 {
-	int pos = commit_graft_pos(graft->sha1);
+	int pos = commit_graft_pos(graft->oid.hash);
 
 	if (0 <= pos) {
 		if (ignore_dups)
@@ -138,22 +138,23 @@ struct commit_graft *read_graft_line(char *buf, int len)
 	/* The format is just "Commit Parent1 Parent2 ...\n" */
 	int i;
 	struct commit_graft *graft = NULL;
+	const int entry_size = GIT_SHA1_HEXSZ + 1;
 
 	while (len && isspace(buf[len-1]))
 		buf[--len] = '\0';
 	if (buf[0] == '#' || buf[0] == '\0')
 		return NULL;
-	if ((len + 1) % 41)
+	if ((len + 1) % entry_size)
 		goto bad_graft_data;
-	i = (len + 1) / 41 - 1;
-	graft = xmalloc(sizeof(*graft) + 20 * i);
+	i = (len + 1) / entry_size - 1;
+	graft = xmalloc(sizeof(*graft) + GIT_SHA1_RAWSZ * i);
 	graft->nr_parent = i;
-	if (get_sha1_hex(buf, graft->sha1))
+	if (get_oid_hex(buf, &graft->oid))
 		goto bad_graft_data;
-	for (i = 40; i < len; i += 41) {
+	for (i = GIT_SHA1_HEXSZ; i < len; i += entry_size) {
 		if (buf[i] != ' ')
 			goto bad_graft_data;
-		if (get_sha1_hex(buf + i + 1, graft->parent[i/41]))
+		if (get_sha1_hex(buf + i + 1, graft->parent[i/entry_size].hash))
 			goto bad_graft_data;
 	}
 	return graft;
@@ -302,39 +303,42 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
 {
 	const char *tail = buffer;
 	const char *bufptr = buffer;
-	unsigned char parent[20];
+	struct object_id parent;
 	struct commit_list **pptr;
 	struct commit_graft *graft;
+	const int tree_entry_len = GIT_SHA1_HEXSZ + 5;
+	const int parent_entry_len = GIT_SHA1_HEXSZ + 7;
 
 	if (item->object.parsed)
 		return 0;
 	item->object.parsed = 1;
 	tail += size;
-	if (tail <= bufptr + 46 || memcmp(bufptr, "tree ", 5) || bufptr[45] != '\n')
+	if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) ||
+			bufptr[tree_entry_len] != '\n')
 		return error("bogus commit object %s", sha1_to_hex(item->object.sha1));
-	if (get_sha1_hex(bufptr + 5, parent) < 0)
+	if (get_sha1_hex(bufptr + 5, parent.hash) < 0)
 		return error("bad tree pointer in commit %s",
 			     sha1_to_hex(item->object.sha1));
-	item->tree = lookup_tree(parent);
-	bufptr += 46; /* "tree " + "hex sha1" + "\n" */
+	item->tree = lookup_tree(parent.hash);
+	bufptr += tree_entry_len + 1; /* "tree " + "hex sha1" + "\n" */
 	pptr = &item->parents;
 
 	graft = lookup_commit_graft(item->object.sha1);
-	while (bufptr + 48 < tail && !memcmp(bufptr, "parent ", 7)) {
+	while (bufptr + parent_entry_len < tail && !memcmp(bufptr, "parent ", 7)) {
 		struct commit *new_parent;
 
-		if (tail <= bufptr + 48 ||
-		    get_sha1_hex(bufptr + 7, parent) ||
-		    bufptr[47] != '\n')
+		if (tail <= bufptr + parent_entry_len + 1 ||
+		    get_sha1_hex(bufptr + 7, parent.hash) ||
+		    bufptr[parent_entry_len] != '\n')
 			return error("bad parents in commit %s", sha1_to_hex(item->object.sha1));
-		bufptr += 48;
+		bufptr += parent_entry_len + 1;
 		/*
 		 * The clone is shallow if nr_parent < 0, and we must
 		 * not traverse its real parents even when we unhide them.
 		 */
 		if (graft && (graft->nr_parent < 0 || grafts_replace_parents))
 			continue;
-		new_parent = lookup_commit(parent);
+		new_parent = lookup_commit(parent.hash);
 		if (new_parent)
 			pptr = &commit_list_insert(new_parent, pptr)->next;
 	}
@@ -342,7 +346,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
 		int i;
 		struct commit *new_parent;
 		for (i = 0; i < graft->nr_parent; i++) {
-			new_parent = lookup_commit(graft->parent[i]);
+			new_parent = lookup_commit(graft->parent[i].hash);
 			if (!new_parent)
 				continue;
 			pptr = &commit_list_insert(new_parent, pptr)->next;
@@ -1580,10 +1584,10 @@ struct commit *get_merge_parent(const char *name)
 {
 	struct object *obj;
 	struct commit *commit;
-	unsigned char sha1[20];
-	if (get_sha1(name, sha1))
+	struct object_id oid;
+	if (get_sha1(name, oid.hash))
 		return NULL;
-	obj = parse_object(sha1);
+	obj = parse_object(oid.hash);
 	commit = (struct commit *)peel_to_type(name, 0, obj, OBJ_COMMIT);
 	if (commit && !commit->util) {
 		struct merge_remote_desc *desc;
diff --git a/commit.h b/commit.h
index 9f189cb..ed3a1d5 100644
--- a/commit.h
+++ b/commit.h
@@ -226,9 +226,9 @@ enum rev_sort_order {
 void sort_in_topological_order(struct commit_list **, enum rev_sort_order);
 
 struct commit_graft {
-	unsigned char sha1[20];
+	struct object_id oid;
 	int nr_parent; /* < 0 if shallow commit */
-	unsigned char parent[FLEX_ARRAY][20]; /* more */
+	struct object_id parent[FLEX_ARRAY]; /* more */
 };
 typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *);
 
diff --git a/log-tree.c b/log-tree.c
index 7f0890e..51cc695 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -137,7 +137,7 @@ static int add_ref_decoration(const char *refname, const unsigned char *sha1, in
 
 static int add_graft_decoration(const struct commit_graft *graft, void *cb_data)
 {
-	struct commit *commit = lookup_commit(graft->sha1);
+	struct commit *commit = lookup_commit(graft->oid.hash);
 	if (!commit)
 		return 0;
 	add_name_decoration(DECORATION_GRAFTED, "grafted", &commit->object);
diff --git a/send-pack.c b/send-pack.c
index 9d2b0c5..b2fd767 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -182,7 +182,7 @@ static int advertise_shallow_grafts_cb(const struct commit_graft *graft, void *c
 {
 	struct strbuf *sb = cb;
 	if (graft->nr_parent == -1)
-		packet_buf_write(sb, "shallow %s\n", sha1_to_hex(graft->sha1));
+		packet_buf_write(sb, "shallow %s\n", oid_to_hex(&graft->oid));
 	return 0;
 }
 
diff --git a/shallow.c b/shallow.c
index d8bf40a..d08d264 100644
--- a/shallow.c
+++ b/shallow.c
@@ -31,7 +31,7 @@ int register_shallow(const unsigned char *sha1)
 		xmalloc(sizeof(struct commit_graft));
 	struct commit *commit = lookup_commit(sha1);
 
-	hashcpy(graft->sha1, sha1);
+	hashcpy(graft->oid.hash, sha1);
 	graft->nr_parent = -1;
 	if (commit && commit->object.parsed)
 		commit->parents = NULL;
@@ -159,11 +159,11 @@ struct write_shallow_data {
 static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
 {
 	struct write_shallow_data *data = cb_data;
-	const char *hex = sha1_to_hex(graft->sha1);
+	const char *hex = oid_to_hex(&graft->oid);
 	if (graft->nr_parent != -1)
 		return 0;
 	if (data->flags & SEEN_ONLY) {
-		struct commit *c = lookup_commit(graft->sha1);
+		struct commit *c = lookup_commit(graft->oid.hash);
 		if (!c || !(c->object.flags & SEEN)) {
 			if (data->flags & VERBOSE)
 				printf("Removing %s from .git/shallow\n",
@@ -282,7 +282,7 @@ static int advertise_shallow_grafts_cb(const struct commit_graft *graft, void *c
 {
 	int fd = *(int *)cb;
 	if (graft->nr_parent == -1)
-		packet_write(fd, "shallow %s\n", sha1_to_hex(graft->sha1));
+		packet_write(fd, "shallow %s\n", oid_to_hex(&graft->oid));
 	return 0;
 }
 
diff --git a/upload-pack.c b/upload-pack.c
index b531a32..0566ce0 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -74,7 +74,7 @@ static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
 {
 	FILE *fp = cb_data;
 	if (graft->nr_parent == -1)
-		fprintf(fp, "--shallow %s\n", sha1_to_hex(graft->sha1));
+		fprintf(fp, "--shallow %s\n", oid_to_hex(&graft->oid));
 	return 0;
 }
 
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 09/10] patch-id: convert to use struct object_id
  2015-03-13 23:39 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
                   ` (7 preceding siblings ...)
  2015-03-13 23:39 ` [PATCH v2 08/10] commit: convert parts to struct object_id brian m. carlson
@ 2015-03-13 23:39 ` brian m. carlson
  2015-03-13 23:39 ` [PATCH v2 10/10] apply: convert threeway_stage to object_id brian m. carlson
  2015-03-14 14:16 ` [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
  10 siblings, 0 replies; 27+ messages in thread
From: brian m. carlson @ 2015-03-13 23:39 UTC (permalink / raw)
  To: git
  Cc: Andreas Schwab, Kyle J. McKay, Nguyen Thai Ngoc Duy,
	Junio C Hamano, Johannes Sixt, David Kastrup, James Denholm

Convert some magic numbers to the new GIT_SHA1 constants.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/patch-id.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index 77db873..ba34dac 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -1,14 +1,14 @@
 #include "builtin.h"
 
-static void flush_current_id(int patchlen, unsigned char *id, unsigned char *result)
+static void flush_current_id(int patchlen, struct object_id *id, struct object_id *result)
 {
 	char name[50];
 
 	if (!patchlen)
 		return;
 
-	memcpy(name, sha1_to_hex(id), 41);
-	printf("%s %s\n", sha1_to_hex(result), name);
+	memcpy(name, oid_to_hex(id), GIT_SHA1_HEXSZ + 1);
+	printf("%s %s\n", oid_to_hex(result), name);
 }
 
 static int remove_space(char *line)
@@ -53,23 +53,23 @@ static int scan_hunk_header(const char *p, int *p_before, int *p_after)
 	return 1;
 }
 
-static void flush_one_hunk(unsigned char *result, git_SHA_CTX *ctx)
+static void flush_one_hunk(struct object_id *result, git_SHA_CTX *ctx)
 {
-	unsigned char hash[20];
+	unsigned char hash[GIT_SHA1_RAWSZ];
 	unsigned short carry = 0;
 	int i;
 
 	git_SHA1_Final(hash, ctx);
 	git_SHA1_Init(ctx);
 	/* 20-byte sum, with carry */
-	for (i = 0; i < 20; ++i) {
-		carry += result[i] + hash[i];
-		result[i] = carry;
+	for (i = 0; i < GIT_SHA1_RAWSZ; ++i) {
+		carry += result->hash[i] + hash[i];
+		result->hash[i] = carry;
 		carry >>= 8;
 	}
 }
 
-static int get_one_patchid(unsigned char *next_sha1, unsigned char *result,
+static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
 			   struct strbuf *line_buf, int stable)
 {
 	int patchlen = 0, found_next = 0;
@@ -77,7 +77,7 @@ static int get_one_patchid(unsigned char *next_sha1, unsigned char *result,
 	git_SHA_CTX ctx;
 
 	git_SHA1_Init(&ctx);
-	hashclr(result);
+	oidclr(result);
 
 	while (strbuf_getwholeline(line_buf, stdin, '\n') != EOF) {
 		char *line = line_buf->buf;
@@ -93,7 +93,7 @@ static int get_one_patchid(unsigned char *next_sha1, unsigned char *result,
 		else if (!memcmp(line, "\\ ", 2) && 12 < strlen(line))
 			continue;
 
-		if (!get_sha1_hex(p, next_sha1)) {
+		if (!get_oid_hex(p, next_oid)) {
 			found_next = 1;
 			break;
 		}
@@ -143,7 +143,7 @@ static int get_one_patchid(unsigned char *next_sha1, unsigned char *result,
 	}
 
 	if (!found_next)
-		hashclr(next_sha1);
+		oidclr(next_oid);
 
 	flush_one_hunk(result, &ctx);
 
@@ -152,15 +152,15 @@ static int get_one_patchid(unsigned char *next_sha1, unsigned char *result,
 
 static void generate_id_list(int stable)
 {
-	unsigned char sha1[20], n[20], result[20];
+	struct object_id oid, n, result;
 	int patchlen;
 	struct strbuf line_buf = STRBUF_INIT;
 
-	hashclr(sha1);
+	oidclr(&oid);
 	while (!feof(stdin)) {
-		patchlen = get_one_patchid(n, result, &line_buf, stable);
-		flush_current_id(patchlen, sha1, result);
-		hashcpy(sha1, n);
+		patchlen = get_one_patchid(&n, &result, &line_buf, stable);
+		flush_current_id(patchlen, &oid, &result);
+		oidcpy(&oid, &n);
 	}
 	strbuf_release(&line_buf);
 }
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 10/10] apply: convert threeway_stage to object_id
  2015-03-13 23:39 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
                   ` (8 preceding siblings ...)
  2015-03-13 23:39 ` [PATCH v2 09/10] patch-id: convert to use " brian m. carlson
@ 2015-03-13 23:39 ` brian m. carlson
  2015-03-14 14:16 ` [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
  10 siblings, 0 replies; 27+ messages in thread
From: brian m. carlson @ 2015-03-13 23:39 UTC (permalink / raw)
  To: git
  Cc: Andreas Schwab, Kyle J. McKay, Nguyen Thai Ngoc Duy,
	Junio C Hamano, Johannes Sixt, David Kastrup, James Denholm

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/apply.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index 65b97ee..c2c8f39 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -208,7 +208,7 @@ struct patch {
 	struct patch *next;
 
 	/* three-way fallback result */
-	unsigned char threeway_stage[3][20];
+	struct object_id threeway_stage[3];
 };
 
 static void free_fragment_list(struct fragment *list)
@@ -3424,11 +3424,11 @@ static int try_threeway(struct image *image, struct patch *patch,
 	if (status) {
 		patch->conflicted_threeway = 1;
 		if (patch->is_new)
-			hashclr(patch->threeway_stage[0]);
+			oidclr(&patch->threeway_stage[0]);
 		else
-			hashcpy(patch->threeway_stage[0], pre_sha1);
-		hashcpy(patch->threeway_stage[1], our_sha1);
-		hashcpy(patch->threeway_stage[2], post_sha1);
+			hashcpy(patch->threeway_stage[0].hash, pre_sha1);
+		hashcpy(patch->threeway_stage[1].hash, our_sha1);
+		hashcpy(patch->threeway_stage[2].hash, post_sha1);
 		fprintf(stderr, "Applied patch to '%s' with conflicts.\n", patch->new_name);
 	} else {
 		fprintf(stderr, "Applied patch to '%s' cleanly.\n", patch->new_name);
@@ -4184,14 +4184,14 @@ static void add_conflicted_stages_file(struct patch *patch)
 
 	remove_file_from_cache(patch->new_name);
 	for (stage = 1; stage < 4; stage++) {
-		if (is_null_sha1(patch->threeway_stage[stage - 1]))
+		if (is_null_oid(&patch->threeway_stage[stage - 1]))
 			continue;
 		ce = xcalloc(1, ce_size);
 		memcpy(ce->name, patch->new_name, namelen);
 		ce->ce_mode = create_ce_mode(mode);
 		ce->ce_flags = create_ce_flags(stage);
 		ce->ce_namelen = namelen;
-		hashcpy(ce->sha1, patch->threeway_stage[stage - 1]);
+		hashcpy(ce->sha1, patch->threeway_stage[stage - 1].hash);
 		if (add_cache_entry(ce, ADD_CACHE_OK_TO_ADD) < 0)
 			die(_("unable to add cache entry for %s"), patch->new_name);
 	}
-- 
2.2.1.209.g41e5f3a

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-13  6:03                   ` Junio C Hamano
@ 2015-03-14 11:49                     ` Duy Nguyen
  2015-03-14 22:19                       ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Duy Nguyen @ 2015-03-14 11:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: brian m. carlson, Michael Haggerty, Kyle J. McKay,
	Git Mailing List, Andreas Schwab

On Fri, Mar 13, 2015 at 1:03 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>> To me, <pack, offset> information smells to belong more to a "struct
>>> object" (or its subclass) as an optional annotation---when a caller
>>> is asked to parse_object(), you would bypass the sha1_read_file()
>>> that goes and looks the object name up from the list of pack .idx
>>> and instead go there straight using that annotation.
>>
>> For pack v4, commits and trees can be encoded this way.
>
> Even if your in-pack representation of a commit object allowed to
> store the tree pointer in <pack, nth> format, its object name must
> be computed as if you have the commit object in the traditional
> format and computed the hash of that (together with the standard
> "<type> <size>\0" header), and at that point, you need the contained
> object's name in <sha-1> format (imagine how you would implement the
> commit-tree command).

True. But commit-tree is not often executed as, say, rev-list, where
the hash is used as 20-byte key to some pieces somewhere (pack data,
struct object *). If we keep <pack, nth> in object_id (the union
approach), v4-aware code has the chance to go on a fast path.

> Hence, I do not think the v4 encoding changes
> the discussion very much.  I see the primary value of v4 encoding is
> to shorten the length of various fields take on-disk and in-pack.
> If it were <pack, offset>, it could be argued that it would also be
> faster to get to the packed data in the packfile, and going from
> <pack, nth> to the .idx file and then going to the location in the
> data in the packfile would be faster than going from <sha-1> to a
> particular pack and its in-pack offset, with the difference of cost
> around log(n)*m where n is the number of objects in a pack and m is
> the total number of packs in the repository.
>
> It is true that <nth> format (force that the referred-to object
> lives in the same pack as the referrer) can help speed up
> interpretation of extended SHA-1 expression, e.g. "v1.0^0:t", which
> can read v1.0 tag in v4 format, find the <nth> info for the commit
> pointed by the tag and get to that data in the pack, find the <nth>
> info for the top-tree recorded in that commit and directly get to
> the data of that tree, and then find the entry for "t", which will
> give the object name for that subtree again in <nth> format, and at
> that point you can find the <sha-1> of that final object, without
> having to know any object names of the intermediate objects
> (i.e. you must start from <sha-1> of the tag you obtain from the
> refs API, but you didn't use the object name of the commit and its
> top-level tree).  So for such a codepath, I would say it would be
> sufficient to use a "union in struct" people have been envisioning
> and convert <pack, nth> to <sha-1> when the latter form becomes
> necessary for the first time for the object.

It's the point: faster access (to either pack data, or "struct object
*"). Using a separate union from  struct object_id (iow, object_id
only contains SHA-1) works. But imagine to independent code islands
can perform this type of fast access. In order to pass this union from
one island to another, we need to update pretty much every function
call between them. The change to using object_id has a similar impact.
If we go with "union in object_id", we may be able to avoid the second
mass change to make use of <pack, nth>.

> Anyway, wouldn't this be all academic?  I do not see how you would
> keep the object name in the <pack, nth> format in-core, as the
> obj_hash[] is a hashtable keyed by <sha-1>, and even when we switch
> to a different hash, I cannot see how such a table to ensure the
> singleton-ness of in-core objects can be keyed sometimes by <hash>
> and by <pack, nth> in some other time.

I'm implementing something to see how much we gain by avoiding object
lookup. The current approach is having "struct object ** obj" in
"struct packed_git", indexed by "nth". So when you have <pack, nth>
and pack->obj[nth] is valid, you'll get to "struct object *" without
hashing. If pack->obj[nth] is NULL, hashing and looking up are
required the first time using SHA-1, then the result is cached in
pack->obj[]. So no, it's not academic :)
-- 
Duy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 00/10] Use a structure for object IDs.
  2015-03-13 23:39 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
                   ` (9 preceding siblings ...)
  2015-03-13 23:39 ` [PATCH v2 10/10] apply: convert threeway_stage to object_id brian m. carlson
@ 2015-03-14 14:16 ` brian m. carlson
  10 siblings, 0 replies; 27+ messages in thread
From: brian m. carlson @ 2015-03-14 14:16 UTC (permalink / raw)
  To: git
  Cc: Andreas Schwab, Kyle J. McKay, Nguyen Thai Ngoc Duy,
	Junio C Hamano, Johannes Sixt, David Kastrup, James Denholm

[-- Attachment #1: Type: text/plain, Size: 411 bytes --]

On Fri, Mar 13, 2015 at 11:39:26PM +0000, brian m. carlson wrote:
>
>Changes since v1:

I just realized that I sent this out as v2 instead of v3.  It really is
v3.  My apologies for the error.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-14 11:49                     ` Duy Nguyen
@ 2015-03-14 22:19                       ` Junio C Hamano
  2015-03-15  0:17                         ` Duy Nguyen
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2015-03-14 22:19 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: brian m. carlson, Michael Haggerty, Kyle J. McKay,
	Git Mailing List, Andreas Schwab

Duy Nguyen <pclouds@gmail.com> writes:

>> Anyway, wouldn't this be all academic?  I do not see how you would
>> keep the object name in the <pack, nth> format in-core, as the
>> obj_hash[] is a hashtable keyed by <sha-1>, and even when we switch
>> to a different hash, I cannot see how such a table to ensure the
>> singleton-ness of in-core objects can be keyed sometimes by <hash>
>> and by <pack, nth> in some other time.
>
> I'm implementing something to see how much we gain by avoiding object
> lookup. The current approach is having "struct object ** obj" in
> "struct packed_git", indexed by "nth". So when you have <pack, nth>
> and pack->obj[nth] is valid, you'll get to "struct object *" without
> hashing.

But do you realize that the hashtable serves two purposes?  Grab the
object from its name is one thing, and the other one I am not seeing
how you will make it work with "sometimes <sha-1> sometimes <pack,nth>"
is to ensure that we will have only one in-core copy for the same object.
We even walk the hashtable when we want to drop the flag bits from
all in-core objects, so even if you instanciated an in-core object
without going through the object name layer, the hashtable needs to
have a pointer to such a pointer, no?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-14 22:19                       ` Junio C Hamano
@ 2015-03-15  0:17                         ` Duy Nguyen
  2015-03-15  2:32                           ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Duy Nguyen @ 2015-03-15  0:17 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: brian m. carlson, Michael Haggerty, Kyle J. McKay,
	Git Mailing List, Andreas Schwab

On Sun, Mar 15, 2015 at 5:19 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
>
>>> Anyway, wouldn't this be all academic?  I do not see how you would
>>> keep the object name in the <pack, nth> format in-core, as the
>>> obj_hash[] is a hashtable keyed by <sha-1>, and even when we switch
>>> to a different hash, I cannot see how such a table to ensure the
>>> singleton-ness of in-core objects can be keyed sometimes by <hash>
>>> and by <pack, nth> in some other time.
>>
>> I'm implementing something to see how much we gain by avoiding object
>> lookup. The current approach is having "struct object ** obj" in
>> "struct packed_git", indexed by "nth". So when you have <pack, nth>
>> and pack->obj[nth] is valid, you'll get to "struct object *" without
>> hashing.
>
> But do you realize that the hashtable serves two purposes?  Grab the
> object from its name is one thing, and the other one I am not seeing
> how you will make it work with "sometimes <sha-1> sometimes <pack,nth>"
> is to ensure that we will have only one in-core copy for the same object.
> We even walk the hashtable when we want to drop the flag bits from
> all in-core objects, so even if you instanciated an in-core object
> without going through the object name layer, the hashtable needs to
> have a pointer to such a pointer, no?

Notice that the first time pack->obj[] is filled using
lookup_object(). So yes, the hash table has all the pointers that
pack->obj[] has. If somebody wants to remove an object out of hash
table, then all pack->obj[] are invalidated, but I don't think we ever
delete objects from hash table.
-- 
Duy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 01/10] Define a structure for object IDs.
  2015-03-15  0:17                         ` Duy Nguyen
@ 2015-03-15  2:32                           ` Junio C Hamano
  0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2015-03-15  2:32 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: brian m. carlson, Michael Haggerty, Kyle J. McKay,
	Git Mailing List, Andreas Schwab

Duy Nguyen <pclouds@gmail.com> writes:

> Notice that the first time pack->obj[] is filled using
> lookup_object(). So yes, the hash table has all the pointers that
> pack->obj[] has.

Are we talking about the same thing?

By "the hash table", I mean **obj_hash that is a hashtable that uses
<sha-1> form of identifier as the key.  So "the hash table has all
the pointers" sounds like all the objects you instantiate from a v4
packfile needs their object name known in the <sha-1> form anyway
when it gets instantiated (or more importantly, when you realize
there is another copy of it already in the **obj_hash hashtable, you
may have to free or refrain from creating it in the first place).

As long as we use **obj_hash hashtable as "all the objects this
process cares about in the world" and "struct object_id.hash" as the
key into it, I do not think <pack,nth> representation belongs to
that layer.

I do think <pack,nth> representation has its use as a short-cut
mechanism to reach an indirectly referenced object directly without
instantiating intermediate objects when dealing with an extended
SHA-1 expression (cf. "v1.0^0:t" example in a few messages
upthread).  I just think it does not belong to "struct object_id"
that are used to refer to in-core objects.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2015-03-15  2:35 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-13 23:39 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
2015-03-13 23:39 ` [PATCH v2 01/10] Define " brian m. carlson
2015-03-13 23:39 ` [PATCH v2 02/10] Define utility functions " brian m. carlson
2015-03-13 23:39 ` [PATCH v2 03/10] bisect.c: convert leaf functions to use struct object_id brian m. carlson
2015-03-13 23:39 ` [PATCH v2 04/10] archive.c: convert " brian m. carlson
2015-03-13 23:39 ` [PATCH v2 05/10] zip: use GIT_SHA1_HEXSZ for trailers brian m. carlson
2015-03-13 23:39 ` [PATCH v2 06/10] bulk-checkin.c: convert to use struct object_id brian m. carlson
2015-03-13 23:39 ` [PATCH v2 07/10] diff: convert struct combine_diff_path to object_id brian m. carlson
2015-03-13 23:39 ` [PATCH v2 08/10] commit: convert parts to struct object_id brian m. carlson
2015-03-13 23:39 ` [PATCH v2 09/10] patch-id: convert to use " brian m. carlson
2015-03-13 23:39 ` [PATCH v2 10/10] apply: convert threeway_stage to object_id brian m. carlson
2015-03-14 14:16 ` [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
  -- strict thread matches above, loose matches on Subject: below --
2015-03-07 23:23 brian m. carlson
2015-03-07 23:23 ` [PATCH v2 01/10] Define " brian m. carlson
     [not found]   ` <CEA07500-9F47-4B24-AD5D-1423A601A4DD@gmail.com>
2015-03-11 22:08     ` brian m. carlson
2015-03-12  0:26       ` Junio C Hamano
2015-03-12  9:34         ` brian m. carlson
2015-03-12 10:28         ` Michael Haggerty
2015-03-12 10:46           ` brian m. carlson
2015-03-12 11:16             ` Duy Nguyen
2015-03-12 18:24               ` Junio C Hamano
2015-03-12 18:44                 ` Junio C Hamano
2015-03-13  0:58                 ` Duy Nguyen
2015-03-13  6:03                   ` Junio C Hamano
2015-03-14 11:49                     ` Duy Nguyen
2015-03-14 22:19                       ` Junio C Hamano
2015-03-15  0:17                         ` Duy Nguyen
2015-03-15  2:32                           ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).