All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: "brian m. carlson" <sandals@crustytoothpaste.net>,
	git@vger.kernel.org, Andreas Schwab <schwab@linux-m68k.org>,
	"Kyle J. McKay" <mackyle@gmail.com>,
	Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	Johannes Sixt <j6t@kdbg.org>, David Kastrup <dak@gnu.org>,
	James Denholm <nod.helm@gmail.com>
Subject: Re: [PATCH v2 00/10] Use a structure for object IDs.
Date: Wed, 11 Mar 2015 13:35:06 -0700	[thread overview]
Message-ID: <xmqqk2yn5l39.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <5500689A.5090101@alum.mit.edu> (Michael Haggerty's message of "Wed, 11 Mar 2015 17:08:58 +0100")

Michael Haggerty <mhagger@alum.mit.edu> writes:

> I think this is a really interesting project and I hope that it works out.

Count me in ;-)

> In my opinion, the biggest risk (aside from the sheer amount of work
> required) is the issue that was brought up on the mailing list when you
> submitted v1 [1]: Converting an arbitrary (unsigned char *) to a (struct
> object_id *) is not allowed, because the alignment requirements of the
> latter might be stricter than those of the former. This means that if,
> for example, we are reading some data from disk or from the network, and
> we expect the 20 bytes starting with byte number 17 to be a SHA-1 in
> binary format, we used to be able to do
>
>     unsigned char *sha1 = buf + 17;
>
> and use sha1 like any other SHA-1, without the need for any copying. But
> we can't do the analogous
>
>     struct object_id *oid = (struct object_id *)(buf + 17);
>
> because the alignment is not necessarily correct. So in a pure "struct
> object_id" world, I think we would be forced to change such code to
>
>     struct object_id oid;
>     hashcpy(oid.sha1, buf + 17);
>
> This uses additional memory and introduces copying overhead. Also, the
> lifetime of oid might exceed the scope of a function, so oid might have
> to be allocated on the heap and later freed. This adds more
> computational overhead, more memory overhead, and more programming
> effort to get it all right.

Because we use abstraction to reduce burden on programmers, the last
point is really what defeats this approach.

I wonder if there is any way to make the new "oid wrapped in a
struct" result in identical binary---that would be a reasonable
criterion to judge the goodness of a change line this.  Any
difference could be (1) compiler being (a) stupid and not taking
optimization opportunity it notices for a bare byte array but not
for a byte array in struct or (b) clever and taking optimization
opportunity the other way around, or (2) breakage in the conversion
causing new bugs, perhaps coming from the "alignment" worries you
cited above.  (1-a) may or may not be an acceptable price to pay for
a cleaner abstraction, but that depends on the extent of damage.
(1-b) may be unlikely but if there is such a gain, that's nice ;-).
And (2) is obviously a show-stopper X-<.

> So as much as I like the idea of wrapping SHA-1s in objects, I think it
> would be a good idea to first make sure that we can all agree on a plan
> for dealing with situations like this. I can think of the following
> possibilities:

These all look sensible to me.

> 4. We continue to support working with SHA-1s declared to be (unsigned
> char *) in some performance-critical code, even as we migrate most other
> code to using SHA-1s embedded within a (struct object_id). This will
> cost some duplication of code. To accept this approach, we would need an
> idea of *how much* code duplication would be needed. E.g., how many
> functions will need both (unsigned char *) versions and (struct
> object_id *) versions?

Ideally, only the ones that knows the underlying hash function is
SHA-1 (i.e. anybody who mentions git_SHA_CTX), moves bytes from/to
in-core object name field and raw range of bytes (e.g. sha1hash());
everybody else like hashcpy() and hashcmp() should be able to do its
thing only on structs and a generic-looking constant that denotes
how many bytes there are in the hash (or even sizeof(struct oid)).

I do not know what kind of code duplication you are worried about,
though.  If a callee needs "unsigned char *", the caller that has a
"struct object_id *o" should pass o->hash to the callee.  We should
not add another variant of the same callee that takes a pointer to
"struct object_id"---I think that leads to insanty, e.g.

    int hashcmp_o_o(struct object_id *, struct object_id *);
    int hashcmp_o_b(struct object_id *, unsigned char *);
    int hashcmp_b_o(unsigned char *, struct object_id *);
    int hashcmp_b_b(unsigned char *, unsigned char *);

And please do not suggest switching to C++; all it would do to
overload these into a single name is to make the callers harder to
read.

  reply	other threads:[~2015-03-11 20:35 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-07 23:23 [PATCH v2 00/10] Use a structure for object IDs brian m. carlson
2015-03-07 23:23 ` [PATCH v2 01/10] Define " brian m. carlson
     [not found]   ` <CEA07500-9F47-4B24-AD5D-1423A601A4DD@gmail.com>
2015-03-11 22:08     ` brian m. carlson
2015-03-12  0:26       ` Junio C Hamano
2015-03-12  9:34         ` brian m. carlson
2015-03-12 10:28         ` Michael Haggerty
2015-03-12 10:46           ` brian m. carlson
2015-03-12 11:16             ` Duy Nguyen
2015-03-12 18:24               ` Junio C Hamano
2015-03-12 18:44                 ` Junio C Hamano
2015-03-13  0:58                 ` Duy Nguyen
2015-03-13  6:03                   ` Junio C Hamano
2015-03-14 11:49                     ` Duy Nguyen
2015-03-14 22:19                       ` Junio C Hamano
2015-03-15  0:17                         ` Duy Nguyen
2015-03-15  2:32                           ` Junio C Hamano
2015-03-07 23:23 ` [PATCH v2 02/10] Define utility functions " brian m. carlson
2015-03-08  9:57   ` Duy Nguyen
2015-03-08 14:48     ` brian m. carlson
2015-03-11 12:44   ` Michael Haggerty
2015-03-07 23:23 ` [PATCH v2 03/10] bisect.c: convert leaf functions to use struct object_id brian m. carlson
2015-03-07 23:23 ` [PATCH v2 04/10] archive.c: convert " brian m. carlson
2015-03-11 14:20   ` Michael Haggerty
2015-03-11 22:12     ` brian m. carlson
2015-03-07 23:24 ` [PATCH v2 05/10] zip: use GIT_SHA1_HEXSZ for trailers brian m. carlson
2015-03-07 23:24 ` [PATCH v2 06/10] bulk-checkin.c: convert to use struct object_id brian m. carlson
2015-03-07 23:24 ` [PATCH v2 07/10] diff: convert struct combine_diff_path to object_id brian m. carlson
2015-03-07 23:24 ` [PATCH v2 08/10] commit: convert parts to struct object_id brian m. carlson
2015-03-11 14:46   ` Michael Haggerty
2015-03-07 23:24 ` [PATCH v2 09/10] patch-id: convert to use " brian m. carlson
2015-03-07 23:24 ` [PATCH v2 10/10] apply: convert threeway_stage to object_id brian m. carlson
2015-03-08  7:43 ` [PATCH v2 00/10] Use a structure for object IDs Junio C Hamano
2015-03-11  2:38 ` Kyle J. McKay
2015-03-11 16:08 ` Michael Haggerty
2015-03-11 20:35   ` Junio C Hamano [this message]
2015-03-13 22:45     ` brian m. carlson
  -- strict thread matches above, loose matches on Subject: below --
2015-03-13 23:39 brian m. carlson
2015-03-14 14:16 ` brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqk2yn5l39.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox.com \
    --cc=dak@gnu.org \
    --cc=git@vger.kernel.org \
    --cc=j6t@kdbg.org \
    --cc=mackyle@gmail.com \
    --cc=mhagger@alum.mit.edu \
    --cc=nod.helm@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    --cc=schwab@linux-m68k.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.