[RFC] adding support for md5

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] adding support for md5
@ 2006-08-18  6:01 David Rientjes
  2006-08-18  9:59 ` Nguyễn Thái Ngọc Duy
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: David Rientjes @ 2006-08-18  6:01 UTC (permalink / raw)
  To: git

I'd like to solicit some comments about implementing support for md5 as a 
hash function that could be determined at runtime by the user during a 
project init-db.  md5, which I implemented as a configurable option in my 
own tree, is a 128-bit hash that is slightly more recognized than sha1.  
Likewise, it is also available in openssl/md5.h just as sha1 is available 
through a library in openssl/sha1.h.  My patch to move the hash name 
comparison was a step in this direction in isolating many of the 
particulars of hash-specific dependencies.

		David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-18  6:01 [RFC] adding support for md5 David Rientjes
@ 2006-08-18  9:59 ` Nguyễn Thái Ngọc Duy
  2006-08-18 10:21 ` Johannes Schindelin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2006-08-18  9:59 UTC (permalink / raw)
  To: David Rientjes; +Cc: git

On 8/18/06, David Rientjes <rientjes@google.com> wrote:
> I'd like to solicit some comments about implementing support for md5 as a
> hash function that could be determined at runtime by the user during a
> project init-db.  md5, which I implemented as a configurable option in my
> own tree, is a 128-bit hash that is slightly more recognized than sha1.
> Likewise, it is also available in openssl/md5.h just as sha1 is available
> through a library in openssl/sha1.h.  My patch to move the hash name
> comparison was a step in this direction in isolating many of the
> particulars of hash-specific dependencies.
Just curious, but why md5? Is there any benefit using md5 over sha1?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-18  6:01 [RFC] adding support for md5 David Rientjes
  2006-08-18  9:59 ` Nguyễn Thái Ngọc Duy
@ 2006-08-18 10:21 ` Johannes Schindelin
  2006-08-18 12:31   ` Petr Baudis
  2006-08-18 20:35   ` David Rientjes
  2006-08-18 10:52 ` Trekie
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 22+ messages in thread
From: Johannes Schindelin @ 2006-08-18 10:21 UTC (permalink / raw)
  To: David Rientjes; +Cc: git

Hi,

On Thu, 17 Aug 2006, David Rientjes wrote:

> I'd like to solicit some comments about implementing support for md5 as a 
> hash function that could be determined at runtime by the user during a 
> project init-db.

Make it a config variable, too, right?

It would be interesting to have other hashes, for three reasons:

1. they could be faster to calculate,
2. they could reduce clashes, and related to that,
3. it is possible that some day SHA1 is broken, i.e. that there is an 
   algorithm to generate a different text for a given hash.

As for 2 and 3, it seems MD5 is equivalent, since another sort of attacks 
was already successful on both SHA1 and MD5: generating two different 
texts with the same hash.

So, 1 could be a good reason to have another hash. IIRC SHA1 is about 25% 
slower than MD5, so it could be worth it.

However, you should know that there is _no way_ to use both hashes on the 
same project. Yes, you could rewrite the history, trying to convert also 
the hashes in the commit objects, but people actually started relying on 
naming commits with the short-SHA1.

I think it would be a nice thing to play through (for example, to find 
out how much impact the hash calculation has on the overall performance 
of git), but I doubt it will ever come to real use.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-18  6:01 [RFC] adding support for md5 David Rientjes
  2006-08-18  9:59 ` Nguyễn Thái Ngọc Duy
  2006-08-18 10:21 ` Johannes Schindelin
@ 2006-08-18 10:52 ` Trekie
  2006-08-18 10:56   ` Johannes Schindelin
  2006-08-18 21:52 ` Jon Smirl
  2006-08-19 20:50 ` Linus Torvalds
  4 siblings, 1 reply; 22+ messages in thread
From: Trekie @ 2006-08-18 10:52 UTC (permalink / raw)
  To: David Rientjes; +Cc: git

Hi,

I'd like to point out that while finding collision for SHA1 according to
the Wikipedia needs 2^63 operations and AFAIK no collision has been
found yet, finding collision for MD5 can be achieved in a minute and
less (see http://cryptography.hyperlink.cz/2006/tunnels.pdf for details).

David Brodsky

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-18 10:52 ` Trekie
@ 2006-08-18 10:56   ` Johannes Schindelin
  2006-08-18 11:27     ` Trekie
  0 siblings, 1 reply; 22+ messages in thread
From: Johannes Schindelin @ 2006-08-18 10:56 UTC (permalink / raw)
  To: Trekie; +Cc: David Rientjes, git

Hi,

On Fri, 18 Aug 2006, Trekie wrote:

> I'd like to point out that while finding collision for SHA1 according to
> the Wikipedia needs 2^63 operations and AFAIK no collision has been
> found yet, finding collision for MD5 can be achieved in a minute and
> less (see http://cryptography.hyperlink.cz/2006/tunnels.pdf for details).

SHA1 has been broken (collisions have been found):

http://www.schneier.com/blog/archives/2005/02/sha1_broken.html

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-18 10:56   ` Johannes Schindelin
@ 2006-08-18 11:27     ` Trekie
  2006-08-18 11:37       ` Johannes Schindelin
  0 siblings, 1 reply; 22+ messages in thread
From: Trekie @ 2006-08-18 11:27 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin wrote:
> SHA1 has been broken (collisions have been found):
> 
> http://www.schneier.com/blog/archives/2005/02/sha1_broken.html

I don't think you're right. That blog just says, that Wang can find

"collisions in the the full SHA-1 in 2**69 hash operations, much less
than the brute-force attack of 2**80 operations based on the hash length."

That doesn't mean any collision has been found. In academic
cryptography, any attack that has less computational complexity than the
expected time needed for brute force is considered a break.

In a document (http://www.rsasecurity.com/rsalabs/node.asp?id=2927) that
has been released 6 months after that blog post is said a collision can
be found in 2^63 operations.

Well, if someone use the fastest computer today
(http://www.top500.org/system/7747) to get a collision it would take a
day to found one.

The point is why use MD5 if anyone can compute a collision?

David Brodsky

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-18 11:27     ` Trekie
@ 2006-08-18 11:37       ` Johannes Schindelin
  0 siblings, 0 replies; 22+ messages in thread
From: Johannes Schindelin @ 2006-08-18 11:37 UTC (permalink / raw)
  To: Trekie; +Cc: git

Hi,

On Fri, 18 Aug 2006, Trekie wrote:

> Johannes Schindelin wrote:
> > SHA1 has been broken (collisions have been found):
> > 
> > http://www.schneier.com/blog/archives/2005/02/sha1_broken.html
> 
> I don't think you're right. That blog just says, that Wang can find
> 
> "collisions in the the full SHA-1 in 2**69 hash operations, much less
> than the brute-force attack of 2**80 operations based on the hash length."

True. I have not heard of a collision either.

> The point is why use MD5 if anyone can compute a collision?

It does not suffice to generate collisions to make a hash unusable for our 
purposes: you would have to find a way to produce another text for a 
_given_ hash. Plus, this text would not only have to look meaningful, but 
compile. And preferrably introduce a back door.

Granted, once people find out how to generate another text, they can try 
to "optimize" some block between "/*" and "*/", so that the hash stays the 
same. But AFAICT none of the breaks of SHA1 or MD5 point into such a 
direction. Yet.

But _even if_ somebody succeeds in all that, that somebody has to convince 
_you_ to pull. And if you already have that object (the "good" version), 
it will not get overwritten.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-18 10:21 ` Johannes Schindelin
@ 2006-08-18 12:31   ` Petr Baudis
  2006-08-18 20:35   ` David Rientjes
  1 sibling, 0 replies; 22+ messages in thread
From: Petr Baudis @ 2006-08-18 12:31 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: David Rientjes, git

Hi,

Dear diary, on Fri, Aug 18, 2006 at 12:21:11PM CEST, I got a letter
where Johannes Schindelin <Johannes.Schindelin@gmx.de> said that...
> However, you should know that there is _no way_ to use both hashes on the 
> same project. Yes, you could rewrite the history, trying to convert also 
> the hashes in the commit objects, but people actually started relying on 
> naming commits with the short-SHA1.

I don't really like having IDs ambiguous in this sense - having the same
type of IDs in all git-tracked projects has some cute benefits which are
of the kind that you don't know ahead that you will need them: joining
history of two distinct projects in a merge and theoretical possibility
of having subprojects where the main project references an exact
tree/commit of the sub project.

If we are ever going to implement support for multiple hashes, the hash
type should at least be part of the object id, in textual representation
as e.g. the first letter. This can still lead to convergence issues and
duplicate objects, but it enables smooth transition without rewriting
the history and it is much less confusing than just switching to a
different function.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Snow falling on Perl. White noise covering line noise.
Hides all the bugs too. -- J. Putnam

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-18 10:21 ` Johannes Schindelin
  2006-08-18 12:31   ` Petr Baudis
@ 2006-08-18 20:35   ` David Rientjes
  1 sibling, 0 replies; 22+ messages in thread
From: David Rientjes @ 2006-08-18 20:35 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Fri, 18 Aug 2006, Johannes Schindelin wrote:

> Make it a config variable, too, right?
> 

Sure.  The default hash function can be a config variable so that all 
projects started with init-db will default to a specific hash.  Other 
projects may still be started with something like init-db -md5.

> 1. they could be faster to calculate,
> 2. they could reduce clashes, and related to that,
> 3. it is possible that some day SHA1 is broken, i.e. that there is an 
>    algorithm to generate a different text for a given hash.
> 
> As for 2 and 3, it seems MD5 is equivalent, since another sort of attacks 
> was already successful on both SHA1 and MD5: generating two different 
> texts with the same hash.
> 

Correct; performance was my main motivation.  sha1 is obviously the 
securest algorithm among the two choices, but there are more steps 
involved in the hash than md5 (sha1 uses 80 and md5 uses 64) and sha1 is 
160-bit compared to the 128-bit md5.  One paper I read from the 
Information Technology Journal stated that sha1 is 25% slower than md5 
precisely for these reasons.

> However, you should know that there is _no way_ to use both hashes on the 
> same project. Yes, you could rewrite the history, trying to convert also 
> the hashes in the commit objects, but people actually started relying on 
> naming commits with the short-SHA1.
> 

I don't foresee changing a hash on a project (and thus rewriting the 
history) to be something that anybody would want to do.  As I said in the 
email that started this thread, it would be configurable at runtime on 
init-db.

> I think it would be a nice thing to play through (for example, to find 
> out how much impact the hash calculation has on the overall performance 
> of git), but I doubt it will ever come to real use.
> 

Again, when working with an enormous amount of data, this could be a 
considerable speedup.  A terabyte is _big_.

		David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-18  6:01 [RFC] adding support for md5 David Rientjes
                   ` (2 preceding siblings ...)
  2006-08-18 10:52 ` Trekie
@ 2006-08-18 21:52 ` Jon Smirl
  2006-08-19  2:35   ` Johannes Schindelin
  2006-08-19 20:50 ` Linus Torvalds
  4 siblings, 1 reply; 22+ messages in thread
From: Jon Smirl @ 2006-08-18 21:52 UTC (permalink / raw)
  To: David Rientjes; +Cc: git

On 8/18/06, David Rientjes <rientjes@google.com> wrote:
> I'd like to solicit some comments about implementing support for md5 as a
> hash function that could be determined at runtime by the user during a
> project init-db.  md5, which I implemented as a configurable option in my
> own tree, is a 128-bit hash that is slightly more recognized than sha1.
> Likewise, it is also available in openssl/md5.h just as sha1 is available
> through a library in openssl/sha1.h.  My patch to move the hash name
> comparison was a step in this direction in isolating many of the
> particulars of hash-specific dependencies.

If I have two repositories each with 100M objects in them and I merge
them, what is the probability of a object id collision with MD5 (128b)
versus SHA1 (160b)?

This is not that far fetched. I know of at least four repositories
with over 1M objects in them today.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-18 21:52 ` Jon Smirl
@ 2006-08-19  2:35   ` Johannes Schindelin
  0 siblings, 0 replies; 22+ messages in thread
From: Johannes Schindelin @ 2006-08-19  2:35 UTC (permalink / raw)
  To: Jon Smirl; +Cc: git

Hi,

On Fri, 18 Aug 2006, Jon Smirl wrote:

> If I have two repositories each with 100M objects in them and I merge 
> them, what is the probability of a object id collision with MD5 (128b) 
> versus SHA1 (160b)?

Assuming a uniform distribution of the hashes over our data, this is the 
birthday problem:

http://mathworld.wolfram.com/BirthdayProblem.html

(In short, given a number of days in the year, how many people do I need 
to pick randomly until at least two of them have the same birthday?)

In our case, we want to know how many objects we need in order to probably 
have a clash in 2^128 (approx. 3.4e38) and 2^160 (approx. 1.5e48) hashes, 
respectively.

Mathworld tells us that a good approximation of the probability is

p = 1 - (1-n/(2d))^(n-1)

where n is the number of objects, and d is the total number of hashes. If 
you have 100M = 1e5 objects, you probably want the probability of a clash 
below 1/1e5 = 1e-5, so let's take 1e-10. Assuming n is way lower than d, 
we can approximate

p = 1 - (1 - (n - 1 over 1) * n/(2d)) = n(n-1)/2d

and therefore (approximately)

n = sqrt(2pd)

which amounts to 2.6e14 in the case of a 128-bit hash, and 1.7e19 in the 
case of a 160-bit hash, both well beyond your 100M objects. BTW the 
addressable space of a 64-bit processor is about 1.9e19.

If you want to know the probability of a clash, you can use the same 
approximation:

For 100M objects: p = 1.5e-59 for 128-bit, and p = 3.3e-69 for 160-bit. 
This is so low as to be incomprehensible.

Remember that all these approximations are really crude, so do not rely on 
the precise numbers. But they'll give you good ballpark figures (if I did 
not make a mistake...).

Hth,
Dscho

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
@ 2006-08-19  3:19 linux
  2006-08-19 22:30 ` Petr Baudis
  0 siblings, 1 reply; 22+ messages in thread
From: linux @ 2006-08-19  3:19 UTC (permalink / raw)
  To: rientjes; +Cc: git

This is a Very Dumb Idea.

I normally try to be polite, but this concept is particularly
deserving of scorn.

The idea that more choice is a good thing is sometimes seductive, but when
it comes to standards, that's not a good idea.  It's like the famous joke
told about the dumb inhabitants of your favorite ethnic region: they're
going to try driving on the other side of the road.  Starting next month,
all the cars will drive on the other side.  If that goes well, the month
after they'll add trucks and buses.

If some disaster arose that required Git to change hash functions,
it would be possible to build a conversion utility, although all the
signatures in tag objects would break; you'd have to regenerate them, too.

But this is an all-or-nothing change.  Trying to support *multiple*
simultaneous hash functions is a mess.

Git depends very fundamentally on identical objects having identical
hashes.  The in-memory merge depends on it.  The network protocol
depends on it.  Various kludges can be imagined to handle blob objects
with multiple names, but tree objects quickly become unworkable.

Let's break down the solutions.  There are basically four classes,
depending on 
1) whether objects are stored in the database indexed by both hashes,
   or just one, and
2) Whether pointers to objects include both hashes, or just one.

If you include both hashes everywhere, then you've just built
a larger hash function that's the concatenation of SHA-1 and MD5,
and while it works sanely, it just makes the object IDs even
bigger, and there are obviously no speed benefts.  But this is
the most reasonable alternative.

If pointers include both hashes, but objects are indexed by only one,
then to find an object by pointer requires two lookups, and you still
need to hash every blob twice when committing get the values to
put in the tree objects.  So obviosuly no faster than the first option.

Okay, so pointers are only one hash.  If they were always the same hash,
the second hash would be utterly pointless, so we're assuming the
database contains a mix.

If objects are indexed by both hashes, then you can hash new blobs once
to check to see if they're already in the database, but if they're not,
you have to hash again with the another algorithm.

On the other hand, if you index by only one, then every object being
checked to see if it's already in the database needs to be hashed twice
so it can be looked up twice.  What were the claimed speed gains?

But more to the point, any system which stores one arbitrary hash as
a pointer makes the the tree object created to describe a directory no
longer unique, which results in its hash being fundamentally non-unique,
which cascades all the way up to the commit object.

So you can get silly things like the need for a merge commit to
record the merge of trees that are actually identical.

Just a big mess.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-18  6:01 [RFC] adding support for md5 David Rientjes
                   ` (3 preceding siblings ...)
  2006-08-18 21:52 ` Jon Smirl
@ 2006-08-19 20:50 ` Linus Torvalds
  2006-08-21 20:44   ` Chris Wedgwood
  4 siblings, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2006-08-19 20:50 UTC (permalink / raw)
  To: David Rientjes; +Cc: git

On Thu, 17 Aug 2006, David Rientjes wrote:
>
> I'd like to solicit some comments about implementing support for md5 as a 
> hash function that could be determined at runtime by the user during a 
> project init-db.

I would _strongly_ suggest against this. At least not md5. 

I can see the point of configurable hashes, but it would be for a stronger 
hash than sha1, not for a (much) weaker one.

md5 is not only shorter, it's known to be broken, and there are attacks 
out there that generate documents with the same md5 checksum quickly and 
undetectably (ie depending on what the "document format" is, you might 
actually not _see_ the corruption).

There's a real-life example of this (just google for "same md5") with a 
postscript file, which when printed out still looks "valid".

In contrast, sha1 is still considered "hard", in that while you can 
obviously always brute-force _any_ hash, the sha1 brute-forcing attack is 
considered to be impractical and nobody has at least shown any realistic 
version of the above postscript kind of hack.

In my fairly limited performance analysis, I've actually been surprised by 
the fact that the hashing has never really shown up as a major issue in 
any of my profiles. All the _real_ performance issues have been related to 
memory usage, and things like the hash lookup (ie "memcmp()" was pretty 
high on the list - just from comparing object names during lookup).

We've also had compression issues (initial check-in) and obviously the 
delta selection used to be a _huge_ time-waster until the pack info reuse 
code went in. But I don't think we've ever had a load that was really 
hashing-limited.

So considering that md5 isn't _that_ much faster to compute (let's say 
that it's ~30% slower), the biggest advantage of md5 would likely be just 
the fact that 16 bytes is smaller than 20 bytes, and thus commit objects 
and tree objects in particular could be smaller. But you'd be better off 
just using the first 16 bytes of the sha1 than the md5 hash, if that was 
the main goal.

So yes, maybe we'll want to make the hash choice a setup-time option, but 
if we ever do, I don't think we should make md5 even a choice. It's just 
not a very good hash, and no new program should start using it. 

			Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-19  3:19 linux
@ 2006-08-19 22:30 ` Petr Baudis
  0 siblings, 0 replies; 22+ messages in thread
From: Petr Baudis @ 2006-08-19 22:30 UTC (permalink / raw)
  To: linux; +Cc: rientjes, git

Pre Scriptum: I do not advocate supporting many hashing functions just
for the heck of it, but I don't like the "full sweep" approach _when_
(not "in case") we _need_ to change the hash function. After all,
currently we have to operate with the fact that the next one will likely
fall victim of attacks in time as well and this is going to be
periodical operation.

Dear diary, on Sat, Aug 19, 2006 at 05:19:31AM CEST, I got a letter
where linux@horizon.com said that...
> So you can get silly things like the need for a merge commit to
> record the merge of trees that are actually identical.

You can create an ordering of hashes based on their strength and then
rehash your objects to max() of hash types used in the operation. So, if
you're about to merge md5(x) and sha1(y), you first recompute the first
id as sha1(x). Of course "recomputing" needs to be a wee bit more
complex: you need to first substitute tree id for a common hash as well
(and recursively do the same in the trees); slow, yes, but you can
cache it.

It gets worse when you are about to recompute ancestry, so you need to
set up the hash usage transition so that you don't ever have to
- such an event is sufficiently rare that each branch keeper can declare
a flag day "before now I merged only md5 commits and from now on I
will merge only sha1 commits". The tool can help to enforce this.
This strategy will not help in 100% of the cases (md5git-based tool
takes sha1git-created commit and re-creates it based on a snapshot and
the precise same metadata) but I guess it's good enough.

It's good that this is just theory so far, though. I'd say let it be for
now (at least from the "emergency hash switch" standpoint) and do it
when it's needed; you will have to have people upgrade their Git at that
point anyway and the code would bitrot just sitting there and adding
so-far unnecessary complexity. We have worse problems at our hands right
now anyway.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Snow falling on Perl. White noise covering line noise.
Hides all the bugs too. -- J. Putnam

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-19 20:50 ` Linus Torvalds
@ 2006-08-21 20:44   ` Chris Wedgwood
  2006-08-22  6:18     ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Wedgwood @ 2006-08-21 20:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Rientjes, git

On Sat, Aug 19, 2006 at 01:50:32PM -0700, Linus Torvalds wrote:

> I can see the point of configurable hashes, but it would be for a
> stronger hash than sha1, not for a (much) weaker one.

Why any configuration option at all?  What in practice does it really
buy?

If someone (eventually) wants to do something malicious (which right
now requires some effort and would probably not go undetected) there
are probably easier ways to achieve this (like posting a patch with a
non-obvious subtle side-effect).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-21 20:44   ` Chris Wedgwood
@ 2006-08-22  6:18     ` Junio C Hamano
  2006-08-23  4:14       ` Shawn Pearce
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2006-08-22  6:18 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: git

Chris Wedgwood <cw@f00f.org> writes:

> On Sat, Aug 19, 2006 at 01:50:32PM -0700, Linus Torvalds wrote:
>
>> I can see the point of configurable hashes, but it would be for a
>> stronger hash than sha1, not for a (much) weaker one.
>
> Why any configuration option at all?  What in practice does it really
> buy?

I personally am not interested in making this configurable at
all.  The hashcmp() change on the other hand to abstract out 20
was a good preparation, if we ever want to switch to longer
hashes we would know where to look.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-22  6:18     ` Junio C Hamano
@ 2006-08-23  4:14       ` Shawn Pearce
  2006-08-23  4:46         ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Shawn Pearce @ 2006-08-23  4:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <junkio@cox.net> wrote:
> I personally am not interested in making this configurable at
> all.  The hashcmp() change on the other hand to abstract out 20
> was a good preparation, if we ever want to switch to longer
> hashes we would know where to look.

What about all of those memcpy(a, b, 20)'s?  :-)

I can see us wanting to support say SHA-128 or SHA-256 in a few
years.  Especially as processors get faster and better attacks are
developed against SHA-1 such that its no longer really the best
trade-off hash function available.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-23  4:14       ` Shawn Pearce
@ 2006-08-23  4:46         ` Junio C Hamano
  2006-08-23  6:49           ` Shawn Pearce
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2006-08-23  4:46 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git

Shawn Pearce <spearce@spearce.org> writes:

> Junio C Hamano <junkio@cox.net> wrote:
>> I personally am not interested in making this configurable at
>> all.  The hashcmp() change on the other hand to abstract out 20
>> was a good preparation, if we ever want to switch to longer
>> hashes we would know where to look.
>
> What about all of those memcpy(a, b, 20)'s?  :-)

Surely.  If you are inclined to, go wild.

> I can see us wanting to support say SHA-128 or SHA-256 in a few
> years.  Especially as processors get faster and better attacks are
> developed against SHA-1 such that its no longer really the best
> trade-off hash function available.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-23  4:46         ` Junio C Hamano
@ 2006-08-23  6:49           ` Shawn Pearce
  2006-08-24  7:36             ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Shawn Pearce @ 2006-08-23  6:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <junkio@cox.net> wrote:
> Shawn Pearce <spearce@spearce.org> writes:
> 
> > Junio C Hamano <junkio@cox.net> wrote:
> >> I personally am not interested in making this configurable at
> >> all.  The hashcmp() change on the other hand to abstract out 20
> >> was a good preparation, if we ever want to switch to longer
> >> hashes we would know where to look.
> >
> > What about all of those memcpy(a, b, 20)'s?  :-)
> 
> Surely.  If you are inclined to, go wild.

Like this?  :-)

-->--
Convert memcpy(a,b,20) to hashcpy(a,b).

This abstracts away the size of the hash values when copying them
from memory location to memory location, much as the introduction
of hashcmp abstracted away hash value comparsion.

A few call sites were using char* rather than unsigned char* so
I added the cast rather than open hashcpy to be void*.  This is a
reasonable tradeoff as most call sites already use unsigned char*
and the existing hashcmp is also declared to be unsigned char*.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 blame.c                  |    4 ++--
 builtin-diff.c           |    4 ++--
 builtin-pack-objects.c   |    6 +++---
 builtin-read-tree.c      |    2 +-
 builtin-unpack-objects.c |    4 ++--
 builtin-update-index.c   |    4 ++--
 builtin-write-tree.c     |    4 ++--
 cache-tree.c             |    6 +++---
 cache.h                  |    4 ++++
 combine-diff.c           |    6 +++---
 connect.c                |    6 +++---
 convert-objects.c        |    6 +++---
 csum-file.c              |    2 +-
 diff.c                   |    4 ++--
 fetch-pack.c             |    2 +-
 fetch.c                  |    2 +-
 fsck-objects.c           |    2 +-
 http-fetch.c             |    2 +-
 http-push.c              |    6 +++---
 index-pack.c             |    4 ++--
 merge-recursive.c        |   20 ++++++++++----------
 mktree.c                 |    4 ++--
 object.c                 |    2 +-
 patch-id.c               |    2 +-
 receive-pack.c           |    4 ++--
 revision.c               |    2 +-
 send-pack.c              |    4 ++--
 sha1_file.c              |   18 +++++++++---------
 sha1_name.c              |   22 +++++++++++-----------
 tree-walk.c              |    4 ++--
 tree.c                   |    2 +-
 unpack-trees.c           |    2 +-
 32 files changed, 85 insertions(+), 81 deletions(-)

diff --git a/blame.c b/blame.c
index c253b9c..5a8af72 100644
--- a/blame.c
+++ b/blame.c
@@ -176,7 +176,7 @@ static int get_blob_sha1(struct tree *t,
 	if (i == 20)
 		return -1;
 
-	memcpy(sha1, blob_sha1, 20);
+	hashcpy(sha1, blob_sha1);
 	return 0;
 }
 
@@ -191,7 +191,7 @@ static int get_blob_sha1_internal(const 
 	    strcmp(blame_file + baselen, pathname))
 		return -1;
 
-	memcpy(blob_sha1, sha1, 20);
+	hashcpy(blob_sha1, sha1);
 	return -1;
 }
 
diff --git a/builtin-diff.c b/builtin-diff.c
index 874f773..a659020 100644
--- a/builtin-diff.c
+++ b/builtin-diff.c
@@ -192,7 +192,7 @@ static int builtin_diff_combined(struct 
 	parent = xmalloc(ents * sizeof(*parent));
 	/* Again, the revs are all reverse */
 	for (i = 0; i < ents; i++)
-		memcpy(parent + i, ent[ents - 1 - i].item->sha1, 20);
+		hashcpy((unsigned char*)parent + i, ent[ents - 1 - i].item->sha1);
 	diff_tree_combined(parent[0], parent + 1, ents - 1,
 			   revs->dense_combined_merges, revs);
 	return 0;
@@ -290,7 +290,7 @@ int cmd_diff(int argc, const char **argv
 		if (obj->type == OBJ_BLOB) {
 			if (2 <= blobs)
 				die("more than two blobs given: '%s'", name);
-			memcpy(blob[blobs].sha1, obj->sha1, 20);
+			hashcpy(blob[blobs].sha1, obj->sha1);
 			blob[blobs].name = name;
 			blobs++;
 			continue;
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index f19f0d6..46f524d 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -534,7 +534,7 @@ static int add_object_entry(const unsign
 	entry = objects + idx;
 	nr_objects = idx + 1;
 	memset(entry, 0, sizeof(*entry));
-	memcpy(entry->sha1, sha1, 20);
+	hashcpy(entry->sha1, sha1);
 	entry->hash = hash;
 
 	if (object_ix_hashsz * 3 <= nr_objects * 4)
@@ -649,7 +649,7 @@ static struct pbase_tree_cache *pbase_tr
 		free(ent->tree_data);
 		nent = ent;
 	}
-	memcpy(nent->sha1, sha1, 20);
+	hashcpy(nent->sha1, sha1);
 	nent->tree_data = data;
 	nent->tree_size = size;
 	nent->ref = 1;
@@ -799,7 +799,7 @@ static void add_preferred_base(unsigned 
 	it->next = pbase_tree;
 	pbase_tree = it;
 
-	memcpy(it->pcache.sha1, tree_sha1, 20);
+	hashcpy(it->pcache.sha1, tree_sha1);
 	it->pcache.tree_data = data;
 	it->pcache.tree_size = size;
 }
diff --git a/builtin-read-tree.c b/builtin-read-tree.c
index 53087fa..c1867d2 100644
--- a/builtin-read-tree.c
+++ b/builtin-read-tree.c
@@ -53,7 +53,7 @@ static void prime_cache_tree_rec(struct 
 	struct name_entry entry;
 	int cnt;
 
-	memcpy(it->sha1, tree->object.sha1, 20);
+	hashcpy(it->sha1, tree->object.sha1);
 	desc.buf = tree->buffer;
 	desc.size = tree->size;
 	cnt = 0;
diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c
index f0ae5c9..ca0ebc2 100644
--- a/builtin-unpack-objects.c
+++ b/builtin-unpack-objects.c
@@ -95,7 +95,7 @@ static void add_delta_to_list(unsigned c
 {
 	struct delta_info *info = xmalloc(sizeof(*info));
 
-	memcpy(info->base_sha1, base_sha1, 20);
+	hashcpy(info->base_sha1, base_sha1);
 	info->size = size;
 	info->delta = delta;
 	info->next = delta_list;
@@ -173,7 +173,7 @@ static int unpack_delta_entry(unsigned l
 	unsigned char base_sha1[20];
 	int result;
 
-	memcpy(base_sha1, fill(20), 20);
+	hashcpy(base_sha1, fill(20));
 	use(20);
 
 	delta_data = get_data(delta_size);
diff --git a/builtin-update-index.c b/builtin-update-index.c
index 5dd91af..8675126 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -142,7 +142,7 @@ static int add_cacheinfo(unsigned int mo
 	size = cache_entry_size(len);
 	ce = xcalloc(1, size);
 
-	memcpy(ce->sha1, sha1, 20);
+	hashcpy(ce->sha1, sha1);
 	memcpy(ce->name, path, len);
 	ce->ce_flags = create_ce_flags(len, stage);
 	ce->ce_mode = create_ce_mode(mode);
@@ -333,7 +333,7 @@ static struct cache_entry *read_one_ent(
 	size = cache_entry_size(namelen);
 	ce = xcalloc(1, size);
 
-	memcpy(ce->sha1, sha1, 20);
+	hashcpy(ce->sha1, sha1);
 	memcpy(ce->name, path, namelen);
 	ce->ce_flags = create_ce_flags(namelen, stage);
 	ce->ce_mode = create_ce_mode(mode);
diff --git a/builtin-write-tree.c b/builtin-write-tree.c
index ca06149..50670dc 100644
--- a/builtin-write-tree.c
+++ b/builtin-write-tree.c
@@ -50,10 +50,10 @@ int write_tree(unsigned char *sha1, int 
 	if (prefix) {
 		struct cache_tree *subtree =
 			cache_tree_find(active_cache_tree, prefix);
-		memcpy(sha1, subtree->sha1, 20);
+		hashcpy(sha1, subtree->sha1);
 	}
 	else
-		memcpy(sha1, active_cache_tree->sha1, 20);
+		hashcpy(sha1, active_cache_tree->sha1);
 
 	rollback_lock_file(lock_file);
 
diff --git a/cache-tree.c b/cache-tree.c
index d9f7e1e..323c68a 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -335,7 +335,7 @@ static int update_one(struct cache_tree 
 		offset += sprintf(buffer + offset,
 				  "%o %.*s", mode, entlen, path + baselen);
 		buffer[offset++] = 0;
-		memcpy(buffer + offset, sha1, 20);
+		hashcpy((unsigned char*)buffer + offset, sha1);
 		offset += 20;
 
 #if DEBUG
@@ -412,7 +412,7 @@ #if DEBUG
 #endif
 
 	if (0 <= it->entry_count) {
-		memcpy(buffer + *offset, it->sha1, 20);
+		hashcpy((unsigned char*)buffer + *offset, it->sha1);
 		*offset += 20;
 	}
 	for (i = 0; i < it->subtree_nr; i++) {
@@ -478,7 +478,7 @@ static struct cache_tree *read_one(const
 	if (0 <= it->entry_count) {
 		if (size < 20)
 			goto free_return;
-		memcpy(it->sha1, buf, 20);
+		hashcpy(it->sha1, (unsigned char*)buf);
 		buf += 20;
 		size -= 20;
 	}
diff --git a/cache.h b/cache.h
index cd2ad90..df642a1 100644
--- a/cache.h
+++ b/cache.h
@@ -222,6 +222,10 @@ static inline int hashcmp(const unsigned
 {
 	return memcmp(sha1, sha2, 20);
 }
+static inline void hashcpy(unsigned char *sha_dst, const unsigned char *sha_src)
+{
+	memcpy(sha_dst, sha_src, 20);
+}
 
 int git_mkstemp(char *path, size_t n, const char *template);
 
diff --git a/combine-diff.c b/combine-diff.c
index 0682acd..466899d 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -31,9 +31,9 @@ static struct combine_diff_path *interse
 			memset(p->parent, 0,
 			       sizeof(p->parent[0]) * num_parent);
 
-			memcpy(p->sha1, q->queue[i]->two->sha1, 20);
+			hashcpy(p->sha1, q->queue[i]->two->sha1);
 			p->mode = q->queue[i]->two->mode;
-			memcpy(p->parent[n].sha1, q->queue[i]->one->sha1, 20);
+			hashcpy(p->parent[n].sha1, q->queue[i]->one->sha1);
 			p->parent[n].mode = q->queue[i]->one->mode;
 			p->parent[n].status = q->queue[i]->status;
 			*tail = p;
@@ -927,6 +927,6 @@ void diff_tree_combined_merge(const unsi
 	for (parents = commit->parents, num_parent = 0;
 	     parents;
 	     parents = parents->next, num_parent++)
-		memcpy(parent + num_parent, parents->item->object.sha1, 20);
+		hashcpy((unsigned char*)parent + num_parent, parents->item->object.sha1);
 	diff_tree_combined(sha1, parent, num_parent, dense, rev);
 }
diff --git a/connect.c b/connect.c
index 7a6a73f..e501ccc 100644
--- a/connect.c
+++ b/connect.c
@@ -77,7 +77,7 @@ struct ref **get_remote_heads(int in, st
 		if (nr_match && !path_match(name, nr_match, match))
 			continue;
 		ref = xcalloc(1, sizeof(*ref) + len - 40);
-		memcpy(ref->old_sha1, old_sha1, 20);
+		hashcpy(ref->old_sha1, old_sha1);
 		memcpy(ref->name, buffer + 41, len - 40);
 		*list = ref;
 		list = &ref->next;
@@ -208,7 +208,7 @@ static struct ref *try_explicit_object_n
 	len = strlen(name) + 1;
 	ref = xcalloc(1, sizeof(*ref) + len);
 	memcpy(ref->name, name, len);
-	memcpy(ref->new_sha1, sha1, 20);
+	hashcpy(ref->new_sha1, sha1);
 	return ref;
 }
 
@@ -318,7 +318,7 @@ int match_refs(struct ref *src, struct r
 			int len = strlen(src->name) + 1;
 			dst_peer = xcalloc(1, sizeof(*dst_peer) + len);
 			memcpy(dst_peer->name, src->name, len);
-			memcpy(dst_peer->new_sha1, src->new_sha1, 20);
+			hashcpy(dst_peer->new_sha1, src->new_sha1);
 			link_dst_tail(dst_peer, dst_tail);
 		}
 		dst_peer->peer_ref = src;
diff --git a/convert-objects.c b/convert-objects.c
index 4e7ff75..631678b 100644
--- a/convert-objects.c
+++ b/convert-objects.c
@@ -23,7 +23,7 @@ static struct entry * convert_entry(unsi
 static struct entry *insert_new(unsigned char *sha1, int pos)
 {
 	struct entry *new = xcalloc(1, sizeof(struct entry));
-	memcpy(new->old_sha1, sha1, 20);
+	hashcpy(new->old_sha1, sha1);
 	memmove(convert + pos + 1, convert + pos, (nr_convert - pos) * sizeof(struct entry *));
 	convert[pos] = new;
 	nr_convert++;
@@ -54,7 +54,7 @@ static struct entry *lookup_entry(unsign
 static void convert_binary_sha1(void *buffer)
 {
 	struct entry *entry = convert_entry(buffer);
-	memcpy(buffer, entry->new_sha1, 20);
+	hashcpy(buffer, entry->new_sha1);
 }
 
 static void convert_ascii_sha1(void *buffer)
@@ -104,7 +104,7 @@ static int write_subdirectory(void *buff
 		if (!slash) {
 			newlen += sprintf(new + newlen, "%o %s", mode, path);
 			new[newlen++] = '\0';
-			memcpy(new + newlen, (char *) buffer + len - 20, 20);
+			hashcpy((unsigned char*)new + newlen, (unsigned char *) buffer + len - 20);
 			newlen += 20;
 
 			used += len;
diff --git a/csum-file.c b/csum-file.c
index e227889..b7174c6 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -38,7 +38,7 @@ int sha1close(struct sha1file *f, unsign
 	}
 	SHA1_Final(f->buffer, &f->ctx);
 	if (result)
-		memcpy(result, f->buffer, 20);
+		hashcpy(result, f->buffer);
 	if (update)
 		sha1flush(f, 20);
 	if (close(f->fd))
diff --git a/diff.c b/diff.c
index ddf2dea..852c175 100644
--- a/diff.c
+++ b/diff.c
@@ -1107,7 +1107,7 @@ void fill_filespec(struct diff_filespec 
 {
 	if (mode) {
 		spec->mode = canon_mode(mode);
-		memcpy(spec->sha1, sha1, 20);
+		hashcpy(spec->sha1, sha1);
 		spec->sha1_valid = !is_null_sha1(sha1);
 	}
 }
@@ -1200,7 +1200,7 @@ static struct sha1_size_cache *locate_si
 			sizeof(*sha1_size_cache));
 	e = xmalloc(sizeof(struct sha1_size_cache));
 	sha1_size_cache[first] = e;
-	memcpy(e->sha1, sha1, 20);
+	hashcpy(e->sha1, sha1);
 	e->size = size;
 	return e;
 }
diff --git a/fetch-pack.c b/fetch-pack.c
index e18c148..377fede 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -404,7 +404,7 @@ static int everything_local(struct ref *
 			continue;
 		}
 
-		memcpy(ref->new_sha1, local, 20);
+		hashcpy(ref->new_sha1, local);
 		if (!verbose)
 			continue;
 		fprintf(stderr,
diff --git a/fetch.c b/fetch.c
index aeb6bf2..ef60b04 100644
--- a/fetch.c
+++ b/fetch.c
@@ -84,7 +84,7 @@ static int process_commit(struct commit 
 	if (commit->object.flags & COMPLETE)
 		return 0;
 
-	memcpy(current_commit_sha1, commit->object.sha1, 20);
+	hashcpy(current_commit_sha1, commit->object.sha1);
 
 	pull_say("walk %s\n", sha1_to_hex(commit->object.sha1));
 
diff --git a/fsck-objects.c b/fsck-objects.c
index 31e00d8..ae0ec8d 100644
--- a/fsck-objects.c
+++ b/fsck-objects.c
@@ -356,7 +356,7 @@ static void add_sha1_list(unsigned char 
 	int nr;
 
 	entry->ino = ino;
-	memcpy(entry->sha1, sha1, 20);
+	hashcpy(entry->sha1, sha1);
 	nr = sha1_list.nr;
 	if (nr == MAX_SHA1_ENTRIES) {
 		fsck_sha1_list();
diff --git a/http-fetch.c b/http-fetch.c
index d1f74b4..7619b33 100644
--- a/http-fetch.c
+++ b/http-fetch.c
@@ -393,7 +393,7 @@ void prefetch(unsigned char *sha1)
 	char *filename = sha1_file_name(sha1);
 
 	newreq = xmalloc(sizeof(*newreq));
-	memcpy(newreq->sha1, sha1, 20);
+	hashcpy(newreq->sha1, sha1);
 	newreq->repo = alt;
 	newreq->url = NULL;
 	newreq->local = -1;
diff --git a/http-push.c b/http-push.c
index 4849779..ebfcc73 100644
--- a/http-push.c
+++ b/http-push.c
@@ -1874,7 +1874,7 @@ static int one_local_ref(const char *ref
 	struct ref *ref;
 	int len = strlen(refname) + 1;
 	ref = xcalloc(1, sizeof(*ref) + len);
-	memcpy(ref->new_sha1, sha1, 20);
+	hashcpy(ref->new_sha1, sha1);
 	memcpy(ref->name, refname, len);
 	*local_tail = ref;
 	local_tail = &ref->next;
@@ -1909,7 +1909,7 @@ static void one_remote_ref(char *refname
 	}
 
 	ref = xcalloc(1, sizeof(*ref) + len);
-	memcpy(ref->old_sha1, remote_sha1, 20);
+	hashcpy(ref->old_sha1, remote_sha1);
 	memcpy(ref->name, refname, len);
 	*remote_tail = ref;
 	remote_tail = &ref->next;
@@ -2445,7 +2445,7 @@ int main(int argc, char **argv)
 				continue;
 			}
 		}
-		memcpy(ref->new_sha1, ref->peer_ref->new_sha1, 20);
+		hashcpy(ref->new_sha1, ref->peer_ref->new_sha1);
 		if (is_zero_sha1(ref->new_sha1)) {
 			error("cannot happen anymore");
 			rc = -3;
diff --git a/index-pack.c b/index-pack.c
index 96ea687..80bc6cb 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -161,7 +161,7 @@ static void *unpack_raw_entry(unsigned l
 	case OBJ_DELTA:
 		if (pos + 20 >= pack_limit)
 			bad_object(offset, "object extends past end of pack");
-		memcpy(delta_base, pack_base + pos, 20);
+		hashcpy(delta_base, pack_base + pos);
 		pos += 20;
 		/* fallthru */
 	case OBJ_COMMIT:
@@ -304,7 +304,7 @@ static void parse_pack_objects(void)
 		if (obj->type == OBJ_DELTA) {
 			struct delta_entry *delta = &deltas[nr_deltas++];
 			delta->obj = obj;
-			memcpy(delta->base_sha1, base_sha1, 20);
+			hashcpy(delta->base_sha1, base_sha1);
 		} else
 			sha1_object(data, data_size, obj->type, obj->sha1);
 		free(data);
diff --git a/merge-recursive.c b/merge-recursive.c
index 048cca1..8a2f697 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -158,7 +158,7 @@ static struct cache_entry *make_cache_en
 	size = cache_entry_size(len);
 	ce = xcalloc(1, size);
 
-	memcpy(ce->sha1, sha1, 20);
+	hashcpy(ce->sha1, sha1);
 	memcpy(ce->name, path, len);
 	ce->ce_flags = create_ce_flags(len, stage);
 	ce->ce_mode = create_ce_mode(mode);
@@ -355,7 +355,7 @@ static struct path_list *get_unmerged(vo
 		}
 		e = item->util;
 		e->stages[ce_stage(ce)].mode = ntohl(ce->ce_mode);
-		memcpy(e->stages[ce_stage(ce)].sha, ce->sha1, 20);
+		hashcpy(e->stages[ce_stage(ce)].sha, ce->sha1);
 	}
 
 	return unmerged;
@@ -636,10 +636,10 @@ static struct merge_file_info merge_file
 		result.clean = 0;
 		if (S_ISREG(a->mode)) {
 			result.mode = a->mode;
-			memcpy(result.sha, a->sha1, 20);
+			hashcpy(result.sha, a->sha1);
 		} else {
 			result.mode = b->mode;
-			memcpy(result.sha, b->sha1, 20);
+			hashcpy(result.sha, b->sha1);
 		}
 	} else {
 		if (!sha_eq(a->sha1, o->sha1) && !sha_eq(b->sha1, o->sha1))
@@ -648,9 +648,9 @@ static struct merge_file_info merge_file
 		result.mode = a->mode == o->mode ? b->mode: a->mode;
 
 		if (sha_eq(a->sha1, o->sha1))
-			memcpy(result.sha, b->sha1, 20);
+			hashcpy(result.sha, b->sha1);
 		else if (sha_eq(b->sha1, o->sha1))
-			memcpy(result.sha, a->sha1, 20);
+			hashcpy(result.sha, a->sha1);
 		else if (S_ISREG(a->mode)) {
 			int code = 1, fd;
 			struct stat st;
@@ -699,7 +699,7 @@ static struct merge_file_info merge_file
 			if (!(S_ISLNK(a->mode) || S_ISLNK(b->mode)))
 				die("cannot merge modes?");
 
-			memcpy(result.sha, a->sha1, 20);
+			hashcpy(result.sha, a->sha1);
 
 			if (!sha_eq(a->sha1, b->sha1))
 				result.clean = 0;
@@ -1096,11 +1096,11 @@ static int process_entry(const char *pat
 
 		output("Auto-merging %s", path);
 		o.path = a.path = b.path = (char *)path;
-		memcpy(o.sha1, o_sha, 20);
+		hashcpy(o.sha1, o_sha);
 		o.mode = o_mode;
-		memcpy(a.sha1, a_sha, 20);
+		hashcpy(a.sha1, a_sha);
 		a.mode = a_mode;
-		memcpy(b.sha1, b_sha, 20);
+		hashcpy(b.sha1, b_sha);
 		b.mode = b_mode;
 
 		mfi = merge_file(&o, &a, &b,
diff --git a/mktree.c b/mktree.c
index 9324138..56205d1 100644
--- a/mktree.c
+++ b/mktree.c
@@ -30,7 +30,7 @@ static void append_to_tree(unsigned mode
 	ent = entries[used++] = xmalloc(sizeof(**entries) + len + 1);
 	ent->mode = mode;
 	ent->len = len;
-	memcpy(ent->sha1, sha1, 20);
+	hashcpy(ent->sha1, sha1);
 	memcpy(ent->name, path, len+1);
 }
 
@@ -64,7 +64,7 @@ static void write_tree(unsigned char *sh
 		offset += sprintf(buffer + offset, "%o ", ent->mode);
 		offset += sprintf(buffer + offset, "%s", ent->name);
 		buffer[offset++] = 0;
-		memcpy(buffer + offset, ent->sha1, 20);
+		hashcpy((unsigned char*)buffer + offset, ent->sha1);
 		offset += 20;
 	}
 	write_sha1_file(buffer, offset, tree_type, sha1);
diff --git a/object.c b/object.c
index fdcfff7..60bf16b 100644
--- a/object.c
+++ b/object.c
@@ -91,7 +91,7 @@ void created_object(const unsigned char 
 	obj->used = 0;
 	obj->type = OBJ_NONE;
 	obj->flags = 0;
-	memcpy(obj->sha1, sha1, 20);
+	hashcpy(obj->sha1, sha1);
 
 	if (obj_hash_size - 1 <= nr_objs * 2)
 		grow_object_hash();
diff --git a/patch-id.c b/patch-id.c
index 3b4c80f..086d2d9 100644
--- a/patch-id.c
+++ b/patch-id.c
@@ -47,7 +47,7 @@ static void generate_id_list(void)
 
 		if (!get_sha1_hex(p, n)) {
 			flush_current_id(patchlen, sha1, &ctx);
-			memcpy(sha1, n, 20);
+			hashcpy(sha1, n);
 			patchlen = 0;
 			continue;
 		}
diff --git a/receive-pack.c b/receive-pack.c
index 81e9190..2015316 100644
--- a/receive-pack.c
+++ b/receive-pack.c
@@ -247,8 +247,8 @@ static void read_head_info(void)
 				report_status = 1;
 		}
 		cmd = xmalloc(sizeof(struct command) + len - 80);
-		memcpy(cmd->old_sha1, old_sha1, 20);
-		memcpy(cmd->new_sha1, new_sha1, 20);
+		hashcpy(cmd->old_sha1, old_sha1);
+		hashcpy(cmd->new_sha1, new_sha1);
 		memcpy(cmd->ref_name, line + 82, len - 81);
 		cmd->error_string = "n/a (unpacker error)";
 		cmd->next = NULL;
diff --git a/revision.c b/revision.c
index 5a91d06..1d89d72 100644
--- a/revision.c
+++ b/revision.c
@@ -496,7 +496,7 @@ static int add_parents_only(struct rev_i
 		it = get_reference(revs, arg, sha1, 0);
 		if (it->type != OBJ_TAG)
 			break;
-		memcpy(sha1, ((struct tag*)it)->tagged->sha1, 20);
+		hashcpy(sha1, ((struct tag*)it)->tagged->sha1);
 	}
 	if (it->type != OBJ_COMMIT)
 		return 0;
diff --git a/send-pack.c b/send-pack.c
index f7c0cfc..fd79a61 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -185,7 +185,7 @@ static int one_local_ref(const char *ref
 	struct ref *ref;
 	int len = strlen(refname) + 1;
 	ref = xcalloc(1, sizeof(*ref) + len);
-	memcpy(ref->new_sha1, sha1, 20);
+	hashcpy(ref->new_sha1, sha1);
 	memcpy(ref->name, refname, len);
 	*local_tail = ref;
 	local_tail = &ref->next;
@@ -310,7 +310,7 @@ static int send_pack(int in, int out, in
 				continue;
 			}
 		}
-		memcpy(ref->new_sha1, ref->peer_ref->new_sha1, 20);
+		hashcpy(ref->new_sha1, ref->peer_ref->new_sha1);
 		if (is_zero_sha1(ref->new_sha1)) {
 			error("cannot happen anymore");
 			ret = -3;
diff --git a/sha1_file.c b/sha1_file.c
index 4b1be72..789deb7 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -558,7 +558,7 @@ struct packed_git *add_packed_git(char *
 	p->pack_use_cnt = 0;
 	p->pack_local = local;
 	if ((path_len > 44) && !get_sha1_hex(path + path_len - 44, sha1))
-		memcpy(p->sha1, sha1, 20);
+		hashcpy(p->sha1, sha1);
 	return p;
 }
 
@@ -589,7 +589,7 @@ struct packed_git *parse_pack_index_file
 	p->pack_base = NULL;
 	p->pack_last_used = 0;
 	p->pack_use_cnt = 0;
-	memcpy(p->sha1, sha1, 20);
+	hashcpy(p->sha1, sha1);
 	return p;
 }
 
@@ -971,7 +971,7 @@ int check_reuse_pack_delta(struct packed
 	ptr = unpack_object_header(p, ptr, kindp, sizep);
 	if (*kindp != OBJ_DELTA)
 		goto done;
-	memcpy(base, (unsigned char *) p->pack_base + ptr, 20);
+	hashcpy(base, (unsigned char *) p->pack_base + ptr);
 	status = 0;
  done:
 	unuse_packed_git(p);
@@ -999,7 +999,7 @@ void packed_object_info_detail(struct pa
 		if (p->pack_size <= offset + 20)
 			die("pack file %s records an incomplete delta base",
 			    p->pack_name);
-		memcpy(base_sha1, pack, 20);
+		hashcpy(base_sha1, pack);
 		do {
 			struct pack_entry base_ent;
 			unsigned long junk;
@@ -1219,7 +1219,7 @@ int nth_packed_object_sha1(const struct 
 	void *index = p->index_base + 256;
 	if (n < 0 || num_packed_objects(p) <= n)
 		return -1;
-	memcpy(sha1, (char *) index + (24 * n) + 4, 20);
+	hashcpy(sha1, (unsigned char *) index + (24 * n) + 4);
 	return 0;
 }
 
@@ -1236,7 +1236,7 @@ int find_pack_entry_one(const unsigned c
 		int cmp = hashcmp((unsigned char *)index + (24 * mi) + 4, sha1);
 		if (!cmp) {
 			e->offset = ntohl(*((unsigned int *) ((char *) index + (24 * mi))));
-			memcpy(e->sha1, sha1, 20);
+			hashcpy(e->sha1, sha1);
 			e->p = p;
 			return 1;
 		}
@@ -1349,7 +1349,7 @@ void *read_object_with_reference(const u
 	unsigned long isize;
 	unsigned char actual_sha1[20];
 
-	memcpy(actual_sha1, sha1, 20);
+	hashcpy(actual_sha1, sha1);
 	while (1) {
 		int ref_length = -1;
 		const char *ref_type = NULL;
@@ -1360,7 +1360,7 @@ void *read_object_with_reference(const u
 		if (!strcmp(type, required_type)) {
 			*size = isize;
 			if (actual_sha1_return)
-				memcpy(actual_sha1_return, actual_sha1, 20);
+				hashcpy(actual_sha1_return, actual_sha1);
 			return buffer;
 		}
 		/* Handle references */
@@ -1555,7 +1555,7 @@ int write_sha1_file(void *buf, unsigned 
 	 */
 	filename = write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
 	if (returnsha1)
-		memcpy(returnsha1, sha1, 20);
+		hashcpy(returnsha1, sha1);
 	if (has_sha1_file(sha1))
 		return 0;
 	fd = open(filename, O_RDONLY);
diff --git a/sha1_name.c b/sha1_name.c
index b6c198f..8a5809e 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -109,7 +109,7 @@ static int find_short_packed_object(int 
 				    !match_sha(len, match, next)) {
 					/* unique within this pack */
 					if (!found) {
-						memcpy(found_sha1, now, 20);
+						hashcpy(found_sha1, now);
 						found++;
 					}
 					else if (hashcmp(found_sha1, now)) {
@@ -126,7 +126,7 @@ static int find_short_packed_object(int 
 		}
 	}
 	if (found == 1)
-		memcpy(sha1, found_sha1, 20);
+		hashcpy(sha1, found_sha1);
 	return found;
 }
 
@@ -146,13 +146,13 @@ static int find_unique_short_object(int 
 	if (1 < has_unpacked || 1 < has_packed)
 		return SHORT_NAME_AMBIGUOUS;
 	if (has_unpacked != has_packed) {
-		memcpy(sha1, (has_packed ? packed_sha1 : unpacked_sha1), 20);
+		hashcpy(sha1, (has_packed ? packed_sha1 : unpacked_sha1));
 		return 0;
 	}
 	/* Both have unique ones -- do they match? */
 	if (hashcmp(packed_sha1, unpacked_sha1))
 		return SHORT_NAME_AMBIGUOUS;
-	memcpy(sha1, packed_sha1, 20);
+	hashcpy(sha1, packed_sha1);
 	return 0;
 }
 
@@ -326,13 +326,13 @@ static int get_parent(const char *name, 
 	if (parse_commit(commit))
 		return -1;
 	if (!idx) {
-		memcpy(result, commit->object.sha1, 20);
+		hashcpy(result, commit->object.sha1);
 		return 0;
 	}
 	p = commit->parents;
 	while (p) {
 		if (!--idx) {
-			memcpy(result, p->item->object.sha1, 20);
+			hashcpy(result, p->item->object.sha1);
 			return 0;
 		}
 		p = p->next;
@@ -353,9 +353,9 @@ static int get_nth_ancestor(const char *
 
 		if (!commit || parse_commit(commit) || !commit->parents)
 			return -1;
-		memcpy(sha1, commit->parents->item->object.sha1, 20);
+		hashcpy(sha1, commit->parents->item->object.sha1);
 	}
-	memcpy(result, sha1, 20);
+	hashcpy(result, sha1);
 	return 0;
 }
 
@@ -407,7 +407,7 @@ static int peel_onion(const char *name, 
 		o = deref_tag(o, name, sp - name - 2);
 		if (!o || (!o->parsed && !parse_object(o->sha1)))
 			return -1;
-		memcpy(sha1, o->sha1, 20);
+		hashcpy(sha1, o->sha1);
 	}
 	else {
 		/* At this point, the syntax look correct, so
@@ -419,7 +419,7 @@ static int peel_onion(const char *name, 
 			if (!o || (!o->parsed && !parse_object(o->sha1)))
 				return -1;
 			if (o->type == expected_type) {
-				memcpy(sha1, o->sha1, 20);
+				hashcpy(sha1, o->sha1);
 				return 0;
 			}
 			if (o->type == OBJ_TAG)
@@ -526,7 +526,7 @@ int get_sha1(const char *name, unsigned 
 			    memcmp(ce->name, cp, namelen))
 				break;
 			if (ce_stage(ce) == stage) {
-				memcpy(sha1, ce->sha1, 20);
+				hashcpy(sha1, ce->sha1);
 				return 0;
 			}
 			pos++;
diff --git a/tree-walk.c b/tree-walk.c
index 3f83e98..14cc5ae 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -179,7 +179,7 @@ static int find_tree_entry(struct tree_d
 		if (cmp < 0)
 			break;
 		if (entrylen == namelen) {
-			memcpy(result, sha1, 20);
+			hashcpy(result, sha1);
 			return 0;
 		}
 		if (name[entrylen] != '/')
@@ -187,7 +187,7 @@ static int find_tree_entry(struct tree_d
 		if (!S_ISDIR(*mode))
 			break;
 		if (++entrylen == namelen) {
-			memcpy(result, sha1, 20);
+			hashcpy(result, sha1);
 			return 0;
 		}
 		return get_tree_entry(sha1, name + entrylen, result, mode);
diff --git a/tree.c b/tree.c
index ef456be..ea386e5 100644
--- a/tree.c
+++ b/tree.c
@@ -25,7 +25,7 @@ static int read_one_entry(const unsigned
 	ce->ce_flags = create_ce_flags(baselen + len, stage);
 	memcpy(ce->name, base, baselen);
 	memcpy(ce->name + baselen, pathname, len+1);
-	memcpy(ce->sha1, sha1, 20);
+	hashcpy(ce->sha1, sha1);
 	return add_cache_entry(ce, ADD_CACHE_OK_TO_ADD|ADD_CACHE_SKIP_DFCHECK);
 }
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 467d994..3ac0289 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -200,7 +200,7 @@ #endif
 
 			any_files = 1;
 
-			memcpy(ce->sha1, posns[i]->sha1, 20);
+			hashcpy(ce->sha1, posns[i]->sha1);
 			src[i + o->merge] = ce;
 			subposns[i] = df_conflict_list;
 			posns[i] = posns[i]->next;
-- 
1.4.2.gfec68

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-23  6:49           ` Shawn Pearce
@ 2006-08-24  7:36             ` Junio C Hamano
  2006-08-24  8:08               ` Shawn Pearce
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2006-08-24  7:36 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git

Shawn Pearce <spearce@spearce.org> writes:

>> > What about all of those memcpy(a, b, 20)'s?  :-)
>> 
>> Surely.  If you are inclined to, go wild.
>
> Like this?  :-)

Except some minor nits, yes.

 * I would have preferred two patches, one for "master" and one
   for the C merge-recursive topic (or at least "next").

 * You missed a few in "master".

 * The cast in the second hunk in combine-diff.c was wrong;
   breakage was caught by our testsuite.

I've pushed out a fixed up result in "master" and "next".

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-24  7:36             ` Junio C Hamano
@ 2006-08-24  8:08               ` Shawn Pearce
  2006-08-24 10:34                 ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Shawn Pearce @ 2006-08-24  8:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <junkio@cox.net> wrote:
> Shawn Pearce <spearce@spearce.org> writes:
> 
> >> > What about all of those memcpy(a, b, 20)'s?  :-)
> >> 
> >> Surely.  If you are inclined to, go wild.
> >
> > Like this?  :-)
> 
> Except some minor nits, yes.
> 
>  * I would have preferred two patches, one for "master" and one
>    for the C merge-recursive topic (or at least "next").

Doh.  I didn't realize this was something you were interested in
pulling into master.  Otherwise I would have done this.  Next time
I'll try to keep that in mind.
 
>  * You missed a few in "master".

Not surprising since I typically work against and use next.
 
>  * The cast in the second hunk in combine-diff.c was wrong;
>    breakage was caught by our testsuite.

OK, that's just flat out stupid of me.  I apologize for making you
fix my mistakes.  :-)

> I've pushed out a fixed up result in "master" and "next".

Thanks.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] adding support for md5
  2006-08-24  8:08               ` Shawn Pearce
@ 2006-08-24 10:34                 ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2006-08-24 10:34 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git

Shawn Pearce <spearce@spearce.org> writes:

>> Except some minor nits, yes.
>> 
>>  * I would have preferred two patches, one for "master" and one
>>    for the C merge-recursive topic (or at least "next").
>
> Doh.  I didn't realize this was something you were interested in
> pulling into master.

Applying them directly to "master" does not have much to do with
this.

Of course, if something is obviously the right thing to do, then
I often apply them straight to "master" (or apply them to
"maint" and pull the result into "master").  In other cases, I
prefer to fork off a new series from the tip of the "master"
into a new topic branch.  Then merge that into either "pu" (if I
have suspicion that it is not ready for even testing yet) or
"next", and cook there for a while until it is ready.

While being cooked in "next", what happens is that changes to
that specific topic are applied to the tip of the topic branch,
and then pulled into "next", over and over.  Many topic branches
are cooked simultaneously that way.  So the development history
of "next" is, eh, messy.

I usually test "next", in other words, multiple topics cooking
together.  But when some changes are applied to "master" that
might interfere with an older but still not in "master", I pull
"master" into the topic and test that topic alone in isolation.

That way, when a topic matures, we can be reasonably sure that
it can be pulled into "master" without breaking things.

A single patch on top of "next" depends on all existing topics
that may or may not turn out to be useful.  That makes such a
patch less useful than otherwise be.

So if a series affects things in "master" and some other things
still not in "master", a preferred way, from my workflow point
of view, is to have at least two patches: one for "master" and
another for the rest.  Then what I would do is to fork one topic
off from "master" and apply the former, pull that into "next"
and cook that.  That part of the topic can graduate to "master"
without waiting for other topics in "next".

What happens to the rest is a bit more involved.  In the case of
the hashcpy() patch, one thing that only exists in "next" was
merge-recursive.c, but the story would be the same if "next" had
more places that used memcpy(a, b, 20) than "master" in a file
that are common in two branches.  Ideally, the remainder will be
broken into pieces and applied as a fixup on top of existing
topics (in this case, C merge-recursive topic) and then merged
into "next".  This can be either done by applying the remainder
directly on top of the affected topic, or forking a subtopic off
of the affected topic (the latter is useful if the new series
might turn out to be dud -- the original topic will not be
contaminated by bad changes and can graduate to "master" more
easily).

In any case, I've done a split myself and the parts that can be
applied to "master" is now sitting in gl/cleanup topic and the
remainder is sitting in gl/cleanup-next topic which was forked
off from C merge-recursive topic (you can tell where their tips
are by looking at "gitk next" output) and both are merged into
"next".

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2006-08-24 10:34 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-18  6:01 [RFC] adding support for md5 David Rientjes
2006-08-18  9:59 ` Nguyễn Thái Ngọc Duy
2006-08-18 10:21 ` Johannes Schindelin
2006-08-18 12:31   ` Petr Baudis
2006-08-18 20:35   ` David Rientjes
2006-08-18 10:52 ` Trekie
2006-08-18 10:56   ` Johannes Schindelin
2006-08-18 11:27     ` Trekie
2006-08-18 11:37       ` Johannes Schindelin
2006-08-18 21:52 ` Jon Smirl
2006-08-19  2:35   ` Johannes Schindelin
2006-08-19 20:50 ` Linus Torvalds
2006-08-21 20:44   ` Chris Wedgwood
2006-08-22  6:18     ` Junio C Hamano
2006-08-23  4:14       ` Shawn Pearce
2006-08-23  4:46         ` Junio C Hamano
2006-08-23  6:49           ` Shawn Pearce
2006-08-24  7:36             ` Junio C Hamano
2006-08-24  8:08               ` Shawn Pearce
2006-08-24 10:34                 ` Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2006-08-19  3:19 linux
2006-08-19 22:30 ` Petr Baudis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).