[IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
@ 2011-10-19 18:03 Ævar Arnfjörð Bjarmason
  2011-10-19 19:01 ` Jeff King
  2011-10-19 22:09 ` Jonathan Nieder
  0 siblings, 2 replies; 24+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2011-10-19 18:03 UTC (permalink / raw)
  To: Git Mailing List

You do not need to read this. Really, stop now.

This is quick hack I wrote just before leaving work to show that I
could indeed push patches to our main repository starting with
31337. Names hidden to protect the innocent.

With a quick call to getenv() i'll be easy to supply a prefix with an
environment variable, so you can have d00dbabe, deadbeef, b00b
etc. commits as time allows.

    < cow-orker-1> avar: what do I have to do to have all my
                            commit ids begin with 31337?
    < cow-orker-2> cow-orker-1: just figure out SHA1 collision
    < cow-orker-2> well with MD5 it is "easy" to have
                   same-prefix collisions, so you can
                   append junk that will be ignored or
                   have no effect, and have the same MD5
                   as the shorter version
    <@avar> It actually wouldn't be too hard to make all your commits
begin with 31337
    <@avar> I might patch my git client to do that
    <@avar> Basically the sha1 is a function of the commit object /
message / top-level tree
    <@avar> and any compliant git client ignores headers it doesn't know about
    <@avar> so you can just add a header "lulz %d"
    <@avar> where %d is a number you keep incrementing until your
resulting sha1 happens to begin with 31337
    <@avar> 31337 is a 5 character hex string, and 16^5/2 = 524288
    <@avar> so you'd on average have to generate half a million
commits before you'd get the desired results
    <@avar> which would slow down git somewhat at commit time, but not
much more than using svn :)
    < cow-orker-2> enjoy your 3MB commits
    < cow-orker-2> ah you can do that on the same header
    <@avar> yes
    < cow-orker-2> thanks, you just ruined my evening
    [...]
    <@avar> Also you don't have to waste your evening, I've implemented this
    < cow-orker-1> \o/
    < cow-orker-3> hurry up and make a commit, already
    < cow-orker-3> [img-mj-popcorn.gif]
    <@avar> victory is mine: 313375d995e6f8b7773c6ed1ee165e5a9e15690b !

This is how you use it:

    $ time ~/g/git/git-commit -F /tmp/commit-message
    Try 0/4000000 to get a 1337 commit =
33d86a5a13ce07914a38ead3a517e391df0cc8c2
    Try 100000/4000000 to get a 1337 commit =
058e7663f54a3dd85e2c5a88cebf221ff7c25889
    Try 200000/4000000 to get a 1337 commit =
78839302cf449d088db1df5eb81f9116cbce55d0
    Try 300000/4000000 to get a 1337 commit =
f0af0cbe91d0ba8903085e2f867246095e6fb957
    Try 400000/4000000 to get a 1337 commit =
2068605f90321558e97ae5a083f63475ae2075ea
    Try 500000/4000000 to get a 1337 commit =
f8b4e2acd14ab111fd7957956368d181596bc6ef
    Try 600000/4000000 to get a 1337 commit =
3a479ab83d969f9b5638481445506036d1e1db46
    Try 700000/4000000 to get a 1337 commit =
29b98585b44c02e59240c1d8a1956ae434b3543f
    Try 800000/4000000 to get a 1337 commit =
8749e31c40200554b3758313cada1a3f596b0230
    Try 900000/4000000 to get a 1337 commit =
3fb2e617db95db8c027fba686c3f75eaa1b7f880
    commit id = 313375d995e6f8b7773c6ed1ee165e5a9e15690b
    [trunk 313375d] <censored>
     <censored>

    real    1m15.389s
    user    0m45.969s
    sys     0m29.368s

Which in just over a minute will generate, in my case:

    $ git show --pretty=raw 313375d995e6f8b7773c6ed1ee165e5a9e15690b | head -n 7
    commit 313375d995e6f8b7773c6ed1ee165e5a9e15690b
    tree c9bebc99c05dfe61cccf02ebdf442945c8ff8b3c
    parent 0dce2d45a79d26a593f0e12301cdfeb7eb23c17a
    author Ævar Arnfjörð Bjarmason <avar@example.com> <censored> <censored>
    committer Ævar Arnfjörð Bjarmason <avar@example.com> <censored> <censored>
    lulz 697889

And this is the evil one-off code for this:

    diff --git a/commit.c b/commit.c
    index 73b7e00..71d1605 100644
    --- a/commit.c
    +++ b/commit.c
    @@ -853,2 +853,4 @@ int commit_tree(const char *msg, unsigned char *tree,
            int encoding_is_utf8;
    +       int try;
    +       int tries = 4000000;
            struct strbuf buffer;
    @@ -857,2 +859,5 @@ int commit_tree(const char *msg, unsigned char *tree,

    +       struct commit_list *parents_ptr = parents;
    +
    +       for (try = 0; try < tries; try++) {
            /* Not having i18n.commitencoding is the same as having utf-8 */
    @@ -872,3 +877,2 @@ int commit_tree(const char *msg, unsigned char *tree,
                            sha1_to_hex(parents->item->object.sha1));
    -               free(parents);
                    parents = next;
    @@ -876,2 +880,4 @@ int commit_tree(const char *msg, unsigned char *tree,

    +               parents = parents_ptr;
    +
            /* Person/date information */
    @@ -883,2 +889,4 @@ int commit_tree(const char *msg, unsigned char *tree,
                    strbuf_addf(&buffer, "encoding %s\n", git_commit_encoding);
    +
    +               strbuf_addf(&buffer, "lulz %d\n", try);
            strbuf_addch(&buffer, '\n');
    @@ -893,3 +901,21 @@ int commit_tree(const char *msg, unsigned char *tree,
            result = write_sha1_file(buffer.buf, buffer.len, commit_type, ret);
    +
    +               if (result) {
    +                       die("failed to write commit object");
    +               } else {
    +                       if (strncmp(sha1_to_hex(ret), "31337", 5) == 0) {
    +                               printf("commit id = %s\n",
sha1_to_hex(ret));
    +                               goto done;
    +                       } else {
    +                               if (try % 100000 == 0) {
    +                                       fprintf(stderr, "Try %d/%d
to get a 1337 commit = %s\n", try, tries, sha1_to_hex(ret));
    +                               }
    +                               unlink_or_warn(sha1_file_name(ret));
    +                       }
            strbuf_release(&buffer);
    +               }
    +       }
    +       done:
    +       strbuf_release(&buffer);
    +
            return result;

There's nothing new or novel about this. I just thought it would be
nice to share it. Someone asked me if having a "lulz" header wouldn't
break things, but since we introduced the "encoding" header a while
back clients have learned to ignore unknown headers, so it doesn't.

Which is why we can discuss e.g. adding GPG headers without worrying
about breaking everything.

I also think it's interesting that we seemingly don't have (in my
brief search, maybe I missed it) an API for writing a complete commit
object into a strbuf, so I had to write a lot of commit objects to
disk, and keep unlinking the unacceptable candidates so the repository
wouldn't balloon in size.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-19 18:03 [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits Ævar Arnfjörð Bjarmason
@ 2011-10-19 19:01 ` Jeff King
  2011-10-19 19:38   ` Jeff King
  2011-10-20  9:38   ` Mikael Magnusson
  2011-10-19 22:09 ` Jonathan Nieder
  1 sibling, 2 replies; 24+ messages in thread
From: Jeff King @ 2011-10-19 19:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

On Wed, Oct 19, 2011 at 08:03:24PM +0200, Ævar Arnfjörð Bjarmason wrote:

> This is quick hack I wrote just before leaving work to show that I
> could indeed push patches to our main repository starting with
> 31337. Names hidden to protect the innocent.

Clever and amusing.

> Which in just over a minute will generate, in my case:
> 
>     $ git show --pretty=raw 313375d995e6f8b7773c6ed1ee165e5a9e15690b | head -n 7
>     commit 313375d995e6f8b7773c6ed1ee165e5a9e15690b
>     tree c9bebc99c05dfe61cccf02ebdf442945c8ff8b3c
>     parent 0dce2d45a79d26a593f0e12301cdfeb7eb23c17a
>     author Ævar Arnfjörð Bjarmason <avar@example.com> <censored> <censored>
>     committer Ævar Arnfjörð Bjarmason <avar@example.com> <censored> <censored>
>     lulz 697889

Nice header name.

> I also think it's interesting that we seemingly don't have (in my
> brief search, maybe I missed it) an API for writing a complete commit
> object into a strbuf, so I had to write a lot of commit objects to
> disk, and keep unlinking the unacceptable candidates so the repository
> wouldn't balloon in size.

Calculating an object sha1 is pretty straightforward. I think you could
just tack the "commit <size>" header in front of the commit strbuf, and
sha1 that, and then just send the "winner" to actually be written.

Too bad it won't work in an append only way.  The internal state of sha1
after a certain set of bytes is deterministic, so you could do something
like:

  void leetify_object(struct strbuf *data)
  {
          SHA_CTX base, guess;
          int len = data->len;
          int i = 0;

          /* come up with a base internal state representing
           * the commit */
          SHA1_Init(&base);
          SHA1_Update(&base, data->buf, data->len);

          while (1) {
                  char append[64];
                  int n = snprintf(append, sizeof(append),
                                   "x-lulz-by: %d\n", i);
                  unsigned char sha1[20];

                  /* and then from that state, see what the sha1 would
                   * be if we add our few new bytes */
                  memcpy(&guess, &base, sizeof(guess));
                  SHA1_Update(&guess, append, n);
                  SHA1_Final(sha1, &guess);

                  /* did we pick a winner? */
                  if (sha1[0] == 0x31 &&
                      sha1[1] == 0x33 &&
                      &sha1[2] & 0xf0) == 0x70) {
                            strbuf_addstr(data, append);
                            return;
                  }

                  i++;
          }
  }

There are two problems with this:

  1. The x-lulz-by header is visible in the commit message. I think you
     can actually get around this by appending a NUL character, and "git
     log" and friends will stop processing the commit message there.

  2. The object header will actually have the length of the object at
     the beginning. So as your lulz number grows, the length of the
     object will change, and invalidate your earlier sha1. You could get
     around this by using a fixed-size guess (e.g., start with 20 bytes
     of zeroes, and then just keep incrementing).

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-19 19:01 ` Jeff King
@ 2011-10-19 19:38   ` Jeff King
  2011-10-20  2:51     ` Jeff King
  2011-10-20  4:31     ` Junio C Hamano
  2011-10-20  9:38   ` Mikael Magnusson
  1 sibling, 2 replies; 24+ messages in thread
From: Jeff King @ 2011-10-19 19:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

On Wed, Oct 19, 2011 at 03:01:14PM -0400, Jeff King wrote:

> Too bad it won't work in an append only way.  The internal state of sha1
> after a certain set of bytes is deterministic, so you could do something
> like:

OK, here's a patch which does that. It's way faster:

  $ git init
  $ time for i in `seq 1 10`; do
      echo $i >file
      git add file
      git commit -q -m $i
    done
  real    0m1.212s
  user    0m1.132s
  sys     0m0.028s

So that's about .12 seconds per commit. Without my patch, it's about .01
seconds. So you waste a tenth of a second generating the collision. Not
too bad.

And the result:

  $ git log --oneline
  31337a1 10
  313376b 9
  3133782 8
  31337cf 7
  313377a 6
  313374b 5
  31337b1 4
  31337a3 3
  3133703 2
  3133706 1

And nothing shows up in the body, because git truncates at the NUL we
added:

  $ git show
  commit 31337a1093af2d97eb2e6c08b261c2946395fdd3
  Author: Jeff King <peff@peff.net>
  Date:   Wed Oct 19 15:34:00 2011 -0400

      10

  diff --git a/file b/file
  index ec63514..f599e28 100644
  --- a/file
  +++ b/file
  @@ -1 +1 @@
  -9
  +10

It also parameterizes the desired sha1, so you could easily find hashes
ending in 31337, or any other pattern. Or add "git commit
--collide=31337".

---
diff --git a/commit.c b/commit.c
index 73b7e00..c478752 100644
--- a/commit.c
+++ b/commit.c
@@ -840,6 +840,57 @@ struct commit_list *reduce_heads(struct commit_list *heads)
 	return result;
 }
 
+static unsigned char elite_want[20] = { 0x31, 0x33, 0x70 };
+static unsigned char elite_mask[20] = { 0xff, 0xff, 0xf0 };
+
+static inline int sha1_match_mask(unsigned char *sha1,
+				  unsigned char *want,
+				  unsigned char *mask)
+{
+	int i;
+	for (i = 0; i < 20; i++)
+		if ((want[i] & mask[i]) != (sha1[i] & mask[i]))
+		    return 0;
+	return 1;
+}
+
+static void collide_commit(struct strbuf *data,
+			   unsigned char want[20],
+			   unsigned char mask[20])
+{
+	static const char terminator[] = { 0 };
+	char header[32];
+	int header_len;
+	unsigned int lulz;
+	SHA_CTX base;
+
+	header_len = snprintf(header, sizeof(header),
+			      "commit %lu",
+			      data->len + 1 + sizeof(lulz)) + 1;
+	SHA1_Init(&base);
+	SHA1_Update(&base, header, header_len);
+	SHA1_Update(&base, data->buf, data->len);
+	SHA1_Update(&base, terminator, sizeof(terminator));
+
+	lulz = 0;
+	do {
+		SHA_CTX guess;
+		unsigned char sha1[20];
+
+		memcpy(&guess, &base, sizeof(guess));
+		SHA1_Update(&guess, &lulz, sizeof(lulz));
+		SHA1_Final(sha1, &guess);
+
+		if (sha1_match_mask(sha1, want, mask)) {
+			strbuf_add(data, terminator, sizeof(terminator));
+			strbuf_add(data, &lulz, sizeof(lulz));
+			return;
+		}
+
+		lulz++;
+	} while (1);
+}
+
 static const char commit_utf8_warn[] =
 "Warning: commit message does not conform to UTF-8.\n"
 "You may want to amend it after fixing the message, or set the config\n"
@@ -890,6 +941,8 @@ int commit_tree(const char *msg, unsigned char *tree,
 	if (encoding_is_utf8 && !is_utf8(buffer.buf))
 		fprintf(stderr, commit_utf8_warn);
 
+	collide_commit(&buffer, elite_want, elite_mask);
+
 	result = write_sha1_file(buffer.buf, buffer.len, commit_type, ret);
 	strbuf_release(&buffer);
 	return result;

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-19 18:03 [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits Ævar Arnfjörð Bjarmason
  2011-10-19 19:01 ` Jeff King
@ 2011-10-19 22:09 ` Jonathan Nieder
  1 sibling, 0 replies; 24+ messages in thread
From: Jonathan Nieder @ 2011-10-19 22:09 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

Ævar Arnfjörð Bjarmason wrote:

> Someone asked me if having a "lulz" header wouldn't
> break things, but since we introduced the "encoding" header a while
> back clients have learned to ignore unknown headers, so it doesn't.
>
> Which is why we can discuss e.g. adding GPG headers without worrying
> about breaking everything.

Just for the record (and I think you understood this already): it is
not quite so simple.  Clients are allowed to assume that no "unknown"
header lines will appear before any of the known fields.  A future git
version could even assume that, say, a

	length-of-remainder-of-header

field appears before any lines after encoding and the GPG stuff.

So it would definitely be frowned upon to add new lines to that part
of the commit object without at least discussing it on this list.
Jeff mentioned one way to get your lulz without violating that
constraint. ;-)

http://thread.gmane.org/gmane.comp.version-control.git/138848/focus=138921

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-19 19:38   ` Jeff King
@ 2011-10-20  2:51     ` Jeff King
  2011-10-20  4:15       ` Kyle Moffett
  2011-10-24 20:47       ` Jeff King
  2011-10-20  4:31     ` Junio C Hamano
  1 sibling, 2 replies; 24+ messages in thread
From: Jeff King @ 2011-10-20  2:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

On Wed, Oct 19, 2011 at 03:38:34PM -0400, Jeff King wrote:

> It also parameterizes the desired sha1, so you could easily find hashes
> ending in 31337, or any other pattern. Or add "git commit
> --collide=31337".

I couldn't resist:

  $ git commit -q -m foo --collide=deadbeef &&
    git rev-list -1 HEAD
  deadbeefdbd6e62a2185606a4fad653e22509b56

You can also do:

  $ SHA1=0000000000000000000000000000000000031337
  $ MASK=00000000000000000000000000000000000fffff
  $ git commit -q -m foo --collide=$SHA1/$MASK &&
    git rev-list -1 HEAD
  ea49af84db92ed0d2bc3ed13810f5990e7c31337

Keep in mind that each hex character you add increases the search space
by a factor of 16. deadbeef took about 70 seconds to find on my machine.
I'm tempted to look for "3133700..0031337", but it would probably
take about 4 hours.

Patch is below.

-Peff

---
diff --git a/builtin/commit.c b/builtin/commit.c
index c46f2d1..734a7ab 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -105,6 +105,10 @@
 static const char *only_include_assumed;
 static struct strbuf message;
 
+static int collide;
+static unsigned char collide_sha1[20];
+static unsigned char collide_mask[20];
+
 static int null_termination;
 static enum {
 	STATUS_FORMAT_LONG,
@@ -125,6 +129,52 @@ static int opt_parse_m(const struct option *opt, const char *arg, int unset)
 	return 0;
 }
 
+static int parse_partial_sha1(const char *s, unsigned char sha1[20])
+{
+	unsigned int i;
+
+	hashclr(sha1);
+
+	for (i = 0; i < 40 && s[i]; i++) {
+		unsigned int v = hexval(s[i]);
+		if (v & ~0xf)
+			break;
+		if (!(i & 1))
+			v <<= 4;
+		sha1[i/2] |= v;
+	}
+	return i;
+}
+
+static void fill_sha1_mask(int n, unsigned char mask[20]) {
+	int i;
+
+	hashclr(mask);
+	for (i = 0; i < n/2; i++)
+		mask[i] = 0xff;
+	if (n & 1)
+		mask[i] = 0xf0;
+}
+
+static int opt_parse_collide(const struct option *opt, const char *arg,
+			     int unset)
+{
+	if (unset)
+		collide = 0;
+	else {
+		int n = parse_partial_sha1(arg, collide_sha1);
+		if (!arg[n])
+			fill_sha1_mask(n, collide_mask);
+		else if (arg[n] == '/')
+			parse_partial_sha1(arg + n + 1, collide_mask);
+		else
+			die("invalid --collide sha1: %s", arg);
+		collide = 1;
+	}
+	return 0;
+}
+
+
 static struct option builtin_commit_options[] = {
 	OPT__QUIET(&quiet, "suppress summary after successful commit"),
 	OPT__VERBOSE(&verbose, "show diff in commit message template"),
@@ -144,6 +194,7 @@ static int opt_parse_m(const struct option *opt, const char *arg, int unset)
 	OPT_BOOLEAN('e', "edit", &edit_flag, "force edit of commit"),
 	OPT_STRING(0, "cleanup", &cleanup_arg, "default", "how to strip spaces and #comments from message"),
 	OPT_BOOLEAN(0, "status", &include_status, "include status in commit message template"),
+	OPT_CALLBACK(0, "collide", NULL, "sha1[/mask]", "choose commit sha1 like <sha1>", opt_parse_collide),
 	/* end commit message options */
 
 	OPT_GROUP("Commit contents options"),
@@ -1483,8 +1534,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		exit(1);
 	}
 
-	if (commit_tree(sb.buf, active_cache_tree->sha1, parents, sha1,
-			author_ident.buf)) {
+	if (commit_tree_collide(sb.buf, active_cache_tree->sha1, parents, sha1,
+				author_ident.buf,
+				collide ? collide_sha1 : NULL, collide_mask)) {
 		rollback_index_files();
 		die(_("failed to write commit object"));
 	}
diff --git a/commit.c b/commit.c
index 73b7e00..24ddccd 100644
--- a/commit.c
+++ b/commit.c
@@ -840,6 +840,54 @@ struct commit_list *reduce_heads(struct commit_list *heads)
 	return result;
 }
 
+static inline int sha1_match_mask(const unsigned char *sha1,
+				  const unsigned char *want,
+				  const unsigned char *mask)
+{
+	int i;
+	for (i = 0; i < 20; i++)
+		if ((want[i] & mask[i]) != (sha1[i] & mask[i]))
+		    return 0;
+	return 1;
+}
+
+static void collide_commit(struct strbuf *data,
+			   const unsigned char *want,
+			   const unsigned char *mask)
+{
+	static const char terminator[] = { 0 };
+	char header[32];
+	int header_len;
+	unsigned int lulz;
+	SHA_CTX base;
+
+	header_len = snprintf(header, sizeof(header),
+			      "commit %lu",
+			      data->len + 1 + sizeof(lulz)) + 1;
+	SHA1_Init(&base);
+	SHA1_Update(&base, header, header_len);
+	SHA1_Update(&base, data->buf, data->len);
+	SHA1_Update(&base, terminator, sizeof(terminator));
+
+	lulz = 0;
+	do {
+		SHA_CTX guess;
+		unsigned char sha1[20];
+
+		memcpy(&guess, &base, sizeof(guess));
+		SHA1_Update(&guess, &lulz, sizeof(lulz));
+		SHA1_Final(sha1, &guess);
+
+		if (sha1_match_mask(sha1, want, mask)) {
+			strbuf_add(data, terminator, sizeof(terminator));
+			strbuf_add(data, &lulz, sizeof(lulz));
+			return;
+		}
+
+		lulz++;
+	} while (1);
+}
+
 static const char commit_utf8_warn[] =
 "Warning: commit message does not conform to UTF-8.\n"
 "You may want to amend it after fixing the message, or set the config\n"
@@ -849,6 +897,15 @@ int commit_tree(const char *msg, unsigned char *tree,
 		struct commit_list *parents, unsigned char *ret,
 		const char *author)
 {
+	return commit_tree_collide(msg, tree, parents, ret, author,
+				   NULL, NULL);
+}
+
+int commit_tree_collide(const char *msg, unsigned char *tree,
+			struct commit_list *parents, unsigned char *ret,
+			const char *author, const unsigned char *want,
+			const unsigned char *mask)
+{
 	int result;
 	int encoding_is_utf8;
 	struct strbuf buffer;
@@ -890,6 +947,9 @@ int commit_tree(const char *msg, unsigned char *tree,
 	if (encoding_is_utf8 && !is_utf8(buffer.buf))
 		fprintf(stderr, commit_utf8_warn);
 
+	if (want && mask)
+		collide_commit(&buffer, want, mask);
+
 	result = write_sha1_file(buffer.buf, buffer.len, commit_type, ret);
 	strbuf_release(&buffer);
 	return result;
diff --git a/commit.h b/commit.h
index 009b113..337dcbd 100644
--- a/commit.h
+++ b/commit.h
@@ -184,5 +184,9 @@ static inline int single_parent(struct commit *commit)
 extern int commit_tree(const char *msg, unsigned char *tree,
 		struct commit_list *parents, unsigned char *ret,
 		const char *author);
+extern int commit_tree_collide(const char *msg, unsigned char *tree,
+			       struct commit_list *parents, unsigned char *ret,
+			       const char *author, const unsigned char *sha1,
+			       const unsigned char *mask);
 
 #endif /* COMMIT_H */

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-20  2:51     ` Jeff King
@ 2011-10-20  4:15       ` Kyle Moffett
  2011-10-20  4:25         ` Jeff King
  2011-10-20  4:27         ` Junio C Hamano
  2011-10-24 20:47       ` Jeff King
  1 sibling, 2 replies; 24+ messages in thread
From: Kyle Moffett @ 2011-10-20  4:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Ævar Arnfjörð, Git Mailing List

On Wed, Oct 19, 2011 at 22:51, Jeff King <peff@peff.net> wrote:
> Keep in mind that each hex character you add increases the search space
> by a factor of 16. deadbeef took about 70 seconds to find on my machine.
> I'm tempted to look for "3133700..0031337", but it would probably
> take about 4 hours.

Heh, there's one other practical downside I can think of...

If you create a bunch of commits with the same 8-hex-character prefix
then suddenly the "git describe" logic for using the first 7 commit ID
characters gets a whole lot less useful.

Cheers,
Kyle Moffett

-- 
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-20  4:15       ` Kyle Moffett
@ 2011-10-20  4:25         ` Jeff King
  2011-10-20  4:27         ` Junio C Hamano
  1 sibling, 0 replies; 24+ messages in thread
From: Jeff King @ 2011-10-20  4:25 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Ævar Arnfjörð, Git Mailing List

On Thu, Oct 20, 2011 at 12:15:03AM -0400, Kyle Moffett wrote:

> On Wed, Oct 19, 2011 at 22:51, Jeff King <peff@peff.net> wrote:
> > Keep in mind that each hex character you add increases the search space
> > by a factor of 16. deadbeef took about 70 seconds to find on my machine.
> > I'm tempted to look for "3133700..0031337", but it would probably
> > take about 4 hours.
> 
> Heh, there's one other practical downside I can think of...
> 
> If you create a bunch of commits with the same 8-hex-character prefix
> then suddenly the "git describe" logic for using the first 7 commit ID
> characters gets a whole lot less useful.

Actually, git will generally find a unique abbreviation among all of
your objects when using abbreviated sha1s, so you'll just get longer
abbreviations. Of course, it is only unique at the time of generation,
so new objects may make it ambiguous. Which is why the default minimum
is 7, not 1.  But of course with this trick you've effectively removed
all of the entropy from those initial 7 characters, and you effectively
only have 1 or 2 non-uniform characters.

So yeah, it is worse.  But really...spending millions of CPU cycles to
get a preimage collision with the partial sha1, and hack-ishly embedding
random crap after a NUL in every commit message, and _this_ is your
complaint? ;)

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-20  4:15       ` Kyle Moffett
  2011-10-20  4:25         ` Jeff King
@ 2011-10-20  4:27         ` Junio C Hamano
  2011-10-20  4:32           ` Kyle Moffett
  1 sibling, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2011-10-20  4:27 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Jeff King, Ævar Arnfjörð, Git Mailing List

Kyle Moffett <kyle@moffetthome.net> writes:

> Heh, there's one other practical downside I can think of...
>
> If you create a bunch of commits with the same 8-hex-character prefix
> then suddenly the "git describe" logic for using the first 7 commit ID
> characters gets a whole lot less useful.

In the sense that you need to cut and paste a lot more characters, you are
correct that it would make it less useful, but if you are talking about
uniqueness, you are mistaken.

The rule is not "using the first 7 hexdigits", but is "using as many
hexdigits to make assure uniqueness, but use at least 7".

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-19 19:38   ` Jeff King
  2011-10-20  2:51     ` Jeff King
@ 2011-10-20  4:31     ` Junio C Hamano
  2011-10-20  4:34       ` Jeff King
  2011-10-20  9:14       ` Nguyen Thai Ngoc Duy
  1 sibling, 2 replies; 24+ messages in thread
From: Junio C Hamano @ 2011-10-20  4:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

Jeff King <peff@peff.net> writes:

> And nothing shows up in the body, because git truncates at the NUL we
> added:
>
>   $ git show
>   commit 31337a1093af2d97eb2e6c08b261c2946395fdd3
>   Author: Jeff King <peff@peff.net>
>   Date:   Wed Oct 19 15:34:00 2011 -0400
>
>       10
>
>   diff --git a/file b/file

But you cannot hide from "cat-file commit" ;-)

With the recent push to more (perceived) security, it may probably make
sense to teach "log" family commands to quote-show ^@ and what is behind
in their output by default, perhaps with an option to turn it off.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-20  4:27         ` Junio C Hamano
@ 2011-10-20  4:32           ` Kyle Moffett
  0 siblings, 0 replies; 24+ messages in thread
From: Kyle Moffett @ 2011-10-20  4:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Ævar Arnfjörð, Git Mailing List

On Thu, Oct 20, 2011 at 00:27, Junio C Hamano <gitster@pobox.com> wrote:
> Kyle Moffett <kyle@moffetthome.net> writes:
>
>> Heh, there's one other practical downside I can think of...
>>
>> If you create a bunch of commits with the same 8-hex-character prefix
>> then suddenly the "git describe" logic for using the first 7 commit ID
>> characters gets a whole lot less useful.
>
> In the sense that you need to cut and paste a lot more characters, you are
> correct that it would make it less useful, but if you are talking about
> uniqueness, you are mistaken.
>
> The rule is not "using the first 7 hexdigits", but is "using as many
> hexdigits to make assure uniqueness, but use at least 7".

Well, yes, but if you generate some 10 commits with the same
7-character prefix, run "git describe", and then generate another
several with the same prefix... Chances are the previously-unique
output of "git describe" is no longer unique.

Also, even if you do use enough characters for it to be reliably
unique, it's not really an abbreviation if your commit description is
"v2.0.3-42-gdeadbeef04a69f", with that many characters you might as
well just paste the whole SHA1 sum.

With all that said, it is a very clever (and ugly) hack :-D.  Kudos!

Cheers,
Kyle Moffett

-- 
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-20  4:31     ` Junio C Hamano
@ 2011-10-20  4:34       ` Jeff King
  2011-10-20  6:57         ` Junio C Hamano
  2011-10-20  9:14       ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 24+ messages in thread
From: Jeff King @ 2011-10-20  4:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

On Wed, Oct 19, 2011 at 09:31:16PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > And nothing shows up in the body, because git truncates at the NUL we
> > added:
> >
> >   $ git show
> >   commit 31337a1093af2d97eb2e6c08b261c2946395fdd3
> >   Author: Jeff King <peff@peff.net>
> >   Date:   Wed Oct 19 15:34:00 2011 -0400
> >
> >       10
> >
> >   diff --git a/file b/file
> 
> But you cannot hide from "cat-file commit" ;-)

Yes. The implementation is a horrible hack, second only in grossness to
the original idea. :)

> With the recent push to more (perceived) security, it may probably make
> sense to teach "log" family commands to quote-show ^@ and what is behind
> in their output by default, perhaps with an option to turn it off.

Agreed. Having hidden cruft makes birthday collision attacks easier (or
it will, if sha1 ever gets broken to that point).  Unfortunately, there
is a _ton_ of code which assumes that commit messages are
NUL-terminated, as they always have been since e871b64 (2005-05-25).

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-20  4:34       ` Jeff King
@ 2011-10-20  6:57         ` Junio C Hamano
  2011-10-20  7:13           ` Jeff King
  2011-10-20  7:27           ` Nguyen Thai Ngoc Duy
  0 siblings, 2 replies; 24+ messages in thread
From: Junio C Hamano @ 2011-10-20  6:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

Jeff King <peff@peff.net> writes:

> Agreed. Having hidden cruft makes birthday collision attacks easier (or
> it will, if sha1 ever gets broken to that point).  Unfortunately, there
> is a _ton_ of code which assumes that commit messages are
> NUL-terminated, as they always have been since e871b64 (2005-05-25).

I think that commit is irrelevant, as long as read_sha1_file() returns the
contents as <ptr,len> pair, which has been the case forever. It's just the
matter of propagating the length back up the callchain.

A naïve implementation to add "len" member to struct commit would increase
the size of the in-core commit object by sizeof(unsigned long), which we
may want to avoid. Traversals that care nothing but the topology of the
history would have to waste that memory and these things tend to add up
(8-byte ulong * 250k commits = 2MB).

Perhaps change the type of "buf" member in struct commit to a pointer to a
<ptr,len> pair, or something? Or perhaps a few megabytes wasted between
friends we do not care much about?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-20  6:57         ` Junio C Hamano
@ 2011-10-20  7:13           ` Jeff King
  2011-10-20 13:14             ` Ted Ts'o
  2011-10-20 18:36             ` Re* " Junio C Hamano
  2011-10-20  7:27           ` Nguyen Thai Ngoc Duy
  1 sibling, 2 replies; 24+ messages in thread
From: Jeff King @ 2011-10-20  7:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

On Wed, Oct 19, 2011 at 11:57:17PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Agreed. Having hidden cruft makes birthday collision attacks easier (or
> > it will, if sha1 ever gets broken to that point).  Unfortunately, there
> > is a _ton_ of code which assumes that commit messages are
> > NUL-terminated, as they always have been since e871b64 (2005-05-25).
> 
> I think that commit is irrelevant, as long as read_sha1_file() returns the
> contents as <ptr,len> pair, which has been the case forever. It's just the
> matter of propagating the length back up the callchain.

It's not that the commit is bad or the source of problems. My point is
that the assumption that commit messages are NUL-terminated has been
there for a really long time, so there are lots of spots in the code
that sloppily run string functions on them. Every one of those needs to
be found and fixed (e.g., I remember seeing this in
for-each-ref.c:find_subpos recently).

It's not impossible, of course, or even really that hard. It's just a
giant pain, and I wonder if the effort is worth it.

> A naïve implementation to add "len" member to struct commit would increase
> the size of the in-core commit object by sizeof(unsigned long), which we
> may want to avoid. Traversals that care nothing but the topology of the
> history would have to waste that memory and these things tend to add up
> (8-byte ulong * 250k commits = 2MB).
> 
> Perhaps change the type of "buf" member in struct commit to a pointer to a
> <ptr,len> pair, or something? Or perhaps a few megabytes wasted between
> friends we do not care much about?

I think you'd have to convert the struct (even if not every piece of
code is converted to use it) and profile.

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-20  6:57         ` Junio C Hamano
  2011-10-20  7:13           ` Jeff King
@ 2011-10-20  7:27           ` Nguyen Thai Ngoc Duy
  1 sibling, 0 replies; 24+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-10-20  7:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Ævar Arnfjörð, Git Mailing List

On Thu, Oct 20, 2011 at 5:57 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jeff King <peff@peff.net> writes:
>
>> Agreed. Having hidden cruft makes birthday collision attacks easier (or
>> it will, if sha1 ever gets broken to that point).  Unfortunately, there
>> is a _ton_ of code which assumes that commit messages are
>> NUL-terminated, as they always have been since e871b64 (2005-05-25).
>
> I think that commit is irrelevant, as long as read_sha1_file() returns the
> contents as <ptr,len> pair, which has been the case forever. It's just the
> matter of propagating the length back up the callchain.
>
> A naïve implementation to add "len" member to struct commit would increase
> the size of the in-core commit object by sizeof(unsigned long), which we
> may want to avoid. Traversals that care nothing but the topology of the
> history would have to waste that memory and these things tend to add up
> (8-byte ulong * 250k commits = 2MB).

Or write commit_length(struct commit *c) that calculates SHA-1 from
c->buf and checks against c->object.sha1. If it does not match,
consider NUL part of the message and move to the next NUL. This
function would be expensive so we may not call frequently though.

Or store length in ((int*)c->buf)[-1].
-- 
Duy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-20  4:31     ` Junio C Hamano
  2011-10-20  4:34       ` Jeff King
@ 2011-10-20  9:14       ` Nguyen Thai Ngoc Duy
  2011-10-20 15:44         ` Jeff King
  1 sibling, 1 reply; 24+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-10-20  9:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Ævar Arnfjörð, Git Mailing List

On Thu, Oct 20, 2011 at 3:31 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jeff King <peff@peff.net> writes:
>
>> And nothing shows up in the body, because git truncates at the NUL we
>> added:
>>
>>   $ git show
>>   commit 31337a1093af2d97eb2e6c08b261c2946395fdd3
>>   Author: Jeff King <peff@peff.net>
>>   Date:   Wed Oct 19 15:34:00 2011 -0400
>>
>>       10
>>
>>   diff --git a/file b/file
>
> But you cannot hide from "cat-file commit" ;-)
>
> With the recent push to more (perceived) security, it may probably make
> sense to teach "log" family commands to quote-show ^@ and what is behind
> in their output by default, perhaps with an option to turn it off.

What about NUL in file name in tree objects? Suppose the original tree
has an entry named "goodthing". With luck, they might be able to
create a new tree object with the entry renamed to "evil\x001234" that
has the same SHA-1. Could that possibly cause any problems?
-- 
Duy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-19 19:01 ` Jeff King
  2011-10-19 19:38   ` Jeff King
@ 2011-10-20  9:38   ` Mikael Magnusson
  2011-10-20 13:44     ` Elijah Newren
  1 sibling, 1 reply; 24+ messages in thread
From: Mikael Magnusson @ 2011-10-20  9:38 UTC (permalink / raw)
  To: Jeff King; +Cc: Ævar Arnfjörð, Git Mailing List

On 19 October 2011 21:01, Jeff King <peff@peff.net> wrote:
> On Wed, Oct 19, 2011 at 08:03:24PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> This is quick hack I wrote just before leaving work to show that I
>> could indeed push patches to our main repository starting with
>> 31337. Names hidden to protect the innocent.
>
> Clever and amusing.
>
>> Which in just over a minute will generate, in my case:
>>
>>     $ git show --pretty=raw 313375d995e6f8b7773c6ed1ee165e5a9e15690b | head -n 7
>>     commit 313375d995e6f8b7773c6ed1ee165e5a9e15690b
>>     tree c9bebc99c05dfe61cccf02ebdf442945c8ff8b3c
>>     parent 0dce2d45a79d26a593f0e12301cdfeb7eb23c17a
>>     author Ævar Arnfjörð Bjarmason <avar@example.com> <censored> <censored>
>>     committer Ævar Arnfjörð Bjarmason <avar@example.com> <censored> <censored>
>>     lulz 697889
>
> Nice header name.

If you don't mind waiting, you could just increase the timestamps
until you get the desired collision. (If you still want them to be
correct, trying 4000000 times would about 6 weeks though :).

-- 
Mikael Magnusson

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-20  7:13           ` Jeff King
@ 2011-10-20 13:14             ` Ted Ts'o
  2011-10-20 15:56               ` Jeff King
  2011-10-20 18:36             ` Re* " Junio C Hamano
  1 sibling, 1 reply; 24+ messages in thread
From: Ted Ts'o @ 2011-10-20 13:14 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Thu, Oct 20, 2011 at 03:13:56AM -0400, Jeff King wrote:
> It's not that the commit is bad or the source of problems. My point is
> that the assumption that commit messages are NUL-terminated has been
> there for a really long time, so there are lots of spots in the code
> that sloppily run string functions on them. Every one of those needs to
> be found and fixed (e.g., I remember seeing this in
> for-each-ref.c:find_subpos recently).

Another possibility is to warn if the commit messages are not NULL
terminated.  Note though that if we're really worried about a bad guy
trying to attack us with a hash collision, he/she could always use
"invisible" non-printing characters in the commit message, and/or just
mess with one or both of the timestamps.  The more bits and more
degrees of flexibility the attacker has, the easier it would be, of
course.  In the grand scheme of things it's not clear to me how big of
a deal this would be.

If people were really concerned it would probably be easier to use
backup crypto checksum using something stronger (say, SHA-2 or the
eventual SHA-3).  Just store the backup checksums of the parent
commitments in some backwardly compatible place that old git
implementations wouldn't look (say, after the NULL in the commit
message if there isn't a better place :-), and new implementations
would know to generate the checksums, and old implementations would
ignore it.

That way, if anyone *does* figure out a way to generate real hash
collisions with SHA1 of objects that look almost completely identical
to the original objects, new implementations that would gradually make
their way out to the field could verify the SHA-2 (or SHA-3, when it
is announced, assuming that we the tag the checksums with a type
identifier) checksums and notice that they are not correct.

Maybe someone's already thought of this, but the cool thing about this
idea is it's a way that we can upgrade to a stronger hash algorithm
without needing a flag day due to some kind of incompatible format
change; we keep using SHA1 for the user-visible hash, since it's fine
modulo intentional attacks, and then use a hidden, backup checksum
which can be checked either all the time, or if that turns out to be a
computational burden, at some configurable random percent of the time.

Anyway, just a thought....

	      	      	     		   	  - Ted

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-20  9:38   ` Mikael Magnusson
@ 2011-10-20 13:44     ` Elijah Newren
  0 siblings, 0 replies; 24+ messages in thread
From: Elijah Newren @ 2011-10-20 13:44 UTC (permalink / raw)
  To: Mikael Magnusson
  Cc: Jeff King, Ævar Arnfjörð, Git Mailing List

On Thu, Oct 20, 2011 at 3:38 AM, Mikael Magnusson <mikachu@gmail.com> wrote:
>> On Wed, Oct 19, 2011 at 08:03:24PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>>> This is quick hack I wrote just before leaving work to show that I
>>> could indeed push patches to our main repository starting with
>>> 31337. Names hidden to protect the innocent.
>>
> If you don't mind waiting, you could just increase the timestamps
> until you get the desired collision. (If you still want them to be
> correct, trying 4000000 times would about 6 weeks though :).

But the nice thing is that we have both author and committer dates to
twiddle with, meaning that if we need 4000000 different values to try
then it's only 2000 for each of those two dates, i.e. we only need to
be willing to let those dates float by about half an hour.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-20  9:14       ` Nguyen Thai Ngoc Duy
@ 2011-10-20 15:44         ` Jeff King
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff King @ 2011-10-20 15:44 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy
  Cc: Junio C Hamano, Ævar Arnfjörð, Git Mailing List

On Thu, Oct 20, 2011 at 08:14:56PM +1100, Nguyen Thai Ngoc Duy wrote:

> > But you cannot hide from "cat-file commit" ;-)
> >
> > With the recent push to more (perceived) security, it may probably make
> > sense to teach "log" family commands to quote-show ^@ and what is behind
> > in their output by default, perhaps with an option to turn it off.
> 
> What about NUL in file name in tree objects? Suppose the original tree
> has an entry named "goodthing". With luck, they might be able to
> create a new tree object with the entry renamed to "evil\x001234" that
> has the same SHA-1. Could that possibly cause any problems?

NUL is already meaningful in a tree object; it is the end of the
filename. So after the NUL, we will consider the next 20 bytes to be
sha1, and then after that, the mode of the next file entry.

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-20 13:14             ` Ted Ts'o
@ 2011-10-20 15:56               ` Jeff King
  2011-10-25 22:35                 ` Drew Northup
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff King @ 2011-10-20 15:56 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Thu, Oct 20, 2011 at 09:14:55AM -0400, Ted Ts'o wrote:

> Another possibility is to warn if the commit messages are not NULL
> terminated.

A minor nit, but it's not whether they are terminated with NUL, but
rather whether they have embedded NUL. But yeah, this could maybe just
be something fsck looks for.

> Note though that if we're really worried about a bad guy trying to
> attack us with a hash collision, he/she could always use "invisible"
> non-printing characters in the commit message, and/or just mess with
> one or both of the timestamps.  The more bits and more degrees of
> flexibility the attacker has, the easier it would be, of course.  In
> the grand scheme of things it's not clear to me how big of a deal this
> would be.

Good point. Append-only attacks are cheaper, because you can avoid doing
most of the hash computation on each iteration (like my patch does). But
that's not a big-O speedup, it just makes the constant smaller.  So you
could assume that any feasible appending attack would probably become
feasible for recomputing the full hash eventually.

> If people were really concerned it would probably be easier to use
> backup crypto checksum using something stronger (say, SHA-2 or the
> eventual SHA-3).  Just store the backup checksums of the parent
> commitments in some backwardly compatible place that old git
> implementations wouldn't look (say, after the NULL in the commit
> message if there isn't a better place :-), and new implementations
> would know to generate the checksums, and old implementations would
> ignore it.

Yeah, if birthday attacks against sha1 become possible, the sensible
thing is probably not to worry too much about the file format, but to
use a better hash.

Commits can hide extra hashes in a header pretty easily. But what about
trees and blobs? I don't think there's any "ignored" space in either
one.

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re* [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-20  7:13           ` Jeff King
  2011-10-20 13:14             ` Ted Ts'o
@ 2011-10-20 18:36             ` Junio C Hamano
  2011-10-20 19:00               ` Jeff King
  1 sibling, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2011-10-20 18:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

Jeff King <peff@peff.net> writes:

> It's not that the commit is bad or the source of problems. My point is
> that the assumption that commit messages are NUL-terminated has been
> there for a really long time, so there are lots of spots in the code
> that sloppily run string functions on them. Every one of those needs to
> be found and fixed (e.g., I remember seeing this in
> for-each-ref.c:find_subpos recently).
>
> It's not impossible, of course, or even really that hard. It's just a
> giant pain, and I wonder if the effort is worth it.

True.

It probably is not worth it for most applications, but this fix-up to a
fairly recent one is worth doing, I would suspect.

-- >8 --
Subject: parse_signed_commit: really use the entire commit log message

... even beyond the first NUL in the buffer, when checking the commit
against the detached signature in the header.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 commit.c                 |   11 +++++------
 t/t7510-signed-commit.sh |   21 ++++++++++++++++-----
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/commit.c b/commit.c
index 93045a2..6ec49fa 100644
--- a/commit.c
+++ b/commit.c
@@ -854,28 +854,27 @@ int parse_signed_commit(const unsigned char *sha1,
 	unsigned long size;
 	enum object_type type;
 	char *buffer = read_sha1_file(sha1, &type, &size);
-	int in_header, saw_signature = -1;
+	int saw_signature = -1;
 	char *line;
 
 	if (!buffer || type != OBJ_COMMIT)
 		goto cleanup;
 
 	line = buffer;
-	in_header = 1;
 	saw_signature = 0;
-	while (*line) {
+	while (line < buffer + size) {
 		char *next = strchrnul(line, '\n');
 		if (*next)
 			next++;
-		if (in_header && !prefixcmp(line, gpg_sig_header)) {
+		if (!prefixcmp(line, gpg_sig_header)) {
 			const char *sig = line + gpg_sig_header_len;
 			strbuf_add(signature, sig, next - sig);
 			saw_signature = 1;
 		} else {
+			if (*line == '\n')
+				next = buffer + size; /* dump the whole remainder */
 			strbuf_add(payload, line, next - line);
 		}
-		if (*line == '\n')
-			in_header = 0;
 		line = next;
 	}
  cleanup:
diff --git a/t/t7510-signed-commit.sh b/t/t7510-signed-commit.sh
index 5c7475d..30401ce 100755
--- a/t/t7510-signed-commit.sh
+++ b/t/t7510-signed-commit.sh
@@ -50,11 +50,22 @@ test_expect_success GPG 'show signatures' '
 
 test_expect_success GPG 'detect fudged signature' '
 	git cat-file commit master >raw &&
-	sed -e "s/fourth signed/4th forged/" raw >forged &&
-	git hash-object -w -t commit forged >forged.commit &&
-	git show --pretty=short --show-signature $(cat forged.commit) >actual &&
-	grep "BAD signature from" actual &&
-	! grep "Good signature from" actual
+
+	sed -e "s/fourth signed/4th forged/" raw >forged1 &&
+	git hash-object -w -t commit forged1 >forged1.commit &&
+	git show --pretty=short --show-signature $(cat forged1.commit) >actual1 &&
+	grep "BAD signature from" actual1 &&
+	! grep "Good signature from" actual1
+'
+
+test_expect_success GPG 'detect fudged signature with NUL' '
+	git cat-file commit master >raw &&
+	cat raw >forged2 &&
+	echo Qwik | tr "Q" "\000" >>forged2 &&
+	git hash-object -w -t commit forged2 >forged2.commit &&
+	git show --pretty=short --show-signature $(cat forged2.commit) >actual2 &&
+	grep "BAD signature from" actual2 &&
+	! grep "Good signature from" actual2
 '
 
 test_done

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: Re* [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-20 18:36             ` Re* " Junio C Hamano
@ 2011-10-20 19:00               ` Jeff King
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff King @ 2011-10-20 19:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

On Thu, Oct 20, 2011 at 11:36:48AM -0700, Junio C Hamano wrote:

> It probably is not worth it for most applications, but this fix-up to a
> fairly recent one is worth doing, I would suspect.
> 
> -- >8 --
> Subject: parse_signed_commit: really use the entire commit log message
>
> ... even beyond the first NUL in the buffer, when checking the commit
> against the detached signature in the header.

Yeah, that is worth fixing, I think. It's one thing to be a little lazy
in pretty-printing for "git log", but I think signature verification
should be more careful.

Patch itself looks sane to me. There's still some use of str-like
functions, but they would prevent us from even seeing the signature
headers in the first place, so anything with a NUL that high is just
broken and crappy.

I didn't check, but I wonder if fsck does/should check that there is a
proper end-of-header blank line before we hit any NUL.

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-20  2:51     ` Jeff King
  2011-10-20  4:15       ` Kyle Moffett
@ 2011-10-24 20:47       ` Jeff King
  1 sibling, 0 replies; 24+ messages in thread
From: Jeff King @ 2011-10-24 20:47 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

On Wed, Oct 19, 2011 at 10:51:49PM -0400, Jeff King wrote:

> I couldn't resist:
> 
>   $ git commit -q -m foo --collide=deadbeef &&
>     git rev-list -1 HEAD
>   deadbeefdbd6e62a2185606a4fad653e22509b56

Here's a followup with a few cleanups:

  1. It will actually finish and report an error if we fail to find a
     commit after ULONG_MAX tries. On 64-bit machines this is unlikely,
     but it might be possible on a 32-bit machine if you have a lot of
     patience.

  2. It uses git_SHA* now, so it will build properly against BLK_SHA1 if
     NO_OPENSSL is defined.

  3. It will split the the collision-finding work across online_cpus()
     threads.

Somebody else suggested farming the work out to a GPU. I'll leave that
as an exercise to the reader.

-Peff

---
 builtin/commit.c |   56 ++++++++++++++++++-
 commit.c         |  165 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 commit.h         |    4 +
 3 files changed, 223 insertions(+), 2 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index c46f2d1..734a7ab 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -105,6 +105,10 @@
 static const char *only_include_assumed;
 static struct strbuf message;
 
+static int collide;
+static unsigned char collide_sha1[20];
+static unsigned char collide_mask[20];
+
 static int null_termination;
 static enum {
 	STATUS_FORMAT_LONG,
@@ -125,6 +129,52 @@ static int opt_parse_m(const struct option *opt, const char *arg, int unset)
 	return 0;
 }
 
+static int parse_partial_sha1(const char *s, unsigned char sha1[20])
+{
+	unsigned int i;
+
+	hashclr(sha1);
+
+	for (i = 0; i < 40 && s[i]; i++) {
+		unsigned int v = hexval(s[i]);
+		if (v & ~0xf)
+			break;
+		if (!(i & 1))
+			v <<= 4;
+		sha1[i/2] |= v;
+	}
+	return i;
+}
+
+static void fill_sha1_mask(int n, unsigned char mask[20]) {
+	int i;
+
+	hashclr(mask);
+	for (i = 0; i < n/2; i++)
+		mask[i] = 0xff;
+	if (n & 1)
+		mask[i] = 0xf0;
+}
+
+static int opt_parse_collide(const struct option *opt, const char *arg,
+			     int unset)
+{
+	if (unset)
+		collide = 0;
+	else {
+		int n = parse_partial_sha1(arg, collide_sha1);
+		if (!arg[n])
+			fill_sha1_mask(n, collide_mask);
+		else if (arg[n] == '/')
+			parse_partial_sha1(arg + n + 1, collide_mask);
+		else
+			die("invalid --collide sha1: %s", arg);
+		collide = 1;
+	}
+	return 0;
+}
+
+
 static struct option builtin_commit_options[] = {
 	OPT__QUIET(&quiet, "suppress summary after successful commit"),
 	OPT__VERBOSE(&verbose, "show diff in commit message template"),
@@ -144,6 +194,7 @@ static int opt_parse_m(const struct option *opt, const char *arg, int unset)
 	OPT_BOOLEAN('e', "edit", &edit_flag, "force edit of commit"),
 	OPT_STRING(0, "cleanup", &cleanup_arg, "default", "how to strip spaces and #comments from message"),
 	OPT_BOOLEAN(0, "status", &include_status, "include status in commit message template"),
+	OPT_CALLBACK(0, "collide", NULL, "sha1[/mask]", "choose commit sha1 like <sha1>", opt_parse_collide),
 	/* end commit message options */
 
 	OPT_GROUP("Commit contents options"),
@@ -1483,8 +1534,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		exit(1);
 	}
 
-	if (commit_tree(sb.buf, active_cache_tree->sha1, parents, sha1,
-			author_ident.buf)) {
+	if (commit_tree_collide(sb.buf, active_cache_tree->sha1, parents, sha1,
+				author_ident.buf,
+				collide ? collide_sha1 : NULL, collide_mask)) {
 		rollback_index_files();
 		die(_("failed to write commit object"));
 	}
diff --git a/commit.c b/commit.c
index 73b7e00..7beea9b 100644
--- a/commit.c
+++ b/commit.c
@@ -6,6 +6,7 @@
 #include "diff.h"
 #include "revision.h"
 #include "notes.h"
+#include "thread-utils.h"
 
 int save_commit_buffer = 1;
 
@@ -840,6 +841,158 @@ struct commit_list *reduce_heads(struct commit_list *heads)
 	return result;
 }
 
+static inline int sha1_match_mask(const unsigned char *sha1,
+				  const unsigned char *want,
+				  const unsigned char *mask)
+{
+	int i;
+	for (i = 0; i < 20; i++)
+		if ((want[i] & mask[i]) != (sha1[i] & mask[i]))
+		    return 0;
+	return 1;
+}
+
+static unsigned long find_collision(const unsigned char *want,
+				    const unsigned char *mask,
+				    const git_SHA_CTX *base,
+				    unsigned long start,
+				    unsigned long end)
+{
+	unsigned long lulz;
+
+	for (lulz = start; lulz < end; lulz++) {
+		git_SHA_CTX guess;
+		unsigned char sha1[20];
+
+		memcpy(&guess, base, sizeof(guess));
+		git_SHA1_Update(&guess, &lulz, sizeof(lulz));
+		git_SHA1_Final(sha1, &guess);
+
+		if (sha1_match_mask(sha1, want, mask))
+			return lulz;
+	}
+	return end;
+}
+
+#ifndef NO_PTHREADS
+struct collision_thread_data {
+	const unsigned char *want;
+	const unsigned char *mask;
+	const git_SHA_CTX *base;
+	unsigned long start;
+	unsigned long end;
+
+	pthread_mutex_t *mutex;
+	pthread_cond_t *cond;
+	int *threads_alive;
+	unsigned long *answer;
+};
+
+static void *collision_thread(void *vdata)
+{
+	struct collision_thread_data *d = vdata;
+	unsigned long r;
+
+	pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
+
+	r = find_collision(d->want, d->mask, d->base,
+			   d->start, d->end);
+
+	pthread_mutex_lock(d->mutex);
+
+	(*d->threads_alive)--;
+	if (r != d->end) {
+		*d->answer = r;
+		pthread_cond_signal(d->cond);
+	}
+	else {
+		/* If we failed to find it, and we're the last thread,
+		 * wake up the parent so it can report failure. */
+		if (!*d->threads_alive)
+			pthread_cond_signal(d->cond);
+	}
+
+	pthread_mutex_unlock(d->mutex);
+	return NULL;
+}
+
+static unsigned long find_collision_threaded(int nthreads,
+					     const unsigned char *want,
+					     const unsigned char *mask,
+					     const git_SHA_CTX *base)
+{
+	int i;
+	pthread_t *threads;
+	struct collision_thread_data *data;
+	pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
+	pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
+	int threads_alive = nthreads;
+	unsigned long ret = ULONG_MAX;
+
+	threads = xmalloc(nthreads * sizeof(*threads));
+	data = xmalloc(nthreads * sizeof(*data));
+
+	pthread_mutex_lock(&mutex);
+
+	for (i = 0; i < nthreads; i++) {
+		data[i].want = want;
+		data[i].mask = mask;
+		data[i].base = base;
+		data[i].start = i * (ULONG_MAX / nthreads);
+		data[i].end = (i+1) * (ULONG_MAX / nthreads);
+		data[i].mutex = &mutex;
+		data[i].cond = &cond;
+		data[i].answer = &ret;
+		data[i].threads_alive = &threads_alive;
+		pthread_create(&threads[i], NULL, collision_thread, &data[i]);
+	}
+
+	pthread_cond_wait(&cond, &mutex);
+
+	for (i = 0; i < nthreads; i++) {
+		pthread_cancel(threads[i]);
+		pthread_join(threads[i], NULL);
+	}
+
+	free(threads);
+	free(data);
+	return ret;
+}
+#endif /* NO_PTHREADS */
+
+
+static void collide_commit(struct strbuf *data,
+			   const unsigned char *want,
+			   const unsigned char *mask)
+{
+	static const char terminator[] = { 0 };
+	char header[32];
+	int header_len;
+	unsigned long lulz;
+	git_SHA_CTX base;
+
+	header_len = snprintf(header, sizeof(header),
+			      "commit %lu",
+			      data->len + 1 + sizeof(lulz)) + 1;
+	git_SHA1_Init(&base);
+	git_SHA1_Update(&base, header, header_len);
+	git_SHA1_Update(&base, data->buf, data->len);
+	git_SHA1_Update(&base, terminator, sizeof(terminator));
+
+#ifdef NO_PTHREADS
+	lulz = find_collision(want, mask, &base, 0, ULONG_MAX);
+#else
+	lulz = find_collision_threaded(online_cpus(), want, mask, &base);
+#endif /* NO_PTHREADS */
+
+	if (lulz != ULONG_MAX) {
+		strbuf_add(data, terminator, sizeof(terminator));
+		strbuf_add(data, &lulz, sizeof(lulz));
+	}
+	else
+		warning("sorry, I couldn't find a collision!");
+}
+
 static const char commit_utf8_warn[] =
 "Warning: commit message does not conform to UTF-8.\n"
 "You may want to amend it after fixing the message, or set the config\n"
@@ -849,6 +1002,15 @@ int commit_tree(const char *msg, unsigned char *tree,
 		struct commit_list *parents, unsigned char *ret,
 		const char *author)
 {
+	return commit_tree_collide(msg, tree, parents, ret, author,
+				   NULL, NULL);
+}
+
+int commit_tree_collide(const char *msg, unsigned char *tree,
+			struct commit_list *parents, unsigned char *ret,
+			const char *author, const unsigned char *want,
+			const unsigned char *mask)
+{
 	int result;
 	int encoding_is_utf8;
 	struct strbuf buffer;
@@ -890,6 +1052,9 @@ int commit_tree(const char *msg, unsigned char *tree,
 	if (encoding_is_utf8 && !is_utf8(buffer.buf))
 		fprintf(stderr, commit_utf8_warn);
 
+	if (want && mask)
+		collide_commit(&buffer, want, mask);
+
 	result = write_sha1_file(buffer.buf, buffer.len, commit_type, ret);
 	strbuf_release(&buffer);
 	return result;
diff --git a/commit.h b/commit.h
index 009b113..337dcbd 100644
--- a/commit.h
+++ b/commit.h
@@ -184,5 +184,9 @@ static inline int single_parent(struct commit *commit)
 extern int commit_tree(const char *msg, unsigned char *tree,
 		struct commit_list *parents, unsigned char *ret,
 		const char *author);
+extern int commit_tree_collide(const char *msg, unsigned char *tree,
+			       struct commit_list *parents, unsigned char *ret,
+			       const char *author, const unsigned char *sha1,
+			       const unsigned char *mask);
 
 #endif /* COMMIT_H */
-- 
1.7.7.40.g1a72f

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits
  2011-10-20 15:56               ` Jeff King
@ 2011-10-25 22:35                 ` Drew Northup
  0 siblings, 0 replies; 24+ messages in thread
From: Drew Northup @ 2011-10-25 22:35 UTC (permalink / raw)
  To: Jeff King
  Cc: Ted Ts'o, Junio C Hamano,
	Ævar Arnfjörð Bjarmason, Git Mailing List


On Thu, 2011-10-20 at 11:56 -0400, Jeff King wrote:
> On Thu, Oct 20, 2011 at 09:14:55AM -0400, Ted Ts'o wrote:
> 
> > Another possibility is to warn if the commit messages are not NULL
> > terminated.
> 
> A minor nit, but it's not whether they are terminated with NUL, but
> rather whether they have embedded NUL. But yeah, this could maybe just
> be something fsck looks for.

God (or your deity of choice) forbid that anybody ever writes code that
saves raw UTF-16 as the commit message...

-- 
-Drew Northup
________________________________________________
"As opposed to vegetable or mineral error?"
-John Pescatore, SANS NewsBites Vol. 12 Num. 59

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2011-10-25 22:36 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-19 18:03 [IGNORETHIS/PATCH] Choosing the sha1 prefix of your commits Ævar Arnfjörð Bjarmason
2011-10-19 19:01 ` Jeff King
2011-10-19 19:38   ` Jeff King
2011-10-20  2:51     ` Jeff King
2011-10-20  4:15       ` Kyle Moffett
2011-10-20  4:25         ` Jeff King
2011-10-20  4:27         ` Junio C Hamano
2011-10-20  4:32           ` Kyle Moffett
2011-10-24 20:47       ` Jeff King
2011-10-20  4:31     ` Junio C Hamano
2011-10-20  4:34       ` Jeff King
2011-10-20  6:57         ` Junio C Hamano
2011-10-20  7:13           ` Jeff King
2011-10-20 13:14             ` Ted Ts'o
2011-10-20 15:56               ` Jeff King
2011-10-25 22:35                 ` Drew Northup
2011-10-20 18:36             ` Re* " Junio C Hamano
2011-10-20 19:00               ` Jeff King
2011-10-20  7:27           ` Nguyen Thai Ngoc Duy
2011-10-20  9:14       ` Nguyen Thai Ngoc Duy
2011-10-20 15:44         ` Jeff King
2011-10-20  9:38   ` Mikael Magnusson
2011-10-20 13:44     ` Elijah Newren
2011-10-19 22:09 ` Jonathan Nieder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).