git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Rast <trast@student.ethz.ch>
To: Thomas Gummerer <t.gummerer@gmail.com>
Cc: <git@vger.kernel.org>, <trast@student.ethz.ch>,
	<gitster@pobox.com>, <mhagger@alum.mit.edu>, <peff@peff.net>,
	<spearce@spearce.org>, <davidbarr@google.com>
Subject: Re: Index format v5
Date: Thu, 3 May 2012 20:16:59 +0200	[thread overview]
Message-ID: <87obq5p1t0.fsf@thomas.inf.ethz.ch> (raw)
In-Reply-To: <CALgYhfMKdbv8TiT4ALDSvD3pSXHEPLWHM09DxYnRmRdBWRjh8Q@mail.gmail.com> (Thomas Gummerer's message of "Thu, 3 May 2012 19:25:12 +0200")

Thomas Gummerer <t.gummerer@gmail.com> writes:

>   32-bit crc32 checksum over ctime seconds, ctime nanoseconds,
>     ino, file size, dev, uid, gid (All stat(2) data except mtime) [7]
[...]
> [7] Since all stat data (except mtime and ctime) is just used for
>     checking if a file has changed a checksum of the data is enough.
>     In addition to that Thomas Rast suggested ctime could be ditched
>     completely (core.trustctime=false) and thus included in the
>     checksum. This would save 24 bytes per index entry, which would
>     be about 4 MB on the Webkit index.
>     (Thanks for the suggestion to Michael Haggerty)

This is the part I'm most curious about.  Are we missing anything?
Michael brought it up on IRC: the stat() results are only used to test
whether they are still the same, with the exception of the mtime (which
also undergoes raciness checks).

As far as I can see, none of st_{ino,dev,uid,gid} are useful for
anything.  st_size might conceivably be used as a hint for a buffer
size, but nobody actually does that.  The ctime undergoes stricter
checks, but AFAICS it's also all about whether it has changed, and
besides that can be turned off.  We think all of those fields can be
replaced by an arbitrary hash/CRC and only tested for equality.  32 bits
should be plenty, probably even if we just xor the values together.

So what's wrong in this thinking?

[The one flaw I found so far is that this makes it impossible to convert
back to v2-4 without at the very least refreshing the index.  Do we
care?]

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

  reply	other threads:[~2012-05-03 18:17 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-03 17:25 Index format v5 Thomas Gummerer
2012-05-03 18:16 ` Thomas Rast [this message]
2012-05-03 19:03   ` Junio C Hamano
2012-05-04  7:12   ` Michael Haggerty
2012-05-07 22:18     ` Robin Rosenberg
2012-05-03 18:21 ` Ronan Keryell
2012-05-03 20:36   ` Thomas Gummerer
2012-05-03 18:54 ` Junio C Hamano
2012-05-03 19:11   ` Thomas Rast
2012-05-03 19:31   ` Thomas Rast
2012-05-03 19:32     ` Thomas Rast
2012-05-03 20:32       ` Junio C Hamano
2012-05-03 21:38   ` Thomas Gummerer
2012-05-07 18:57     ` Robin Rosenberg
2012-05-03 19:38 ` solo-git
2012-05-04 13:20 ` Nguyen Thai Ngoc Duy
2012-05-04 15:44   ` Thomas Gummerer
2012-05-04 13:25 ` Philip Oakley
2012-05-04 15:46   ` Junio C Hamano
2012-05-06 10:23 ` Nguyen Thai Ngoc Duy
2012-05-07 13:44   ` Thomas Gummerer
2012-05-06 16:49 ` Phil Hord
2012-05-07 13:08   ` Thomas Gummerer
2012-05-07 15:15 ` Michael Haggerty
2012-05-08 14:11   ` Thomas Gummerer
2012-05-08 14:25     ` Nguyen Thai Ngoc Duy
2012-05-08 14:34       ` Nguyen Thai Ngoc Duy
2012-05-10  6:53         ` Thomas Gummerer
2012-05-10 11:06           ` Nguyen Thai Ngoc Duy
2012-05-09  8:37     ` Michael Haggerty
2012-05-10 12:19       ` Thomas Gummerer
2012-05-10 18:17         ` Michael Haggerty
2012-05-11 17:12           ` Thomas Gummerer
2012-05-13 19:50             ` Michael Haggerty
2012-05-14 15:01               ` Thomas Gummerer
2012-05-14 21:08                 ` Michael Haggerty
2012-05-14 22:10                   ` Thomas Rast
2012-05-15  6:43                     ` Michael Haggerty
2012-05-15 13:49                   ` Thomas Gummerer
2012-05-15 15:02                     ` Michael Haggerty
2012-05-18 15:38                       ` Thomas Gummerer
2012-05-19 13:00                         ` Michael Haggerty
2012-05-21  7:45                           ` Thomas Gummerer
2012-05-16  5:01                     ` Michael Haggerty
2012-05-16 21:54                       ` Thomas Gummerer
2012-05-19  5:40                         ` Michael Haggerty
2012-05-21 20:30                           ` Thomas Gummerer
2012-05-13 21:01 ` Philip Oakley
2012-05-14 14:54   ` Thomas Gummerer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87obq5p1t0.fsf@thomas.inf.ethz.ch \
    --to=trast@student.ethz.ch \
    --cc=davidbarr@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    --cc=spearce@spearce.org \
    --cc=t.gummerer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).