From: Thomas Rast <trast@inf.ethz.ch>
To: Shawn Pearce <spearce@spearce.org>
Cc: "Junio C Hamano" <gitster@pobox.com>,
"Nguyễn Thái Ngọc" <pclouds@gmail.com>,
git@vger.kernel.org, "Joshua Redstone" <joshua.redstone@fb.com>
Subject: Re: [PATCH 3/6] Stop producing index version 2
Date: Tue, 7 Feb 2012 18:25:43 +0100 [thread overview]
Message-ID: <874nv2o8rs.fsf@thomas.inf.ethz.ch> (raw)
In-Reply-To: <CAJo=hJvtRnmvALcn3vKpYTr3j6ada8iboPjWN3cQnwwKzRvrDA@mail.gmail.com> (Shawn Pearce's message of "Mon, 6 Feb 2012 19:09:15 -0800")
Shawn Pearce <spearce@spearce.org> writes:
> I have long wanted to scrap the current index format. I unfortunately
> don't have the time to do it myself. But I suspect there may be a lot
> of gains by making the index format match the canonical tree format
> better by keeping the tree structure within a single file stream,
> nesting entries below their parent directory, and keeping tree SHA-1
> data along with the directory entry.
If I may add to this: the one thing that I would like to see fixed about
the index is that it's flat out impossible to change a single thing in
it without re"writing" it from scratch.
I'm saying "writing" because it is possible to change a few things
around, but recomputing the trailing SHA1 swamps that by a large margin
unless you are writing to a floppy disk, so it doesn't matter. I'm sure
using a CRC32 helps here, but if we're going to make an incompatible
change, why not go all the way?
A tree layout can fix that if it is properly arranged so that if you
'git add path/to/file', it only updates the SHA1s for path/to/file,
path/to and path. For this to work, the checks would have to correspond
to the trees, perhaps even directly use the actual tree SHA1. This
would at least be natural in some sense; getting to actual log(n)
complexity for hilariously large directories would require dynamically
splitting directories where appropriate.
Along the same lines the format should allow for changing the extension
data for a single extension while only rehashing the new data.
When I worked on cache-tree, I considered making a change to the latter
effect, but thought the impact too great for a little gain. Now from
this thread, I'm getting the impression that such a change would be ok,
even if users would have to scrap the index if they downgrade. Is that
right?
--
Thomas Rast
trast@{inf,student}.ethz.ch
next prev parent reply other threads:[~2012-02-07 17:25 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-06 5:48 [PATCH 1/6] read-cache: use sha1file for sha1 calculation Nguyễn Thái Ngọc Duy
2012-02-06 5:48 ` [PATCH 2/6] csum-file: make sha1 calculation optional Nguyễn Thái Ngọc Duy
2012-02-06 5:48 ` [PATCH 3/6] Stop producing index version 2 Nguyễn Thái Ngọc Duy
2012-02-06 7:10 ` Junio C Hamano
2012-02-07 3:09 ` Shawn Pearce
2012-02-07 4:50 ` Nguyen Thai Ngoc Duy
2012-02-07 8:51 ` Nguyen Thai Ngoc Duy
2012-02-07 5:21 ` Junio C Hamano
2012-02-07 17:25 ` Thomas Rast [this message]
2012-02-06 5:48 ` [PATCH 4/6] Introduce index version 4 with global flags Nguyễn Thái Ngọc Duy
2012-02-06 5:48 ` [PATCH 5/6] Allow to use crc32 as a lighter checksum on index Nguyễn Thái Ngọc Duy
2012-02-07 3:17 ` Shawn Pearce
2012-02-07 4:04 ` Dave Zarzycki
2012-02-07 4:29 ` Dave Zarzycki
2012-02-06 5:48 ` [PATCH 6/6] Automatically switch to crc32 checksum for index when it's too large Nguyễn Thái Ngọc Duy
2012-02-06 8:50 ` Dave Zarzycki
2012-02-06 8:54 ` Nguyen Thai Ngoc Duy
2012-02-06 9:07 ` Dave Zarzycki
2012-02-06 7:34 ` [PATCH 1/6] read-cache: use sha1file for sha1 calculation Junio C Hamano
2012-02-06 8:36 ` Nguyen Thai Ngoc Duy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874nv2o8rs.fsf@thomas.inf.ethz.ch \
--to=trast@inf.ethz.ch \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=joshua.redstone@fb.com \
--cc=pclouds@gmail.com \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).