From: Shawn Pearce <spearce@spearce.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Nguyễn Thái Ngọc" <pclouds@gmail.com>,
git@vger.kernel.org, "Thomas Rast" <trast@inf.ethz.ch>,
"Joshua Redstone" <joshua.redstone@fb.com>
Subject: Re: [PATCH 3/6] Stop producing index version 2
Date: Mon, 6 Feb 2012 19:09:15 -0800 [thread overview]
Message-ID: <CAJo=hJvtRnmvALcn3vKpYTr3j6ada8iboPjWN3cQnwwKzRvrDA@mail.gmail.com> (raw)
In-Reply-To: <7v4nv4a131.fsf@alter.siamese.dyndns.org>
2012/2/5 Junio C Hamano <gitster@pobox.com>:
> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>
>> read-cache.c learned to produce version 2 or 3 depending on whether
>> extended cache entries exist in 06aaaa0 (Extend index to save more flags
>> - 2008-10-01), first released in 1.6.1. The purpose is to keep
>> compatibility with older git. It's been more than three years since
>> then and git has reached 1.7.9. Drop support for older git.
>
> Cc'ing this, as I suspect this would surely raise eyebrows of some people
> who wanted to get rid of the version 3 format.
Version 3 was a mistake because of the variable length record sizes.
Saving 2 bytes on some records that don't use the extended flags makes
the index file *MUCH* harder to parse. So much so that we should take
version 3 and kill it, not encourage it as the default!
IMHO, when these extended flags were added to make version 3 the
following should have happened:
- All records use the larger structure format with 4 bytes for the
flags, not 2 bytes.
- Change the trailing padding after the name to be a *SINGLE* \0 byte,
and do not pad out to an 8 byte boundary.
Both make it really hard to process the file, and the latter happens
only for direct mmap usage, which we don't do anymore.
We also have to consider the EGit and JGit user base as part of the
ecosystem. We can't just kill a file format because git-core has been
capable of reading its alternative since some arbitrary YYYY-MM-DD
release date. We need to also consider when did some other major tools
catch up and also support this format?
FWIW JGit released index version 3 support in version 0.9.1, which
shipped Sep 15, 2010. JGit/EGit were more than 2 years behind here.
<thinking type="wishful" probability="never-happen"
probably-inflating-flame-from="linus">
I have long wanted to scrap the current index format. I unfortunately
don't have the time to do it myself. But I suspect there may be a lot
of gains by making the index format match the canonical tree format
better by keeping the tree structure within a single file stream,
nesting entries below their parent directory, and keeping tree SHA-1
data along with the directory entry. For one thing the index would be
able to register an empty subdirectory, rather than ignoring them. It
would also better line up with the filesystem's readdir() handling,
giving us more sane logic to compare what readdir() tells us exists
against what the index thinks should be in the same file. And the
overall index should be smaller, because we don't have to repeat the
same path/to/a/file/for/every/file/in/that/same/directory/tree.
Reconstructing the path strings at read time into a flat list should
be pretty trivial, and still keep the parallel lstat calls running off
a flat list working well for fast status operations.
</thinking>
next prev parent reply other threads:[~2012-02-07 3:09 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-06 5:48 [PATCH 1/6] read-cache: use sha1file for sha1 calculation Nguyễn Thái Ngọc Duy
2012-02-06 5:48 ` [PATCH 2/6] csum-file: make sha1 calculation optional Nguyễn Thái Ngọc Duy
2012-02-06 5:48 ` [PATCH 3/6] Stop producing index version 2 Nguyễn Thái Ngọc Duy
2012-02-06 7:10 ` Junio C Hamano
2012-02-07 3:09 ` Shawn Pearce [this message]
2012-02-07 4:50 ` Nguyen Thai Ngoc Duy
2012-02-07 8:51 ` Nguyen Thai Ngoc Duy
2012-02-07 5:21 ` Junio C Hamano
2012-02-07 17:25 ` Thomas Rast
2012-02-06 5:48 ` [PATCH 4/6] Introduce index version 4 with global flags Nguyễn Thái Ngọc Duy
2012-02-06 5:48 ` [PATCH 5/6] Allow to use crc32 as a lighter checksum on index Nguyễn Thái Ngọc Duy
2012-02-07 3:17 ` Shawn Pearce
2012-02-07 4:04 ` Dave Zarzycki
2012-02-07 4:29 ` Dave Zarzycki
2012-02-06 5:48 ` [PATCH 6/6] Automatically switch to crc32 checksum for index when it's too large Nguyễn Thái Ngọc Duy
2012-02-06 8:50 ` Dave Zarzycki
2012-02-06 8:54 ` Nguyen Thai Ngoc Duy
2012-02-06 9:07 ` Dave Zarzycki
2012-02-06 7:34 ` [PATCH 1/6] read-cache: use sha1file for sha1 calculation Junio C Hamano
2012-02-06 8:36 ` Nguyen Thai Ngoc Duy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJo=hJvtRnmvALcn3vKpYTr3j6ada8iboPjWN3cQnwwKzRvrDA@mail.gmail.com' \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=joshua.redstone@fb.com \
--cc=pclouds@gmail.com \
--cc=trast@inf.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).