From: Junio C Hamano <gitster@pobox.com>
To: git@vger.kernel.org
Subject: [PATCH 0/9] Prefix-compress on-disk index entries
Date: Tue, 3 Apr 2012 15:53:07 -0700 [thread overview]
Message-ID: <1333493596-14202-1-git-send-email-gitster@pobox.com> (raw)
This is still rough, but with this patch I am getting:
$ ls -l .git/index*
-rw-r----- 1 jch eng 25586488 2012-04-03 15:27 .git/index
-rw-r----- 1 jch eng 14654328 2012-04-03 15:38 .git/index-4
in a clone of WebKit repository that has 183175 paths.
With hot-cache with no local modification:
$ time sh -c 'GIT_INDEX_FILE=.git/index-4 git diff'
real 0m0.469s
user 0m0.130s
sys 0m0.330s
$ time sh -c 'git diff'
real 0m0.677s
user 0m0.290s
sys 0m0.370s
which is mesuring the time needed to read of the index into in-core
structure and comparing the cached stat information taken from lstat(2).
The updated format is not documented yet, as I didn't intend (and I still
am not committed) to declare a change along this line the official "v4"
format; I was merely being curious to see how much improvements we can get
from a trivial approach like this.
The saving of the on-disk index size comes from two factors:
- Not padding the on-disk index entries to 8-byte boundary;
- Not storing the full pathname for each entry in the on-disk format.
Because the entries are sorted by path, adjacent entries in the index tend
to share the leading components of them, and it makes sense to only store
the differences in later entries. In the v4 on-disk format of the index,
each on-disk cache entry stores the number of bytes to be stripped from
the end of the previous name, and the bytes to append to the result, to
come up with its name.
The "to-remove" count is encoded in the varint format used in the
packfiles, and the "bytes-to-append" is a simple NUL-terminated string.
Junio C Hamano (9):
varint: make it available outside the context of pack
cache.h: hide on-disk index details
read-cache.c: allow unaligned mapping of the index file
read-cache.c: make create_from_disk() report number of bytes it consumed
read-cache.c: report the header version we do not understand
read-cache.c: move code to copy ondisk to incore cache to a helper function
read-cache.c: move code to copy incore to ondisk cache to a helper function
read-cache.c: read prefix-compressed names in index on-disk version v4
read-cache.c: write index v4 format
Makefile | 2 +
builtin/update-index.c | 2 +
cache.h | 52 +---------
config.c | 11 ++
environment.c | 1 +
read-cache.c | 259 ++++++++++++++++++++++++++++++++++++++++--------
varint.c | 29 ++++++
varint.h | 9 ++
8 files changed, 275 insertions(+), 90 deletions(-)
create mode 100644 varint.c
create mode 100644 varint.h
--
1.7.10.rc4.54.g1d5dd3
next reply other threads:[~2012-04-03 22:53 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-03 22:53 Junio C Hamano [this message]
2012-04-03 22:53 ` [PATCH 1/9] varint: make it available outside the context of pack Junio C Hamano
2012-04-03 22:53 ` [PATCH 2/9] cache.h: hide on-disk index details Junio C Hamano
2012-04-03 22:53 ` [PATCH 3/9] read-cache.c: allow unaligned mapping of the index file Junio C Hamano
2012-04-03 22:53 ` [PATCH 4/9] read-cache.c: make create_from_disk() report number of bytes it consumed Junio C Hamano
2012-04-03 22:53 ` [PATCH 5/9] read-cache.c: report the header version we do not understand Junio C Hamano
2012-04-03 22:53 ` [PATCH 6/9] read-cache.c: move code to copy ondisk to incore cache to a helper function Junio C Hamano
2012-04-03 22:53 ` [PATCH 7/9] read-cache.c: move code to copy incore to ondisk " Junio C Hamano
2012-04-03 22:53 ` [PATCH 8/9] read-cache.c: read prefix-compressed names in index on-disk version v4 Junio C Hamano
2012-04-03 22:53 ` [PATCH 9/9] read-cache.c: write index v4 format Junio C Hamano
2012-04-04 1:44 ` [PATCH 0/9] Prefix-compress on-disk index entries David Barr
2012-04-04 15:33 ` Junio C Hamano
2012-04-04 16:57 ` Junio C Hamano
2012-04-04 16:58 ` [PATCH 2/2] update-index: upgrade/downgrade on-disk index version Junio C Hamano
2012-04-04 12:34 ` [PATCH 0/9] Prefix-compress on-disk index entries Nguyen Thai Ngoc Duy
2012-04-04 18:44 ` Junio C Hamano
2012-04-06 8:41 ` David Barr
2012-05-02 1:58 ` Nguyen Thai Ngoc Duy
2012-05-02 4:26 ` David Barr
2012-04-27 22:58 ` [PATCH 1/2] unpack-trees: preserve the index file version of original Junio C Hamano
2012-04-27 23:02 ` [PATCH 2/2] index-v4: document the entry format Junio C Hamano
2012-04-30 17:20 ` Thomas Rast
2012-05-01 4:00 ` Junio C Hamano
2012-05-01 21:43 ` Thomas Rast
2012-05-02 15:12 ` Shawn Pearce
2012-05-02 17:04 ` Junio C Hamano
2012-05-02 17:13 ` Shawn Pearce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1333493596-14202-1-git-send-email-gitster@pobox.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).