git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Joshua Redstone" <joshua.redstone@fb.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH 0/3] On compresing large index
Date: Sun,  5 Feb 2012 15:30:02 +0700	[thread overview]
Message-ID: <1328430605-4566-1-git-send-email-pclouds@gmail.com> (raw)

I was thinking whether compressing index might help when it contained
~2M files. It turns out that only makes the situation worse. Anyway, I
post the code and some numbers here.

The index is created artifically with the program [1]

$ git init
$ touch foo
$ git hash-object -w foo
$ ./a.out 256 256 32 | git update-index --index-info

That gives ~2M files in index, 209 MB in size.

$ time ~/w/git/git ls-files | head >/dev/null
real    0m4.635s
user    0m4.258s
sys     0m0.329s

$ time ~/w/git/git update-index level-0-0000/foo
real    0m4.593s
user    0m4.264s
sys     0m0.323s

Index is compressed with GIT_ZCACHE=1.

$ GIT_ZCACHE=1 ~/w/git/git update-index level-0-0000/foo

which gives 6.8 MB index (the true number may be less impressive
because compressing rate in my artificial tree is really high). The
only problem with this is git uses more time, not less

$ time ~/w/git/git ls-files | head >/dev/null
real    0m4.970s
user    0m4.675s
sys     0m0.289s

$ time GIT_ZCACHE=1 ~/w/git/git update-index level-0-0000/foo
real    0m4.959s
user    0m4.682s
sys     0m0.273s

My guess is Linux caches the whole index in memory already so I/O time
does not really matter, while we still have to pay for zlib's time. We
need to figure out what git uses 4s user time for.

This series may be useful on OSes that do not cache heavily. Though
I'm not sure if there is any out there nowadays.

Nguyễn Thái Ngọc Duy (3):
  read-cache: factor out cache entries reading code
  read-cache: reduce malloc/free during writing index
  Support compressing index when GIT_ZCACHE=1

 cache.h      |    1 +
 read-cache.c |  172 +++++++++++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 148 insertions(+), 25 deletions(-)

[1]
-- 8< --
#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
	const char *prefix = "100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0\t";
	int l1, l2, l3;
	int m1, m2, m3;

	m1 = atoi(argv[1]);
	m2 = atoi(argv[2]);
	m3 = atoi(argv[3]);

	for (l1 = 0; l1 < m1; l1++) {
		printf("%slevel-0-%04d/foo\n", prefix, l1);
		for (l2 = 0; l2 < m2; l2++)
			for (l3 = 0; l3 < m3; l3++)
				printf("%slevel-0-%04d/level-1-%04d/foo-%04d\n",
				       prefix, l1, l2, l3);
	}
	return 0;
}
-- 8< --
-- 
1.7.8.36.g69ee2

             reply	other threads:[~2012-02-05  8:31 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-05  8:30 Nguyễn Thái Ngọc Duy [this message]
2012-02-05  8:30 ` [PATCH 1/3] read-cache: factor out cache entries reading code Nguyễn Thái Ngọc Duy
2012-02-05  8:30 ` [PATCH 2/3] read-cache: reduce malloc/free during writing index Nguyễn Thái Ngọc Duy
2012-02-05  8:30 ` [PATCH 3/3] Support compressing index when GIT_ZCACHE=1 Nguyễn Thái Ngọc Duy
2012-02-05 21:22 ` [PATCH 0/3] On compresing large index Thomas Rast
2012-02-06  1:35   ` Nguyen Thai Ngoc Duy
2012-02-06 15:54     ` Joshua Redstone

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1328430605-4566-1-git-send-email-pclouds@gmail.com \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=joshua.redstone@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).