git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] On compresing large index
@ 2012-02-05  8:30 Nguyễn Thái Ngọc Duy
  2012-02-05  8:30 ` [PATCH 1/3] read-cache: factor out cache entries reading code Nguyễn Thái Ngọc Duy
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-02-05  8:30 UTC (permalink / raw)
  To: git; +Cc: Joshua Redstone, Nguyễn Thái Ngọc Duy

I was thinking whether compressing index might help when it contained
~2M files. It turns out that only makes the situation worse. Anyway, I
post the code and some numbers here.

The index is created artifically with the program [1]

$ git init
$ touch foo
$ git hash-object -w foo
$ ./a.out 256 256 32 | git update-index --index-info

That gives ~2M files in index, 209 MB in size.

$ time ~/w/git/git ls-files | head >/dev/null
real    0m4.635s
user    0m4.258s
sys     0m0.329s

$ time ~/w/git/git update-index level-0-0000/foo
real    0m4.593s
user    0m4.264s
sys     0m0.323s

Index is compressed with GIT_ZCACHE=1.

$ GIT_ZCACHE=1 ~/w/git/git update-index level-0-0000/foo

which gives 6.8 MB index (the true number may be less impressive
because compressing rate in my artificial tree is really high). The
only problem with this is git uses more time, not less

$ time ~/w/git/git ls-files | head >/dev/null
real    0m4.970s
user    0m4.675s
sys     0m0.289s

$ time GIT_ZCACHE=1 ~/w/git/git update-index level-0-0000/foo
real    0m4.959s
user    0m4.682s
sys     0m0.273s

My guess is Linux caches the whole index in memory already so I/O time
does not really matter, while we still have to pay for zlib's time. We
need to figure out what git uses 4s user time for.

This series may be useful on OSes that do not cache heavily. Though
I'm not sure if there is any out there nowadays.

Nguyễn Thái Ngọc Duy (3):
  read-cache: factor out cache entries reading code
  read-cache: reduce malloc/free during writing index
  Support compressing index when GIT_ZCACHE=1

 cache.h      |    1 +
 read-cache.c |  172 +++++++++++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 148 insertions(+), 25 deletions(-)

[1]
-- 8< --
#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
	const char *prefix = "100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0\t";
	int l1, l2, l3;
	int m1, m2, m3;

	m1 = atoi(argv[1]);
	m2 = atoi(argv[2]);
	m3 = atoi(argv[3]);

	for (l1 = 0; l1 < m1; l1++) {
		printf("%slevel-0-%04d/foo\n", prefix, l1);
		for (l2 = 0; l2 < m2; l2++)
			for (l3 = 0; l3 < m3; l3++)
				printf("%slevel-0-%04d/level-1-%04d/foo-%04d\n",
				       prefix, l1, l2, l3);
	}
	return 0;
}
-- 8< --
-- 
1.7.8.36.g69ee2

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-02-06 15:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-05  8:30 [PATCH 0/3] On compresing large index Nguyễn Thái Ngọc Duy
2012-02-05  8:30 ` [PATCH 1/3] read-cache: factor out cache entries reading code Nguyễn Thái Ngọc Duy
2012-02-05  8:30 ` [PATCH 2/3] read-cache: reduce malloc/free during writing index Nguyễn Thái Ngọc Duy
2012-02-05  8:30 ` [PATCH 3/3] Support compressing index when GIT_ZCACHE=1 Nguyễn Thái Ngọc Duy
2012-02-05 21:22 ` [PATCH 0/3] On compresing large index Thomas Rast
2012-02-06  1:35   ` Nguyen Thai Ngoc Duy
2012-02-06 15:54     ` Joshua Redstone

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).