* Re: [PATCH] fetch-pack: avoid fixing thin packs when unnecessary
[not found] ` <Pine.LNX.4.64.0612181638220.18171@xanadu.home>
@ 2006-12-18 21:55 ` Johannes Schindelin
2006-12-18 22:17 ` Nicolas Pitre
0 siblings, 1 reply; 52+ messages in thread
From: Johannes Schindelin @ 2006-12-18 21:55 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Randal L. Schwartz, git
Hi,
On Mon, 18 Dec 2006, Nicolas Pitre wrote:
> On Mon, 18 Dec 2006, Johannes Schindelin wrote:
>
> >
> > When we know that there are no common commits, the pack must be
> > closed (i.e. non-thin) already. Avoid "fixing" it in that case.
> >
> > Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
> >
> > ---
> >
> > On Mon, 18 Dec 2006, Johannes Schindelin wrote:
> >
> > > On Mon, 18 Dec 2006, Randal L. Schwartz wrote:
> > >
> > > > But then it took nearly an *hour* at the next phase:
> > > >
> > > > Resolving 313037 deltas.
> > > > 100% (313037/313037) done
> > >
> > > Ouch.
> > >
> > > We try to avoid unpacking the thin packs received by git-fetch.
> > > This means completing that pack (since it can contain deltas
> > > against objects which are part of another pack).
> > >
> > > However, for the clone this is utter overkill. We really should
> > > try to avoid resolving unnecessarily. This is really for the
> > > clone case, since we do not have _any_ objects in the local
> > > repository.
> > >
> > > It happens that the other case -- fetching an independent branch
> > > -- is easy enough: we already have the check for it in
> > > fetch-pack.c:586.
> >
> > ... and here is a lightly tested fix.
>
> NAK.
>
> This fixes nothing. See previous message.
You're completely right. My patch does what I say, but it does not fix the
problem.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
[not found] ` <Pine.LNX.4.64.0612181625140.18171@xanadu.home>
@ 2006-12-18 22:01 ` Randal L. Schwartz
2006-12-18 22:09 ` Nicolas Pitre
2006-12-18 22:22 ` Linus Torvalds
0 siblings, 2 replies; 52+ messages in thread
From: Randal L. Schwartz @ 2006-12-18 22:01 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Linus Torvalds, git
>>>>> "Nicolas" == Nicolas Pitre <nico@cam.org> writes:
Nicolas> Could you try the following please:
Nicolas> time git-index-pack -v -o /dev/null .git/objects/pack/*.pack
Nicolas> and provide us with the time it took (or an estimate if it is going to
Nicolas> be on hour long)?
Nicolas> Performing the above on my kernel repository (after it was repacked into
Nicolas> a single ~150MB pack) takes only 37 seconds.
I have a single pack.
"Indexing" took about 30 seconds.
"Resolving 313037 deltas" looks like it's going to take an hour.
So that *was* a local delay.
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-18 22:01 ` cloning the kernel - why long time in "Resolving 313037 deltas" Randal L. Schwartz
@ 2006-12-18 22:09 ` Nicolas Pitre
2006-12-18 22:21 ` Randal L. Schwartz
2006-12-18 22:22 ` Linus Torvalds
1 sibling, 1 reply; 52+ messages in thread
From: Nicolas Pitre @ 2006-12-18 22:09 UTC (permalink / raw)
To: Randal L. Schwartz; +Cc: Linus Torvalds, git
On Mon, 18 Dec 2006, Randal L. Schwartz wrote:
> >>>>> "Nicolas" == Nicolas Pitre <nico@cam.org> writes:
>
> Nicolas> Could you try the following please:
>
> Nicolas> time git-index-pack -v -o /dev/null .git/objects/pack/*.pack
>
> Nicolas> and provide us with the time it took (or an estimate if it is going to
> Nicolas> be on hour long)?
>
> Nicolas> Performing the above on my kernel repository (after it was repacked into
> Nicolas> a single ~150MB pack) takes only 37 seconds.
>
> I have a single pack.
>
> "Indexing" took about 30 seconds.
> "Resolving 313037 deltas" looks like it's going to take an hour.
>
> So that *was* a local delay.
What CPU and amount of ram do you have?
Are you on Windows?
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] fetch-pack: avoid fixing thin packs when unnecessary
2006-12-18 21:55 ` [PATCH] fetch-pack: avoid fixing thin packs when unnecessary Johannes Schindelin
@ 2006-12-18 22:17 ` Nicolas Pitre
0 siblings, 0 replies; 52+ messages in thread
From: Nicolas Pitre @ 2006-12-18 22:17 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Randal L. Schwartz, git
On Mon, 18 Dec 2006, Johannes Schindelin wrote:
> Hi,
>
> On Mon, 18 Dec 2006, Nicolas Pitre wrote:
>
> > On Mon, 18 Dec 2006, Johannes Schindelin wrote:
> >
> > > ... and here is a lightly tested fix.
> >
> > NAK.
> >
> > This fixes nothing. See previous message.
>
> You're completely right. My patch does what I say, but it does not fix the
> problem.
Your patch is also unnecessary. For use_thin_pack to be true, you must
provide --thin to git-fetch-pack which is not the case when cloning.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-18 22:09 ` Nicolas Pitre
@ 2006-12-18 22:21 ` Randal L. Schwartz
2006-12-18 22:50 ` Nicolas Pitre
0 siblings, 1 reply; 52+ messages in thread
From: Randal L. Schwartz @ 2006-12-18 22:21 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Linus Torvalds, git
>>>>> "Nicolas" == Nicolas Pitre <nico@cam.org> writes:
>> "Indexing" took about 30 seconds.
>> "Resolving 313037 deltas" looks like it's going to take an hour.
>>
>> So that *was* a local delay.
Nicolas> What CPU and amount of ram do you have?
2.2 Ghz Intel Core 2 Duo (Macbook Pro high end)
I can compile and install GNU Emacs from source in 11 minutes. :)
Nicolas> Are you on Windows?
Gawd no!
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-18 22:01 ` cloning the kernel - why long time in "Resolving 313037 deltas" Randal L. Schwartz
2006-12-18 22:09 ` Nicolas Pitre
@ 2006-12-18 22:22 ` Linus Torvalds
2006-12-18 22:26 ` Randal L. Schwartz
1 sibling, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2006-12-18 22:22 UTC (permalink / raw)
To: Randal L. Schwartz; +Cc: Nicolas Pitre, git
On Mon, 18 Dec 2006, Randal L. Schwartz wrote:
>
> I have a single pack.
>
> "Indexing" took about 30 seconds.
> "Resolving 313037 deltas" looks like it's going to take an hour.
>
> So that *was* a local delay.
Ok, interesting. Two questions:
- what does "top" say (is it CPU-bound? Is it perhaps blowing out your
disk cache? Is it swapping?)
- do you have "oprofile" (or even just pgprof) to see where the *hell*
that time is spent, if it's actually CPU?
For me, it takes 25 seconds. Not an hour:
[torvalds@woody linux]$ time git-index-pack -v -o /dev/null .git/objects/pack/*.pack
Indexing 393507 objects.
100% (393507/393507) done
Resolving 316071 deltas.
100% (316071/316071) done
fatal: unable to create /dev/null: File exists
real 0m24.619s
user 0m22.569s
sys 0m1.316s
(I admit that 25 seconds is already "too much", but it does actually end
up exploding a lot of objects and doing quite a bit of work, so I guess
it's fair).
And the process grew to 33MB in RSS at it's biggest, so it wasn't even
using all that much memory (33MB isn't _tiny_, but considering that the
pack in question is 155MB and has hundreds of thousands of objects, 33MB
isn't really all that bad.
You're running this under OS X, aren't you? It's a pig of an OS, but
"almost one hour" vs "25 seconds" is still unreasonable.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-18 22:22 ` Linus Torvalds
@ 2006-12-18 22:26 ` Randal L. Schwartz
2006-12-18 23:02 ` Martin Langhoff
2006-12-18 23:28 ` Linus Torvalds
0 siblings, 2 replies; 52+ messages in thread
From: Randal L. Schwartz @ 2006-12-18 22:26 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Nicolas Pitre, git
>>>>> "Linus" == Linus Torvalds <torvalds@osdl.org> writes:
>> So that *was* a local delay.
Linus> Ok, interesting. Two questions:
Linus> - what does "top" say (is it CPU-bound? Is it perhaps blowing out your
Linus> disk cache? Is it swapping?)
Not swapping, but CPU bound.
Linus> - do you have "oprofile" (or even just pgprof) to see where the *hell*
Linus> that time is spent, if it's actually CPU?
I'm a "bear of very little brane" regarding code development on OSX. I
can ask around to see if there's someway to profile this.
Linus> You're running this under OS X, aren't you? It's a pig of an OS, but
Linus> "almost one hour" vs "25 seconds" is still unreasonable.
I agree!
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-18 22:21 ` Randal L. Schwartz
@ 2006-12-18 22:50 ` Nicolas Pitre
0 siblings, 0 replies; 52+ messages in thread
From: Nicolas Pitre @ 2006-12-18 22:50 UTC (permalink / raw)
To: Randal L. Schwartz; +Cc: Linus Torvalds, git
On Mon, 18 Dec 2006, Randal L. Schwartz wrote:
> Nicolas> What CPU and amount of ram do you have?
>
> 2.2 Ghz Intel Core 2 Duo (Macbook Pro high end)
> I can compile and install GNU Emacs from source in 11 minutes. :)
So it shouldn't be a lack of resource.
> Nicolas> Are you on Windows?
>
> Gawd no!
;-)
I asked because it could have had something with the mmap() usage
recently reported to be dreadfully slow on Windows
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-18 22:26 ` Randal L. Schwartz
@ 2006-12-18 23:02 ` Martin Langhoff
2006-12-22 1:44 ` Kyle Moffett
2007-01-03 13:55 ` Andreas Ericsson
2006-12-18 23:28 ` Linus Torvalds
1 sibling, 2 replies; 52+ messages in thread
From: Martin Langhoff @ 2006-12-18 23:02 UTC (permalink / raw)
To: Randal L. Schwartz; +Cc: Linus Torvalds, Nicolas Pitre, git
On 18 Dec 2006 14:26:36 -0800, Randal L. Schwartz <merlyn@stonehenge.com> wrote:
> Linus> You're running this under OS X, aren't you? It's a pig of an OS, but
> Linus> "almost one hour" vs "25 seconds" is still unreasonable.
>
> I agree!
Me too -- but entirely possible. Disk IO is specially painful on OSX.
Stat calls are horrid. Using Arch (which abused stat calls to no end)
many ops would take 50x-100x longer on OSX than on Linux. A large
unpacked repo with git is a real pain -- and packing it can take
hours.
Once you are packed it's sweet, but large repos are a pain to deal
with. You won't impress anyone with performance over a linux kernel
repo -- starting up gitk can take a long time. Stat-heavy stuff like
git-diff is noticeably slower under OSX.
Have you got a linux partition you can boot into to get comparative timings?
[This is part of the reason I am migrating my OSX machine to Linux
fulltime, now that it seems that mergedfb+randr will let me switch to
dual monitors "hot".]
cheers,
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-18 22:26 ` Randal L. Schwartz
2006-12-18 23:02 ` Martin Langhoff
@ 2006-12-18 23:28 ` Linus Torvalds
2006-12-19 0:13 ` Nicolas Pitre
1 sibling, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2006-12-18 23:28 UTC (permalink / raw)
To: Randal L. Schwartz; +Cc: Nicolas Pitre, git
On Mon, 18 Dec 2006, Randal L. Schwartz wrote:
>
> Linus> You're running this under OS X, aren't you? It's a pig of an OS, but
> Linus> "almost one hour" vs "25 seconds" is still unreasonable.
>
> I agree!
The only thing that comes to mind is
- some actual difference in how the pack looks. Although I don't really
see that. I re-pack both my own private kernel repos and the kernel.org
one fairly often, and usually with a pretty recent git. I don't think
any of the packing heuristics have changed in a long time. And I just
checked: the kernel.org pack-file (the big one) unpacks quickly too.
- SHA1 library differences. Maybe OpenSSL on OS X sucks? I still don't
see the difference being _that_ big, but whatever..
- OS X "mmap()" performance really really sucks.
We do end up doing a _log_ of small mmaps in the "Resolving .." stage.
On Linux, the strace looks like
write(2, "Resolving 316071 deltas.\n", 25) = 25
mmap(NULL, 2619, PROT_READ, MAP_PRIVATE, 3, 0x4c000) = 0x2aaf13be2000
munmap(0x2aaf13be2000, 2619) = 0
mmap(NULL, 2848, PROT_READ, MAP_PRIVATE, 3, 0x4c000) = 0x2aaf13be2000
munmap(0x2aaf13be2000, 2848) = 0
mmap(NULL, 2988, PROT_READ, MAP_PRIVATE, 3, 0x4c000) = 0x2aaf13be2000
munmap(0x2aaf13be2000, 2988) = 0
write(2, " 0% (2/316071) done\r", 22) = 22
mmap(NULL, 3336, PROT_READ, MAP_PRIVATE, 3, 0x4c000) = 0x2aaf13be2000
munmap(0x2aaf13be2000, 3336) = 0
mmap(NULL, 3471, PROT_READ, MAP_PRIVATE, 3, 0x4c000) = 0x2aaf13be2000
munmap(0x2aaf13be2000, 3471) = 0
mmap(NULL, 3611, PROT_READ, MAP_PRIVATE, 3, 0x4c000) = 0x2aaf13be2000
munmap(0x2aaf13be2000, 3611) = 0
...
and it's entirely possible that mmap() really does suck *ss on OS X.
Those small mmap's go on for a _loong_ time.
I've heard of other operations being slow on OS X - and two orders of
magnitude really isn't unthinkable. I don't think people always seem to
really understand how _good_ Linux is, and how much faster it can be. It's
not just "Windows XP" sucks. Quite often it's literally "Linux is just
damn fast".
Sadly, that causes problems when the main developers don't even see any
issues, just because the Linux kernel environment makes things look really
really cheap. Even when it isn't always cheap on other platforms.
Nico - have you looked at perhaps making the index-pack.c "mmap()" usage
do chunking? Or just mmap the whole damn thing once? Linux is fast, but
even Linux will be faster if you just mmap it once ;)
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-18 23:28 ` Linus Torvalds
@ 2006-12-19 0:13 ` Nicolas Pitre
2006-12-19 5:11 ` Theodore Tso
0 siblings, 1 reply; 52+ messages in thread
From: Nicolas Pitre @ 2006-12-19 0:13 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Randal L. Schwartz, git
On Mon, 18 Dec 2006, Linus Torvalds wrote:
> I've heard of other operations being slow on OS X - and two orders of
> magnitude really isn't unthinkable. I don't think people always seem to
> really understand how _good_ Linux is, and how much faster it can be. It's
> not just "Windows XP" sucks. Quite often it's literally "Linux is just
> damn fast".
>
> Sadly, that causes problems when the main developers don't even see any
> issues, just because the Linux kernel environment makes things look really
> really cheap. Even when it isn't always cheap on other platforms.
>
> Nico - have you looked at perhaps making the index-pack.c "mmap()" usage
> do chunking? Or just mmap the whole damn thing once? Linux is fast, but
> even Linux will be faster if you just mmap it once ;)
Maybe. However the mmap() may occur on section of the pack file which
has just been written to in order to write even more, always to the same
file. On Linux this is fast because the mmap'd data is likely to still
be in the cache.
I guess this could be turned into a malloc()/read()/free() with no
trouble.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 0:13 ` Nicolas Pitre
@ 2006-12-19 5:11 ` Theodore Tso
2006-12-19 6:39 ` Shawn Pearce
2006-12-19 6:47 ` Linus Torvalds
0 siblings, 2 replies; 52+ messages in thread
From: Theodore Tso @ 2006-12-19 5:11 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Linus Torvalds, Randal L. Schwartz, git
On Mon, Dec 18, 2006 at 07:13:40PM -0500, Nicolas Pitre wrote:
> Maybe. However the mmap() may occur on section of the pack file which
> has just been written to in order to write even more, always to the same
> file. On Linux this is fast because the mmap'd data is likely to still
> be in the cache.
>
> I guess this could be turned into a malloc()/read()/free() with no
> trouble.
Actually, depending on the size of the chunk, even on Linux
malloc/read/free can be faster than the mmap/munmap, because
mmap/munmap calls involve page table manipulations, and even on Linux
that is often slower or dead even with the memory copy involved with
using malloc/read. Even when reading huge chunks of Canon Raw File
data at a time, I found (experimentally) that it was no faster to use
mmap() compared to read(). And for small chunks of data, malloc/read
will definitely win out over mmap(), since the page table operations
and resulting page faults completely trump the cost of copying the
bytes from the page cache to the read() buffer.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 5:11 ` Theodore Tso
@ 2006-12-19 6:39 ` Shawn Pearce
2006-12-19 6:51 ` Linus Torvalds
2006-12-19 16:19 ` Theodore Tso
2006-12-19 6:47 ` Linus Torvalds
1 sibling, 2 replies; 52+ messages in thread
From: Shawn Pearce @ 2006-12-19 6:39 UTC (permalink / raw)
To: Theodore Tso; +Cc: Nicolas Pitre, Linus Torvalds, Randal L. Schwartz, git
Theodore Tso <tytso@mit.edu> wrote:
> On Mon, Dec 18, 2006 at 07:13:40PM -0500, Nicolas Pitre wrote:
> > Maybe. However the mmap() may occur on section of the pack file which
> > has just been written to in order to write even more, always to the same
> > file. On Linux this is fast because the mmap'd data is likely to still
> > be in the cache.
> >
> > I guess this could be turned into a malloc()/read()/free() with no
> > trouble.
>
> Actually, depending on the size of the chunk, even on Linux
> malloc/read/free can be faster than the mmap/munmap, because
> mmap/munmap calls involve page table manipulations, and even on Linux
> that is often slower or dead even with the memory copy involved with
> using malloc/read. Even when reading huge chunks of Canon Raw File
> data at a time, I found (experimentally) that it was no faster to use
> mmap() compared to read(). And for small chunks of data, malloc/read
> will definitely win out over mmap(), since the page table operations
> and resulting page faults completely trump the cost of copying the
> bytes from the page cache to the read() buffer.
This is why git-fast-import mmaps 128 MiB blocks from the file at
a time. The mmap region is usually much larger than the file itself;
the application appends to the file via write() then goes back
and rereads data when necessary via the already established mmap.
Its rare for the application to need to unmap/remap a different block
so there really isn't very much page table manipulation overhead.
Why isn't git-index-pack doing the same? Is there some hidden glitch
in some OS somewhere that has a problem with overmapping a file and
appending into it via write()? I've done that on Mac OS X, Linux,
BSDi, Solaris... never had a problem.
--
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 5:11 ` Theodore Tso
2006-12-19 6:39 ` Shawn Pearce
@ 2006-12-19 6:47 ` Linus Torvalds
2006-12-19 8:32 ` Johannes Schindelin
1 sibling, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2006-12-19 6:47 UTC (permalink / raw)
To: Theodore Tso; +Cc: Nicolas Pitre, Randal L. Schwartz, git
On Tue, 19 Dec 2006, Theodore Tso wrote:
>
> Actually, depending on the size of the chunk, even on Linux
> malloc/read/free can be faster than the mmap/munmap
Yes.
In general, mmap/munmap is faster only if:
- you access the same data multiple times within one page (ie a single
page-fault will actually result in more than one access)
OR
- you can use it to avoid management overhead (ie you know your data is
going to accessed very sparsely, but you don't know the patterns, and
trying to keep track of it is painful as hell)
That said, under Linux, mmap is almost never really _slower_ either, which
is why this issue never made any real difference. The overhead of doing
page table manipulation is pretty much balanced out by the overhead of
doing a memcpy.
But that "mmap is fast" is _not_ true on many other operating systems,
which is why it might be worthwhile to try something like the appended on
OS X, which uses pread() instead of mmap().
This is _not_ very much tested. It seems to work. Caveat emptor. It would
be interesting to hear if many small "pread()" calls are faster than many
mmap/munmap calls on OS X. I bet they are. Under Linux, there should be
almost no difference.
Linus
---
diff --git a/index-pack.c b/index-pack.c
index 6d6c92b..094f8b2 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -8,6 +8,7 @@
#include "tree.h"
#include <sys/time.h>
#include <signal.h>
+#include <unistd.h>
static const char index_pack_usage[] =
"git-index-pack [-v] [-o <index-file>] [{ ---keep | --keep=<msg> }] { <pack-file> | --stdin [--fix-thin] [<pack-file>] }";
@@ -279,27 +280,25 @@ static void *get_data_from_pack(struct object_entry *obj)
{
unsigned long from = obj[0].offset + obj[0].hdr_size;
unsigned long len = obj[1].offset - from;
- unsigned pg_offset = from % getpagesize();
- unsigned char *map, *data;
+ unsigned char *src, *data;
z_stream stream;
int st;
- map = mmap(NULL, len + pg_offset, PROT_READ, MAP_PRIVATE,
- mmap_fd, from - pg_offset);
- if (map == MAP_FAILED)
- die("cannot mmap pack file: %s", strerror(errno));
+ src = xmalloc(len);
+ if (pread(mmap_fd, src, len, from) != len)
+ die("cannot pread pack file: %s", strerror(errno));
data = xmalloc(obj->size);
memset(&stream, 0, sizeof(stream));
stream.next_out = data;
stream.avail_out = obj->size;
- stream.next_in = map + pg_offset;
+ stream.next_in = src;
stream.avail_in = len;
inflateInit(&stream);
while ((st = inflate(&stream, Z_FINISH)) == Z_OK);
inflateEnd(&stream);
if (st != Z_STREAM_END || stream.total_out != obj->size)
die("serious inflate inconsistency");
- munmap(map, len + pg_offset);
+ free(src);
return data;
}
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 6:39 ` Shawn Pearce
@ 2006-12-19 6:51 ` Linus Torvalds
2006-12-19 7:26 ` Shawn Pearce
2006-12-19 8:32 ` Shawn Pearce
2006-12-19 16:19 ` Theodore Tso
1 sibling, 2 replies; 52+ messages in thread
From: Linus Torvalds @ 2006-12-19 6:51 UTC (permalink / raw)
To: Shawn Pearce; +Cc: Theodore Tso, Nicolas Pitre, Randal L. Schwartz, git
On Tue, 19 Dec 2006, Shawn Pearce wrote:
>
> Why isn't git-index-pack doing the same? Is there some hidden glitch
> in some OS somewhere that has a problem with overmapping a file and
> appending into it via write()? I've done that on Mac OS X, Linux,
> BSDi, Solaris... never had a problem.
It works on modern systems, but at least old HPUX versions had
non-coherent mmap() and write(), and POSIX does not guarantee it. And if
you ever want to port to Windows, I don't think you should do it.
Anyway, try the pread() version first, see if that fixes the OS X problem.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 6:51 ` Linus Torvalds
@ 2006-12-19 7:26 ` Shawn Pearce
2006-12-19 7:52 ` Marco Roeland
2006-12-19 8:32 ` Shawn Pearce
1 sibling, 1 reply; 52+ messages in thread
From: Shawn Pearce @ 2006-12-19 7:26 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Theodore Tso, Nicolas Pitre, Randal L. Schwartz, git
Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Tue, 19 Dec 2006, Shawn Pearce wrote:
> >
> > Why isn't git-index-pack doing the same? Is there some hidden glitch
> > in some OS somewhere that has a problem with overmapping a file and
> > appending into it via write()? I've done that on Mac OS X, Linux,
> > BSDi, Solaris... never had a problem.
>
> It works on modern systems, but at least old HPUX versions had
> non-coherent mmap() and write(), and POSIX does not guarantee it. And if
> you ever want to port to Windows, I don't think you should do it.
>
> Anyway, try the pread() version first, see if that fixes the OS X problem.
I'll give your pread() version a shot.
But right now I'm in the middle of cloning your linux-2.6 git
repository.
It is done downloading the pack and my system is pegged at 100%
CPU while resolving deltas.
ActivityMonitor is showing that I'm spending 94% CPU in the kernel,
which is just insane. Clearly Mac OS X's kernel cannot gracefully
handle what git-index-pack is currently doing.
--
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 7:26 ` Shawn Pearce
@ 2006-12-19 7:52 ` Marco Roeland
2006-12-19 7:58 ` Shawn Pearce
0 siblings, 1 reply; 52+ messages in thread
From: Marco Roeland @ 2006-12-19 7:52 UTC (permalink / raw)
To: Shawn Pearce
Cc: Linus Torvalds, Theodore Tso, Nicolas Pitre, Randal L. Schwartz,
git
On Tuesday December 19th 2006 at 02:26 Shawn Pearce wrote:
> [git-index-pack limping along on Mac OS X]
>
> ActivityMonitor is showing that I'm spending 94% CPU in the kernel,
> which is just insane. Clearly Mac OS X's kernel cannot gracefully
> handle what git-index-pack is currently doing.
A bit off topic but it might help people diagnose and test different
strategies.
The equivalent of "oprofile" on Mac OS X is a tool called "shark" from
Apple itself. It's very nice actually.
It's in the CHUD (Computer Hardware Understanding
Developer Tools) package. More information at
http://developer.apple.com/tools/performance and it is free as in beer.
;-)
Also the OpenSSL version on Mac OS X is rather old and compiled as
32-bit application. OpenSSL is one of the few userspace packages that
_really_ benefits tremendously from being compiled as 64-bit. It might
explain a bit of the enormous performance difference in this case. But
only profiling (perhaps with the help of "shark") will tell of course...
The kernel git repository isn't very handy on the Mac HFS+ filesystem,
due to it being case-insensitive, but I suppose it won't influence
git-index-pack.
--
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 7:52 ` Marco Roeland
@ 2006-12-19 7:58 ` Shawn Pearce
0 siblings, 0 replies; 52+ messages in thread
From: Shawn Pearce @ 2006-12-19 7:58 UTC (permalink / raw)
To: Marco Roeland
Cc: Linus Torvalds, Theodore Tso, Nicolas Pitre, Randal L. Schwartz,
git
Marco Roeland <marco.roeland@xs4all.nl> wrote:
> The kernel git repository isn't very handy on the Mac HFS+ filesystem,
> due to it being case-insensitive, but I suppose it won't influence
> git-index-pack.
Yea, I just play with it in packed format on this system. :-)
--
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 6:47 ` Linus Torvalds
@ 2006-12-19 8:32 ` Johannes Schindelin
2006-12-19 9:10 ` Junio C Hamano
0 siblings, 1 reply; 52+ messages in thread
From: Johannes Schindelin @ 2006-12-19 8:32 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Theodore Tso, Nicolas Pitre, Randal L. Schwartz, git
Hi,
in a very unscientific test, without your patch local cloning of the
LilyPond repo takes 1m33s (user), and with your patch (pread() instead of
mmap()) it takes 1m13s (user). The real times are somewhat bogus, but
still in favour of pread(), but only by 8 seconds instead of 20.
This is on Linux 2.4.32.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 6:51 ` Linus Torvalds
2006-12-19 7:26 ` Shawn Pearce
@ 2006-12-19 8:32 ` Shawn Pearce
2006-12-19 8:40 ` Marco Roeland
1 sibling, 1 reply; 52+ messages in thread
From: Shawn Pearce @ 2006-12-19 8:32 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Theodore Tso, Nicolas Pitre, Randal L. Schwartz, git
Linus Torvalds <torvalds@osdl.org> wrote:
> On Tue, 19 Dec 2006, Shawn Pearce wrote:
> >
> > Why isn't git-index-pack doing the same? Is there some hidden glitch
> > in some OS somewhere that has a problem with overmapping a file and
> > appending into it via write()? I've done that on Mac OS X, Linux,
> > BSDi, Solaris... never had a problem.
>
> It works on modern systems, but at least old HPUX versions had
> non-coherent mmap() and write(), and POSIX does not guarantee it. And if
> you ever want to port to Windows, I don't think you should do it.
>
> Anyway, try the pread() version first, see if that fixes the OS X problem.
It does. Without pread() (aka stock 'next') it takes me over an
hour to index a pack of linux-2.6. With pread() its 1m6s to run
index-pack on the same pack file. The indexes are (of course)
identically produced.
--
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 8:32 ` Shawn Pearce
@ 2006-12-19 8:40 ` Marco Roeland
2006-12-19 8:49 ` Shawn Pearce
0 siblings, 1 reply; 52+ messages in thread
From: Marco Roeland @ 2006-12-19 8:40 UTC (permalink / raw)
To: Shawn Pearce
Cc: Linus Torvalds, Theodore Tso, Nicolas Pitre, Randal L. Schwartz,
git
On Tuesday December 19th 2006 at 03:32 Shawn Pearce wrote:
> > Anyway, try the pread() version first, see if that fixes the OS X problem.
>
> It does. Without pread() (aka stock 'next') it takes me over an
> hour to index a pack of linux-2.6. With pread() its 1m6s to run
> index-pack on the same pack file. The indexes are (of course)
> identically produced.
I see the same here. From an (estimated) time of 37 minutes down to 52
seconds with the pread() patch. Running the profiler (shark) on the old
version showed that 85% of the time was spent in the Mac OS X mmap()
system call.
--
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 8:40 ` Marco Roeland
@ 2006-12-19 8:49 ` Shawn Pearce
2006-12-19 9:13 ` Marco Roeland
0 siblings, 1 reply; 52+ messages in thread
From: Shawn Pearce @ 2006-12-19 8:49 UTC (permalink / raw)
To: Marco Roeland
Cc: Linus Torvalds, Theodore Tso, Nicolas Pitre, Johannes Schindelin,
Randal L. Schwartz, git
Marco Roeland <marco.roeland@xs4all.nl> wrote:
> On Tuesday December 19th 2006 at 03:32 Shawn Pearce wrote:
>
> > > Anyway, try the pread() version first, see if that fixes the OS X problem.
> >
> > It does. Without pread() (aka stock 'next') it takes me over an
> > hour to index a pack of linux-2.6. With pread() its 1m6s to run
> > index-pack on the same pack file. The indexes are (of course)
> > identically produced.
>
> I see the same here. From an (estimated) time of 37 minutes down to 52
> seconds with the pread() patch. Running the profiler (shark) on the old
> version showed that 85% of the time was spent in the Mac OS X mmap()
> system call.
More testing on Linux is probably needed, but if using pread()
on Linux is breakeven or slightly faster (as suggested by Johannes'
LilyPond test) this 60x performance improvement on initial clone
of largish projects on Mac OS X would be nice to have.
Not that I personally frequently clone large projects on Mac OS X.
But new users to Git might. :-)
--
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 8:32 ` Johannes Schindelin
@ 2006-12-19 9:10 ` Junio C Hamano
2006-12-19 9:47 ` Jeff King
` (2 more replies)
0 siblings, 3 replies; 52+ messages in thread
From: Junio C Hamano @ 2006-12-19 9:10 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Theodore Tso, Nicolas Pitre, Randal L. Schwartz, git
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> Hi,
>
> in a very unscientific test, without your patch local cloning of the
> LilyPond repo takes 1m33s (user), and with your patch (pread() instead of
> mmap()) it takes 1m13s (user). The real times are somewhat bogus, but
> still in favour of pread(), but only by 8 seconds instead of 20.
>
> This is on Linux 2.4.32.
Interesting. Anybody have numbers from 2.6?
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 8:49 ` Shawn Pearce
@ 2006-12-19 9:13 ` Marco Roeland
2006-12-19 20:28 ` Alex Riesen
0 siblings, 1 reply; 52+ messages in thread
From: Marco Roeland @ 2006-12-19 9:13 UTC (permalink / raw)
To: Shawn Pearce
Cc: Marco Roeland, Linus Torvalds, Theodore Tso, Nicolas Pitre,
Johannes Schindelin, Randal L. Schwartz, git
On Tuesday December 19th 2006 at 03:49 Shawn Pearce wrote:
> [pread patch to speed up git-index-pack on Mac OS X]
>
> More testing on Linux is probably needed, but if using pread()
> on Linux is breakeven or slightly faster (as suggested by Johannes'
> LilyPond test) this 60x performance improvement on initial clone
> of largish projects on Mac OS X would be nice to have.
I see a decrease in total time (so an improvement in performance)
going from 37.3 to 35.2 seconds with the pread patch on Linux x86-64.
Note that both my testing on Linux and Mac OS X was done on dual core
processors (Athlon 3800+ XP on Linux and Intel Core 2 Duo 2.16GHz on an
iMac). Git is only single threaded and thus uses only core, but the
system can use the other core.
The minor page faults also decreased from 734866 to 370690. Nice.
> Not that I personally frequently clone large projects on Mac OS X.
> But new users to Git might. :-)
And perhaps the Cygwin version might benefit too.
--
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 9:10 ` Junio C Hamano
@ 2006-12-19 9:47 ` Jeff King
2006-12-19 10:24 ` Andy Whitcroft
2006-12-19 15:53 ` [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux Nicolas Pitre
2 siblings, 0 replies; 52+ messages in thread
From: Jeff King @ 2006-12-19 9:47 UTC (permalink / raw)
To: Junio C Hamano
Cc: Johannes Schindelin, Theodore Tso, Nicolas Pitre,
Randal L. Schwartz, git
On Tue, Dec 19, 2006 at 01:10:59AM -0800, Junio C Hamano wrote:
> > in a very unscientific test, without your patch local cloning of the
> > LilyPond repo takes 1m33s (user), and with your patch (pread() instead of
> > mmap()) it takes 1m13s (user). The real times are somewhat bogus, but
> > still in favour of pread(), but only by 8 seconds instead of 20.
> >
> > This is on Linux 2.4.32.
>
> Interesting. Anybody have numbers from 2.6?
Similar results to those posted by Marco Roeland (numbers below are
index-pack of linux-2.6 on 2.6.18 i386 smp):
mmap
46.78user 4.18system 0:51.08elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+725627minor)pagefaults 0swaps
pread
45.70user 3.12system 0:48.96elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+363794minor)pagefaults 0swaps
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 9:10 ` Junio C Hamano
2006-12-19 9:47 ` Jeff King
@ 2006-12-19 10:24 ` Andy Whitcroft
2006-12-19 15:53 ` [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux Nicolas Pitre
2 siblings, 0 replies; 52+ messages in thread
From: Andy Whitcroft @ 2006-12-19 10:24 UTC (permalink / raw)
To: Junio C Hamano
Cc: Johannes Schindelin, Theodore Tso, Nicolas Pitre,
Randal L. Schwartz, git
Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>> Hi,
>>
>> in a very unscientific test, without your patch local cloning of the
>> LilyPond repo takes 1m33s (user), and with your patch (pread() instead of
>> mmap()) it takes 1m13s (user). The real times are somewhat bogus, but
>> still in favour of pread(), but only by 8 seconds instead of 20.
>>
>> This is on Linux 2.4.32.
>
> Interesting. Anybody have numbers from 2.6?
On my debian etch system:
Linux version 2.6.17-2-686 (Debian 2.6.17-9) (waldi@debian.org) (gcc
version 4.1.2 20060901 (prerelease) (Debian 4.1.1-13)) #1 SMP Wed Sep 13
16:34:10 UTC 2006
I did one run first (not included) to get things nice and warm, then
three runs of each. Overall the same as reported elsewhere marginally
better with pread(). I guess you could say a 20-30% improvement in
system time which isn't to be sniffed at.
-apw
mmap():
real 1m5.187s
user 1m0.844s
sys 0m2.900s
real 1m6.748s
user 1m0.868s
sys 0m3.064s
real 1m5.604s
user 1m0.760s
sys 0m3.124s
pread():
real 1m4.676s
user 1m0.168s
sys 0m2.340s
real 1m3.563s
user 0m59.796s
sys 0m2.248s
real 1m4.066s
user 1m0.156s
sys 0m2.304s
^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux
2006-12-19 9:10 ` Junio C Hamano
2006-12-19 9:47 ` Jeff King
2006-12-19 10:24 ` Andy Whitcroft
@ 2006-12-19 15:53 ` Nicolas Pitre
2006-12-19 19:00 ` Junio C Hamano
2 siblings, 1 reply; 52+ messages in thread
From: Nicolas Pitre @ 2006-12-19 15:53 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Linus Torvalds, Randal L. Schwartz, git
It was reported by Randal L. Schwartz <merlyn@stonehenge.com> that
indexing the Linux repository ~150MB pack takes about an hour on OS x
while it's a minute on Linux. It seems that the OS X mmap()
implementation is more than 2 orders of magnitude slower than the Linux
one.
Linus proposed a patch replacing mmap() with pread() bringing index-pack
performance on OS X in line with the Linux one. The performances on
Linux also improved by a small margin.
Signed-off-by: Nicolas Pitre <nico@cam.org>
---
OK looks like this has been sorted out while I was away. Good!
On Tue, 19 Dec 2006, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Hi,
> >
> > in a very unscientific test, without your patch local cloning of the
> > LilyPond repo takes 1m33s (user), and with your patch (pread() instead of
> > mmap()) it takes 1m13s (user). The real times are somewhat bogus, but
> > still in favour of pread(), but only by 8 seconds instead of 20.
> >
> > This is on Linux 2.4.32.
>
> Interesting. Anybody have numbers from 2.6?
My 37 seconds of yesterday dropped to 32.
This is Linus's patch plus a few cosmetic changes.
diff --git a/index-pack.c b/index-pack.c
index 6d6c92b..e08a687 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -1,3 +1,8 @@
+#define _XOPEN_SOURCE 500
+#include <unistd.h>
+#include <sys/time.h>
+#include <signal.h>
+
#include "cache.h"
#include "delta.h"
#include "pack.h"
@@ -6,8 +11,6 @@
#include "commit.h"
#include "tag.h"
#include "tree.h"
-#include <sys/time.h>
-#include <signal.h>
static const char index_pack_usage[] =
"git-index-pack [-v] [-o <index-file>] [{ ---keep | --keep=<msg> }] { <pack-file> | --stdin [--fix-thin] [<pack-file>] }";
@@ -87,7 +90,7 @@ static unsigned display_progress(unsigned n, unsigned total, unsigned last_pc)
static unsigned char input_buffer[4096];
static unsigned long input_offset, input_len, consumed_bytes;
static SHA_CTX input_ctx;
-static int input_fd, output_fd, mmap_fd;
+static int input_fd, output_fd, pack_fd;
/* Discard current buffer used content. */
static void flush(void)
@@ -148,14 +151,14 @@ static const char *open_pack_file(const char *pack_name)
output_fd = open(pack_name, O_CREAT|O_EXCL|O_RDWR, 0600);
if (output_fd < 0)
die("unable to create %s: %s\n", pack_name, strerror(errno));
- mmap_fd = output_fd;
+ pack_fd = output_fd;
} else {
input_fd = open(pack_name, O_RDONLY);
if (input_fd < 0)
die("cannot open packfile '%s': %s",
pack_name, strerror(errno));
output_fd = -1;
- mmap_fd = input_fd;
+ pack_fd = input_fd;
}
SHA1_Init(&input_ctx);
return pack_name;
@@ -279,27 +282,25 @@ static void *get_data_from_pack(struct object_entry *obj)
{
unsigned long from = obj[0].offset + obj[0].hdr_size;
unsigned long len = obj[1].offset - from;
- unsigned pg_offset = from % getpagesize();
- unsigned char *map, *data;
+ unsigned char *src, *data;
z_stream stream;
int st;
- map = mmap(NULL, len + pg_offset, PROT_READ, MAP_PRIVATE,
- mmap_fd, from - pg_offset);
- if (map == MAP_FAILED)
- die("cannot mmap pack file: %s", strerror(errno));
+ src = xmalloc(len);
+ if (pread(pack_fd, src, len, from) != len)
+ die("cannot pread pack file: %s", strerror(errno));
data = xmalloc(obj->size);
memset(&stream, 0, sizeof(stream));
stream.next_out = data;
stream.avail_out = obj->size;
- stream.next_in = map + pg_offset;
+ stream.next_in = src;
stream.avail_in = len;
inflateInit(&stream);
while ((st = inflate(&stream, Z_FINISH)) == Z_OK);
inflateEnd(&stream);
if (st != Z_STREAM_END || stream.total_out != obj->size)
die("serious inflate inconsistency");
- munmap(map, len + pg_offset);
+ free(src);
return data;
}
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 6:39 ` Shawn Pearce
2006-12-19 6:51 ` Linus Torvalds
@ 2006-12-19 16:19 ` Theodore Tso
2006-12-19 16:57 ` Linus Torvalds
2006-12-20 1:58 ` Shawn Pearce
1 sibling, 2 replies; 52+ messages in thread
From: Theodore Tso @ 2006-12-19 16:19 UTC (permalink / raw)
To: Shawn Pearce; +Cc: Nicolas Pitre, Linus Torvalds, Randal L. Schwartz, git
On Tue, Dec 19, 2006 at 01:39:30AM -0500, Shawn Pearce wrote:
> This is why git-fast-import mmaps 128 MiB blocks from the file at
> a time. The mmap region is usually much larger than the file itself;
> the application appends to the file via write() then goes back
> and rereads data when necessary via the already established mmap.
> Its rare for the application to need to unmap/remap a different block
> so there really isn't very much page table manipulation overhead.
Yes, but unless you are using the (non-portable, Linux specific)
MAP_POPULATE flag to mmap, each time you touch a new page, you end up
taking a page fault; and so malloc/read/free might *still* be faster.
I'd encourage you to make the change and benchmark it; the results may
be surprising. I played with this with dcraw, the Canon Raw File
converter a while back (before MAP_POPULATE was added), where I found
that with a linear access pattern, if you are reading the entire file,
it's stil marginally faster to use read() over mmap(), because with
dcraw taking a page fault every 4k of raw file, the system time was
significantly higher.
So the main reason to use mamp, as Linus puts it, is if the management
overhead of needing to read lots of small bits of the file makes the
use of malloc/read to be a pain in the *ss, then go for it. But don't
assume that you'll get better performance; in my experience, even on
the hyper-performant Linus kernel, mmap() in general only barely
breaks even with read(). On other systems, things are probably going
to be even worse.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 16:19 ` Theodore Tso
@ 2006-12-19 16:57 ` Linus Torvalds
2006-12-20 1:54 ` Shawn Pearce
2006-12-20 1:58 ` Shawn Pearce
1 sibling, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2006-12-19 16:57 UTC (permalink / raw)
To: Theodore Tso; +Cc: Shawn Pearce, Nicolas Pitre, Randal L. Schwartz, git
On Tue, 19 Dec 2006, Theodore Tso wrote:
>
> So the main reason to use mamp, as Linus puts it, is if the management
> overhead of needing to read lots of small bits of the file makes the
> use of malloc/read to be a pain in the *ss, then go for it.
An example of this in git is the regular pack-file accesses. We're MUCH
better off just mmap'ing the whole pack-file (or at least big chunks of
it) and not having to maintain difficult structures of "this is where I
read that part of the file into memory", or read _big_ chunks when
quite often we just use a few kB of it.
So mmap for pack-files does make sense, but probably only when you can
mmap big chunks, and are going to access much smaller (random) parts of
it.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux
2006-12-19 15:53 ` [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux Nicolas Pitre
@ 2006-12-19 19:00 ` Junio C Hamano
2006-12-19 19:14 ` Nicolas Pitre
0 siblings, 1 reply; 52+ messages in thread
From: Junio C Hamano @ 2006-12-19 19:00 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Linus Torvalds, Randal L. Schwartz, git
Nicolas Pitre <nico@cam.org> writes:
> OK looks like this has been sorted out while I was away. Good!
>
> This is Linus's patch plus a few cosmetic changes.
Not a complaint but rather a request for free education ;-).
> diff --git a/index-pack.c b/index-pack.c
> index 6d6c92b..e08a687 100644
> --- a/index-pack.c
> +++ b/index-pack.c
> @@ -1,3 +1,8 @@
> +#define _XOPEN_SOURCE 500
> +#include <unistd.h>
> +#include <sys/time.h>
> +#include <signal.h>
> +
> #include "cache.h"
> #include "delta.h"
> #include "pack.h"
> @@ -6,8 +11,6 @@
> #include "commit.h"
> #include "tag.h"
> #include "tree.h"
> -#include <sys/time.h>
> -#include <signal.h>
Most of the rest of the sources seem to do our includes first
and source-file specific system includes at the end. What's the
rationale for this change?
Do we need _XOPEN_SOURCE=500 because pread() is XSI?
Also nobody other than convert-objects.c has _XOPEN_SOURCE level
specified. If _XOPEN_SOURCE matters I wonder if we should do so
in some central place to make it consistent across source files?
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux
2006-12-19 19:00 ` Junio C Hamano
@ 2006-12-19 19:14 ` Nicolas Pitre
2006-12-19 19:55 ` Linus Torvalds
0 siblings, 1 reply; 52+ messages in thread
From: Nicolas Pitre @ 2006-12-19 19:14 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Linus Torvalds, Randal L. Schwartz, git
On Tue, 19 Dec 2006, Junio C Hamano wrote:
> > diff --git a/index-pack.c b/index-pack.c
> > index 6d6c92b..e08a687 100644
> > --- a/index-pack.c
> > +++ b/index-pack.c
> > @@ -1,3 +1,8 @@
> > +#define _XOPEN_SOURCE 500
> > +#include <unistd.h>
> > +#include <sys/time.h>
> > +#include <signal.h>
> > +
> > #include "cache.h"
> > #include "delta.h"
> > #include "pack.h"
> > @@ -6,8 +11,6 @@
> > #include "commit.h"
> > #include "tag.h"
> > #include "tree.h"
> > -#include <sys/time.h>
> > -#include <signal.h>
>
> Most of the rest of the sources seem to do our includes first
> and source-file specific system includes at the end. What's the
> rationale for this change?
Because _XOPEN_SOURCE must be defined before including unistd.h
otherwise pread is not declared and a warning is issued.
> Do we need _XOPEN_SOURCE=500 because pread() is XSI?
The pread man page says Unix98.
> Also nobody other than convert-objects.c has _XOPEN_SOURCE level
> specified. If _XOPEN_SOURCE matters I wonder if we should do so
> in some central place to make it consistent across source files?
Your call I guess.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux
2006-12-19 19:14 ` Nicolas Pitre
@ 2006-12-19 19:55 ` Linus Torvalds
2006-12-19 19:57 ` Randal L. Schwartz
` (2 more replies)
0 siblings, 3 replies; 52+ messages in thread
From: Linus Torvalds @ 2006-12-19 19:55 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Junio C Hamano, Randal L. Schwartz, git
On Tue, 19 Dec 2006, Nicolas Pitre wrote:
>
> Because _XOPEN_SOURCE must be defined before including unistd.h
> otherwise pread is not declared and a warning is issued.
May I actually suggest we handle _all_ of these issues in one central
place, namely "git-compat-util.h"
It's nice to have just one single file that tries to hide the details of
all the differences between systems.
Sure, that file ends up having to include a lot of standard header files
that some of the .c files don't actually _need_, but git compiles
reasonably quickly, so I don't think we need to try to optimize compile
speed much.
It's the C++ people who tend to have sucky compile times.
So how about something like the appended? And then just have the rule that
we try to include "cache.h" early, because that brings in ALL the really
basic system header files?
Linus
---
diff --git a/convert-objects.c b/convert-objects.c
index 8812583..a630132 100644
--- a/convert-objects.c
+++ b/convert-objects.c
@@ -1,7 +1,3 @@
-#define _XOPEN_SOURCE 600 /* glibc2 and AIX 5.3L need 500, OpenBSD needs 600 for S_ISLNK() */
-#define _XOPEN_SOURCE_EXTENDED 1 /* AIX 5.3L needs this */
-#define _GNU_SOURCE
-#include <time.h>
#include "cache.h"
#include "blob.h"
#include "commit.h"
diff --git a/git-compat-util.h b/git-compat-util.h
index 0272d04..e619e29 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -11,6 +11,10 @@
#define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0]))
+#define _XOPEN_SOURCE 600 /* glibc2 and AIX 5.3L need 500, OpenBSD needs 600 for S_ISLNK() */
+#define _XOPEN_SOURCE_EXTENDED 1 /* AIX 5.3L needs this */
+#define _GNU_SOURCE
+
#include <unistd.h>
#include <stdio.h>
#include <sys/stat.h>
@@ -25,6 +29,10 @@
#include <netinet/in.h>
#include <sys/types.h>
#include <dirent.h>
+#include <sys/time.h>
+#include <time.h>
+#include <signal.h>
+#include <sys/wait.h>
/* On most systems <limits.h> would have given us this, but
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux
2006-12-19 19:55 ` Linus Torvalds
@ 2006-12-19 19:57 ` Randal L. Schwartz
2006-12-19 20:03 ` Randal L. Schwartz
2006-12-19 20:02 ` Jeff Garzik
2006-12-20 22:13 ` Nikolai Weibull
2 siblings, 1 reply; 52+ messages in thread
From: Randal L. Schwartz @ 2006-12-19 19:57 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Nicolas Pitre, Junio C Hamano, git
>>>>> "Linus" == Linus Torvalds <torvalds@osdl.org> writes:
Linus> May I actually suggest we handle _all_ of these issues in one central
Linus> place, namely "git-compat-util.h"
I can't remember now, but a couple of patches I had to submit were because
sys/types.h was included either too early or too late on OSX, so let's be sure
to get that right. Surely, my patch can be observed somewhere, perhaps in a
git repository. :)
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux
2006-12-19 19:55 ` Linus Torvalds
2006-12-19 19:57 ` Randal L. Schwartz
@ 2006-12-19 20:02 ` Jeff Garzik
2006-12-20 0:30 ` Junio C Hamano
2006-12-20 22:13 ` Nikolai Weibull
2 siblings, 1 reply; 52+ messages in thread
From: Jeff Garzik @ 2006-12-19 20:02 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Nicolas Pitre, Junio C Hamano, Randal L. Schwartz, git
Linus Torvalds wrote:
> diff --git a/convert-objects.c b/convert-objects.c
> index 8812583..a630132 100644
> --- a/convert-objects.c
> +++ b/convert-objects.c
> @@ -1,7 +1,3 @@
> -#define _XOPEN_SOURCE 600 /* glibc2 and AIX 5.3L need 500, OpenBSD needs 600 for S_ISLNK() */
> -#define _XOPEN_SOURCE_EXTENDED 1 /* AIX 5.3L needs this */
> -#define _GNU_SOURCE
> -#include <time.h>
> #include "cache.h"
> #include "blob.h"
> #include "commit.h"
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 0272d04..e619e29 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -11,6 +11,10 @@
>
> #define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0]))
>
> +#define _XOPEN_SOURCE 600 /* glibc2 and AIX 5.3L need 500, OpenBSD needs 600 for S_ISLNK() */
> +#define _XOPEN_SOURCE_EXTENDED 1 /* AIX 5.3L needs this */
> +#define _GNU_SOURCE
> +
> #include <unistd.h>
> #include <stdio.h>
> #include <sys/stat.h>
> @@ -25,6 +29,10 @@
> #include <netinet/in.h>
> #include <sys/types.h>
> #include <dirent.h>
> +#include <sys/time.h>
> +#include <time.h>
> +#include <signal.h>
> +#include <sys/wait.h>
If you are going to do this, you have to audit -every- file, to make
sure git-compat-util.h is -always- the first header.
For example, builtin-mailinfo.c includes git-compat-util.h after ctype.h
and iconv.h, which renders your #define _XOPEN_SOURCE 600 useless.
/usr/include/features.h has already been included at that point.
Jeff
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux
2006-12-19 19:57 ` Randal L. Schwartz
@ 2006-12-19 20:03 ` Randal L. Schwartz
0 siblings, 0 replies; 52+ messages in thread
From: Randal L. Schwartz @ 2006-12-19 20:03 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Nicolas Pitre, Junio C Hamano, git
>>>>> "Randal" == Randal L Schwartz <merlyn@stonehenge.com> writes:
Randal> I can't remember now, but a couple of patches I had to submit were
Randal> because sys/types.h was included either too early or too late on OSX,
Randal> so let's be sure to get that right. Surely, my patch can be observed
Randal> somewhere, perhaps in a git repository. :)
Here's one that might be relevant for OSX:
979e32fa1483a32faa4ec331e29b357e5eb5ef25
And this is an ordering issue for OpenBSD:
ed1795fcc5f2aa3f105630429bcbed49c50053fa
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 9:13 ` Marco Roeland
@ 2006-12-19 20:28 ` Alex Riesen
2006-12-21 20:35 ` Juergen Ruehle
0 siblings, 1 reply; 52+ messages in thread
From: Alex Riesen @ 2006-12-19 20:28 UTC (permalink / raw)
To: Marco Roeland
Cc: Shawn Pearce, Linus Torvalds, Theodore Tso, Nicolas Pitre,
Johannes Schindelin, Randal L. Schwartz, git
Marco Roeland, Tue, Dec 19, 2006 10:13:19 +0100:
> > Not that I personally frequently clone large projects on Mac OS X.
> > But new users to Git might. :-)
>
> And perhaps the Cygwin version might benefit too.
Does not work there at all. Even errno is not set (0).
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux
2006-12-19 20:02 ` Jeff Garzik
@ 2006-12-20 0:30 ` Junio C Hamano
2006-12-20 0:40 ` Linus Torvalds
0 siblings, 1 reply; 52+ messages in thread
From: Junio C Hamano @ 2006-12-20 0:30 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Nicolas Pitre, Linus Torvalds, Randal L. Schwartz, git
Jeff Garzik <jeff@garzik.org> writes:
> If you are going to do this, you have to audit -every- file, to make
> sure git-compat-util.h is -always- the first header.
Will do.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux
2006-12-20 0:30 ` Junio C Hamano
@ 2006-12-20 0:40 ` Linus Torvalds
2006-12-20 0:50 ` Jeff Garzik
2006-12-20 1:12 ` Junio C Hamano
0 siblings, 2 replies; 52+ messages in thread
From: Linus Torvalds @ 2006-12-20 0:40 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff Garzik, Nicolas Pitre, Randal L. Schwartz, git
On Tue, 19 Dec 2006, Junio C Hamano wrote:
> Jeff Garzik <jeff@garzik.org> writes:
>
> > If you are going to do this, you have to audit -every- file, to make
> > sure git-compat-util.h is -always- the first header.
>
> Will do.
Well, since any cases where it isn't (and where we'd care) will show up
as just a compiler warning, I doubt we really even need to. We can fix
things up as they get reported too..
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux
2006-12-20 0:40 ` Linus Torvalds
@ 2006-12-20 0:50 ` Jeff Garzik
2006-12-20 1:12 ` Junio C Hamano
1 sibling, 0 replies; 52+ messages in thread
From: Jeff Garzik @ 2006-12-20 0:50 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, Nicolas Pitre, Randal L. Schwartz, git
Linus Torvalds wrote:
>
> On Tue, 19 Dec 2006, Junio C Hamano wrote:
>
>> Jeff Garzik <jeff@garzik.org> writes:
>>
>>> If you are going to do this, you have to audit -every- file, to make
>>> sure git-compat-util.h is -always- the first header.
>> Will do.
>
> Well, since any cases where it isn't (and where we'd care) will show up
> as just a compiler warning, I doubt we really even need to. We can fix
> things up as they get reported too..
I wouldn't assume that is always the case. Mostly true, I would
suppose. But some of those defines turn on and off 64-bit file
size/offset gadgets for example.
Jeff
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux
2006-12-20 0:40 ` Linus Torvalds
2006-12-20 0:50 ` Jeff Garzik
@ 2006-12-20 1:12 ` Junio C Hamano
2006-12-20 20:17 ` Junio C Hamano
1 sibling, 1 reply; 52+ messages in thread
From: Junio C Hamano @ 2006-12-20 1:12 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Linus Torvalds, Nicolas Pitre, Randal L. Schwartz, git
Linus Torvalds <torvalds@osdl.org> writes:
> On Tue, 19 Dec 2006, Junio C Hamano wrote:
>
>> Jeff Garzik <jeff@garzik.org> writes:
>>
>> > If you are going to do this, you have to audit -every- file, to make
>> > sure git-compat-util.h is -always- the first header.
>>
>> Will do.
>
> Well, since any cases where it isn't (and where we'd care) will show up
> as just a compiler warning, I doubt we really even need to. We can fix
> things up as they get reported too..
True, but I've done it already, so...
Test compile especially on non Linux boxes are appreciated (I'll
do one on an OpenBSD bochs tomorrow myself anyway, though).
-- >8 --
[PATCH] simplify inclusion of system header files.
This is a mechanical clean-up of the way *.c files include
system header files.
(1) sources under compat/, platform sha-1 implementations, and
xdelta code are exempt from the following rules;
(2) the first #include must be "git-compat-util.h" or one of
our own header file that includes it first (e.g. config.h,
builtin.h, pkt-line.h);
(3) system headers that are included in "git-compat-util.h"
need not be included in individual C source files.
(4) "git-compat-util.h" does not have to include subsystem
specific header files (e.g. expat.h).
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
archive-tar.c | 1 -
archive-zip.c | 1 -
blob.c | 1 -
builtin-add.c | 2 --
builtin-apply.c | 1 -
builtin-archive.c | 1 -
builtin-blame.c | 4 ----
builtin-branch.c | 2 +-
builtin-for-each-ref.c | 1 -
builtin-grep.c | 3 ---
builtin-log.c | 2 --
builtin-ls-files.c | 2 --
builtin-mailinfo.c | 9 ---------
builtin-mailsplit.c | 7 -------
builtin-mv.c | 2 --
builtin-name-rev.c | 1 -
builtin-pack-objects.c | 2 --
builtin-repo-config.c | 1 -
builtin-runstatus.c | 2 +-
builtin-shortlog.c | 1 -
builtin-show-branch.c | 2 --
builtin-stripspace.c | 3 ---
builtin-tar-tree.c | 1 -
builtin-unpack-objects.c | 2 --
builtin-upload-archive.c | 3 ---
color.c | 5 +----
compat/mmap.c | 4 ----
compat/setenv.c | 3 +--
compat/strlcpy.c | 2 +-
compat/unsetenv.c | 3 +--
config.c | 1 -
connect.c | 6 ------
convert-objects.c | 4 ----
daemon.c | 16 +++-------------
date.c | 3 ---
diff-delta.c | 5 +----
diff.c | 3 ---
diffcore-order.c | 1 -
diffcore-pickaxe.c | 2 --
dir.c | 3 ---
entry.c | 2 --
fetch-pack.c | 1 -
fetch.c | 3 +--
fsck-objects.c | 3 ---
git-compat-util.h | 23 +++++++++++++++++++++++
git.c | 14 +-------------
grep.c | 1 -
help.c | 3 +--
ident.c | 3 ---
imap-send.c | 7 -------
interpolate.c | 2 --
lockfile.c | 1 -
merge-base.c | 1 -
merge-index.c | 4 ----
merge-recursive.c | 7 -------
path-list.c | 1 -
path.c | 1 -
receive-pack.c | 1 -
refs.c | 4 +---
revision.c | 1 -
rsh.c | 6 +-----
run-command.c | 1 -
ssh-upload.c | 2 --
strbuf.c | 4 +---
test-date.c | 3 ---
test-delta.c | 8 +-------
tree.c | 1 -
unpack-trees.c | 2 --
upload-pack.c | 3 ---
var.c | 3 ---
wt-status.c | 2 +-
71 files changed, 41 insertions(+), 195 deletions(-)
diff --git a/archive-tar.c b/archive-tar.c
index ff0f6e2..af47fdc 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -1,7 +1,6 @@
/*
* Copyright (c) 2005, 2006 Rene Scharfe
*/
-#include <time.h>
#include "cache.h"
#include "commit.h"
#include "strbuf.h"
diff --git a/archive-zip.c b/archive-zip.c
index 36e922a..f31b8ed 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -1,7 +1,6 @@
/*
* Copyright (c) 2006 Rene Scharfe
*/
-#include <time.h>
#include "cache.h"
#include "commit.h"
#include "blob.h"
diff --git a/blob.c b/blob.c
index d1af2e6..9776bea 100644
--- a/blob.c
+++ b/blob.c
@@ -1,6 +1,5 @@
#include "cache.h"
#include "blob.h"
-#include <stdlib.h>
const char *blob_type = "blob";
diff --git a/builtin-add.c b/builtin-add.c
index b3f9206..c8a114f 100644
--- a/builtin-add.c
+++ b/builtin-add.c
@@ -3,8 +3,6 @@
*
* Copyright (C) 2006 Linus Torvalds
*/
-#include <fnmatch.h>
-
#include "cache.h"
#include "builtin.h"
#include "dir.h"
diff --git a/builtin-apply.c b/builtin-apply.c
index 436d9e1..1c35837 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -6,7 +6,6 @@
* This applies patches on top of some (arbitrary) version of the SCM.
*
*/
-#include <fnmatch.h>
#include "cache.h"
#include "cache-tree.h"
#include "quote.h"
diff --git a/builtin-archive.c b/builtin-archive.c
index a8a1f07..391cf43 100644
--- a/builtin-archive.c
+++ b/builtin-archive.c
@@ -2,7 +2,6 @@
* Copyright (c) 2006 Franck Bui-Huu
* Copyright (c) 2006 Rene Scharfe
*/
-#include <time.h>
#include "cache.h"
#include "builtin.h"
#include "archive.h"
diff --git a/builtin-blame.c b/builtin-blame.c
index a250724..9bf6ec9 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -15,10 +15,6 @@
#include "revision.h"
#include "xdiff-interface.h"
-#include <time.h>
-#include <sys/time.h>
-#include <regex.h>
-
static char blame_usage[] =
"git-blame [-c] [-l] [-t] [-f] [-n] [-p] [-L n,m] [-S <revs-file>] [-M] [-C] [-C] [commit] [--] file\n"
" -c, --compatibility Use the same output mode as git-annotate (Default: off)\n"
diff --git a/builtin-branch.c b/builtin-branch.c
index 560309c..515330c 100644
--- a/builtin-branch.c
+++ b/builtin-branch.c
@@ -5,8 +5,8 @@
* Based on git-branch.sh by Junio C Hamano.
*/
-#include "color.h"
#include "cache.h"
+#include "color.h"
#include "refs.h"
#include "commit.h"
#include "builtin.h"
diff --git a/builtin-for-each-ref.c b/builtin-for-each-ref.c
index 227aa3c..af72a12 100644
--- a/builtin-for-each-ref.c
+++ b/builtin-for-each-ref.c
@@ -6,7 +6,6 @@
#include "tree.h"
#include "blob.h"
#include "quote.h"
-#include <fnmatch.h>
/* Quoting styles */
#define QUOTE_NONE 0
diff --git a/builtin-grep.c b/builtin-grep.c
index 9873e3d..3b1b1cb 100644
--- a/builtin-grep.c
+++ b/builtin-grep.c
@@ -10,10 +10,7 @@
#include "tag.h"
#include "tree-walk.h"
#include "builtin.h"
-#include <regex.h>
#include "grep.h"
-#include <fnmatch.h>
-#include <sys/wait.h>
/*
* git grep pathspecs are somewhat different from diff-tree pathspecs;
diff --git a/builtin-log.c b/builtin-log.c
index 17014f7..8df3c13 100644
--- a/builtin-log.c
+++ b/builtin-log.c
@@ -11,8 +11,6 @@
#include "log-tree.h"
#include "builtin.h"
#include "tag.h"
-#include <time.h>
-#include <sys/time.h>
static int default_show_root = 1;
diff --git a/builtin-ls-files.c b/builtin-ls-files.c
index bc79ce4..21c2a6e 100644
--- a/builtin-ls-files.c
+++ b/builtin-ls-files.c
@@ -5,8 +5,6 @@
*
* Copyright (C) Linus Torvalds, 2005
*/
-#include <fnmatch.h>
-
#include "cache.h"
#include "quote.h"
#include "dir.h"
diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
index b8d7dbc..e647229 100644
--- a/builtin-mailinfo.c
+++ b/builtin-mailinfo.c
@@ -2,15 +2,6 @@
* Another stupid program, this one parsing the headers of an
* email to figure out authorship and subject
*/
-#define _GNU_SOURCE
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <ctype.h>
-#ifndef NO_ICONV
-#include <iconv.h>
-#endif
-#include "git-compat-util.h"
#include "cache.h"
#include "builtin.h"
diff --git a/builtin-mailsplit.c b/builtin-mailsplit.c
index 91a699d..3bca855 100644
--- a/builtin-mailsplit.c
+++ b/builtin-mailsplit.c
@@ -4,13 +4,6 @@
* It just splits a mbox into a list of files: "0001" "0002" ..
* so you can process them further from there.
*/
-#include <unistd.h>
-#include <stdlib.h>
-#include <fcntl.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <string.h>
-#include <stdio.h>
#include "cache.h"
#include "builtin.h"
diff --git a/builtin-mv.c b/builtin-mv.c
index d14a4a7..737af35 100644
--- a/builtin-mv.c
+++ b/builtin-mv.c
@@ -3,8 +3,6 @@
*
* Copyright (C) 2006 Johannes Schindelin
*/
-#include <fnmatch.h>
-
#include "cache.h"
#include "builtin.h"
#include "dir.h"
diff --git a/builtin-name-rev.c b/builtin-name-rev.c
index 618aa31..b4f15cc 100644
--- a/builtin-name-rev.c
+++ b/builtin-name-rev.c
@@ -1,4 +1,3 @@
-#include <stdlib.h>
#include "builtin.h"
#include "cache.h"
#include "commit.h"
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index a2dc7d1..807be8c 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -12,8 +12,6 @@
#include "diff.h"
#include "revision.h"
#include "list-objects.h"
-#include <sys/time.h>
-#include <signal.h>
static const char pack_usage[] = "\
git-pack-objects [{ -q | --progress | --all-progress }] \n\
diff --git a/builtin-repo-config.c b/builtin-repo-config.c
index a38099a..a7ab4ce 100644
--- a/builtin-repo-config.c
+++ b/builtin-repo-config.c
@@ -1,6 +1,5 @@
#include "builtin.h"
#include "cache.h"
-#include <regex.h>
static const char git_config_set_usage[] =
"git-repo-config [ --global ] [ --bool | --int ] [--get | --get-all | --get-regexp | --replace-all | --add | --unset | --unset-all] name [value [value_regex]] | --rename-section old_name new_name | --list";
diff --git a/builtin-runstatus.c b/builtin-runstatus.c
index 0b63037..4b489b1 100644
--- a/builtin-runstatus.c
+++ b/builtin-runstatus.c
@@ -1,5 +1,5 @@
-#include "wt-status.h"
#include "cache.h"
+#include "wt-status.h"
extern int wt_status_use_color;
diff --git a/builtin-shortlog.c b/builtin-shortlog.c
index 3fc43dd..edb4042 100644
--- a/builtin-shortlog.c
+++ b/builtin-shortlog.c
@@ -4,7 +4,6 @@
#include "diff.h"
#include "path-list.h"
#include "revision.h"
-#include <string.h>
static const char shortlog_usage[] =
"git-shortlog [-n] [-s] [<commit-id>... ]";
diff --git a/builtin-show-branch.c b/builtin-show-branch.c
index a38ac34..b9d9781 100644
--- a/builtin-show-branch.c
+++ b/builtin-show-branch.c
@@ -1,5 +1,3 @@
-#include <stdlib.h>
-#include <fnmatch.h>
#include "cache.h"
#include "commit.h"
#include "refs.h"
diff --git a/builtin-stripspace.c b/builtin-stripspace.c
index 09cc910..f0d4d9e 100644
--- a/builtin-stripspace.c
+++ b/builtin-stripspace.c
@@ -1,6 +1,3 @@
-#include <stdio.h>
-#include <string.h>
-#include <ctype.h>
#include "builtin.h"
/*
diff --git a/builtin-tar-tree.c b/builtin-tar-tree.c
index 4d4cfec..11e62fc 100644
--- a/builtin-tar-tree.c
+++ b/builtin-tar-tree.c
@@ -1,7 +1,6 @@
/*
* Copyright (c) 2005, 2006 Rene Scharfe
*/
-#include <time.h>
#include "cache.h"
#include "commit.h"
#include "tar.h"
diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c
index e6d7574..d351e02 100644
--- a/builtin-unpack-objects.c
+++ b/builtin-unpack-objects.c
@@ -8,8 +8,6 @@
#include "tag.h"
#include "tree.h"
-#include <sys/time.h>
-
static int dry_run, quiet, recover, has_errors;
static const char unpack_usage[] = "git-unpack-objects [-n] [-q] [-r] < pack-file";
diff --git a/builtin-upload-archive.c b/builtin-upload-archive.c
index 45c92e1..e4156f8 100644
--- a/builtin-upload-archive.c
+++ b/builtin-upload-archive.c
@@ -1,9 +1,6 @@
/*
* Copyright (c) 2006 Franck Bui-Huu
*/
-#include <time.h>
-#include <sys/wait.h>
-#include <sys/poll.h>
#include "cache.h"
#include "builtin.h"
#include "archive.h"
diff --git a/color.c b/color.c
index d8c8399..09d82ee 100644
--- a/color.c
+++ b/color.c
@@ -1,8 +1,5 @@
-#include "color.h"
#include "cache.h"
-#include "git-compat-util.h"
-
-#include <stdarg.h>
+#include "color.h"
#define COLOR_RESET "\033[m"
diff --git a/compat/mmap.c b/compat/mmap.c
index a4d2e50..0fd46e7 100644
--- a/compat/mmap.c
+++ b/compat/mmap.c
@@ -1,7 +1,3 @@
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <errno.h>
#include "../git-compat-util.h"
void *gitfakemmap(void *start, size_t length, int prot , int flags, int fd, off_t offset)
diff --git a/compat/setenv.c b/compat/setenv.c
index b7d7678..3a22ea7 100644
--- a/compat/setenv.c
+++ b/compat/setenv.c
@@ -1,5 +1,4 @@
-#include <stdlib.h>
-#include <string.h>
+#include "../git-compat-util.h"
int gitsetenv(const char *name, const char *value, int replace)
{
diff --git a/compat/strlcpy.c b/compat/strlcpy.c
index b66856a..4024c36 100644
--- a/compat/strlcpy.c
+++ b/compat/strlcpy.c
@@ -1,4 +1,4 @@
-#include <string.h>
+#include "../git-compat-util.h"
size_t gitstrlcpy(char *dest, const char *src, size_t size)
{
diff --git a/compat/unsetenv.c b/compat/unsetenv.c
index 3a5e4ec..eb29f5e 100644
--- a/compat/unsetenv.c
+++ b/compat/unsetenv.c
@@ -1,5 +1,4 @@
-#include <stdlib.h>
-#include <string.h>
+#include "../git-compat-util.h"
void gitunsetenv (const char *name)
{
diff --git a/config.c b/config.c
index 663993f..913509f 100644
--- a/config.c
+++ b/config.c
@@ -6,7 +6,6 @@
*
*/
#include "cache.h"
-#include <regex.h>
#define MAXNAME (256)
diff --git a/connect.c b/connect.c
index f7edba8..66daa11 100644
--- a/connect.c
+++ b/connect.c
@@ -3,12 +3,6 @@
#include "pkt-line.h"
#include "quote.h"
#include "refs.h"
-#include <sys/wait.h>
-#include <sys/socket.h>
-#include <netinet/in.h>
-#include <arpa/inet.h>
-#include <netdb.h>
-#include <signal.h>
static char *server_capabilities;
diff --git a/convert-objects.c b/convert-objects.c
index 8812583..a630132 100644
--- a/convert-objects.c
+++ b/convert-objects.c
@@ -1,7 +1,3 @@
-#define _XOPEN_SOURCE 600 /* glibc2 and AIX 5.3L need 500, OpenBSD needs 600 for S_ISLNK() */
-#define _XOPEN_SOURCE_EXTENDED 1 /* AIX 5.3L needs this */
-#define _GNU_SOURCE
-#include <time.h>
#include "cache.h"
#include "blob.h"
#include "commit.h"
diff --git a/daemon.c b/daemon.c
index e66bb80..b129b83 100644
--- a/daemon.c
+++ b/daemon.c
@@ -1,20 +1,10 @@
-#include <signal.h>
-#include <sys/wait.h>
-#include <sys/socket.h>
-#include <sys/time.h>
-#include <sys/poll.h>
-#include <netdb.h>
-#include <netinet/in.h>
-#include <arpa/inet.h>
-#include <syslog.h>
-#include <pwd.h>
-#include <grp.h>
-#include <limits.h>
-#include "pkt-line.h"
#include "cache.h"
+#include "pkt-line.h"
#include "exec_cmd.h"
#include "interpolate.h"
+#include <syslog.h>
+
#ifndef HOST_NAME_MAX
#define HOST_NAME_MAX 256
#endif
diff --git a/date.c b/date.c
index 1825922..7acb8cb 100644
--- a/date.c
+++ b/date.c
@@ -4,9 +4,6 @@
* Copyright (C) Linus Torvalds, 2005
*/
-#include <time.h>
-#include <sys/time.h>
-
#include "cache.h"
static time_t my_mktime(struct tm *tm)
diff --git a/diff-delta.c b/diff-delta.c
index fa16d06..9f998d0 100644
--- a/diff-delta.c
+++ b/diff-delta.c
@@ -18,11 +18,8 @@
* licensing gets turned into GPLv2 within this project.
*/
-#include <stdlib.h>
-#include <string.h>
-#include "delta.h"
-
#include "git-compat-util.h"
+#include "delta.h"
/* maximum hash entry list for the same hash bucket */
#define HASH_LIMIT 64
diff --git a/diff.c b/diff.c
index 9974435..61889d7 100644
--- a/diff.c
+++ b/diff.c
@@ -1,9 +1,6 @@
/*
* Copyright (C) 2005 Junio C Hamano
*/
-#include <sys/types.h>
-#include <sys/wait.h>
-#include <signal.h>
#include "cache.h"
#include "quote.h"
#include "diff.h"
diff --git a/diffcore-order.c b/diffcore-order.c
index aef6da6..7ad0946 100644
--- a/diffcore-order.c
+++ b/diffcore-order.c
@@ -4,7 +4,6 @@
#include "cache.h"
#include "diff.h"
#include "diffcore.h"
-#include <fnmatch.h>
static char **order;
static int order_cnt;
diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index cfcce31..de44ada 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -5,8 +5,6 @@
#include "diff.h"
#include "diffcore.h"
-#include <regex.h>
-
static unsigned int contains(struct diff_filespec *one,
const char *needle, unsigned long len,
regex_t *regexp)
diff --git a/dir.c b/dir.c
index e6a61ee..16401d8 100644
--- a/dir.c
+++ b/dir.c
@@ -5,9 +5,6 @@
* Copyright (C) Linus Torvalds, 2005-2006
* Junio Hamano, 2005-2006
*/
-#include <dirent.h>
-#include <fnmatch.h>
-
#include "cache.h"
#include "dir.h"
diff --git a/entry.c b/entry.c
index b2ea0ef..88df713 100644
--- a/entry.c
+++ b/entry.c
@@ -1,5 +1,3 @@
-#include <sys/types.h>
-#include <dirent.h>
#include "cache.h"
#include "blob.h"
diff --git a/fetch-pack.c b/fetch-pack.c
index 743eab7..92322cf 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -5,7 +5,6 @@
#include "tag.h"
#include "exec_cmd.h"
#include "sideband.h"
-#include <sys/wait.h>
static int keep_pack;
static int quiet;
diff --git a/fetch.c b/fetch.c
index 663b4b2..f69be82 100644
--- a/fetch.c
+++ b/fetch.c
@@ -1,6 +1,5 @@
-#include "fetch.h"
-
#include "cache.h"
+#include "fetch.h"
#include "commit.h"
#include "tree.h"
#include "tree-walk.h"
diff --git a/fsck-objects.c b/fsck-objects.c
index 46b628c..409aea0 100644
--- a/fsck-objects.c
+++ b/fsck-objects.c
@@ -1,6 +1,3 @@
-#include <sys/types.h>
-#include <dirent.h>
-
#include "cache.h"
#include "commit.h"
#include "tree.h"
diff --git a/git-compat-util.h b/git-compat-util.h
index 0272d04..87fde99 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -11,6 +11,10 @@
#define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0]))
+#define _XOPEN_SOURCE 600 /* glibc2 and AIX 5.3L need 500, OpenBSD needs 600 for S_ISLNK() */
+#define _XOPEN_SOURCE_EXTENDED 1 /* AIX 5.3L needs this */
+#define _GNU_SOURCE
+
#include <unistd.h>
#include <stdio.h>
#include <sys/stat.h>
@@ -25,6 +29,25 @@
#include <netinet/in.h>
#include <sys/types.h>
#include <dirent.h>
+#include <sys/time.h>
+#include <time.h>
+#include <signal.h>
+#include <sys/wait.h>
+#include <fnmatch.h>
+#include <sys/poll.h>
+#include <sys/socket.h>
+#include <assert.h>
+#include <regex.h>
+#include <netinet/in.h>
+#include <netinet/tcp.h>
+#include <arpa/inet.h>
+#include <netdb.h>
+#include <pwd.h>
+#include <grp.h>
+
+#ifndef NO_ICONV
+#include <iconv.h>
+#endif
/* On most systems <limits.h> would have given us this, but
* not on some systems (e.g. GNU/Hurd).
diff --git a/git.c b/git.c
index 016ee8a..73cf4d4 100644
--- a/git.c
+++ b/git.c
@@ -1,20 +1,8 @@
-#include <stdio.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <dirent.h>
-#include <unistd.h>
-#include <stdlib.h>
-#include <string.h>
-#include <errno.h>
-#include <limits.h>
-#include <stdarg.h>
-#include "git-compat-util.h"
+#include "builtin.h"
#include "exec_cmd.h"
#include "cache.h"
#include "quote.h"
-#include "builtin.h"
-
const char git_usage_string[] =
"git [--version] [--exec-path[=GIT_EXEC_PATH]] [-p|--paginate] [--bare] [--git-dir=GIT_DIR] [--help] COMMAND [ARGS]";
diff --git a/grep.c b/grep.c
index 0fc078e..fcc6762 100644
--- a/grep.c
+++ b/grep.c
@@ -1,5 +1,4 @@
#include "cache.h"
-#include <regex.h>
#include "grep.h"
void append_grep_pattern(struct grep_opt *opt, const char *pat,
diff --git a/help.c b/help.c
index 0824c25..341b9e3 100644
--- a/help.c
+++ b/help.c
@@ -3,12 +3,11 @@
*
* Builtin help-related commands (help, usage, version)
*/
-#include <sys/ioctl.h>
#include "cache.h"
#include "builtin.h"
#include "exec_cmd.h"
#include "common-cmds.h"
-
+#include <sys/ioctl.h>
/* most GUI terminals set COLUMNS (although some don't export it) */
static int term_columns(void)
diff --git a/ident.c b/ident.c
index d7faba6..6ad8fed 100644
--- a/ident.c
+++ b/ident.c
@@ -7,9 +7,6 @@
*/
#include "cache.h"
-#include <pwd.h>
-#include <netdb.h>
-
static char git_default_date[50];
static void copy_gecos(struct passwd *w, char *name, int sz)
diff --git a/imap-send.c b/imap-send.c
index a6a6568..894cbbd 100644
--- a/imap-send.c
+++ b/imap-send.c
@@ -24,13 +24,6 @@
#include "cache.h"
-#include <assert.h>
-#include <netinet/in.h>
-#include <netinet/tcp.h>
-#include <arpa/inet.h>
-#include <sys/socket.h>
-#include <netdb.h>
-
typedef struct store_conf {
char *name;
const char *path; /* should this be here? its interpretation is driver-specific */
diff --git a/interpolate.c b/interpolate.c
index 5d9d188..f992ef7 100644
--- a/interpolate.c
+++ b/interpolate.c
@@ -2,8 +2,6 @@
* Copyright 2006 Jon Loeliger
*/
-#include <string.h>
-
#include "git-compat-util.h"
#include "interpolate.h"
diff --git a/lockfile.c b/lockfile.c
index 2a2fea3..261baff 100644
--- a/lockfile.c
+++ b/lockfile.c
@@ -1,7 +1,6 @@
/*
* Copyright (c) 2005, Junio C Hamano
*/
-#include <signal.h>
#include "cache.h"
static struct lock_file *lock_file_list;
diff --git a/merge-base.c b/merge-base.c
index 009caf8..385f4ba 100644
--- a/merge-base.c
+++ b/merge-base.c
@@ -1,4 +1,3 @@
-#include <stdlib.h>
#include "cache.h"
#include "commit.h"
diff --git a/merge-index.c b/merge-index.c
index 646d090..a9983dd 100644
--- a/merge-index.c
+++ b/merge-index.c
@@ -1,7 +1,3 @@
-#include <sys/types.h>
-#include <sys/wait.h>
-#include <signal.h>
-
#include "cache.h"
static const char *pgm;
diff --git a/merge-recursive.c b/merge-recursive.c
index 6dd6e2e..1de273e 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -3,13 +3,6 @@
* Fredrik Kuivinen.
* The thieves were Alex Riesen and Johannes Schindelin, in June/July 2006
*/
-#include <stdarg.h>
-#include <string.h>
-#include <assert.h>
-#include <sys/wait.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <time.h>
#include "cache.h"
#include "cache-tree.h"
#include "commit.h"
diff --git a/path-list.c b/path-list.c
index f8800f8..caaa5cc 100644
--- a/path-list.c
+++ b/path-list.c
@@ -1,4 +1,3 @@
-#include <stdio.h>
#include "cache.h"
#include "path-list.h"
diff --git a/path.c b/path.c
index d2c076d..066f621 100644
--- a/path.c
+++ b/path.c
@@ -11,7 +11,6 @@
* which is what it's designed for.
*/
#include "cache.h"
-#include <pwd.h>
static char bad_path[] = "/bad-path/";
diff --git a/receive-pack.c b/receive-pack.c
index 5e5510b..59b682c 100644
--- a/receive-pack.c
+++ b/receive-pack.c
@@ -6,7 +6,6 @@
#include "exec_cmd.h"
#include "commit.h"
#include "object.h"
-#include <sys/wait.h>
static const char receive_pack_usage[] = "git-receive-pack <git-dir>";
diff --git a/refs.c b/refs.c
index d911b9e..a101ff3 100644
--- a/refs.c
+++ b/refs.c
@@ -1,10 +1,8 @@
-#include "refs.h"
#include "cache.h"
+#include "refs.h"
#include "object.h"
#include "tag.h"
-#include <errno.h>
-
/* ISSYMREF=01 and ISPACKED=02 are public interfaces */
#define REF_KNOWS_PEELED 04
diff --git a/revision.c b/revision.c
index 993bb66..7b6f91f 100644
--- a/revision.c
+++ b/revision.c
@@ -6,7 +6,6 @@
#include "diff.h"
#include "refs.h"
#include "revision.h"
-#include <regex.h>
#include "grep.h"
static char *path_name(struct name_path *path, const char *name)
diff --git a/rsh.c b/rsh.c
index f34409e..5754a23 100644
--- a/rsh.c
+++ b/rsh.c
@@ -1,10 +1,6 @@
-#include <string.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-
+#include "cache.h"
#include "rsh.h"
#include "quote.h"
-#include "cache.h"
#define COMMAND_SIZE 4096
diff --git a/run-command.c b/run-command.c
index 6190868..492ad3e 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1,6 +1,5 @@
#include "cache.h"
#include "run-command.h"
-#include <sys/wait.h>
#include "exec_cmd.h"
int run_command_v_opt(int argc, const char **argv, int flags)
diff --git a/ssh-upload.c b/ssh-upload.c
index 20b15ea..0b52ae1 100644
--- a/ssh-upload.c
+++ b/ssh-upload.c
@@ -12,8 +12,6 @@
#include "rsh.h"
#include "refs.h"
-#include <string.h>
-
static unsigned char local_version = 1;
static unsigned char remote_version;
diff --git a/strbuf.c b/strbuf.c
index 9d9d8be..7f14b0f 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -1,7 +1,5 @@
-#include <stdio.h>
-#include <stdlib.h>
-#include "strbuf.h"
#include "cache.h"
+#include "strbuf.h"
void strbuf_init(struct strbuf *sb) {
sb->buf = NULL;
diff --git a/test-date.c b/test-date.c
index 93e8027..62e8f23 100644
--- a/test-date.c
+++ b/test-date.c
@@ -1,6 +1,3 @@
-#include <stdio.h>
-#include <time.h>
-
#include "cache.h"
int main(int argc, char **argv)
diff --git a/test-delta.c b/test-delta.c
index 1be8ee0..795aa08 100644
--- a/test-delta.c
+++ b/test-delta.c
@@ -8,13 +8,7 @@
* published by the Free Software Foundation.
*/
-#include <stdio.h>
-#include <unistd.h>
-#include <string.h>
-#include <fcntl.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <sys/mman.h>
+#include "git-compat-util.h"
#include "delta.h"
static const char usage[] =
diff --git a/tree.c b/tree.c
index ea386e5..b6f02fe 100644
--- a/tree.c
+++ b/tree.c
@@ -4,7 +4,6 @@
#include "commit.h"
#include "tag.h"
#include "tree-walk.h"
-#include <stdlib.h>
const char *tree_type = "tree";
diff --git a/unpack-trees.c b/unpack-trees.c
index b8689eb..2e2232c 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1,5 +1,3 @@
-#include <signal.h>
-#include <sys/time.h>
#include "cache.h"
#include "dir.h"
#include "tree.h"
diff --git a/upload-pack.c b/upload-pack.c
index 4572fff..32b06b2 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -1,6 +1,3 @@
-#include <signal.h>
-#include <sys/wait.h>
-#include <sys/poll.h>
#include "cache.h"
#include "refs.h"
#include "pkt-line.h"
diff --git a/var.c b/var.c
index a57a33b..39977b9 100644
--- a/var.c
+++ b/var.c
@@ -4,9 +4,6 @@
* Copyright (C) Eric Biederman, 2005
*/
#include "cache.h"
-#include <stdio.h>
-#include <errno.h>
-#include <string.h>
static const char var_usage[] = "git-var [-l | <variable>]";
diff --git a/wt-status.c b/wt-status.c
index cface6c..db42738 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1,6 +1,6 @@
+#include "cache.h"
#include "wt-status.h"
#include "color.h"
-#include "cache.h"
#include "object.h"
#include "dir.h"
#include "commit.h"
--
1.4.4.2.g205bf
^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 16:57 ` Linus Torvalds
@ 2006-12-20 1:54 ` Shawn Pearce
0 siblings, 0 replies; 52+ messages in thread
From: Shawn Pearce @ 2006-12-20 1:54 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Theodore Tso, Nicolas Pitre, Randal L. Schwartz, git
Linus Torvalds <torvalds@osdl.org> wrote:
> On Tue, 19 Dec 2006, Theodore Tso wrote:
> >
> > So the main reason to use mamp, as Linus puts it, is if the management
> > overhead of needing to read lots of small bits of the file makes the
> > use of malloc/read to be a pain in the *ss, then go for it.
>
> An example of this in git is the regular pack-file accesses. We're MUCH
> better off just mmap'ing the whole pack-file (or at least big chunks of
> it) and not having to maintain difficult structures of "this is where I
> read that part of the file into memory", or read _big_ chunks when
> quite often we just use a few kB of it.
>
> So mmap for pack-files does make sense, but probably only when you can
> mmap big chunks, and are going to access much smaller (random) parts of
> it.
Yes, exactly.
git-fast-import mmaps the pack file for this very reason. It every
so often needs to go back and reread a tree object which has expired
from its own in-memory LRU cache. This usually doesn't happen
very often, but when it does we don't know where we are going to
jump to get data from. mmaping a huge segment of the pack file
(or the whole thing if its reasonably small) works for this case as
the OS buffer cache can just take care of it for us. But as Linus
pointed out mmap and write() aren't safe on some systems. Arrrgh.
However git-fast-import would probably work just as well (or maybe
slightly better) with pread(). I really should port that code
forward to current Git, use pread() instead, and submit the patch
to Junio. But nobody really showed a lot of interest.
My sliding window pack-file access implementation (that I'm currently
rewriting on top of current Git) tries to work in very large chunks,
by default its 32 MiB per chunk, but its user/repository configurable
so kernel hackers may just set it to 256 MiB and continue to get
one large mmap for quite some time to come. Of course I would
also like to get that to autoselect the window size rather than
just hardcode it. :-)
The implementation would prefer a very small number (<8) of very
large chunks (>32 MiB), but is designed to more gracefully degrade
on huge packs on limited address space systems (e.g. Windows 32 bit)
then the current code does.
--
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 16:19 ` Theodore Tso
2006-12-19 16:57 ` Linus Torvalds
@ 2006-12-20 1:58 ` Shawn Pearce
1 sibling, 0 replies; 52+ messages in thread
From: Shawn Pearce @ 2006-12-20 1:58 UTC (permalink / raw)
To: Theodore Tso; +Cc: Nicolas Pitre, Linus Torvalds, Randal L. Schwartz, git
Theodore Tso <tytso@mit.edu> wrote:
> On Tue, Dec 19, 2006 at 01:39:30AM -0500, Shawn Pearce wrote:
> > This is why git-fast-import mmaps 128 MiB blocks from the file at
> > a time. The mmap region is usually much larger than the file itself;
> > the application appends to the file via write() then goes back
> > and rereads data when necessary via the already established mmap.
> > Its rare for the application to need to unmap/remap a different block
> > so there really isn't very much page table manipulation overhead.
>
> Yes, but unless you are using the (non-portable, Linux specific)
> MAP_POPULATE flag to mmap, each time you touch a new page, you end up
> taking a page fault; and so malloc/read/free might *still* be faster.
> I'd encourage you to make the change and benchmark it; the results may
> be surprising. I played with this with dcraw, the Canon Raw File
> converter a while back (before MAP_POPULATE was added), where I found
> that with a linear access pattern, if you are reading the entire file,
> it's stil marginally faster to use read() over mmap(), because with
> dcraw taking a page fault every 4k of raw file, the system time was
> significantly higher.
Interesting. Lots of good reasons to probably just use pread()
in there instead of mmap. For one thing git-fast-import doesn't
go back and hit the already written pack data very often. Its own
in memory caches usually perform very well.
--
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux
2006-12-20 1:12 ` Junio C Hamano
@ 2006-12-20 20:17 ` Junio C Hamano
2006-12-20 20:53 ` Linus Torvalds
0 siblings, 1 reply; 52+ messages in thread
From: Junio C Hamano @ 2006-12-20 20:17 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Linus Torvalds, Nicolas Pitre, Randal L. Schwartz, git
Junio C Hamano <junkio@cox.net> writes:
> Linus Torvalds <torvalds@osdl.org> writes:
>
>> On Tue, 19 Dec 2006, Junio C Hamano wrote:
>>
>>> Jeff Garzik <jeff@garzik.org> writes:
>>>
>>> > If you are going to do this, you have to audit -every- file, to make
>>> > sure git-compat-util.h is -always- the first header.
>>>
>>> Will do.
>>
>> Well, since any cases where it isn't (and where we'd care) will show up
>> as just a compiler warning, I doubt we really even need to. We can fix
>> things up as they get reported too..
>
> True, but I've done it already, so...
>
> Test compile especially on non Linux boxes are appreciated (I'll
> do one on an OpenBSD bochs tomorrow myself anyway, though).
I've pushed the results out, along with the index-pack updates
from Linus/Nico.
I needed to fix the changes to git-compat-util.h a bit from the
version I sent earlier to make OpenBSD happy (sys/types.h there
did not expose u_int unless _BSD_SOURCE was set, and
netinet/in.h was duplicated by mistake).
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux
2006-12-20 20:17 ` Junio C Hamano
@ 2006-12-20 20:53 ` Linus Torvalds
2006-12-20 21:52 ` Junio C Hamano
0 siblings, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2006-12-20 20:53 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff Garzik, Nicolas Pitre, Randal L. Schwartz, git
On Wed, 20 Dec 2006, Junio C Hamano wrote:
>
> I needed to fix the changes to git-compat-util.h a bit from the
> version I sent earlier to make OpenBSD happy (sys/types.h there
> did not expose u_int unless _BSD_SOURCE was set, and
> netinet/in.h was duplicated by mistake).
Please don't use "u_int" in the first place. It's an abomination of a
type. It should never be used.
There's simply no point. It's "unsigned int", and that doesn't need any
header files at all.
I don't understand why people are lazy, and can't write "unsigned", but
then introduce a type that requires you to have all kinds of magic. The
lazyness just results in more work, and is totally nonportable.
So "u_int" (along with its idiotic brethren "u_long", "u_short" and
"u_char") is just silly.
The only user in git seems to have been copied from a source that is
insane. It does
u_int words[NS_IN6ADDRSZ / NS_INT16SZ];
which is just insane. It actually seems to want to use "uint16_t", which
at least would make sense, and be a type that has some _point_ to it.
So please change the "u_int" to either "unsigned int" or "uint16_t".
Either is better.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux
2006-12-20 20:53 ` Linus Torvalds
@ 2006-12-20 21:52 ` Junio C Hamano
0 siblings, 0 replies; 52+ messages in thread
From: Junio C Hamano @ 2006-12-20 21:52 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Jeff Garzik, Nicolas Pitre, Randal L. Schwartz, git
Linus Torvalds <torvalds@osdl.org> writes:
> On Wed, 20 Dec 2006, Junio C Hamano wrote:
>>
>> I needed to fix the changes to git-compat-util.h a bit from the
>> version I sent earlier to make OpenBSD happy (sys/types.h there
>> did not expose u_int unless _BSD_SOURCE was set, and
>> netinet/in.h was duplicated by mistake).
>
> Please don't use "u_int" in the first place. It's an abomination of a
> type. It should never be used.
I did not want to touch imported sources in compat/, but
> The only user in git seems to have been copied from a source that is
> insane. It does
>
> u_int words[NS_IN6ADDRSZ / NS_INT16SZ];
>
> which is just insane. It actually seems to want to use "uint16_t", which
> at least would make sense, and be a type that has some _point_ to it.
the above argument makes 100% sense. Will fix.
However, on sane platforms we do not even compile that file.
The problem I observed was that <include/netinet/tcp.h> on
OpenBSD uses u_int wants the source code that uses that header
file to first include <sys/types.h> to get u_int, which in turn
requires __BSD_VISIBLE to be in effect.
Unfortunately I think _BSD_SOURCE needs to stay for this reason.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux
2006-12-19 19:55 ` Linus Torvalds
2006-12-19 19:57 ` Randal L. Schwartz
2006-12-19 20:02 ` Jeff Garzik
@ 2006-12-20 22:13 ` Nikolai Weibull
2 siblings, 0 replies; 52+ messages in thread
From: Nikolai Weibull @ 2006-12-20 22:13 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Nicolas Pitre, Junio C Hamano, Randal L. Schwartz, git
On 12/19/06, Linus Torvalds <torvalds@osdl.org> wrote:
> It's the C++ people who tend to have sucky compile times.
Always looking for a way to bash on C++, eh?
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
@ 2006-12-21 8:41 linux
0 siblings, 0 replies; 52+ messages in thread
From: linux @ 2006-12-21 8:41 UTC (permalink / raw)
To: git
Remember, OS X is FreeBSD somehow layered on top of Mach 3.
In particular, it calls though to Mach for virtual memory.
And Mach is at its heart an academic design aiming for flexibility in
its VM system rather tha performance. In particular, its provisions
for user-space pagers.
I can easily believe that its mmap is 10x slower that Linux. 60x is.. a
bit extreme. A Darwin kernel hacker should probably get a nudge about it.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-19 20:28 ` Alex Riesen
@ 2006-12-21 20:35 ` Juergen Ruehle
0 siblings, 0 replies; 52+ messages in thread
From: Juergen Ruehle @ 2006-12-21 20:35 UTC (permalink / raw)
To: Alex Riesen
Cc: Marco Roeland, Shawn Pearce, Linus Torvalds, Theodore Tso,
Nicolas Pitre, Johannes Schindelin, Randal L. Schwartz, git
Alex Riesen writes:
> Marco Roeland, Tue, Dec 19, 2006 10:13:19 +0100:
> > > Not that I personally frequently clone large projects on Mac OS X.
> > > But new users to Git might. :-)
> >
> > And perhaps the Cygwin version might benefit too.
>
> Does not work there at all. Even errno is not set (0).
Haven't seen a reply to this yet.
Upgrade cygwin.dll. The fix is pretty recent (1.5.22 or something).
Perhaps we now need a note in INSTALL.
Light testing seems to indicate that this change removes most of the
system times while leaving user time mostly unchanged (though I don't
know what cygwin time really reports in these categories). In the git
repository this is about 30% of the total time. For another (slightly
larger but with fewer objects) pack this still saves 20%. I haven't
compared it with the NO_MMAP version (too lazy), but overall it's a
win for cygwin as well (though we should note that cygwin's mmap seems
to have reasonable performance).
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-18 23:02 ` Martin Langhoff
@ 2006-12-22 1:44 ` Kyle Moffett
2006-12-22 1:56 ` Shawn Pearce
2006-12-22 8:04 ` Marco Roeland
2007-01-03 13:55 ` Andreas Ericsson
1 sibling, 2 replies; 52+ messages in thread
From: Kyle Moffett @ 2006-12-22 1:44 UTC (permalink / raw)
To: Martin Langhoff; +Cc: Randal L. Schwartz, Linus Torvalds, Nicolas Pitre, git
On Dec 18, 2006, at 18:02:07, Martin Langhoff wrote:
> On 18 Dec 2006 14:26:36 -0800, Randal L. Schwartz
> <merlyn@stonehenge.com> wrote:
>> Linus Torvalds wrote:
>>> You're running this under OS X, aren't you? It's a pig of an OS,
>>> but "almost one hour" vs "25 seconds" is still unreasonable.
>>
>> I agree!
>
> Me too -- but entirely possible. Disk IO is specially painful on
> OSX. Stat calls are horrid. Using Arch (which abused stat calls to
> no end) many ops would take 50x-100x longer on OSX than on Linux. A
> large unpacked repo with git is a real pain -- and packing it can
> take hours.
I've actually also seen filesystem operation latency double or triple
if you start trying to do operations from multiple threads at once.
Suddenly the already dog-slow single-CPU operations start bouncing
caches and the Mac OS X mostly-whole-of-BSD-BKL across CPUs and it
just crawls. I can definitely see the local disk IO taking 100x
longer than the network I/O, especially with an 8-megabit internet link.
> Once you are packed it's sweet, but large repos are a pain to deal
> with. You won't impress anyone with performance over a linux kernel
> repo -- starting up gitk can take a long time. Stat-heavy stuff
> like git-diff is noticeably slower under OSX.
Just as an example, it takes my OS-X-running Quad-2.5GHz G5 ten times
as long to do a "grep -rl foo linux/" as my Linux-running dual-1GHz
G4 with 400MHz system bus. This is disk-cache-hot too. And that's
not even a stat-heavy workload. There's more than one reason I'm
trying to make a Mac OS X ABI emulation layer on top of Linux :-D.
Cheers,
Kyle Moffett
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-22 1:44 ` Kyle Moffett
@ 2006-12-22 1:56 ` Shawn Pearce
2006-12-22 8:04 ` Marco Roeland
1 sibling, 0 replies; 52+ messages in thread
From: Shawn Pearce @ 2006-12-22 1:56 UTC (permalink / raw)
To: Kyle Moffett
Cc: Martin Langhoff, Randal L. Schwartz, Linus Torvalds,
Nicolas Pitre, git
Kyle Moffett <mrmacman_g4@mac.com> wrote:
> Just as an example, it takes my OS-X-running Quad-2.5GHz G5 ten times
> as long to do a "grep -rl foo linux/" as my Linux-running dual-1GHz
> G4 with 400MHz system bus. This is disk-cache-hot too. And that's
> not even a stat-heavy workload. There's more than one reason I'm
> trying to make a Mac OS X ABI emulation layer on top of Linux :-D.
Try 'git grep foo' instead; its very fast if you are working with
a fully packed Git repository.
Doesn't help any other application on Mac OS X however. :-(
--
Shawn.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-22 1:44 ` Kyle Moffett
2006-12-22 1:56 ` Shawn Pearce
@ 2006-12-22 8:04 ` Marco Roeland
1 sibling, 0 replies; 52+ messages in thread
From: Marco Roeland @ 2006-12-22 8:04 UTC (permalink / raw)
To: Kyle Moffett
Cc: Martin Langhoff, Randal L. Schwartz, Linus Torvalds,
Nicolas Pitre, git
On Thursday December 21st 2006 at 20:44 Kyle Moffett wrote:
> I've actually also seen filesystem operation latency double or triple
> if you start trying to do operations from multiple threads at once.
> Suddenly the already dog-slow single-CPU operations start bouncing
> caches and the Mac OS X mostly-whole-of-BSD-BKL across CPUs and it
> just crawls. I can definitely see the local disk IO taking 100x
> longer than the network I/O, especially with an 8-megabit internet link.
The mmap() implementation on Mac OS X ate 85% percent of system time on
the old version for git-index-pack. Of that time nearly all was spent on
some Mach locking function. So yes the BKL like locking inside the Mach
message passing seems to be the big culprit, perhaps Andy Tanenbaum can
explain this on LCA 2007. <ducks>
As most developers now run multicore CPU's we notice these differences
even better now! ;-)
--
Marco Roeland
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: cloning the kernel - why long time in "Resolving 313037 deltas"
2006-12-18 23:02 ` Martin Langhoff
2006-12-22 1:44 ` Kyle Moffett
@ 2007-01-03 13:55 ` Andreas Ericsson
1 sibling, 0 replies; 52+ messages in thread
From: Andreas Ericsson @ 2007-01-03 13:55 UTC (permalink / raw)
To: Martin Langhoff; +Cc: Randal L. Schwartz, Linus Torvalds, Nicolas Pitre, git
Martin Langhoff wrote:
>
> Once you are packed it's sweet, but large repos are a pain to deal
> with. You won't impress anyone with performance over a linux kernel
> repo -- starting up gitk can take a long time.
I'd suggest running qgit instead, and make sure you've run it once on
the repo you're demoing. qgit has a nice little cache where it tucks
away most of the data it uses, which makes it draw things a lot faster.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2007-01-03 13:55 UTC | newest]
Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <86y7p57y05.fsf@blue.stonehenge.com>
[not found] ` <Pine.LNX.4.63.0612182154170.19693@wbgn013.biozentrum.uni-wuerzburg.de>
[not found] ` <Pine.LNX.4.63.0612182213020.19693@wbgn013.biozentrum.uni-wuerzburg.de>
[not found] ` <Pine.LNX.4.64.0612181638220.18171@xanadu.home>
2006-12-18 21:55 ` [PATCH] fetch-pack: avoid fixing thin packs when unnecessary Johannes Schindelin
2006-12-18 22:17 ` Nicolas Pitre
[not found] ` <Pine.LNX.4.64.0612181251020.3479@woody.osdl.org>
[not found] ` <86r6uw9azn.fsf@blue.stonehenge.com>
[not found] ` <Pine.LNX.4.64.0612181625140.18171@xanadu.home>
2006-12-18 22:01 ` cloning the kernel - why long time in "Resolving 313037 deltas" Randal L. Schwartz
2006-12-18 22:09 ` Nicolas Pitre
2006-12-18 22:21 ` Randal L. Schwartz
2006-12-18 22:50 ` Nicolas Pitre
2006-12-18 22:22 ` Linus Torvalds
2006-12-18 22:26 ` Randal L. Schwartz
2006-12-18 23:02 ` Martin Langhoff
2006-12-22 1:44 ` Kyle Moffett
2006-12-22 1:56 ` Shawn Pearce
2006-12-22 8:04 ` Marco Roeland
2007-01-03 13:55 ` Andreas Ericsson
2006-12-18 23:28 ` Linus Torvalds
2006-12-19 0:13 ` Nicolas Pitre
2006-12-19 5:11 ` Theodore Tso
2006-12-19 6:39 ` Shawn Pearce
2006-12-19 6:51 ` Linus Torvalds
2006-12-19 7:26 ` Shawn Pearce
2006-12-19 7:52 ` Marco Roeland
2006-12-19 7:58 ` Shawn Pearce
2006-12-19 8:32 ` Shawn Pearce
2006-12-19 8:40 ` Marco Roeland
2006-12-19 8:49 ` Shawn Pearce
2006-12-19 9:13 ` Marco Roeland
2006-12-19 20:28 ` Alex Riesen
2006-12-21 20:35 ` Juergen Ruehle
2006-12-19 16:19 ` Theodore Tso
2006-12-19 16:57 ` Linus Torvalds
2006-12-20 1:54 ` Shawn Pearce
2006-12-20 1:58 ` Shawn Pearce
2006-12-19 6:47 ` Linus Torvalds
2006-12-19 8:32 ` Johannes Schindelin
2006-12-19 9:10 ` Junio C Hamano
2006-12-19 9:47 ` Jeff King
2006-12-19 10:24 ` Andy Whitcroft
2006-12-19 15:53 ` [PATCH] index-pack usage of mmap() is unacceptably slower on many OSes other than Linux Nicolas Pitre
2006-12-19 19:00 ` Junio C Hamano
2006-12-19 19:14 ` Nicolas Pitre
2006-12-19 19:55 ` Linus Torvalds
2006-12-19 19:57 ` Randal L. Schwartz
2006-12-19 20:03 ` Randal L. Schwartz
2006-12-19 20:02 ` Jeff Garzik
2006-12-20 0:30 ` Junio C Hamano
2006-12-20 0:40 ` Linus Torvalds
2006-12-20 0:50 ` Jeff Garzik
2006-12-20 1:12 ` Junio C Hamano
2006-12-20 20:17 ` Junio C Hamano
2006-12-20 20:53 ` Linus Torvalds
2006-12-20 21:52 ` Junio C Hamano
2006-12-20 22:13 ` Nikolai Weibull
2006-12-21 8:41 cloning the kernel - why long time in "Resolving 313037 deltas" linux
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).