* comparing file contents in is_exact_match? @ 2006-07-06 5:57 Martin Waitz 2006-07-06 6:20 ` Junio C Hamano 2006-07-07 16:33 ` Florian Weimer 0 siblings, 2 replies; 32+ messages in thread From: Martin Waitz @ 2006-07-06 5:57 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 708 bytes --] hoi :) I created a git repository for my photo collection and then renamed some photos (about 600). Now git status and commit get hit by the OOM killer. The reason for that is that is_exact_match (in diffcore-rename.c) maps both the source and destination file into memory and then compares them byte for byte. This is a little bit too much for my little machine. When I remove the content comparation and only leave the sha1 comparision, then my renames are correctly found in a second. But unluckily, some other renames in the testcases are not correctly handled any more. So is there an easy solution? Why is the content comparision in is_exact_match needed? -- Martin Waitz [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-06 5:57 comparing file contents in is_exact_match? Martin Waitz @ 2006-07-06 6:20 ` Junio C Hamano 2006-07-06 7:16 ` Martin Waitz 2006-07-07 16:33 ` Florian Weimer 1 sibling, 1 reply; 32+ messages in thread From: Junio C Hamano @ 2006-07-06 6:20 UTC (permalink / raw) To: Martin Waitz; +Cc: git Martin Waitz <tali@admingilde.org> writes: > The reason for that is that is_exact_match (in diffcore-rename.c) maps > both the source and destination file into memory and then compares them > byte for byte. This is a little bit too much for my little machine. > > When I remove the content comparation and only leave the sha1 > comparision, then my renames are correctly found in a second. But > unluckily, some other renames in the testcases are not correctly > handled any more. > > So is there an easy solution? Why is the content comparision in > is_exact_match needed? Because your working tree can be out of sync with respect to what's in the index, in which case we cannot trust the sha1 while running diff-index (without --cached flag). git-update-index --refresh before doing anything might help. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-06 6:20 ` Junio C Hamano @ 2006-07-06 7:16 ` Martin Waitz 2006-07-06 7:33 ` Junio C Hamano 0 siblings, 1 reply; 32+ messages in thread From: Martin Waitz @ 2006-07-06 7:16 UTC (permalink / raw) To: Junio C Hamano; +Cc: git [-- Attachment #1: Type: text/plain, Size: 706 bytes --] hoi :) On Wed, Jul 05, 2006 at 11:20:13PM -0700, Junio C Hamano wrote: > Martin Waitz <tali@admingilde.org> writes: > > Why is the content comparision in is_exact_match needed? > > Because your working tree can be out of sync with respect to > what's in the index, in which case we cannot trust the sha1 > while running diff-index (without --cached flag). so perhaps we need three phases instead of two: first sort out all renames that can be detected by the sha1, then compare file contents and finally do the diff. > git-update-index --refresh before doing anything might help. At the moment it is doing (N-1)^2 content compares, even if the index is in sync. -- Martin Waitz [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-06 7:16 ` Martin Waitz @ 2006-07-06 7:33 ` Junio C Hamano 2006-07-06 7:41 ` Martin Waitz 2006-07-06 17:55 ` Martin Waitz 0 siblings, 2 replies; 32+ messages in thread From: Junio C Hamano @ 2006-07-06 7:33 UTC (permalink / raw) To: Martin Waitz; +Cc: git Martin Waitz <tali@admingilde.org> writes: >> Because your working tree can be out of sync with respect to >> what's in the index, in which case we cannot trust the sha1 >> while running diff-index (without --cached flag). > > so perhaps we need three phases instead of two: > first sort out all renames that can be detected by the sha1, > then compare file contents and finally do the diff. Makes sort-of sense. Although I am not sure how much this would help with a regular workload, maybe something like this untested patch might help your situation? -- >8 -- diffcore-rename: try matching up renames without populating filespec first Signed-off-by: Junio C Hamano <junkio@cox.net> --- diff --git a/diffcore-rename.c b/diffcore-rename.c index d57e865..affff7a 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -96,11 +96,15 @@ static struct diff_rename_src *register_ return &(rename_src[first]); } -static int is_exact_match(struct diff_filespec *src, struct diff_filespec *dst) +static int is_exact_match(struct diff_filespec *src, + struct diff_filespec *dst, + int contents_too) { if (src->sha1_valid && dst->sha1_valid && !memcmp(src->sha1, dst->sha1, 20)) return 1; + if (!contents_too) + return 0; if (diff_populate_filespec(src, 1) || diff_populate_filespec(dst, 1)) return 0; if (src->size != dst->size) @@ -242,7 +246,7 @@ void diffcore_rename(struct diff_options struct diff_queue_struct *q = &diff_queued_diff; struct diff_queue_struct outq; struct diff_score *mx; - int i, j, rename_count; + int i, j, rename_count, contents_too; int num_create, num_src, dst_cnt; if (!minimum_score) @@ -273,16 +277,23 @@ void diffcore_rename(struct diff_options /* We really want to cull the candidates list early * with cheap tests in order to avoid doing deltas. + * The first round matches up the up-to-date entries, + * and then during the second round we try to match + * cache-dirty entries as well. */ - for (i = 0; i < rename_dst_nr; i++) { - struct diff_filespec *two = rename_dst[i].two; - for (j = 0; j < rename_src_nr; j++) { - struct diff_filespec *one = rename_src[j].one; - if (!is_exact_match(one, two)) - continue; - record_rename_pair(i, j, MAX_SCORE); - rename_count++; - break; /* we are done with this entry */ + for (contents_too = 0; contents_too < 2; contents_too++) { + for (i = 0; i < rename_dst_nr; i++) { + struct diff_filespec *two = rename_dst[i].two; + if (rename_dst[i].pair) + continue; /* dealt with an earlier round */ + for (j = 0; j < rename_src_nr; j++) { + struct diff_filespec *one = rename_src[j].one; + if (!is_exact_match(one, two, contents_too)) + continue; + record_rename_pair(i, j, MAX_SCORE); + rename_count++; + break; /* we are done with this entry */ + } } } ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-06 7:33 ` Junio C Hamano @ 2006-07-06 7:41 ` Martin Waitz 2006-07-06 17:55 ` Martin Waitz 1 sibling, 0 replies; 32+ messages in thread From: Martin Waitz @ 2006-07-06 7:41 UTC (permalink / raw) To: Junio C Hamano; +Cc: git [-- Attachment #1: Type: text/plain, Size: 539 bytes --] On Thu, Jul 06, 2006 at 12:33:33AM -0700, Junio C Hamano wrote: > Martin Waitz <tali@admingilde.org> writes: > > so perhaps we need three phases instead of two: > > first sort out all renames that can be detected by the sha1, > > then compare file contents and finally do the diff. > > Makes sort-of sense. > > Although I am not sure how much this would help with a regular > workload, maybe something like this untested patch might help > your situation? patch looks good, I'll try it in the evening. -- Martin Waitz [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-06 7:33 ` Junio C Hamano 2006-07-06 7:41 ` Martin Waitz @ 2006-07-06 17:55 ` Martin Waitz 1 sibling, 0 replies; 32+ messages in thread From: Martin Waitz @ 2006-07-06 17:55 UTC (permalink / raw) To: Junio C Hamano; +Cc: git [-- Attachment #1: Type: text/plain, Size: 275 bytes --] hoi :) On Thu, Jul 06, 2006 at 12:33:33AM -0700, Junio C Hamano wrote: > Although I am not sure how much this would help with a regular > workload, maybe something like this untested patch might help > your situation? works perfectly! Thanks! -- Martin Waitz [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-06 5:57 comparing file contents in is_exact_match? Martin Waitz 2006-07-06 6:20 ` Junio C Hamano @ 2006-07-07 16:33 ` Florian Weimer 2006-07-08 2:50 ` Johannes Schindelin 1 sibling, 1 reply; 32+ messages in thread From: Florian Weimer @ 2006-07-07 16:33 UTC (permalink / raw) To: git * Martin Waitz: > I created a git repository for my photo collection and then renamed > some photos (about 600). Now git status and commit get hit by > the OOM killer. > > The reason for that is that is_exact_match (in diffcore-rename.c) maps > both the source and destination file into memory and then compares them > byte for byte. This is a little bit too much for my little machine. Uhm, this shouldn't trigger the OOM killer, really. You already have physical backing storage for both files, so this shouldn't count towards the OOM limit. Ah, diff_populate_filespec has the following: s->data = mmap(NULL, s->size, PROT_READ, MAP_PRIVATE, fd, 0); Perhaps the following patch is in order? On some systems, MAP_PRIVATE might guarantee some form of repeatable reads, but I don't think GIT needs this to guard against concurrent modification. -- >8 -- diff_populate_filespec: use shared mapping It seems that on some systems, PROT_READ + MAP_PRIVATE counts towards the OOM limit, even though no additional backing store is required. Requesting MAP_SHARED mapping should fix this. Signed-off-by: Florian Weimer <fw@deneb.enyo.de> --- diff.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/diff.c b/diff.c index 428ff78..2b4367e 100644 --- a/diff.c +++ b/diff.c @@ -1007,7 +1007,7 @@ int diff_populate_filespec(struct diff_f fd = open(s->path, O_RDONLY); if (fd < 0) goto err_empty; - s->data = mmap(NULL, s->size, PROT_READ, MAP_PRIVATE, fd, 0); + s->data = mmap(NULL, s->size, PROT_READ, MAP_SHARED, fd, 0); close(fd); if (s->data == MAP_FAILED) goto err_empty; -- 1.4.0 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-07 16:33 ` Florian Weimer @ 2006-07-08 2:50 ` Johannes Schindelin 2006-07-16 9:05 ` Florian Weimer 0 siblings, 1 reply; 32+ messages in thread From: Johannes Schindelin @ 2006-07-08 2:50 UTC (permalink / raw) To: Florian Weimer; +Cc: git Hi, On Fri, 7 Jul 2006, Florian Weimer wrote: > - s->data = mmap(NULL, s->size, PROT_READ, MAP_PRIVATE, fd, 0); > + s->data = mmap(NULL, s->size, PROT_READ, MAP_SHARED, fd, 0); Be advised that this breaks setups with NO_MMAP, in particular most (all) cygwin setups I know of. Hth, Dscho ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-08 2:50 ` Johannes Schindelin @ 2006-07-16 9:05 ` Florian Weimer 2006-07-16 14:00 ` Johannes Schindelin 2006-07-16 15:03 ` Yakov Lerner 0 siblings, 2 replies; 32+ messages in thread From: Florian Weimer @ 2006-07-16 9:05 UTC (permalink / raw) To: git * Johannes Schindelin: > Hi, > > On Fri, 7 Jul 2006, Florian Weimer wrote: > >> - s->data = mmap(NULL, s->size, PROT_READ, MAP_PRIVATE, fd, 0); >> + s->data = mmap(NULL, s->size, PROT_READ, MAP_SHARED, fd, 0); > > Be advised that this breaks setups with NO_MMAP, in particular most (all) > cygwin setups I know of. Are these Cygwin setups running on top of the Windows 95 code base by chance? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-16 9:05 ` Florian Weimer @ 2006-07-16 14:00 ` Johannes Schindelin 2006-07-16 15:03 ` Yakov Lerner 1 sibling, 0 replies; 32+ messages in thread From: Johannes Schindelin @ 2006-07-16 14:00 UTC (permalink / raw) To: Florian Weimer; +Cc: git Hi, On Sun, 16 Jul 2006, Florian Weimer wrote: > * Johannes Schindelin: > > > Hi, > > > > On Fri, 7 Jul 2006, Florian Weimer wrote: > > > >> - s->data = mmap(NULL, s->size, PROT_READ, MAP_PRIVATE, fd, 0); > >> + s->data = mmap(NULL, s->size, PROT_READ, MAP_SHARED, fd, 0); > > > > Be advised that this breaks setups with NO_MMAP, in particular most (all) > > cygwin setups I know of. > > Are these Cygwin setups running on top of the Windows 95 code base by > chance? No. One is Windows2000, the other WindowsXP, and both need the NO_MMAP set. For obvious reasons, NO_MMAP does not support MAP_SHARED... Ciao, Dscho ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-16 9:05 ` Florian Weimer 2006-07-16 14:00 ` Johannes Schindelin @ 2006-07-16 15:03 ` Yakov Lerner 2006-07-16 22:36 ` Alex Riesen 1 sibling, 1 reply; 32+ messages in thread From: Yakov Lerner @ 2006-07-16 15:03 UTC (permalink / raw) To: Florian Weimer; +Cc: git On 7/16/06, Florian Weimer <fw@deneb.enyo.de> wrote: > * Johannes Schindelin: > > > Hi, > > > > On Fri, 7 Jul 2006, Florian Weimer wrote: > > > >> - s->data = mmap(NULL, s->size, PROT_READ, MAP_PRIVATE, fd, 0); > >> + s->data = mmap(NULL, s->size, PROT_READ, MAP_SHARED, fd, 0); > > > > Be advised that this breaks setups with NO_MMAP, in particular most (all) > > cygwin setups I know of. > > Are these Cygwin setups running on top of the Windows 95 code base by > chance? Cygwin has mmap. But cygwin's mmap() not good enough for git. What happens is that git does rename() when target file has active mmap(). In cygwin, this makes rename() to fail. This is what makes cygwin's mmap unusable for git. (BTW for read-only git access, mmap() will work on cygwin, for what I saw. But attempts to modify index will break). Yakov ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-16 15:03 ` Yakov Lerner @ 2006-07-16 22:36 ` Alex Riesen 2006-07-17 5:25 ` Florian Weimer 2006-07-17 12:11 ` Yakov Lerner 0 siblings, 2 replies; 32+ messages in thread From: Alex Riesen @ 2006-07-16 22:36 UTC (permalink / raw) To: Yakov Lerner; +Cc: Florian Weimer, git Yakov Lerner, Sun, Jul 16, 2006 17:03:49 +0200: > Cygwin has mmap. But cygwin's mmap() not good enough for git. > What happens is that git does rename() when target file has active mmap(). > In cygwin, this makes rename() to fail. This is what makes cygwin's > mmap unusable for git. (BTW for read-only git access, mmap() will work > on cygwin, for what I saw. But attempts to modify index will break). It is not Cygwin really. It's windows. You can't rename or delete an open or mmapped file in that thing. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-16 22:36 ` Alex Riesen @ 2006-07-17 5:25 ` Florian Weimer 2006-07-17 12:41 ` Johannes Schindelin 2006-07-17 12:11 ` Yakov Lerner 1 sibling, 1 reply; 32+ messages in thread From: Florian Weimer @ 2006-07-17 5:25 UTC (permalink / raw) To: git * Alex Riesen: > It is not Cygwin really. It's windows. You can't rename or delete an > open or mmapped file in that thing. And GIT's workaround is to read the whole file into memory and close it after that? Uh-oh. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-17 5:25 ` Florian Weimer @ 2006-07-17 12:41 ` Johannes Schindelin 2006-07-17 15:43 ` Linus Torvalds 2006-07-17 16:32 ` Juergen Ruehle 0 siblings, 2 replies; 32+ messages in thread From: Johannes Schindelin @ 2006-07-17 12:41 UTC (permalink / raw) To: Florian Weimer; +Cc: git Hi, On Mon, 17 Jul 2006, Florian Weimer wrote: > * Alex Riesen: > > > It is not Cygwin really. It's windows. You can't rename or delete an > > open or mmapped file in that thing. > > And GIT's workaround is to read the whole file into memory and close > it after that? Uh-oh. If you have a better idea (which does not make git source code ugly), go ahead, write a patch. But note that we usually use the whole contents of the mmap()ed file anyway. Ciao, Dscho ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-17 12:41 ` Johannes Schindelin @ 2006-07-17 15:43 ` Linus Torvalds 2006-07-17 16:05 ` Johannes Schindelin 2006-07-17 16:32 ` Juergen Ruehle 1 sibling, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2006-07-17 15:43 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Florian Weimer, git On Mon, 17 Jul 2006, Johannes Schindelin wrote: > > But note that we usually use the whole contents of the mmap()ed file > anyway. Not for pack-files, though. You may have a half-gigabyte pack-file, and only use a small small fraction of it. (You also never really rename them, so windows should be fine for that case) Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-17 15:43 ` Linus Torvalds @ 2006-07-17 16:05 ` Johannes Schindelin 2006-07-17 17:56 ` Linus Torvalds 2006-07-17 19:37 ` Geert Bosch 0 siblings, 2 replies; 32+ messages in thread From: Johannes Schindelin @ 2006-07-17 16:05 UTC (permalink / raw) To: Linus Torvalds; +Cc: Florian Weimer, git Hi, On Mon, 17 Jul 2006, Linus Torvalds wrote: > On Mon, 17 Jul 2006, Johannes Schindelin wrote: > > > > But note that we usually use the whole contents of the mmap()ed file > > anyway. > > Not for pack-files, though. You may have a half-gigabyte pack-file, and > only use a small small fraction of it. Right. > (You also never really rename them, so windows should be fine for that > case) So we could introduce a second mmap() which is normally the same as mmap(), except for windows, where it is a rename() and unlink() safe version of mmap() by reading the thing into RAM. Not very pretty. Ciao, Dscho ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-17 16:05 ` Johannes Schindelin @ 2006-07-17 17:56 ` Linus Torvalds 2006-07-17 18:15 ` Martin Waitz 2006-07-17 20:20 ` Alex Riesen 2006-07-17 19:37 ` Geert Bosch 1 sibling, 2 replies; 32+ messages in thread From: Linus Torvalds @ 2006-07-17 17:56 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Florian Weimer, git On Mon, 17 Jul 2006, Johannes Schindelin wrote: > > So we could introduce a second mmap() which is normally the same as > mmap(), except for windows, where it is a rename() and unlink() safe > version of mmap() by reading the thing into RAM. Not very pretty. Well, not too ugly either. Imagine having a "map_packfile()" interface, and letting different targets just implement their own version. That doesn't sound too bad, does it? Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-17 17:56 ` Linus Torvalds @ 2006-07-17 18:15 ` Martin Waitz 2006-07-17 20:20 ` Alex Riesen 1 sibling, 0 replies; 32+ messages in thread From: Martin Waitz @ 2006-07-17 18:15 UTC (permalink / raw) To: Linus Torvalds; +Cc: Johannes Schindelin, Florian Weimer, git [-- Attachment #1: Type: text/plain, Size: 506 bytes --] hoi :) On Mon, Jul 17, 2006 at 10:56:04AM -0700, Linus Torvalds wrote: > Imagine having a "map_packfile()" interface, and letting different targets > just implement their own version. That doesn't sound too bad, does it? we'd need something like that anyway if we want to support larger pack files. When it is only possible to map a part of the file into memory at a given time then we have to special case all accesses to the pack file, too. Or is there an easier way? -- Martin Waitz [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-17 17:56 ` Linus Torvalds 2006-07-17 18:15 ` Martin Waitz @ 2006-07-17 20:20 ` Alex Riesen 1 sibling, 0 replies; 32+ messages in thread From: Alex Riesen @ 2006-07-17 20:20 UTC (permalink / raw) To: Linus Torvalds; +Cc: Johannes Schindelin, Florian Weimer, git Linus Torvalds, Mon, Jul 17, 2006 19:56:04 +0200: > > So we could introduce a second mmap() which is normally the same as > > mmap(), except for windows, where it is a rename() and unlink() safe > > version of mmap() by reading the thing into RAM. Not very pretty. > > Well, not too ugly either. > > Imagine having a "map_packfile()" interface, and letting different targets > just implement their own version. That doesn't sound too bad, does it? > Maybe the patch below will at least help to identify the place where the interface could be used. It's a bit more than packfiles. I use it since march (I really meant to post it, but probably forgot). But careful, it has an ugly piece in git-compat-util.h, which probably makes the patch only useful on cygwin and nowhere else (is there any other platform which has NO_MMAP defined?). --- Makefile | 2 +- compat/realmmap.c | 26 ++++++++++++++++++++++++++ diff.c | 4 ++-- git-compat-util.h | 10 ++++++++-- sha1_file.c | 24 ++++++++++++------------ 5 files changed, 49 insertions(+), 17 deletions(-) diff --git a/Makefile b/Makefile index 01fb9cf..f16b466 100644 --- a/Makefile +++ b/Makefile @@ -432,7 +432,7 @@ ifdef NO_SETENV endif ifdef NO_MMAP COMPAT_CFLAGS += -DNO_MMAP - COMPAT_OBJS += compat/mmap.o + COMPAT_OBJS += compat/mmap.o compat/realmmap.o endif ifdef NO_IPV6 ALL_CFLAGS += -DNO_IPV6 diff --git a/compat/realmmap.c b/compat/realmmap.c new file mode 100644 index 0000000..8f26641 --- /dev/null +++ b/compat/realmmap.c @@ -0,0 +1,26 @@ +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> +#include <errno.h> +#include <sys/mman.h> +#include "../git-compat-util.h" + +#undef mmap +#undef munmap + +void *realmmap(void *start, size_t length, int prot , int flags, int fd, off_t offset) +{ + if (start != NULL || !(flags & MAP_PRIVATE)) { + errno = ENOTSUP; + return MAP_FAILED; + } + start = mmap(start, length, prot, flags, fd, offset); + return start; +} + +int realmunmap(void *start, size_t length) +{ + return munmap(start, length); +} + + diff --git a/diff.c b/diff.c index 8b44756..831bf82 100644 --- a/diff.c +++ b/diff.c @@ -1094,7 +1094,7 @@ int diff_populate_filespec(struct diff_f fd = open(s->path, O_RDONLY); if (fd < 0) goto err_empty; - s->data = mmap(NULL, s->size, PROT_READ, MAP_PRIVATE, fd, 0); + s->data = realmmap(NULL, s->size, PROT_READ, MAP_PRIVATE, fd, 0); close(fd); if (s->data == MAP_FAILED) goto err_empty; @@ -1126,7 +1126,7 @@ void diff_free_filespec_data(struct diff if (s->should_free) free(s->data); else if (s->should_munmap) - munmap(s->data, s->size); + realmunmap(s->data, s->size); s->should_free = s->should_munmap = 0; s->data = NULL; free(s->cnt_data); diff --git a/git-compat-util.h b/git-compat-util.h index 93f5580..503b8e4 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -46,22 +46,28 @@ extern void set_error_routine(void (*rou #ifdef NO_MMAP -#ifndef PROT_READ +#include <sys/mman.h> +/*#ifndef PROT_READ #define PROT_READ 1 #define PROT_WRITE 2 #define MAP_PRIVATE 1 #define MAP_FAILED ((void*)-1) -#endif +#endif*/ #define mmap gitfakemmap #define munmap gitfakemunmap extern void *gitfakemmap(void *start, size_t length, int prot , int flags, int fd, off_t offset); extern int gitfakemunmap(void *start, size_t length); +extern void *realmmap(void *start, size_t length, int prot , int flags, int fd, off_t offset); +extern int realmunmap(void *start, size_t length); + #else /* NO_MMAP */ #include <sys/mman.h> +#define realmmap mmap +#define realmunmap munmap #endif /* NO_MMAP */ #ifdef NO_SETENV diff --git a/sha1_file.c b/sha1_file.c index e666aec..38ab710 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -331,14 +331,14 @@ static void read_info_alternates(const c close(fd); return; } - map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0); + map = realmmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0); close(fd); if (map == MAP_FAILED) return; link_alt_odb_entries(map, map + st.st_size, '\n', relative_base, depth); - munmap(map, st.st_size); + realmunmap(map, st.st_size); } void prepare_alt_odb(void) @@ -394,7 +394,7 @@ static int check_packed_git_idx(const ch return -1; } idx_size = st.st_size; - idx_map = mmap(NULL, idx_size, PROT_READ, MAP_PRIVATE, fd, 0); + idx_map = realmmap(NULL, idx_size, PROT_READ, MAP_PRIVATE, fd, 0); close(fd); if (idx_map == MAP_FAILED) return -1; @@ -439,7 +439,7 @@ static int unuse_one_packed_git(void) } if (!lru) return 0; - munmap(lru->pack_base, lru->pack_size); + realmunmap(lru->pack_base, lru->pack_size); lru->pack_base = NULL; return 1; } @@ -476,7 +476,7 @@ int use_packed_git(struct packed_git *p) } if (st.st_size != p->pack_size) die("packfile %s size mismatch.", p->pack_name); - map = mmap(NULL, p->pack_size, PROT_READ, MAP_PRIVATE, fd, 0); + map = realmmap(NULL, p->pack_size, PROT_READ, MAP_PRIVATE, fd, 0); close(fd); if (map == MAP_FAILED) die("packfile %s cannot be mapped.", p->pack_name); @@ -511,7 +511,7 @@ struct packed_git *add_packed_git(char * /* do we have a corresponding .pack file? */ strcpy(path + path_len - 4, ".pack"); if (stat(path, &st) || !S_ISREG(st.st_mode)) { - munmap(idx_map, idx_size); + realmunmap(idx_map, idx_size); return NULL; } /* ok, it looks sane as far as we can check without @@ -676,7 +676,7 @@ static void *map_sha1_file_internal(cons */ sha1_file_open_flag = 0; } - map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0); + map = realmmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0); close(fd); if (map == MAP_FAILED) return NULL; @@ -1220,7 +1220,7 @@ int sha1_object_info(const unsigned char *sizep = size; } inflateEnd(&stream); - munmap(map, mapsize); + realmunmap(map, mapsize); return status; } @@ -1246,7 +1246,7 @@ void * read_sha1_file(const unsigned cha map = map_sha1_file_internal(sha1, &mapsize); if (map) { buf = unpack_sha1_file(map, mapsize, type, size); - munmap(map, mapsize); + realmunmap(map, mapsize); return buf; } reprepare_packed_git(); @@ -1543,7 +1543,7 @@ int write_sha1_to_fd(int fd, const unsig if (buf) { retval = write_buffer(fd, buf, objsize); - munmap(buf, objsize); + realmunmap(buf, objsize); return retval; } @@ -1726,7 +1726,7 @@ int index_fd(unsigned char *sha1, int fd buf = ""; if (size) - buf = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); + buf = realmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); close(fd); if (buf == MAP_FAILED) return -1; @@ -1740,7 +1740,7 @@ int index_fd(unsigned char *sha1, int fd ret = 0; } if (size) - munmap(buf, size); + realmunmap(buf, size); return ret; } -- 1.4.1.gb944 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-17 16:05 ` Johannes Schindelin 2006-07-17 17:56 ` Linus Torvalds @ 2006-07-17 19:37 ` Geert Bosch 2006-07-17 21:31 ` Linus Torvalds 1 sibling, 1 reply; 32+ messages in thread From: Geert Bosch @ 2006-07-17 19:37 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Linus Torvalds, Florian Weimer, git On Jul 17, 2006, at 12:05, Johannes Schindelin wrote: >> Not for pack-files, though. You may have a half-gigabyte pack- >> file, and >> only use a small small fraction of it. > > Right. > >> (You also never really rename them, so windows should be fine for >> that >> case) > > So we could introduce a second mmap() which is normally the same as > mmap(), except for windows, where it is a rename() and unlink() safe > version of mmap() by reading the thing into RAM. Not very pretty. Or we can avoid doing an mmap of the entire pack file, and instead try to be somewhat smart on limiting the size of the mmap's. This might be sufficient to help Windows and also solve the issue of finding contiguous address space for large packs on 32-bit systems. -Geert ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-17 19:37 ` Geert Bosch @ 2006-07-17 21:31 ` Linus Torvalds 2006-07-18 9:38 ` Yakov Lerner 0 siblings, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2006-07-17 21:31 UTC (permalink / raw) To: Geert Bosch; +Cc: Johannes Schindelin, Florian Weimer, Git Mailing List On Mon, 17 Jul 2006, Geert Bosch wrote: > > Or we can avoid doing an mmap of the entire pack file, and instead > try to be somewhat smart on limiting the size of the mmap's. > This might be sufficient to help Windows and also solve the > issue of finding contiguous address space for large packs on > 32-bit systems. Well, the thing is, you really _do_ want to mmap as much as possible of the pack-file as possible, if mmap() works. So even with large pack-files, you do want to mmap a huge chunk at a time, even if it turns out that you only need a very small part of it. For example, the commit data is at the very beginning of the pack-file, so if you only look at the history, you only look at a very small part of the pack-file, but you should not have to know how much you'll need ahead of time, so you'd still want the pack-file operations to act in a way that is efficient for the general case (which is to mmap as much as possible). So yes, we'll need to have some chunking layer at some point (when people have gigabyte pack-files) but I think that's a totally separate issue from the fact that we _do_ actually want mmap() (the _real_ kind of mmap) for pack-files. Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-17 21:31 ` Linus Torvalds @ 2006-07-18 9:38 ` Yakov Lerner 2006-07-18 10:20 ` Johannes Schindelin 2006-07-18 15:37 ` Linus Torvalds 0 siblings, 2 replies; 32+ messages in thread From: Yakov Lerner @ 2006-07-18 9:38 UTC (permalink / raw) Cc: Git Mailing List Linus Torvalds <torvalds@osdl.org> wrote: > Well, the thing is, you really _do_ want to mmap as much as possible of > the pack-file as possible, if mmap() works. Juergen Ruehle <j.ruehle@bmiag.de> wrote: > I did some > quick performance check and the NO_MMAP code path seems to be as fast > as the mmap one (even much faster in some cases). So the combination > of windows' memory management and git mmap usage doesn't seem so hot How about making this parameter (do-use-mmap vs not-use-mmap) a *runtime* parameter ? (Env. var. $GIT_MMAP or $GIT_USE_MMAP ?). What do you think ? I see two benefits: (1) much easier to benchmark two methods against each other (2) will always work on cygwin (automatic fallback to working method at runtime; say depending on filesystem) What do you say ? (Only in case when MMAP is enabled at build-time, of course) Yakov ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-18 9:38 ` Yakov Lerner @ 2006-07-18 10:20 ` Johannes Schindelin 2006-07-18 15:37 ` Linus Torvalds 1 sibling, 0 replies; 32+ messages in thread From: Johannes Schindelin @ 2006-07-18 10:20 UTC (permalink / raw) To: Yakov Lerner Cc: no To-header on input <""@pop.gmx.net>, Git Mailing List Hi, On Tue, 18 Jul 2006, Yakov Lerner wrote: > How about making this parameter (do-use-mmap vs not-use-mmap) > a *runtime* parameter ? (Env. var. $GIT_MMAP or $GIT_USE_MMAP ?). > What do you think ? I see two benefits: > (1) much easier to benchmark two methods against each other You will benchmark it only once, right? No need to have this in git for one-time use. > (2) will always work on cygwin (automatic fallback to working > method at runtime; say depending on filesystem) I think it is too complicated to depend on the filesystem, and too system specific (but hey, those who submit a patch are more right than others...). But an automatic fallback is not feasible: you would try to mmap(), and it would throw an "Access violation", which is MS speak for "Segmentation fault". Besides, if you would be able to detect a failed mmap(), it would be too late: the rename() or unlink() would already have taken place. Ciao, Dscho ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-18 9:38 ` Yakov Lerner 2006-07-18 10:20 ` Johannes Schindelin @ 2006-07-18 15:37 ` Linus Torvalds 2006-07-18 18:49 ` Alex Riesen 1 sibling, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2006-07-18 15:37 UTC (permalink / raw) To: Yakov Lerner; +Cc: Git Mailing List On Tue, 18 Jul 2006, Yakov Lerner wrote: > > How about making this parameter (do-use-mmap vs not-use-mmap) > a *runtime* parameter ? The thing is, there are really two different kinds of users, and one of them in particular (the pack-file case) shouldn't have the problems under Windows at all (because the pack-files aren't moved), and have a much bigger reason to use mmap in the first place (because accesses will be sparse, so reading the whole file is wasteful). The other user was the index file, and _that_ one is the one that gets renamed (to replace the previous index file), and that one is also not ever read only partially, so using "read()" instead of mmap is much less of an issue. So I really think we should just basically always mmap the pack-files (it should work everywhere), and make NO_MMAP just trigger on the other cases. Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-18 15:37 ` Linus Torvalds @ 2006-07-18 18:49 ` Alex Riesen 0 siblings, 0 replies; 32+ messages in thread From: Alex Riesen @ 2006-07-18 18:49 UTC (permalink / raw) To: Linus Torvalds; +Cc: Yakov Lerner, Git Mailing List On 7/18/06, Linus Torvalds <torvalds@osdl.org> wrote: > So I really think we should just basically always mmap the pack-files (it > should work everywhere), and make NO_MMAP just trigger on the other cases. And while we are at it make a mental note about avoiding platforms with broken mmap. For instance, QNX6 mmap with MAP_PRIVATE _always_ makes a copy of the whole! file, the stupid thing (they even documented it so!). The alternative flag, MAP_SHARED, does not work on some filesystems at all. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-17 12:41 ` Johannes Schindelin 2006-07-17 15:43 ` Linus Torvalds @ 2006-07-17 16:32 ` Juergen Ruehle 2006-07-17 17:10 ` Johannes Schindelin 2006-07-17 20:32 ` Yakov Lerner 1 sibling, 2 replies; 32+ messages in thread From: Juergen Ruehle @ 2006-07-17 16:32 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Florian Weimer, git Johannes Schindelin writes: > > > It is not Cygwin really. It's windows. You can't rename or delete an > > > open or mmapped file in that thing. > > > > And GIT's workaround is to read the whole file into memory and close > > it after that? Uh-oh. > > If you have a better idea (which does not make git source code ugly), go > ahead, write a patch. On several boxes I've tested the mmap code passes the tests on NTFS. It is usable for my simple work even on FAT32 with an unlink before the rename in lockfile.c, but that probably breaks in more involved scenarios. jr ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-17 16:32 ` Juergen Ruehle @ 2006-07-17 17:10 ` Johannes Schindelin 2006-07-17 17:37 ` Juergen Ruehle 2006-07-17 20:32 ` Yakov Lerner 1 sibling, 1 reply; 32+ messages in thread From: Johannes Schindelin @ 2006-07-17 17:10 UTC (permalink / raw) To: Juergen Ruehle; +Cc: Florian Weimer, git Hi, On Mon, 17 Jul 2006, Juergen Ruehle wrote: > Johannes Schindelin writes: > > > > It is not Cygwin really. It's windows. You can't rename or delete an > > > > open or mmapped file in that thing. > > > > > > And GIT's workaround is to read the whole file into memory and close > > > it after that? Uh-oh. > > > > If you have a better idea (which does not make git source code ugly), go > > ahead, write a patch. > > On several boxes I've tested the mmap code passes the tests on NTFS. > > It is usable for my simple work even on FAT32 with an unlink before > the rename in lockfile.c, but that probably breaks in more involved > scenarios. In my not-so-simple tests, it failed. Reproducibly. If I am not mistaken, your test just does not hit the problem, namely fork()ing after rename()ing a mmap()ed file. Now, with the diff changes a few months ago, there are way less fork()s in the code, that might be one reason you do not hit the wall, but I could imagine that a "git-read-tree -m -u" still has problems. Note: I did not test what I just said, but I _did_ test that there are problems with mmap() when you fork() with the map in use. Ciao, Dscho ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-17 17:10 ` Johannes Schindelin @ 2006-07-17 17:37 ` Juergen Ruehle 0 siblings, 0 replies; 32+ messages in thread From: Juergen Ruehle @ 2006-07-17 17:37 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Florian Weimer, git Johannes Schindelin writes: > In my not-so-simple tests, it failed. Reproducibly. If I am not mistaken, > your test just does not hit the problem, namely fork()ing after > rename()ing a mmap()ed file. No doubt about that. But until you post additional test cases, I can continue to claim my version passes all tests:-) jr ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-17 16:32 ` Juergen Ruehle 2006-07-17 17:10 ` Johannes Schindelin @ 2006-07-17 20:32 ` Yakov Lerner 2006-07-17 22:43 ` Juergen Ruehle 1 sibling, 1 reply; 32+ messages in thread From: Yakov Lerner @ 2006-07-17 20:32 UTC (permalink / raw) Cc: git On 7/17/06, Juergen Ruehle <j.ruehle@bmiag.de> wrote: > Johannes Schindelin writes: > > > > It is not Cygwin really. It's windows. You can't rename or delete an > > > > open or mmapped file in that thing. > > > > > > And GIT's workaround is to read the whole file into memory and close > > > it after that? Uh-oh. > > > > If you have a better idea (which does not make git source code ugly), go > > ahead, write a patch. > > On several boxes I've tested the mmap code passes the tests on NTFS. On me, it failed me on git-apply with more than 1 patches on the commandline. git-apply with 1 patch on the commandline passed, with two, failed. git-apply with two patches on commandline is the simplest testcase that exposes this problem, AFAIK. Yakov ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-17 20:32 ` Yakov Lerner @ 2006-07-17 22:43 ` Juergen Ruehle 2006-07-18 9:15 ` Yakov Lerner 0 siblings, 1 reply; 32+ messages in thread From: Juergen Ruehle @ 2006-07-17 22:43 UTC (permalink / raw) To: Yakov Lerner Yakov Lerner writes: > On me, it failed me on git-apply with more than 1 patches on > the commandline. > > git-apply with 1 patch on the commandline passed, with two, failed. > > git-apply with two patches on commandline is the simplest > testcase that exposes this problem, AFAIK. In that case tests 4109 and 4110 should fail, shouldn't they? They succed for me on NTFS (and fail on other FS). But anyway, I did some quick performance check and the NO_MMAP code path seems to be as fast as the mmap one (even much faster in some cases). So the combination of windows' memory management and git mmap usage doesn't seem so hot (especially considering the fact that git runs about twice as fast on FAT32 for me). jr ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-17 22:43 ` Juergen Ruehle @ 2006-07-18 9:15 ` Yakov Lerner 0 siblings, 0 replies; 32+ messages in thread From: Yakov Lerner @ 2006-07-18 9:15 UTC (permalink / raw) Cc: git On 7/17/06, Juergen Ruehle <j.ruehle@bmiag.de> wrote: > Yakov Lerner writes: > > On me, it failed me on git-apply with more than 1 patches on > > the commandline. > > > > git-apply with 1 patch on the commandline passed, with two, failed. > > > > git-apply with two patches on commandline is the simplest > > testcase that exposes this problem, AFAIK. > > In that case tests 4109 and 4110 should fail, shouldn't they? They > succed for me on NTFS (and fail on other FS). Mine FAT32 FS. Yakov > But anyway, I did some > quick performance check and the NO_MMAP code path seems to be as fast > as the mmap one (even much faster in some cases). So the combination > of windows' memory management and git mmap usage doesn't seem so hot > (especially considering the fact that git runs about twice as fast on > FAT32 for me). Yakov ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: comparing file contents in is_exact_match? 2006-07-16 22:36 ` Alex Riesen 2006-07-17 5:25 ` Florian Weimer @ 2006-07-17 12:11 ` Yakov Lerner 1 sibling, 0 replies; 32+ messages in thread From: Yakov Lerner @ 2006-07-17 12:11 UTC (permalink / raw) To: Alex Riesen; +Cc: Florian Weimer, git Florian Weimer wrote: > And GIT's workaround is to read the whole file into memory and close > it after that? Uh-oh. On 7/16/06, Alex Riesen <fork0@t-online.de> wrote: > Yakov Lerner, Sun, Jul 16, 2006 17:03:49 +0200: > > Cygwin has mmap. But cygwin's mmap() not good enough for git. > > What happens is that git does rename() when target file has active mmap(). > > In cygwin, this makes rename() to fail. This is what makes cygwin's > > mmap unusable for git. (BTW for read-only git access, mmap() will work > > on cygwin, for what I saw. But attempts to modify index will break). > > It is not Cygwin really. It's windows. You can't rename or delete an > open or mmapped file in that thing. We have right to expect cygwin to do some effort to emulate the unix mmap() semantics on top of different semantics of windows memory-mapped files. Why doesn't cygwin do some effort to detect active mmap()s on the file before rename and do something like (1) unmap (2) rename the target to temp name (3) remap. (4) take care of temp file cleanup ? Probably too messy even if possible. Is it reasonable to expect git to do this mess ? It probably belongs to the generic cygwin layer anyway. I did try another kind of workaround, in which I unmapped the target file before rename. But it didn't work in all situations and was too intrusive to the git-apply logic. Didn't work. Regarding the existing workaround (NO_MMAP), well, cygwin is not exactly a speed champion in many areas. I didn't really notice speed loss with NO_MMAP, maybe because cygwin loses time here and there anyway ? Yakov ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2006-07-18 18:49 UTC | newest] Thread overview: 32+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-07-06 5:57 comparing file contents in is_exact_match? Martin Waitz 2006-07-06 6:20 ` Junio C Hamano 2006-07-06 7:16 ` Martin Waitz 2006-07-06 7:33 ` Junio C Hamano 2006-07-06 7:41 ` Martin Waitz 2006-07-06 17:55 ` Martin Waitz 2006-07-07 16:33 ` Florian Weimer 2006-07-08 2:50 ` Johannes Schindelin 2006-07-16 9:05 ` Florian Weimer 2006-07-16 14:00 ` Johannes Schindelin 2006-07-16 15:03 ` Yakov Lerner 2006-07-16 22:36 ` Alex Riesen 2006-07-17 5:25 ` Florian Weimer 2006-07-17 12:41 ` Johannes Schindelin 2006-07-17 15:43 ` Linus Torvalds 2006-07-17 16:05 ` Johannes Schindelin 2006-07-17 17:56 ` Linus Torvalds 2006-07-17 18:15 ` Martin Waitz 2006-07-17 20:20 ` Alex Riesen 2006-07-17 19:37 ` Geert Bosch 2006-07-17 21:31 ` Linus Torvalds 2006-07-18 9:38 ` Yakov Lerner 2006-07-18 10:20 ` Johannes Schindelin 2006-07-18 15:37 ` Linus Torvalds 2006-07-18 18:49 ` Alex Riesen 2006-07-17 16:32 ` Juergen Ruehle 2006-07-17 17:10 ` Johannes Schindelin 2006-07-17 17:37 ` Juergen Ruehle 2006-07-17 20:32 ` Yakov Lerner 2006-07-17 22:43 ` Juergen Ruehle 2006-07-18 9:15 ` Yakov Lerner 2006-07-17 12:11 ` Yakov Lerner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).