* [PATCH 1/17] Replace unpack_entry_gently with unpack_entry.
[not found] <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>
@ 2006-12-23 7:33 ` Shawn O. Pearce
2006-12-23 7:33 ` [PATCH 2/17] Introduce new config option for mmap limit Shawn O. Pearce
` (15 subsequent siblings)
16 siblings, 0 replies; 25+ messages in thread
From: Shawn O. Pearce @ 2006-12-23 7:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
The unpack_entry_gently function currently has only two callers:
the delta base resolution in sha1_file.c and the main loop of
pack-check.c. Both of these must change to using unpack_entry
directly when we implement sliding window mmap logic, so I'm doing
it earlier to help break down the change set.
This may cause a slight performance decrease for delta base
resolution as well as for pack-check.c's verify_packfile(), as
the pack use counter will be incremented and decremented for every
object that is unpacked.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
cache.h | 2 +-
pack-check.c | 2 +-
sha1_file.c | 33 ++++++++++++---------------------
3 files changed, 14 insertions(+), 23 deletions(-)
diff --git a/cache.h b/cache.h
index 4943056..38db1bf 100644
--- a/cache.h
+++ b/cache.h
@@ -394,7 +394,7 @@ extern struct packed_git *add_packed_git(char *, int, int);
extern int num_packed_objects(const struct packed_git *p);
extern int nth_packed_object_sha1(const struct packed_git *, int, unsigned char*);
extern unsigned long find_pack_entry_one(const unsigned char *, struct packed_git *);
-extern void *unpack_entry_gently(struct packed_git *, unsigned long, char *, unsigned long *);
+extern void *unpack_entry(struct packed_git *, unsigned long, char *, unsigned long *);
extern unsigned long unpack_object_header_gently(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep);
extern void packed_object_info_detail(struct packed_git *, unsigned long, char *, unsigned long *, unsigned long *, unsigned int *, unsigned char *);
diff --git a/pack-check.c b/pack-check.c
index c0caaee..491bad2 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -51,7 +51,7 @@ static int verify_packfile(struct packed_git *p)
offset = find_pack_entry_one(sha1, p);
if (!offset)
die("internal error pack-check find-pack-entry-one");
- data = unpack_entry_gently(p, offset, type, &size);
+ data = unpack_entry(p, offset, type, &size);
if (!data) {
err = error("cannot unpack %s from %s",
sha1_to_hex(sha1), p->pack_name);
diff --git a/sha1_file.c b/sha1_file.c
index 1c4df5b..4824a5d 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1110,7 +1110,7 @@ static void *unpack_delta_entry(struct packed_git *p,
unsigned long result_size, base_size, base_offset;
offset = get_delta_base(p, offset, kind, obj_offset, &base_offset);
- base = unpack_entry_gently(p, base_offset, type, &base_size);
+ base = unpack_entry(p, base_offset, type, &base_size);
if (!base)
die("failed to read delta base object at %lu from %s",
base_offset, p->pack_name);
@@ -1127,43 +1127,34 @@ static void *unpack_delta_entry(struct packed_git *p,
return result;
}
-static void *unpack_entry(struct pack_entry *entry,
+void *unpack_entry(struct packed_git *p, unsigned long offset,
char *type, unsigned long *sizep)
{
- struct packed_git *p = entry->p;
+ unsigned long size, obj_offset = offset;
+ enum object_type kind;
void *retval;
if (use_packed_git(p))
die("cannot map packed file");
- retval = unpack_entry_gently(p, entry->offset, type, sizep);
- unuse_packed_git(p);
- if (!retval)
- die("corrupted pack file %s", p->pack_name);
- return retval;
-}
-
-/* The caller is responsible for use_packed_git()/unuse_packed_git() pair */
-void *unpack_entry_gently(struct packed_git *p, unsigned long offset,
- char *type, unsigned long *sizep)
-{
- unsigned long size, obj_offset = offset;
- enum object_type kind;
-
offset = unpack_object_header(p, offset, &kind, &size);
switch (kind) {
case OBJ_OFS_DELTA:
case OBJ_REF_DELTA:
- return unpack_delta_entry(p, offset, size, kind, obj_offset, type, sizep);
+ retval = unpack_delta_entry(p, offset, size, kind, obj_offset, type, sizep);
+ break;
case OBJ_COMMIT:
case OBJ_TREE:
case OBJ_BLOB:
case OBJ_TAG:
strcpy(type, type_names[kind]);
*sizep = size;
- return unpack_compressed_entry(p, offset, size);
+ retval = unpack_compressed_entry(p, offset, size);
+ break;
default:
- return NULL;
+ die("unknown object type %i in %s", kind, p->pack_name);
}
+ unuse_packed_git(p);
+ return retval;
}
int num_packed_objects(const struct packed_git *p)
@@ -1312,7 +1303,7 @@ static void *read_packed_sha1(const unsigned char *sha1, char *type, unsigned lo
error("cannot read sha1_file for %s", sha1_to_hex(sha1));
return NULL;
}
- return unpack_entry(&e, type, size);
+ return unpack_entry(e.p, e.offset, type, size);
}
void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size)
--
1.4.4.3.g87d8
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 2/17] Introduce new config option for mmap limit.
[not found] <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>
2006-12-23 7:33 ` [PATCH 1/17] Replace unpack_entry_gently with unpack_entry Shawn O. Pearce
@ 2006-12-23 7:33 ` Shawn O. Pearce
2006-12-23 7:33 ` [PATCH 3/17] Refactor packed_git to prepare for sliding mmap windows Shawn O. Pearce
` (14 subsequent siblings)
16 siblings, 0 replies; 25+ messages in thread
From: Shawn O. Pearce @ 2006-12-23 7:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Rather than hardcoding the maximum number of bytes which can be
mmapped from pack files we should make this value configurable,
allowing the end user to increase or decrease this limit on a
per-repository basis depending on the size of the repository
and the capabilities of their operating system.
In general users should not need to manually tune such a low-level
setting within the core code, but being able to artifically limit
the number of bytes which we can mmap at once from pack files will
make it easier to craft test cases for the new mmap sliding window
implementation.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
Documentation/config.txt | 9 +++++++++
cache.h | 1 +
config.c | 5 +++++
environment.c | 1 +
sha1_file.c | 3 +--
5 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 22482d6..4e93066 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -118,6 +118,15 @@ core.legacyheaders::
database directly (where the "http://" and "rsync://" protocols
count as direct access).
+core.packedGitLimit::
+ Maximum number of bytes to map simultaneously into memory
+ from pack files. If Git needs to access more than this many
+ bytes at once to complete an operation it will unmap existing
+ regions to reclaim virtual address space within the process.
+ Default is 256 MiB, which should be reasonable for all
+ users/operating systems, except on largest Git projects.
+ You probably do not need to adjust this value.
+
alias.*::
Command aliases for the gitlink:git[1] command wrapper - e.g.
after defining "alias.last = cat-file commit HEAD", the invocation
diff --git a/cache.h b/cache.h
index 38db1bf..ad94c3f 100644
--- a/cache.h
+++ b/cache.h
@@ -196,6 +196,7 @@ extern int warn_ambiguous_refs;
extern int shared_repository;
extern const char *apply_default_whitespace;
extern int zlib_compression_level;
+extern size_t packed_git_limit;
#define GIT_REPO_VERSION 0
extern int repository_format_version;
diff --git a/config.c b/config.c
index 1662a46..1e79f09 100644
--- a/config.c
+++ b/config.c
@@ -298,6 +298,11 @@ int git_default_config(const char *var, const char *value)
return 0;
}
+ if (!strcmp(var, "core.packedgitlimit")) {
+ packed_git_limit = git_config_int(var, value);
+ return 0;
+ }
+
if (!strcmp(var, "user.name")) {
strlcpy(git_default_name, value, sizeof(git_default_name));
return 0;
diff --git a/environment.c b/environment.c
index f8c7dbc..8a09df2 100644
--- a/environment.c
+++ b/environment.c
@@ -22,6 +22,7 @@ char git_commit_encoding[MAX_ENCODING_LENGTH] = "utf-8";
int shared_repository = PERM_UMASK;
const char *apply_default_whitespace;
int zlib_compression_level = Z_DEFAULT_COMPRESSION;
+size_t packed_git_limit = 256 * 1024 * 1024;
int pager_in_use;
int pager_use_color = 1;
diff --git a/sha1_file.c b/sha1_file.c
index 4824a5d..4183f59 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -397,7 +397,6 @@ static char *find_sha1_file(const unsigned char *sha1, struct stat *st)
return NULL;
}
-#define PACK_MAX_SZ (1<<26)
static int pack_used_ctr;
static unsigned long pack_mapped;
struct packed_git *packed_git;
@@ -490,7 +489,7 @@ int use_packed_git(struct packed_git *p)
struct pack_header *hdr;
pack_mapped += p->pack_size;
- while (PACK_MAX_SZ < pack_mapped && unuse_one_packed_git())
+ while (packed_git_limit < pack_mapped && unuse_one_packed_git())
; /* nothing */
fd = open(p->pack_name, O_RDONLY);
if (fd < 0)
--
1.4.4.3.g87d8
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 3/17] Refactor packed_git to prepare for sliding mmap windows.
[not found] <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>
2006-12-23 7:33 ` [PATCH 1/17] Replace unpack_entry_gently with unpack_entry Shawn O. Pearce
2006-12-23 7:33 ` [PATCH 2/17] Introduce new config option for mmap limit Shawn O. Pearce
@ 2006-12-23 7:33 ` Shawn O. Pearce
2006-12-23 7:33 ` [PATCH 4/17] Use off_t for index and pack file lengths Shawn O. Pearce
` (13 subsequent siblings)
16 siblings, 0 replies; 25+ messages in thread
From: Shawn O. Pearce @ 2006-12-23 7:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
The idea behind the sliding mmap window pack reader implementation
is to have multiple mmap regions active against the same pack file,
thereby allowing the process to mmap in only the active/hot sections
of the pack and reduce overall virtual address space usage.
To implement this we need to refactor the mmap related data
(pack_base, pack_use_cnt) out of struct packed_git and move them
into a new struct pack_window.
We are refactoring the code to support a single struct pack_window
per packfile, thereby emulating the prior behavior of mmap'ing the
entire pack file.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
builtin-pack-objects.c | 4 +-
cache.h | 13 +++++++++--
pack-check.c | 6 ++--
sha1_file.c | 53 ++++++++++++++++++++++-------------------------
4 files changed, 40 insertions(+), 36 deletions(-)
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 9e15beb..4a00a12 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -438,7 +438,7 @@ static unsigned long write_object(struct sha1file *f,
}
use_packed_git(p);
- buf = (char *) p->pack_base
+ buf = p->windows->base
+ entry->in_pack_offset
+ entry->in_pack_header_size;
datalen = find_packed_object_size(p, entry->in_pack_offset)
@@ -943,7 +943,7 @@ static void check_object(struct object_entry *entry)
struct object_entry *base_entry = NULL;
use_packed_git(p);
- buf = p->pack_base;
+ buf = p->windows->base;
buf += entry->in_pack_offset;
/* We want in_pack_type even if we do not reuse delta.
diff --git a/cache.h b/cache.h
index ad94c3f..bf1d776 100644
--- a/cache.h
+++ b/cache.h
@@ -336,14 +336,21 @@ extern struct alternate_object_database {
} *alt_odb_list;
extern void prepare_alt_odb(void);
+struct pack_window {
+ struct pack_window *next;
+ unsigned char *base;
+ off_t offset;
+ size_t len;
+ unsigned int last_used;
+ unsigned int inuse_cnt;
+};
+
extern struct packed_git {
struct packed_git *next;
unsigned long index_size;
unsigned long pack_size;
+ struct pack_window *windows;
unsigned int *index_base;
- void *pack_base;
- unsigned int pack_last_used;
- unsigned int pack_use_cnt;
int pack_local;
unsigned char sha1[20];
/* something like ".git/objects/pack/xxxxx.pack" */
diff --git a/pack-check.c b/pack-check.c
index 491bad2..761cc85 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -13,7 +13,8 @@ static int verify_packfile(struct packed_git *p)
int nr_objects, err, i;
/* Header consistency check */
- hdr = p->pack_base;
+ pack_base = p->windows->base;
+ hdr = (struct pack_header*)pack_base;
if (hdr->hdr_signature != htonl(PACK_SIGNATURE))
return error("Packfile %s signature mismatch", p->pack_name);
if (!pack_version_ok(hdr->hdr_version))
@@ -26,7 +27,6 @@ static int verify_packfile(struct packed_git *p)
num_packed_objects(p));
SHA1_Init(&ctx);
- pack_base = p->pack_base;
SHA1_Update(&ctx, pack_base, pack_size - 20);
SHA1_Final(sha1, &ctx);
if (hashcmp(sha1, (unsigned char *)pack_base + pack_size - 20))
@@ -78,7 +78,7 @@ static void show_pack_info(struct packed_git *p)
int nr_objects, i;
unsigned int chain_histogram[MAX_CHAIN];
- hdr = p->pack_base;
+ hdr = (struct pack_header*)p->windows->base;
nr_objects = ntohl(hdr->hdr_entries);
memset(chain_histogram, 0, sizeof(chain_histogram));
diff --git a/sha1_file.c b/sha1_file.c
index 4183f59..8377874 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -455,21 +455,23 @@ static int unuse_one_packed_git(void)
struct packed_git *p, *lru = NULL;
for (p = packed_git; p; p = p->next) {
- if (p->pack_use_cnt || !p->pack_base)
+ if (!p->windows || p->windows->inuse_cnt)
continue;
- if (!lru || p->pack_last_used < lru->pack_last_used)
+ if (!lru || p->windows->last_used < lru->windows->last_used)
lru = p;
}
if (!lru)
return 0;
- munmap(lru->pack_base, lru->pack_size);
- lru->pack_base = NULL;
+ munmap(lru->windows->base, lru->windows->len);
+ free(lru->windows);
+ lru->windows = NULL;
return 1;
}
void unuse_packed_git(struct packed_git *p)
{
- p->pack_use_cnt--;
+ if (p->windows)
+ p->windows->inuse_cnt--;
}
int use_packed_git(struct packed_git *p)
@@ -482,10 +484,10 @@ int use_packed_git(struct packed_git *p)
die("packfile %s not a regular file", p->pack_name);
p->pack_size = st.st_size;
}
- if (!p->pack_base) {
+ if (!p->windows) {
int fd;
struct stat st;
- void *map;
+ struct pack_window *win;
struct pack_header *hdr;
pack_mapped += p->pack_size;
@@ -500,16 +502,18 @@ int use_packed_git(struct packed_git *p)
}
if (st.st_size != p->pack_size)
die("packfile %s size mismatch.", p->pack_name);
- map = mmap(NULL, p->pack_size, PROT_READ, MAP_PRIVATE, fd, 0);
+ win = xcalloc(1, sizeof(*win));
+ win->len = p->pack_size;
+ win->base = mmap(NULL, p->pack_size, PROT_READ, MAP_PRIVATE, fd, 0);
close(fd);
- if (map == MAP_FAILED)
+ if (win->base == MAP_FAILED)
die("packfile %s cannot be mapped.", p->pack_name);
- p->pack_base = map;
+ p->windows = win;
/* Check if we understand this pack file. If we don't we're
* likely too old to handle it.
*/
- hdr = map;
+ hdr = (struct pack_header*)win->base;
if (hdr->hdr_signature != htonl(PACK_SIGNATURE))
die("packfile %s isn't actually a pack.", p->pack_name);
if (!pack_version_ok(hdr->hdr_version))
@@ -522,13 +526,13 @@ int use_packed_git(struct packed_git *p)
*/
if (hashcmp((unsigned char *)(p->index_base) +
p->index_size - 40,
- (unsigned char *)p->pack_base +
+ p->windows->base +
p->pack_size - 20)) {
die("packfile %s does not match index.", p->pack_name);
}
}
- p->pack_last_used = pack_used_ctr++;
- p->pack_use_cnt++;
+ p->windows->last_used = pack_used_ctr++;
+ p->windows->inuse_cnt++;
return 0;
}
@@ -557,10 +561,7 @@ struct packed_git *add_packed_git(char *path, int path_len, int local)
p->index_size = idx_size;
p->pack_size = st.st_size;
p->index_base = idx_map;
- p->next = NULL;
- p->pack_base = NULL;
- p->pack_last_used = 0;
- p->pack_use_cnt = 0;
+ p->windows = NULL;
p->pack_local = local;
if ((path_len > 44) && !get_sha1_hex(path + path_len - 44, sha1))
hashcpy(p->sha1, sha1);
@@ -590,10 +591,7 @@ struct packed_git *parse_pack_index_file(const unsigned char *sha1, char *idx_pa
p->index_size = idx_size;
p->pack_size = 0;
p->index_base = idx_map;
- p->next = NULL;
- p->pack_base = NULL;
- p->pack_last_used = 0;
- p->pack_use_cnt = 0;
+ p->windows = NULL;
hashcpy(p->sha1, sha1);
return p;
}
@@ -882,7 +880,7 @@ static unsigned long get_delta_base(struct packed_git *p,
unsigned long delta_obj_offset,
unsigned long *base_obj_offset)
{
- unsigned char *base_info = (unsigned char *) p->pack_base + offset;
+ unsigned char *base_info = p->windows->base + offset;
unsigned long base_offset;
/* there must be at least 20 bytes left regardless of delta type */
@@ -949,7 +947,7 @@ static int packed_delta_info(struct packed_git *p,
memset(&stream, 0, sizeof(stream));
- stream.next_in = (unsigned char *) p->pack_base + offset;
+ stream.next_in = p->windows->base + offset;
stream.avail_in = p->pack_size - offset;
stream.next_out = delta_head;
stream.avail_out = sizeof(delta_head);
@@ -984,8 +982,7 @@ static unsigned long unpack_object_header(struct packed_git *p, unsigned long of
if (p->pack_size <= offset)
die("object offset outside of pack file");
- used = unpack_object_header_gently((unsigned char *)p->pack_base +
- offset,
+ used = unpack_object_header_gently(p->windows->base + offset,
p->pack_size - offset, type, sizep);
if (!used)
die("object offset outside of pack file");
@@ -1031,7 +1028,7 @@ void packed_object_info_detail(struct packed_git *p,
if (p->pack_size <= offset + 20)
die("pack file %s records an incomplete delta base",
p->pack_name);
- next_sha1 = (unsigned char *) p->pack_base + offset;
+ next_sha1 = p->windows->base + offset;
if (*delta_chain_length == 0)
hashcpy(base_sha1, next_sha1);
offset = find_pack_entry_one(next_sha1, p);
@@ -1081,7 +1078,7 @@ static void *unpack_compressed_entry(struct packed_git *p,
buffer = xmalloc(size + 1);
buffer[size] = 0;
memset(&stream, 0, sizeof(stream));
- stream.next_in = (unsigned char*)p->pack_base + offset;
+ stream.next_in = p->windows->base + offset;
stream.avail_in = p->pack_size - offset;
stream.next_out = buffer;
stream.avail_out = size;
--
1.4.4.3.g87d8
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 4/17] Use off_t for index and pack file lengths.
[not found] <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>
` (2 preceding siblings ...)
2006-12-23 7:33 ` [PATCH 3/17] Refactor packed_git to prepare for sliding mmap windows Shawn O. Pearce
@ 2006-12-23 7:33 ` Shawn O. Pearce
2006-12-23 7:33 ` [PATCH 5/17] Create read_or_die utility routine Shawn O. Pearce
` (12 subsequent siblings)
16 siblings, 0 replies; 25+ messages in thread
From: Shawn O. Pearce @ 2006-12-23 7:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Since the index_size and pack_size members of struct packed_git
are the lengths of those corresponding files we should use the
off_t size of the operating system to store these file lengths,
rather than an unsigned long. This would help in the future should
we ever resurrect Junio's 64 bit index implementation.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
cache.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/cache.h b/cache.h
index bf1d776..936555c 100644
--- a/cache.h
+++ b/cache.h
@@ -347,10 +347,10 @@ struct pack_window {
extern struct packed_git {
struct packed_git *next;
- unsigned long index_size;
- unsigned long pack_size;
struct pack_window *windows;
unsigned int *index_base;
+ off_t index_size;
+ off_t pack_size;
int pack_local;
unsigned char sha1[20];
/* something like ".git/objects/pack/xxxxx.pack" */
--
1.4.4.3.g87d8
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 5/17] Create read_or_die utility routine.
[not found] <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>
` (3 preceding siblings ...)
2006-12-23 7:33 ` [PATCH 4/17] Use off_t for index and pack file lengths Shawn O. Pearce
@ 2006-12-23 7:33 ` Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 6/17] Refactor how we open pack files to prepare for multiple windows Shawn O. Pearce
` (11 subsequent siblings)
16 siblings, 0 replies; 25+ messages in thread
From: Shawn O. Pearce @ 2006-12-23 7:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Like write_or_die read_or_die reads the entire length requested
or it kills the current process with a die call.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
cache.h | 1 +
write_or_die.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+), 0 deletions(-)
diff --git a/cache.h b/cache.h
index 936555c..77f2338 100644
--- a/cache.h
+++ b/cache.h
@@ -428,6 +428,7 @@ extern char git_default_name[MAX_GITNAME];
extern char git_commit_encoding[MAX_ENCODING_LENGTH];
extern int copy_fd(int ifd, int ofd);
+extern void read_or_die(int fd, void *buf, size_t count);
extern void write_or_die(int fd, const void *buf, size_t count);
extern int write_or_whine(int fd, const void *buf, size_t count, const char *msg);
diff --git a/write_or_die.c b/write_or_die.c
index bfe4eeb..a56d992 100644
--- a/write_or_die.c
+++ b/write_or_die.c
@@ -1,5 +1,24 @@
#include "cache.h"
+void read_or_die(int fd, void *buf, size_t count)
+{
+ char *p = buf;
+ ssize_t loaded;
+
+ while (count > 0) {
+ loaded = xread(fd, p, count);
+ if (loaded == 0)
+ die("unexpected end of file");
+ else if (loaded < 0) {
+ if (errno == EPIPE)
+ exit(0);
+ die("read error (%s)", strerror(errno));
+ }
+ count -= loaded;
+ p += loaded;
+ }
+}
+
void write_or_die(int fd, const void *buf, size_t count)
{
const char *p = buf;
--
1.4.4.3.g87d8
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6/17] Refactor how we open pack files to prepare for multiple windows.
[not found] <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>
` (4 preceding siblings ...)
2006-12-23 7:33 ` [PATCH 5/17] Create read_or_die utility routine Shawn O. Pearce
@ 2006-12-23 7:34 ` Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 7/17] Replace use_packed_git with window cursors Shawn O. Pearce
` (10 subsequent siblings)
16 siblings, 0 replies; 25+ messages in thread
From: Shawn O. Pearce @ 2006-12-23 7:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
To efficiently support mmaping of multiple regions of the same pack
file we want to keep the pack's file descriptor open while we are
actively working with that pack. So we are now keeping that file
descriptor in packed_git.pack_fd and closing it only after we unmap
the last window.
This is going to increase the number of file descriptors that are
in use at once, however that will be bounded by the total number of
pack files present and therefore should not be very high. It is
a small tradeoff which we may need to revisit after some testing
can be done on various repositories and systems.
For code clarity we also want to seperate out the implementation
of how we open a pack file from the implementation which locates
a suitable window (or makes a new one) from the given pack file.
Since this is a rather large delta I'm taking advantage of doing
it now, in a fairly isolated change.
When we open a pack file we need to examine the header and trailer
without having a mmap in place, as we may only need to mmap
the middle section of this particular pack. Consequently the
verification code has been refactored to make use of the new
read_or_die function.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
cache.h | 1 +
sha1_file.c | 86 +++++++++++++++++++++++++++++++---------------------------
2 files changed, 47 insertions(+), 40 deletions(-)
diff --git a/cache.h b/cache.h
index 77f2338..0afc55e 100644
--- a/cache.h
+++ b/cache.h
@@ -351,6 +351,7 @@ extern struct packed_git {
unsigned int *index_base;
off_t index_size;
off_t pack_size;
+ int pack_fd;
int pack_local;
unsigned char sha1[20];
/* something like ".git/objects/pack/xxxxx.pack" */
diff --git a/sha1_file.c b/sha1_file.c
index 8377874..cc68a8b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -465,6 +465,8 @@ static int unuse_one_packed_git(void)
munmap(lru->windows->base, lru->windows->len);
free(lru->windows);
lru->windows = NULL;
+ close(p->pack_fd);
+ p->pack_fd = -1;
return 1;
}
@@ -474,62 +476,64 @@ void unuse_packed_git(struct packed_git *p)
p->windows->inuse_cnt--;
}
-int use_packed_git(struct packed_git *p)
+static void open_packed_git(struct packed_git *p)
{
+ struct stat st;
+ struct pack_header hdr;
+ unsigned char sha1[20];
+ unsigned char *idx_sha1;
+
+ p->pack_fd = open(p->pack_name, O_RDONLY);
+ if (p->pack_fd < 0 || fstat(p->pack_fd, &st))
+ die("packfile %s cannot be opened", p->pack_name);
+
+ /* If we created the struct before we had the pack we lack size. */
if (!p->pack_size) {
- struct stat st;
- /* We created the struct before we had the pack */
- stat(p->pack_name, &st);
if (!S_ISREG(st.st_mode))
die("packfile %s not a regular file", p->pack_name);
p->pack_size = st.st_size;
- }
+ } else if (p->pack_size != st.st_size)
+ die("packfile %s size changed", p->pack_name);
+
+ /* Verify we recognize this pack file format. */
+ read_or_die(p->pack_fd, &hdr, sizeof(hdr));
+ if (hdr.hdr_signature != htonl(PACK_SIGNATURE))
+ die("file %s is not a GIT packfile", p->pack_name);
+ if (!pack_version_ok(hdr.hdr_version))
+ die("packfile %s is version %u and not supported"
+ " (try upgrading GIT to a newer version)",
+ p->pack_name, ntohl(hdr.hdr_version));
+
+ /* Verify the pack matches its index. */
+ if (num_packed_objects(p) != ntohl(hdr.hdr_entries))
+ die("packfile %s claims to have %u objects"
+ " while index size indicates %u objects",
+ p->pack_name, ntohl(hdr.hdr_entries),
+ num_packed_objects(p));
+ if (lseek(p->pack_fd, p->pack_size - sizeof(sha1), SEEK_SET) == -1)
+ die("end of packfile %s is unavailable", p->pack_name);
+ read_or_die(p->pack_fd, sha1, sizeof(sha1));
+ idx_sha1 = ((unsigned char *)p->index_base) + p->index_size - 40;
+ if (hashcmp(sha1, idx_sha1))
+ die("packfile %s does not match index", p->pack_name);
+}
+
+int use_packed_git(struct packed_git *p)
+{
+ if (p->pack_fd == -1)
+ open_packed_git(p);
if (!p->windows) {
- int fd;
- struct stat st;
struct pack_window *win;
- struct pack_header *hdr;
pack_mapped += p->pack_size;
while (packed_git_limit < pack_mapped && unuse_one_packed_git())
; /* nothing */
- fd = open(p->pack_name, O_RDONLY);
- if (fd < 0)
- die("packfile %s cannot be opened", p->pack_name);
- if (fstat(fd, &st)) {
- close(fd);
- die("packfile %s cannot be opened", p->pack_name);
- }
- if (st.st_size != p->pack_size)
- die("packfile %s size mismatch.", p->pack_name);
win = xcalloc(1, sizeof(*win));
win->len = p->pack_size;
- win->base = mmap(NULL, p->pack_size, PROT_READ, MAP_PRIVATE, fd, 0);
- close(fd);
+ win->base = mmap(NULL, p->pack_size, PROT_READ, MAP_PRIVATE, p->pack_fd, 0);
if (win->base == MAP_FAILED)
die("packfile %s cannot be mapped.", p->pack_name);
p->windows = win;
-
- /* Check if we understand this pack file. If we don't we're
- * likely too old to handle it.
- */
- hdr = (struct pack_header*)win->base;
- if (hdr->hdr_signature != htonl(PACK_SIGNATURE))
- die("packfile %s isn't actually a pack.", p->pack_name);
- if (!pack_version_ok(hdr->hdr_version))
- die("packfile %s is version %i and not supported"
- " (try upgrading GIT to a newer version)",
- p->pack_name, ntohl(hdr->hdr_version));
-
- /* Check if the pack file matches with the index file.
- * this is cheap.
- */
- if (hashcmp((unsigned char *)(p->index_base) +
- p->index_size - 40,
- p->windows->base +
- p->pack_size - 20)) {
- die("packfile %s does not match index.", p->pack_name);
- }
}
p->windows->last_used = pack_used_ctr++;
p->windows->inuse_cnt++;
@@ -562,6 +566,7 @@ struct packed_git *add_packed_git(char *path, int path_len, int local)
p->pack_size = st.st_size;
p->index_base = idx_map;
p->windows = NULL;
+ p->pack_fd = -1;
p->pack_local = local;
if ((path_len > 44) && !get_sha1_hex(path + path_len - 44, sha1))
hashcpy(p->sha1, sha1);
@@ -592,6 +597,7 @@ struct packed_git *parse_pack_index_file(const unsigned char *sha1, char *idx_pa
p->pack_size = 0;
p->index_base = idx_map;
p->windows = NULL;
+ p->pack_fd = -1;
hashcpy(p->sha1, sha1);
return p;
}
--
1.4.4.3.g87d8
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 7/17] Replace use_packed_git with window cursors.
[not found] <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>
` (5 preceding siblings ...)
2006-12-23 7:34 ` [PATCH 6/17] Refactor how we open pack files to prepare for multiple windows Shawn O. Pearce
@ 2006-12-23 7:34 ` Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 8/17] Loop over pack_windows when inflating/accessing data Shawn O. Pearce
` (9 subsequent siblings)
16 siblings, 0 replies; 25+ messages in thread
From: Shawn O. Pearce @ 2006-12-23 7:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Part of the implementation concept of the sliding mmap window for
pack access is to permit multiple windows per pack to be mapped
independently. Since the inuse_cnt is associated with the mmap and
not with the file, this value is in struct pack_window and needs to
be incremented/decremented for each pack_window accessed by any code.
To faciliate that implementation we need to replace all uses of
use_packed_git() and unuse_packed_git() with a different API that
follows struct pack_window objects rather than struct packed_git.
The way this works is when we need to start accessing a pack for
the first time we should setup a new window 'cursor' by declaring
a local and setting it to NULL:
struct pack_windows *w_curs = NULL;
To obtain the memory region which contains a specific section of
the pack file we invoke use_pack(), supplying the address of our
current window cursor:
unsigned int len;
unsigned char *addr = use_pack(p, &w_curs, offset, &len);
the returned address `addr` will be the first byte at `offset`
within the pack file. The optional variable len will also be
updated with the number of bytes remaining following the address.
Multiple calls to use_pack() with the same window cursor will
update the window cursor, moving it from one window to another
when necessary. In this way each window cursor variable maintains
only one struct pack_window inuse at a time.
Finally before exiting the scope which originally declared the window
cursor we must invoke unuse_pack() to unuse the current window (which
may be different from the one that was first obtained from use_pack):
unuse_pack(&w_curs);
This implementation is still not complete with regards to multiple
windows, as only one window per pack file is supported right now.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
builtin-pack-objects.c | 16 +++----
cache.h | 4 +-
pack-check.c | 22 +++++----
sha1_file.c | 110 ++++++++++++++++++++++++++++--------------------
4 files changed, 85 insertions(+), 67 deletions(-)
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 4a00a12..6d7ae7f 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -416,6 +416,7 @@ static unsigned long write_object(struct sha1file *f,
}
else {
struct packed_git *p = entry->in_pack;
+ struct pack_window *w_curs = NULL;
if (entry->delta) {
obj_type = (allow_ofs_delta && entry->delta->offset) ?
@@ -437,16 +438,14 @@ static unsigned long write_object(struct sha1file *f,
hdrlen += 20;
}
- use_packed_git(p);
- buf = p->windows->base
- + entry->in_pack_offset
- + entry->in_pack_header_size;
+ buf = use_pack(p, &w_curs, entry->in_pack_offset
+ + entry->in_pack_header_size, NULL);
datalen = find_packed_object_size(p, entry->in_pack_offset)
- entry->in_pack_header_size;
if (!pack_to_stdout && check_inflate(buf, datalen, entry->size))
die("corrupt delta in pack %s", sha1_to_hex(entry->sha1));
sha1write(f, buf, datalen);
- unuse_packed_git(p);
+ unuse_pack(&w_curs);
reused++;
}
if (entry->delta)
@@ -937,14 +936,13 @@ static void check_object(struct object_entry *entry)
if (entry->in_pack && !entry->preferred_base) {
struct packed_git *p = entry->in_pack;
+ struct pack_window *w_curs = NULL;
unsigned long left = p->pack_size - entry->in_pack_offset;
unsigned long size, used;
unsigned char *buf;
struct object_entry *base_entry = NULL;
- use_packed_git(p);
- buf = p->windows->base;
- buf += entry->in_pack_offset;
+ buf = use_pack(p, &w_curs, entry->in_pack_offset, NULL);
/* We want in_pack_type even if we do not reuse delta.
* There is no point not reusing non-delta representations.
@@ -990,7 +988,7 @@ static void check_object(struct object_entry *entry)
if (base_name)
base_entry = locate_object_entry(base_name);
}
- unuse_packed_git(p);
+ unuse_pack(&w_curs);
entry->in_pack_header_size = used;
if (base_entry) {
diff --git a/cache.h b/cache.h
index 0afc55e..b294bbf 100644
--- a/cache.h
+++ b/cache.h
@@ -397,8 +397,8 @@ extern void install_packed_git(struct packed_git *pack);
extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
struct packed_git *packs);
-extern int use_packed_git(struct packed_git *);
-extern void unuse_packed_git(struct packed_git *);
+extern unsigned char* use_pack(struct packed_git *, struct pack_window **, unsigned long, unsigned int *);
+extern void unuse_pack(struct pack_window **);
extern struct packed_git *add_packed_git(char *, int, int);
extern int num_packed_objects(const struct packed_git *p);
extern int nth_packed_object_sha1(const struct packed_git *, int, unsigned char*);
diff --git a/pack-check.c b/pack-check.c
index 761cc85..972916f 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -1,7 +1,8 @@
#include "cache.h"
#include "pack.h"
-static int verify_packfile(struct packed_git *p)
+static int verify_packfile(struct packed_git *p,
+ struct pack_window **w_curs)
{
unsigned long index_size = p->index_size;
void *index_base = p->index_base;
@@ -13,7 +14,7 @@ static int verify_packfile(struct packed_git *p)
int nr_objects, err, i;
/* Header consistency check */
- pack_base = p->windows->base;
+ pack_base = use_pack(p, w_curs, 0, NULL);
hdr = (struct pack_header*)pack_base;
if (hdr->hdr_signature != htonl(PACK_SIGNATURE))
return error("Packfile %s signature mismatch", p->pack_name);
@@ -72,13 +73,14 @@ static int verify_packfile(struct packed_git *p)
#define MAX_CHAIN 40
-static void show_pack_info(struct packed_git *p)
+static void show_pack_info(struct packed_git *p,
+ struct pack_window **w_curs)
{
struct pack_header *hdr;
int nr_objects, i;
unsigned int chain_histogram[MAX_CHAIN];
- hdr = (struct pack_header*)p->windows->base;
+ hdr = (struct pack_header*)use_pack(p, w_curs, 0, NULL);
nr_objects = ntohl(hdr->hdr_entries);
memset(chain_histogram, 0, sizeof(chain_histogram));
@@ -142,18 +144,18 @@ int verify_pack(struct packed_git *p, int verbose)
if (!ret) {
/* Verify pack file */
- use_packed_git(p);
- ret = verify_packfile(p);
- unuse_packed_git(p);
+ struct pack_window *w_curs = NULL;
+ ret = verify_packfile(p, &w_curs);
+ unuse_pack(&w_curs);
}
if (verbose) {
if (ret)
printf("%s: bad\n", p->pack_name);
else {
- use_packed_git(p);
- show_pack_info(p);
- unuse_packed_git(p);
+ struct pack_window *w_curs = NULL;
+ show_pack_info(p, &w_curs);
+ unuse_pack(&w_curs);
printf("%s: ok\n", p->pack_name);
}
}
diff --git a/sha1_file.c b/sha1_file.c
index cc68a8b..ef3f056 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -470,10 +470,13 @@ static int unuse_one_packed_git(void)
return 1;
}
-void unuse_packed_git(struct packed_git *p)
+void unuse_pack(struct pack_window **w_cursor)
{
- if (p->windows)
- p->windows->inuse_cnt--;
+ struct pack_window *w = *w_cursor;
+ if (w) {
+ w->inuse_cnt--;
+ *w_cursor = NULL;
+ }
}
static void open_packed_git(struct packed_git *p)
@@ -518,13 +521,16 @@ static void open_packed_git(struct packed_git *p)
die("packfile %s does not match index", p->pack_name);
}
-int use_packed_git(struct packed_git *p)
+unsigned char* use_pack(struct packed_git *p,
+ struct pack_window **w_cursor,
+ unsigned long offset,
+ unsigned int *left)
{
+ struct pack_window *win = p->windows;
+
if (p->pack_fd == -1)
open_packed_git(p);
- if (!p->windows) {
- struct pack_window *win;
-
+ if (!win) {
pack_mapped += p->pack_size;
while (packed_git_limit < pack_mapped && unuse_one_packed_git())
; /* nothing */
@@ -535,9 +541,14 @@ int use_packed_git(struct packed_git *p)
die("packfile %s cannot be mapped.", p->pack_name);
p->windows = win;
}
- p->windows->last_used = pack_used_ctr++;
- p->windows->inuse_cnt++;
- return 0;
+ if (win != *w_cursor) {
+ win->last_used = pack_used_ctr++;
+ win->inuse_cnt++;
+ *w_cursor = win;
+ }
+ if (left)
+ *left = win->len - offset;
+ return win->base + offset;
}
struct packed_git *add_packed_git(char *path, int path_len, int local)
@@ -881,12 +892,13 @@ void * unpack_sha1_file(void *map, unsigned long mapsize, char *type, unsigned l
}
static unsigned long get_delta_base(struct packed_git *p,
+ struct pack_window **w_curs,
unsigned long offset,
enum object_type kind,
unsigned long delta_obj_offset,
unsigned long *base_obj_offset)
{
- unsigned char *base_info = p->windows->base + offset;
+ unsigned char *base_info = use_pack(p, w_curs, offset, NULL);
unsigned long base_offset;
/* there must be at least 20 bytes left regardless of delta type */
@@ -926,6 +938,7 @@ static int packed_object_info(struct packed_git *p, unsigned long offset,
char *type, unsigned long *sizep);
static int packed_delta_info(struct packed_git *p,
+ struct pack_window **w_curs,
unsigned long offset,
enum object_type kind,
unsigned long obj_offset,
@@ -934,7 +947,8 @@ static int packed_delta_info(struct packed_git *p,
{
unsigned long base_offset;
- offset = get_delta_base(p, offset, kind, obj_offset, &base_offset);
+ offset = get_delta_base(p, w_curs, offset, kind,
+ obj_offset, &base_offset);
/* We choose to only get the type of the base object and
* ignore potentially corrupt pack file that expects the delta
@@ -953,8 +967,7 @@ static int packed_delta_info(struct packed_git *p,
memset(&stream, 0, sizeof(stream));
- stream.next_in = p->windows->base + offset;
- stream.avail_in = p->pack_size - offset;
+ stream.next_in = use_pack(p, w_curs, offset, &stream.avail_in);
stream.next_out = delta_head;
stream.avail_out = sizeof(delta_head);
@@ -980,16 +993,18 @@ static int packed_delta_info(struct packed_git *p,
return 0;
}
-static unsigned long unpack_object_header(struct packed_git *p, unsigned long offset,
- enum object_type *type, unsigned long *sizep)
+static unsigned long unpack_object_header(struct packed_git *p,
+ struct pack_window **w_curs,
+ unsigned long offset,
+ enum object_type *type,
+ unsigned long *sizep)
{
+ unsigned char *base;
+ unsigned int left;
unsigned long used;
- if (p->pack_size <= offset)
- die("object offset outside of pack file");
-
- used = unpack_object_header_gently(p->windows->base + offset,
- p->pack_size - offset, type, sizep);
+ base = use_pack(p, w_curs, offset, &left);
+ used = unpack_object_header_gently(base, left, type, sizep);
if (!used)
die("object offset outside of pack file");
@@ -1004,13 +1019,14 @@ void packed_object_info_detail(struct packed_git *p,
unsigned int *delta_chain_length,
unsigned char *base_sha1)
{
+ struct pack_window *w_curs = NULL;
unsigned long obj_offset, val;
unsigned char *next_sha1;
enum object_type kind;
*delta_chain_length = 0;
obj_offset = offset;
- offset = unpack_object_header(p, offset, &kind, size);
+ offset = unpack_object_header(p, &w_curs, offset, &kind, size);
for (;;) {
switch (kind) {
@@ -1023,25 +1039,24 @@ void packed_object_info_detail(struct packed_git *p,
case OBJ_TAG:
strcpy(type, type_names[kind]);
*store_size = 0; /* notyet */
+ unuse_pack(&w_curs);
return;
case OBJ_OFS_DELTA:
- get_delta_base(p, offset, kind, obj_offset, &offset);
+ get_delta_base(p, &w_curs, offset, kind,
+ obj_offset, &offset);
if (*delta_chain_length == 0) {
/* TODO: find base_sha1 as pointed by offset */
}
break;
case OBJ_REF_DELTA:
- if (p->pack_size <= offset + 20)
- die("pack file %s records an incomplete delta base",
- p->pack_name);
- next_sha1 = p->windows->base + offset;
+ next_sha1 = use_pack(p, &w_curs, offset, NULL);
if (*delta_chain_length == 0)
hashcpy(base_sha1, next_sha1);
offset = find_pack_entry_one(next_sha1, p);
break;
}
obj_offset = offset;
- offset = unpack_object_header(p, offset, &kind, &val);
+ offset = unpack_object_header(p, &w_curs, offset, &kind, &val);
(*delta_chain_length)++;
}
}
@@ -1049,20 +1064,26 @@ void packed_object_info_detail(struct packed_git *p,
static int packed_object_info(struct packed_git *p, unsigned long offset,
char *type, unsigned long *sizep)
{
+ struct pack_window *w_curs = NULL;
unsigned long size, obj_offset = offset;
enum object_type kind;
+ int r;
- offset = unpack_object_header(p, offset, &kind, &size);
+ offset = unpack_object_header(p, &w_curs, offset, &kind, &size);
switch (kind) {
case OBJ_OFS_DELTA:
case OBJ_REF_DELTA:
- return packed_delta_info(p, offset, kind, obj_offset, type, sizep);
+ r = packed_delta_info(p, &w_curs, offset, kind,
+ obj_offset, type, sizep);
+ unuse_pack(&w_curs);
+ return r;
case OBJ_COMMIT:
case OBJ_TREE:
case OBJ_BLOB:
case OBJ_TAG:
strcpy(type, type_names[kind]);
+ unuse_pack(&w_curs);
break;
default:
die("pack %s contains unknown object type %d",
@@ -1074,6 +1095,7 @@ static int packed_object_info(struct packed_git *p, unsigned long offset,
}
static void *unpack_compressed_entry(struct packed_git *p,
+ struct pack_window **w_curs,
unsigned long offset,
unsigned long size)
{
@@ -1084,8 +1106,7 @@ static void *unpack_compressed_entry(struct packed_git *p,
buffer = xmalloc(size + 1);
buffer[size] = 0;
memset(&stream, 0, sizeof(stream));
- stream.next_in = p->windows->base + offset;
- stream.avail_in = p->pack_size - offset;
+ stream.next_in = use_pack(p, w_curs, offset, &stream.avail_in);
stream.next_out = buffer;
stream.avail_out = size;
@@ -1101,6 +1122,7 @@ static void *unpack_compressed_entry(struct packed_git *p,
}
static void *unpack_delta_entry(struct packed_git *p,
+ struct pack_window **w_curs,
unsigned long offset,
unsigned long delta_size,
enum object_type kind,
@@ -1111,13 +1133,14 @@ static void *unpack_delta_entry(struct packed_git *p,
void *delta_data, *result, *base;
unsigned long result_size, base_size, base_offset;
- offset = get_delta_base(p, offset, kind, obj_offset, &base_offset);
+ offset = get_delta_base(p, w_curs, offset, kind,
+ obj_offset, &base_offset);
base = unpack_entry(p, base_offset, type, &base_size);
if (!base)
die("failed to read delta base object at %lu from %s",
base_offset, p->pack_name);
- delta_data = unpack_compressed_entry(p, offset, delta_size);
+ delta_data = unpack_compressed_entry(p, w_curs, offset, delta_size);
result = patch_delta(base, base_size,
delta_data, delta_size,
&result_size);
@@ -1132,17 +1155,17 @@ static void *unpack_delta_entry(struct packed_git *p,
void *unpack_entry(struct packed_git *p, unsigned long offset,
char *type, unsigned long *sizep)
{
+ struct pack_window *w_curs = NULL;
unsigned long size, obj_offset = offset;
enum object_type kind;
void *retval;
- if (use_packed_git(p))
- die("cannot map packed file");
- offset = unpack_object_header(p, offset, &kind, &size);
+ offset = unpack_object_header(p, &w_curs, offset, &kind, &size);
switch (kind) {
case OBJ_OFS_DELTA:
case OBJ_REF_DELTA:
- retval = unpack_delta_entry(p, offset, size, kind, obj_offset, type, sizep);
+ retval = unpack_delta_entry(p, &w_curs, offset, size,
+ kind, obj_offset, type, sizep);
break;
case OBJ_COMMIT:
case OBJ_TREE:
@@ -1150,12 +1173,12 @@ void *unpack_entry(struct packed_git *p, unsigned long offset,
case OBJ_TAG:
strcpy(type, type_names[kind]);
*sizep = size;
- retval = unpack_compressed_entry(p, offset, size);
+ retval = unpack_compressed_entry(p, &w_curs, offset, size);
break;
default:
die("unknown object type %i in %s", kind, p->pack_name);
}
- unuse_packed_git(p);
+ unuse_pack(&w_curs);
return retval;
}
@@ -1282,7 +1305,6 @@ static int sha1_loose_object_info(const unsigned char *sha1, char *type, unsigne
int sha1_object_info(const unsigned char *sha1, char *type, unsigned long *sizep)
{
- int status;
struct pack_entry e;
if (!find_pack_entry(sha1, &e, NULL)) {
@@ -1290,11 +1312,7 @@ int sha1_object_info(const unsigned char *sha1, char *type, unsigned long *sizep
if (!find_pack_entry(sha1, &e, NULL))
return sha1_loose_object_info(sha1, type, sizep);
}
- if (use_packed_git(e.p))
- die("cannot map packed file");
- status = packed_object_info(e.p, e.offset, type, sizep);
- unuse_packed_git(e.p);
- return status;
+ return packed_object_info(e.p, e.offset, type, sizep);
}
static void *read_packed_sha1(const unsigned char *sha1, char *type, unsigned long *size)
--
1.4.4.3.g87d8
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 8/17] Loop over pack_windows when inflating/accessing data.
[not found] <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>
` (6 preceding siblings ...)
2006-12-23 7:34 ` [PATCH 7/17] Replace use_packed_git with window cursors Shawn O. Pearce
@ 2006-12-23 7:34 ` Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 9/17] Document why header parsing won't exceed a window Shawn O. Pearce
` (8 subsequent siblings)
16 siblings, 0 replies; 25+ messages in thread
From: Shawn O. Pearce @ 2006-12-23 7:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
When multiple mmaps start getting used for all pack file access it
is not possible to get all data associated with a specific object
in one contiguous memory region. This limitation prevents simply
passing a single address and length to SHA1_Update or to inflate.
Instead we need to loop until we have processed all data of interest.
As we loop over the data we are always interested in reusing the same
window 'cursor', as the prior window will no longer be of any use
to us. This allows the use_pack() call to automatically decrement
the use count of the prior window before setting up access for us
to the next window.
Within each loop we need to make use of the available length output
parameter of use_pack() to tell us how many bytes are available in
the current memory region, as we cannot tell otherwise.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
builtin-pack-objects.c | 58 +++++++++++++++++++++++++++++++++++++++++++-----
pack-check.c | 46 ++++++++++++++++---------------------
sha1_file.c | 22 ++++++++++++-----
3 files changed, 87 insertions(+), 39 deletions(-)
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 6d7ae7f..afb926a 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -276,7 +276,52 @@ static int encode_header(enum object_type type, unsigned long size, unsigned cha
* we are going to reuse the existing object data as is. make
* sure it is not corrupt.
*/
-static int check_inflate(unsigned char *data, unsigned long len, unsigned long expect)
+static int check_pack_inflate(struct packed_git *p,
+ struct pack_window **w_curs,
+ unsigned long offset,
+ unsigned long len,
+ unsigned long expect)
+{
+ z_stream stream;
+ unsigned char fakebuf[4096], *in;
+ int st;
+
+ memset(&stream, 0, sizeof(stream));
+ inflateInit(&stream);
+ do {
+ in = use_pack(p, w_curs, offset, &stream.avail_in);
+ stream.next_in = in;
+ stream.next_out = fakebuf;
+ stream.avail_out = sizeof(fakebuf);
+ st = inflate(&stream, Z_FINISH);
+ offset += stream.next_in - in;
+ } while (st == Z_OK || st == Z_BUF_ERROR);
+ inflateEnd(&stream);
+ return (st == Z_STREAM_END &&
+ stream.total_out == expect &&
+ stream.total_in == len) ? 0 : -1;
+}
+
+static void copy_pack_data(struct sha1file *f,
+ struct packed_git *p,
+ struct pack_window **w_curs,
+ unsigned long offset,
+ unsigned long len)
+{
+ unsigned char *in;
+ unsigned int avail;
+
+ while (len) {
+ in = use_pack(p, w_curs, offset, &avail);
+ if (avail > len)
+ avail = len;
+ sha1write(f, in, avail);
+ offset += avail;
+ len -= avail;
+ }
+}
+
+static int check_loose_inflate(unsigned char *data, unsigned long len, unsigned long expect)
{
z_stream stream;
unsigned char fakebuf[4096];
@@ -323,7 +368,7 @@ static int revalidate_loose_object(struct object_entry *entry,
return -1;
map += used;
mapsize -= used;
- return check_inflate(map, mapsize, size);
+ return check_loose_inflate(map, mapsize, size);
}
static unsigned long write_object(struct sha1file *f,
@@ -417,6 +462,7 @@ static unsigned long write_object(struct sha1file *f,
else {
struct packed_git *p = entry->in_pack;
struct pack_window *w_curs = NULL;
+ unsigned long offset;
if (entry->delta) {
obj_type = (allow_ofs_delta && entry->delta->offset) ?
@@ -438,13 +484,13 @@ static unsigned long write_object(struct sha1file *f,
hdrlen += 20;
}
- buf = use_pack(p, &w_curs, entry->in_pack_offset
- + entry->in_pack_header_size, NULL);
+ offset = entry->in_pack_offset + entry->in_pack_header_size;
datalen = find_packed_object_size(p, entry->in_pack_offset)
- entry->in_pack_header_size;
- if (!pack_to_stdout && check_inflate(buf, datalen, entry->size))
+ if (!pack_to_stdout && check_pack_inflate(p, &w_curs,
+ offset, datalen, entry->size))
die("corrupt delta in pack %s", sha1_to_hex(entry->sha1));
- sha1write(f, buf, datalen);
+ copy_pack_data(f, p, &w_curs, offset, datalen);
unuse_pack(&w_curs);
reused++;
}
diff --git a/pack-check.c b/pack-check.c
index 972916f..08a9fd8 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -8,39 +8,38 @@ static int verify_packfile(struct packed_git *p,
void *index_base = p->index_base;
SHA_CTX ctx;
unsigned char sha1[20];
- unsigned long pack_size = p->pack_size;
- void *pack_base;
- struct pack_header *hdr;
+ unsigned long offset = 0, pack_sig = p->pack_size - 20;
int nr_objects, err, i;
- /* Header consistency check */
- pack_base = use_pack(p, w_curs, 0, NULL);
- hdr = (struct pack_header*)pack_base;
- if (hdr->hdr_signature != htonl(PACK_SIGNATURE))
- return error("Packfile %s signature mismatch", p->pack_name);
- if (!pack_version_ok(hdr->hdr_version))
- return error("Packfile version %d unsupported",
- ntohl(hdr->hdr_version));
- nr_objects = ntohl(hdr->hdr_entries);
- if (num_packed_objects(p) != nr_objects)
- return error("Packfile claims to have %d objects, "
- "while idx size expects %d", nr_objects,
- num_packed_objects(p));
+ /* Note that the pack header checks are actually performed by
+ * use_pack when it first opens the pack file. If anything
+ * goes wrong during those checks then the call will die out
+ * immediately.
+ */
SHA1_Init(&ctx);
- SHA1_Update(&ctx, pack_base, pack_size - 20);
+ while (offset < pack_sig) {
+ unsigned int remaining;
+ unsigned char *in = use_pack(p, w_curs, offset, &remaining);
+ offset += remaining;
+ if (offset > pack_sig)
+ remaining -= offset - pack_sig;
+ SHA1_Update(&ctx, in, remaining);
+ }
SHA1_Final(sha1, &ctx);
- if (hashcmp(sha1, (unsigned char *)pack_base + pack_size - 20))
+ if (hashcmp(sha1, use_pack(p, w_curs, pack_sig, NULL)))
return error("Packfile %s SHA1 mismatch with itself",
p->pack_name);
if (hashcmp(sha1, (unsigned char *)index_base + index_size - 40))
return error("Packfile %s SHA1 mismatch with idx",
p->pack_name);
+ unuse_pack(w_curs);
/* Make sure everything reachable from idx is valid. Since we
* have verified that nr_objects matches between idx and pack,
* we do not do scan-streaming check on the pack file.
*/
+ nr_objects = num_packed_objects(p);
for (i = err = 0; i < nr_objects; i++) {
unsigned char sha1[20];
void *data;
@@ -73,15 +72,12 @@ static int verify_packfile(struct packed_git *p,
#define MAX_CHAIN 40
-static void show_pack_info(struct packed_git *p,
- struct pack_window **w_curs)
+static void show_pack_info(struct packed_git *p)
{
- struct pack_header *hdr;
int nr_objects, i;
unsigned int chain_histogram[MAX_CHAIN];
- hdr = (struct pack_header*)use_pack(p, w_curs, 0, NULL);
- nr_objects = ntohl(hdr->hdr_entries);
+ nr_objects = num_packed_objects(p);
memset(chain_histogram, 0, sizeof(chain_histogram));
for (i = 0; i < nr_objects; i++) {
@@ -153,9 +149,7 @@ int verify_pack(struct packed_git *p, int verbose)
if (ret)
printf("%s: bad\n", p->pack_name);
else {
- struct pack_window *w_curs = NULL;
- show_pack_info(p, &w_curs);
- unuse_pack(&w_curs);
+ show_pack_info(p);
printf("%s: ok\n", p->pack_name);
}
}
diff --git a/sha1_file.c b/sha1_file.c
index ef3f056..886e3b5 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -960,19 +960,23 @@ static int packed_delta_info(struct packed_git *p,
if (sizep) {
const unsigned char *data;
- unsigned char delta_head[20];
+ unsigned char delta_head[20], *in;
unsigned long result_size;
z_stream stream;
int st;
memset(&stream, 0, sizeof(stream));
-
- stream.next_in = use_pack(p, w_curs, offset, &stream.avail_in);
stream.next_out = delta_head;
stream.avail_out = sizeof(delta_head);
inflateInit(&stream);
- st = inflate(&stream, Z_FINISH);
+ do {
+ in = use_pack(p, w_curs, offset, &stream.avail_in);
+ stream.next_in = in;
+ st = inflate(&stream, Z_FINISH);
+ offset += stream.next_in - in;
+ } while ((st == Z_OK || st == Z_BUF_ERROR)
+ && stream.total_out < sizeof(delta_head));
inflateEnd(&stream);
if ((st != Z_STREAM_END) &&
stream.total_out != sizeof(delta_head))
@@ -1101,17 +1105,21 @@ static void *unpack_compressed_entry(struct packed_git *p,
{
int st;
z_stream stream;
- unsigned char *buffer;
+ unsigned char *buffer, *in;
buffer = xmalloc(size + 1);
buffer[size] = 0;
memset(&stream, 0, sizeof(stream));
- stream.next_in = use_pack(p, w_curs, offset, &stream.avail_in);
stream.next_out = buffer;
stream.avail_out = size;
inflateInit(&stream);
- st = inflate(&stream, Z_FINISH);
+ do {
+ in = use_pack(p, w_curs, offset, &stream.avail_in);
+ stream.next_in = in;
+ st = inflate(&stream, Z_FINISH);
+ offset += stream.next_in - in;
+ } while (st == Z_OK || st == Z_BUF_ERROR);
inflateEnd(&stream);
if ((st != Z_STREAM_END) || stream.total_out != size) {
free(buffer);
--
1.4.4.3.g87d8
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 9/17] Document why header parsing won't exceed a window.
[not found] <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>
` (7 preceding siblings ...)
2006-12-23 7:34 ` [PATCH 8/17] Loop over pack_windows when inflating/accessing data Shawn O. Pearce
@ 2006-12-23 7:34 ` Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 10/17] Unmap individual windows rather than entire files Shawn O. Pearce
` (7 subsequent siblings)
16 siblings, 0 replies; 25+ messages in thread
From: Shawn O. Pearce @ 2006-12-23 7:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
When we parse the object header or the delta base reference we
don't bother to loop over use_pack() calls. The reason we don't
need to bother with calling use_pack for each byte accessed is that
use_pack will always promise us at least 20 bytes (really the hash
size) after the offset. This promise from use_pack simplifies a
lot of code in the header parsing logic, as well as helps out the
zlib library by ensuring there's always some data for it to consume
during an inflate call.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
sha1_file.c | 16 ++++++++++++----
1 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/sha1_file.c b/sha1_file.c
index 886e3b5..6c34482 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -901,10 +901,12 @@ static unsigned long get_delta_base(struct packed_git *p,
unsigned char *base_info = use_pack(p, w_curs, offset, NULL);
unsigned long base_offset;
- /* there must be at least 20 bytes left regardless of delta type */
- if (p->pack_size <= offset + 20)
- die("truncated pack file");
-
+ /* use_pack() assured us we have [base_info, base_info + 20)
+ * as a range that we can look at without walking off the
+ * end of the mapped window. Its actually the hash size
+ * that is assured. An OFS_DELTA longer than the hash size
+ * is stupid, as then a REF_DELTA would be smaller to store.
+ */
if (kind == OBJ_OFS_DELTA) {
unsigned used = 0;
unsigned char c = base_info[used++];
@@ -1007,6 +1009,12 @@ static unsigned long unpack_object_header(struct packed_git *p,
unsigned int left;
unsigned long used;
+ /* use_pack() assures us we have [base, base + 20) available
+ * as a range that we can look at at. (Its actually the hash
+ * size that is assurred.) With our object header encoding
+ * the maximum deflated object size is 2^137, which is just
+ * insane, so we know won't exceed what we have been given.
+ */
base = use_pack(p, w_curs, offset, &left);
used = unpack_object_header_gently(base, left, type, sizep);
if (!used)
--
1.4.4.3.g87d8
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 10/17] Unmap individual windows rather than entire files.
[not found] <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>
` (8 preceding siblings ...)
2006-12-23 7:34 ` [PATCH 9/17] Document why header parsing won't exceed a window Shawn O. Pearce
@ 2006-12-23 7:34 ` Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 11/17] Fully activate the sliding window pack access Shawn O. Pearce
` (6 subsequent siblings)
16 siblings, 0 replies; 25+ messages in thread
From: Shawn O. Pearce @ 2006-12-23 7:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
To support multiple windows per packfile we need to unmap only one
window at a time from that packfile, leaving any other windows in
place and available for reference.
We treat all windows from all packfiles equally; the least recently
used, not-in-use window across all packfiles will always be closed
first.
If we have unmapped all windows in a packfile then we can also close
the packfile's file descriptor as its possible we won't need to map
any window from that file in the near future. This decision about
when to close the pack file descriptor may need to be revisited in
the future after additional testing on several different platforms
can be performed.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
sha1_file.c | 45 ++++++++++++++++++++++++++++++---------------
1 files changed, 30 insertions(+), 15 deletions(-)
diff --git a/sha1_file.c b/sha1_file.c
index 6c34482..49dd4b7 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -450,24 +450,39 @@ static int check_packed_git_idx(const char *path, unsigned long *idx_size_,
return 0;
}
-static int unuse_one_packed_git(void)
+static int unuse_one_window(void)
{
- struct packed_git *p, *lru = NULL;
+ struct packed_git *p, *lru_p = NULL;
+ struct pack_window *w, *w_l, *lru_w = NULL, *lru_l = NULL;
for (p = packed_git; p; p = p->next) {
- if (!p->windows || p->windows->inuse_cnt)
- continue;
- if (!lru || p->windows->last_used < lru->windows->last_used)
- lru = p;
+ for (w_l = NULL, w = p->windows; w; w = w->next) {
+ if (!w->inuse_cnt) {
+ if (!lru_w || w->last_used < lru_w->last_used) {
+ lru_p = p;
+ lru_w = w;
+ lru_l = w_l;
+ }
+ }
+ w_l = w;
+ }
}
- if (!lru)
- return 0;
- munmap(lru->windows->base, lru->windows->len);
- free(lru->windows);
- lru->windows = NULL;
- close(p->pack_fd);
- p->pack_fd = -1;
- return 1;
+ if (lru_p) {
+ munmap(lru_w->base, lru_w->len);
+ pack_mapped -= lru_w->len;
+ if (lru_l)
+ lru_l->next = lru_w->next;
+ else {
+ lru_p->windows = lru_w->next;
+ if (!lru_p->windows) {
+ close(lru_p->pack_fd);
+ lru_p->pack_fd = -1;
+ }
+ }
+ free(lru_w);
+ return 1;
+ }
+ return 0;
}
void unuse_pack(struct pack_window **w_cursor)
@@ -532,7 +547,7 @@ unsigned char* use_pack(struct packed_git *p,
open_packed_git(p);
if (!win) {
pack_mapped += p->pack_size;
- while (packed_git_limit < pack_mapped && unuse_one_packed_git())
+ while (packed_git_limit < pack_mapped && unuse_one_window())
; /* nothing */
win = xcalloc(1, sizeof(*win));
win->len = p->pack_size;
--
1.4.4.3.g87d8
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 11/17] Fully activate the sliding window pack access.
[not found] <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>
` (9 preceding siblings ...)
2006-12-23 7:34 ` [PATCH 10/17] Unmap individual windows rather than entire files Shawn O. Pearce
@ 2006-12-23 7:34 ` Shawn O. Pearce
2006-12-23 18:44 ` Linus Torvalds
2006-12-23 7:34 ` [PATCH 12/17] Load core configuration in git-verify-pack Shawn O. Pearce
` (5 subsequent siblings)
16 siblings, 1 reply; 25+ messages in thread
From: Shawn O. Pearce @ 2006-12-23 7:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
This finally turns on the sliding window behavior for packfile data
access by mapping limited size windows and chaining them under the
packed_git->windows list.
We consider a given byte offset to be within the window only if there
would be at least 20 bytes (one hash worth of data) accessible after
the requested offset. This range selection relates to the contract
that use_pack() makes with its callers, allowing them to access
one hash or one object header without needing to call use_pack()
for every byte of data obtained.
In the worst case scenario we will map the same page of data twice
into memory: once at the end of one window and once again at the
start of the next window. This duplicate page mapping will happen
only when an object header or a delta base reference is spanned
over the end of a window and is always limited to just one page of
duplication, as no sane operating system will ever have a page size
smaller than a hash.
I am assuming that the possible wasted page of virtual address
space is going to perform faster than the alternatives, which
would be to copy the object header or ref delta into a temporary
buffer prior to parsing, or to check the window range on every byte
during header parsing. We may decide to revisit this decision in
the future since this is just a gut instinct decision and has not
actually been proven out by experimental testing.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
Documentation/config.txt | 11 +++++++
cache.h | 1 +
config.c | 10 +++++++
environment.c | 1 +
sha1_file.c | 66 +++++++++++++++++++++++++++++++++++++---------
5 files changed, 76 insertions(+), 13 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 4e93066..f8775f1 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -118,6 +118,17 @@ core.legacyheaders::
database directly (where the "http://" and "rsync://" protocols
count as direct access).
+core.packedGitWindowSize::
+ Number of bytes of a pack file to map into memory in a
+ single mapping operation. Larger window sizes may allow
+ your system to process a smaller number of large pack files
+ more quickly. Smaller window sizes will negatively affect
+ performance due to increased calls to the opreating system's
+ memory manager, but may improve performance when accessing
+ a large number of large pack files. Default is 32 MiB,
+ which should be reasonable for all users/operating systems.
+ You probably do not need to adjust this value.
+
core.packedGitLimit::
Maximum number of bytes to map simultaneously into memory
from pack files. If Git needs to access more than this many
diff --git a/cache.h b/cache.h
index b294bbf..b7855ef 100644
--- a/cache.h
+++ b/cache.h
@@ -196,6 +196,7 @@ extern int warn_ambiguous_refs;
extern int shared_repository;
extern const char *apply_default_whitespace;
extern int zlib_compression_level;
+extern size_t packed_git_window_size;
extern size_t packed_git_limit;
#define GIT_REPO_VERSION 0
diff --git a/config.c b/config.c
index 1e79f09..a8ea063 100644
--- a/config.c
+++ b/config.c
@@ -298,6 +298,16 @@ int git_default_config(const char *var, const char *value)
return 0;
}
+ if (!strcmp(var, "core.packedgitwindowsize")) {
+ int pgsz = getpagesize();
+ packed_git_window_size = git_config_int(var, value);
+ packed_git_window_size /= pgsz;
+ if (!packed_git_window_size)
+ packed_git_window_size = 1;
+ packed_git_window_size *= pgsz;
+ return 0;
+ }
+
if (!strcmp(var, "core.packedgitlimit")) {
packed_git_limit = git_config_int(var, value);
return 0;
diff --git a/environment.c b/environment.c
index 8a09df2..289fc84 100644
--- a/environment.c
+++ b/environment.c
@@ -22,6 +22,7 @@ char git_commit_encoding[MAX_ENCODING_LENGTH] = "utf-8";
int shared_repository = PERM_UMASK;
const char *apply_default_whitespace;
int zlib_compression_level = Z_DEFAULT_COMPRESSION;
+size_t packed_git_window_size = 32 * 1024 * 1024;
size_t packed_git_limit = 256 * 1024 * 1024;
int pager_in_use;
int pager_use_color = 1;
diff --git a/sha1_file.c b/sha1_file.c
index 49dd4b7..fab2ab0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -397,8 +397,9 @@ static char *find_sha1_file(const unsigned char *sha1, struct stat *st)
return NULL;
}
-static int pack_used_ctr;
-static unsigned long pack_mapped;
+static unsigned int pack_used_ctr;
+static size_t pack_mapped;
+static size_t page_size;
struct packed_git *packed_git;
static int check_packed_git_idx(const char *path, unsigned long *idx_size_,
@@ -536,31 +537,70 @@ static void open_packed_git(struct packed_git *p)
die("packfile %s does not match index", p->pack_name);
}
+static int in_window(struct pack_window *win, unsigned long offset)
+{
+ /* We must promise at least 20 bytes (one hash) after the
+ * offset is available from this window, otherwise the offset
+ * is not actually in this window and a different window (which
+ * has that one hash excess) must be used. This is to support
+ * the object header and delta base parsing routines below.
+ */
+ off_t win_off = win->offset;
+ return win_off <= offset
+ && (offset + 20) <= (win_off + win->len);
+}
+
unsigned char* use_pack(struct packed_git *p,
struct pack_window **w_cursor,
unsigned long offset,
unsigned int *left)
{
- struct pack_window *win = p->windows;
+ struct pack_window *win = *w_cursor;
if (p->pack_fd == -1)
open_packed_git(p);
- if (!win) {
- pack_mapped += p->pack_size;
- while (packed_git_limit < pack_mapped && unuse_one_window())
- ; /* nothing */
- win = xcalloc(1, sizeof(*win));
- win->len = p->pack_size;
- win->base = mmap(NULL, p->pack_size, PROT_READ, MAP_PRIVATE, p->pack_fd, 0);
- if (win->base == MAP_FAILED)
- die("packfile %s cannot be mapped.", p->pack_name);
- p->windows = win;
+
+ /* Since packfiles end in a hash of their content and its
+ * pointless to ask for an offset into the middle of that
+ * hash, and the in_window function above wouldn't match
+ * don't allow an offset too close to the end of the file.
+ */
+ if (offset > (p->pack_size - 20))
+ die("offset beyond end of packfile (truncated pack?)");
+
+ if (!win || !in_window(win, offset)) {
+ if (win)
+ win->inuse_cnt--;
+ for (win = p->windows; win; win = win->next) {
+ if (in_window(win, offset))
+ break;
+ }
+ if (!win) {
+ if (!page_size)
+ page_size = getpagesize();
+ win = xcalloc(1, sizeof(*win));
+ win->offset = (offset / page_size) * page_size;
+ win->len = p->pack_size - win->offset;
+ if (win->len > packed_git_window_size)
+ win->len = packed_git_window_size;
+ pack_mapped += win->len;
+ while (packed_git_limit < pack_mapped && unuse_one_window())
+ ; /* nothing */
+ win->base = mmap(NULL, win->len,
+ PROT_READ, MAP_PRIVATE,
+ p->pack_fd, win->offset);
+ if (win->base == MAP_FAILED)
+ die("packfile %s cannot be mapped.", p->pack_name);
+ win->next = p->windows;
+ p->windows = win;
+ }
}
if (win != *w_cursor) {
win->last_used = pack_used_ctr++;
win->inuse_cnt++;
*w_cursor = win;
}
+ offset -= win->offset;
if (left)
*left = win->len - offset;
return win->base + offset;
--
1.4.4.3.g87d8
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 12/17] Load core configuration in git-verify-pack.
[not found] <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>
` (10 preceding siblings ...)
2006-12-23 7:34 ` [PATCH 11/17] Fully activate the sliding window pack access Shawn O. Pearce
@ 2006-12-23 7:34 ` Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 13/17] Ensure core.packedGitWindowSize cannot be less than 2 pages Shawn O. Pearce
` (4 subsequent siblings)
16 siblings, 0 replies; 25+ messages in thread
From: Shawn O. Pearce @ 2006-12-23 7:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Now that our pack access code's behavior may be altered by the
setting of core.packedGitWindowSize or core.packedGitLimit we need
to make sure these values are set as configured in the repository's
configuration file rather than to their defaults.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
builtin-verify-pack.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/builtin-verify-pack.c b/builtin-verify-pack.c
index 7d39d9b..4e31c27 100644
--- a/builtin-verify-pack.c
+++ b/builtin-verify-pack.c
@@ -55,6 +55,7 @@ int cmd_verify_pack(int argc, const char **argv, const char *prefix)
int no_more_options = 0;
int nothing_done = 1;
+ git_config(git_default_config);
while (1 < argc) {
if (!no_more_options && argv[1][0] == '-') {
if (!strcmp("-v", argv[1]))
--
1.4.4.3.g87d8
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 13/17] Ensure core.packedGitWindowSize cannot be less than 2 pages.
[not found] <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>
` (11 preceding siblings ...)
2006-12-23 7:34 ` [PATCH 12/17] Load core configuration in git-verify-pack Shawn O. Pearce
@ 2006-12-23 7:34 ` Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 14/17] Improve error message when packfile mmap fails Shawn O. Pearce
` (3 subsequent siblings)
16 siblings, 0 replies; 25+ messages in thread
From: Shawn O. Pearce @ 2006-12-23 7:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
We cannot allow a window to be smaller than 2 system pages.
This limitation is necessary to support the feature of use_pack()
where we always supply at least 20 bytes after the offset to help
the object header and delta base parsing routines.
If packedGitWindowSize were allowed to be as small as 1 system page
then we would be completely unable to access an object header which
spanned over a page as we would never be able to arrange a mapping
such that the header was contiguous in virtual memory.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
config.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/config.c b/config.c
index a8ea063..edc42f4 100644
--- a/config.c
+++ b/config.c
@@ -302,8 +302,8 @@ int git_default_config(const char *var, const char *value)
int pgsz = getpagesize();
packed_git_window_size = git_config_int(var, value);
packed_git_window_size /= pgsz;
- if (!packed_git_window_size)
- packed_git_window_size = 1;
+ if (packed_git_window_size < 2)
+ packed_git_window_size = 2;
packed_git_window_size *= pgsz;
return 0;
}
--
1.4.4.3.g87d8
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 14/17] Improve error message when packfile mmap fails.
[not found] <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>
` (12 preceding siblings ...)
2006-12-23 7:34 ` [PATCH 13/17] Ensure core.packedGitWindowSize cannot be less than 2 pages Shawn O. Pearce
@ 2006-12-23 7:34 ` Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 15/17] Support unmapping windows on 'temporary' packfiles Shawn O. Pearce
` (2 subsequent siblings)
16 siblings, 0 replies; 25+ messages in thread
From: Shawn O. Pearce @ 2006-12-23 7:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
If we are unable to mmap the a region of the packfile with the mmap()
system call there may be a good reason why, such as a closed file
descriptor or out of address space. Reporting the system level
error message can help to debug such problems.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
sha1_file.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/sha1_file.c b/sha1_file.c
index fab2ab0..7872c2d 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -590,7 +590,9 @@ unsigned char* use_pack(struct packed_git *p,
PROT_READ, MAP_PRIVATE,
p->pack_fd, win->offset);
if (win->base == MAP_FAILED)
- die("packfile %s cannot be mapped.", p->pack_name);
+ die("packfile %s cannot be mapped: %s",
+ p->pack_name,
+ strerror(errno));
win->next = p->windows;
p->windows = win;
}
--
1.4.4.3.g87d8
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 15/17] Support unmapping windows on 'temporary' packfiles.
[not found] <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>
` (13 preceding siblings ...)
2006-12-23 7:34 ` [PATCH 14/17] Improve error message when packfile mmap fails Shawn O. Pearce
@ 2006-12-23 7:34 ` Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 16/17] Create pack_report() as a debugging aid Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 17/17] Test suite for sliding window mmap implementation Shawn O. Pearce
16 siblings, 0 replies; 25+ messages in thread
From: Shawn O. Pearce @ 2006-12-23 7:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
If a command opens a packfile for only temporary access and does not
install the struct packed_git* into the global packed_git list then
we are unable to unmap any inactive windows within that packed_git,
causing the overall process to exceed core.packedGitLimit.
We cannot force the callers to install their temporary packfile
into the packed_git chain as doing so would allow that (possibly
corrupt but currently being verified) temporary packfile to become
part of the local ODB, which may allow it to be considered for
object resolution when it may not actually be a valid packfile.
So to support unmapping the windows of these temporary packfiles we
also scan the windows of the struct packed_git which was supplied
to use_pack(). Since commands only work with one temporary packfile
at a time scanning the one supplied to use_pack() and all packs
installed into packed_git should cover everything available in
memory.
We also have to be careful to not close the file descriptor of
the packed_git which was handed to use_pack() when all of that
packfile's windows have been unmapped, as we are already past the
open call that would open the packfile and need the file descriptor
to be ready for mmap() after unuse_one_window returns.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
sha1_file.c | 44 ++++++++++++++++++++++++++++----------------
1 files changed, 28 insertions(+), 16 deletions(-)
diff --git a/sha1_file.c b/sha1_file.c
index 7872c2d..a8a6c10 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -451,23 +451,34 @@ static int check_packed_git_idx(const char *path, unsigned long *idx_size_,
return 0;
}
-static int unuse_one_window(void)
-{
- struct packed_git *p, *lru_p = NULL;
- struct pack_window *w, *w_l, *lru_w = NULL, *lru_l = NULL;
-
- for (p = packed_git; p; p = p->next) {
- for (w_l = NULL, w = p->windows; w; w = w->next) {
- if (!w->inuse_cnt) {
- if (!lru_w || w->last_used < lru_w->last_used) {
- lru_p = p;
- lru_w = w;
- lru_l = w_l;
- }
+static void scan_windows(struct packed_git *p,
+ struct packed_git **lru_p,
+ struct pack_window **lru_w,
+ struct pack_window **lru_l)
+{
+ struct pack_window *w, *w_l;
+
+ for (w_l = NULL, w = p->windows; w; w = w->next) {
+ if (!w->inuse_cnt) {
+ if (!*lru_w || w->last_used < (*lru_w)->last_used) {
+ *lru_p = p;
+ *lru_w = w;
+ *lru_l = w_l;
}
- w_l = w;
}
+ w_l = w;
}
+}
+
+static int unuse_one_window(struct packed_git *current)
+{
+ struct packed_git *p, *lru_p = NULL;
+ struct pack_window *lru_w = NULL, *lru_l = NULL;
+
+ if (current)
+ scan_windows(current, &lru_p, &lru_w, &lru_l);
+ for (p = packed_git; p; p = p->next)
+ scan_windows(p, &lru_p, &lru_w, &lru_l);
if (lru_p) {
munmap(lru_w->base, lru_w->len);
pack_mapped -= lru_w->len;
@@ -475,7 +486,7 @@ static int unuse_one_window(void)
lru_l->next = lru_w->next;
else {
lru_p->windows = lru_w->next;
- if (!lru_p->windows) {
+ if (!lru_p->windows && lru_p != current) {
close(lru_p->pack_fd);
lru_p->pack_fd = -1;
}
@@ -584,7 +595,8 @@ unsigned char* use_pack(struct packed_git *p,
if (win->len > packed_git_window_size)
win->len = packed_git_window_size;
pack_mapped += win->len;
- while (packed_git_limit < pack_mapped && unuse_one_window())
+ while (packed_git_limit < pack_mapped
+ && unuse_one_window(p))
; /* nothing */
win->base = mmap(NULL, win->len,
PROT_READ, MAP_PRIVATE,
--
1.4.4.3.g87d8
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 16/17] Create pack_report() as a debugging aid.
[not found] <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>
` (14 preceding siblings ...)
2006-12-23 7:34 ` [PATCH 15/17] Support unmapping windows on 'temporary' packfiles Shawn O. Pearce
@ 2006-12-23 7:34 ` Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 17/17] Test suite for sliding window mmap implementation Shawn O. Pearce
16 siblings, 0 replies; 25+ messages in thread
From: Shawn O. Pearce @ 2006-12-23 7:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Much like the alloc_report() function can be useful to report on
object allocation statistics while debugging the new pack_report()
function can be useful to report on the behavior of the mmap window
code used for packfile access.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
cache.h | 1 +
sha1_file.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 32 insertions(+), 0 deletions(-)
diff --git a/cache.h b/cache.h
index b7855ef..bd73aab 100644
--- a/cache.h
+++ b/cache.h
@@ -398,6 +398,7 @@ extern void install_packed_git(struct packed_git *pack);
extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
struct packed_git *packs);
+extern void pack_report();
extern unsigned char* use_pack(struct packed_git *, struct pack_window **, unsigned long, unsigned int *);
extern void unuse_pack(struct pack_window **);
extern struct packed_git *add_packed_git(char *, int, int);
diff --git a/sha1_file.c b/sha1_file.c
index a8a6c10..1a87f95 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -398,10 +398,34 @@ static char *find_sha1_file(const unsigned char *sha1, struct stat *st)
}
static unsigned int pack_used_ctr;
+static unsigned int pack_mmap_calls;
+static unsigned int peak_pack_open_windows;
+static unsigned int pack_open_windows;
+static size_t peak_pack_mapped;
static size_t pack_mapped;
static size_t page_size;
struct packed_git *packed_git;
+void pack_report()
+{
+ fprintf(stderr,
+ "pack_report: getpagesize() = %10lu\n"
+ "pack_report: core.packedGitWindowSize = %10lu\n"
+ "pack_report: core.packedGitLimit = %10lu\n",
+ page_size,
+ packed_git_window_size,
+ packed_git_limit);
+ fprintf(stderr,
+ "pack_report: pack_used_ctr = %10u\n"
+ "pack_report: pack_mmap_calls = %10u\n"
+ "pack_report: pack_open_windows = %10u / %10u\n"
+ "pack_report: pack_mapped = %10lu / %10lu\n",
+ pack_used_ctr,
+ pack_mmap_calls,
+ pack_open_windows, peak_pack_open_windows,
+ pack_mapped, peak_pack_mapped);
+}
+
static int check_packed_git_idx(const char *path, unsigned long *idx_size_,
void **idx_map_)
{
@@ -492,6 +516,7 @@ static int unuse_one_window(struct packed_git *current)
}
}
free(lru_w);
+ pack_open_windows--;
return 1;
}
return 0;
@@ -605,6 +630,12 @@ unsigned char* use_pack(struct packed_git *p,
die("packfile %s cannot be mapped: %s",
p->pack_name,
strerror(errno));
+ pack_mmap_calls++;
+ pack_open_windows++;
+ if (pack_mapped > peak_pack_mapped)
+ peak_pack_mapped = pack_mapped;
+ if (pack_open_windows > peak_pack_open_windows)
+ peak_pack_open_windows = pack_open_windows;
win->next = p->windows;
p->windows = win;
}
--
1.4.4.3.g87d8
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 17/17] Test suite for sliding window mmap implementation.
[not found] <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>
` (15 preceding siblings ...)
2006-12-23 7:34 ` [PATCH 16/17] Create pack_report() as a debugging aid Shawn O. Pearce
@ 2006-12-23 7:34 ` Shawn O. Pearce
16 siblings, 0 replies; 25+ messages in thread
From: Shawn O. Pearce @ 2006-12-23 7:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
This is a basic set of tests for the sliding window mmap. We mostly
focus on the verify-pack and pack-objects implementations (including
delta reuse) as these commands appear to cover the bulk of the
affected portions of sha1_file.c.
The test cases don't verify the virtual memory size used, as
this can differ from system to system. Instead it just verifies
that we can run with very low values for core.packedGitLimit and
core.packedGitWindowSize.
Adding pack_report() to the end of both builtin-verify-pack.c and
builtin-pack-objects.c and manually inspecting the statistics output
can help to verify that the total virtual memory size attributed
to pack mmap usage is what one might expect on the current system.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
t/t5301-sliding-window.sh | 60 +++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 60 insertions(+), 0 deletions(-)
diff --git a/t/t5301-sliding-window.sh b/t/t5301-sliding-window.sh
new file mode 100755
index 0000000..5a7232a
--- /dev/null
+++ b/t/t5301-sliding-window.sh
@@ -0,0 +1,60 @@
+#!/bin/sh
+#
+# Copyright (c) 2006 Shawn Pearce
+#
+
+test_description='mmap sliding window tests'
+. ./test-lib.sh
+
+test_expect_success \
+ 'setup' \
+ 'rm -f .git/index*
+ for i in a b c
+ do
+ echo $i >$i &&
+ dd if=/dev/urandom bs=32k count=1 >>$i &&
+ git-update-index --add $i || return 1
+ done &&
+ echo d >d && cat c >>d && git-update-index --add d &&
+ tree=`git-write-tree` &&
+ commit1=`git-commit-tree $tree </dev/null` &&
+ git-update-ref HEAD $commit1 &&
+ git-repack -a -d &&
+ test "`git-count-objects`" = "0 objects, 0 kilobytes" &&
+ pack1=`ls .git/objects/pack/*.pack` &&
+ test -f "$pack1"'
+
+test_expect_success \
+ 'verify-pack -v, defaults' \
+ 'git-verify-pack -v "$pack1"'
+
+test_expect_success \
+ 'verify-pack -v, packedGitWindowSize == 1 page' \
+ 'git-repo-config core.packedGitWindowSize 512 &&
+ git-verify-pack -v "$pack1"'
+
+test_expect_success \
+ 'verify-pack -v, packedGit{WindowSize,Limit} == 1 page' \
+ 'git-repo-config core.packedGitWindowSize 512 &&
+ git-repo-config core.packedGitLimit 512 &&
+ git-verify-pack -v "$pack1"'
+
+test_expect_success \
+ 'repack -a -d, packedGit{WindowSize,Limit} == 1 page' \
+ 'git-repo-config core.packedGitWindowSize 512 &&
+ git-repo-config core.packedGitLimit 512 &&
+ commit2=`git-commit-tree $tree -p $commit1 </dev/null` &&
+ git-update-ref HEAD $commit2 &&
+ git-repack -a -d &&
+ test "`git-count-objects`" = "0 objects, 0 kilobytes" &&
+ pack2=`ls .git/objects/pack/*.pack` &&
+ test -f "$pack2"
+ test "$pack1" \!= "$pack2"'
+
+test_expect_success \
+ 'verify-pack -v, defaults' \
+ 'git-repo-config --unset core.packedGitWindowSize &&
+ git-repo-config --unset core.packedGitLimit &&
+ git-verify-pack -v "$pack2"'
+
+test_done
--
1.4.4.3.g87d8
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 11/17] Fully activate the sliding window pack access.
2006-12-23 7:34 ` [PATCH 11/17] Fully activate the sliding window pack access Shawn O. Pearce
@ 2006-12-23 18:44 ` Linus Torvalds
2006-12-23 19:34 ` Eric Blake
2006-12-23 19:45 ` Junio C Hamano
0 siblings, 2 replies; 25+ messages in thread
From: Linus Torvalds @ 2006-12-23 18:44 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Junio C Hamano, git
On Sat, 23 Dec 2006, Shawn O. Pearce wrote:
>
> This finally turns on the sliding window behavior for packfile data
> access by mapping limited size windows and chaining them under the
> packed_git->windows list.
Hmm. You seem to default the window size to 32MB.
Maybe I'm reading that code wrong, but I think that's a bit sad.
In particular, in any setup that doesn't like mmap() at all (eg Cygwin),
and uses the "malloc+read" emulation, 32MB is actually likely much too
big. It's probably better to use something like a 1MB slice instead, since
otherwise you'll often be reading much too much.
So I'd argue that if you fall back to read() (or pread) instead of mmap,
the 32MB thing is way too big.
So maybe you should make the default depend on NO_MMAP (although it would
seem that the default Makefile makes Cygwin actually default to using mmap
these days, so maybe it's not a big deal).
Linus
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 11/17] Fully activate the sliding window pack access.
2006-12-23 18:44 ` Linus Torvalds
@ 2006-12-23 19:34 ` Eric Blake
2006-12-24 0:58 ` Johannes Schindelin
2006-12-23 19:45 ` Junio C Hamano
1 sibling, 1 reply; 25+ messages in thread
From: Eric Blake @ 2006-12-23 19:34 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Shawn O. Pearce, Junio C Hamano, git
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
According to Linus Torvalds on 12/23/2006 11:44 AM:
>
> In particular, in any setup that doesn't like mmap() at all (eg Cygwin),
...
> So maybe you should make the default depend on NO_MMAP (although it would
> seem that the default Makefile makes Cygwin actually default to using mmap
> these days, so maybe it's not a big deal).
Indeed, on cygwin, using the 1.4.4.3 Makefile setting where NO_MMAP is
commented out, I have not seemed to have any mmap problems.
- --
Life is short - so eat dessert first!
Eric Blake ebb9@byu.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFjYS684KuGfSFAYARAu4hAKDFfGVrgH0dnxkPHiUdrkAxz8waDQCgwGTo
jbUOvu8uI372BYxK2zHftjs=
=5Rwv
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 11/17] Fully activate the sliding window pack access.
2006-12-23 18:44 ` Linus Torvalds
2006-12-23 19:34 ` Eric Blake
@ 2006-12-23 19:45 ` Junio C Hamano
2006-12-23 20:10 ` Linus Torvalds
` (2 more replies)
1 sibling, 3 replies; 25+ messages in thread
From: Junio C Hamano @ 2006-12-23 19:45 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git, Shawn O. Pearce
Linus Torvalds <torvalds@osdl.org> writes:
> On Sat, 23 Dec 2006, Shawn O. Pearce wrote:
>>
>> This finally turns on the sliding window behavior for packfile data
>> access by mapping limited size windows and chaining them under the
>> packed_git->windows list.
>
> Hmm. You seem to default the window size to 32MB.
>
> Maybe I'm reading that code wrong, but I think that's a bit sad.
>
> In particular, in any setup that doesn't like mmap() at all (eg Cygwin),
> and uses the "malloc+read" emulation, 32MB is actually likely much too
> big. It's probably better to use something like a 1MB slice instead, since
> otherwise you'll often be reading much too much.
>
> So I'd argue that if you fall back to read() (or pread) instead of mmap,
> the 32MB thing is way too big.
>
> So maybe you should make the default depend on NO_MMAP (although it would
> seem that the default Makefile makes Cygwin actually default to using mmap
> these days, so maybe it's not a big deal).
I have been thinking that we should perhaps change Cygwin
default to NO_MMAP. Safer but slower would be better than not
working correctly for people on FAT32.
I agree that 32MB is too big for emulated mmap(). We might want
to further enhance the new use_pack() API so that the caller can
say how much it expects to consume, to help pread() based
emulation avoid reading unnecessary data.
Also the patch makes the maximum mapped from (1<<26) bytes to
256MB which is four-fold; I think tweaking such settings should
be done as a separate step.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 11/17] Fully activate the sliding window pack access.
2006-12-23 19:45 ` Junio C Hamano
@ 2006-12-23 20:10 ` Linus Torvalds
2006-12-24 1:23 ` Johannes Schindelin
2006-12-24 2:23 ` Shawn Pearce
2006-12-24 2:35 ` Shawn Pearce
2 siblings, 1 reply; 25+ messages in thread
From: Linus Torvalds @ 2006-12-23 20:10 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Shawn O. Pearce
On Sat, 23 Dec 2006, Junio C Hamano wrote:
>
> I have been thinking that we should perhaps change Cygwin
> default to NO_MMAP. Safer but slower would be better than not
> working correctly for people on FAT32.
The thing is, with a smaller pack access window, it might not even be
slower. I don't know just _how_ many hoops cygwin jumps through for mmap,
and maybe mmap under cygwin is actually perfectly fine, but at the same
time I do suspect that UNIX mmap semantics are a lot harder to emulate
than just a regular "pread()", so it's quite possible that by avoiding
mmap you could avoid a lot of complex cygwin code.
And yes, making it all work on FAT32 would obviously be a good thing.
Only testing (or somebody who knows cygwin well) can tell.
Linus
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 11/17] Fully activate the sliding window pack access.
2006-12-23 19:34 ` Eric Blake
@ 2006-12-24 0:58 ` Johannes Schindelin
0 siblings, 0 replies; 25+ messages in thread
From: Johannes Schindelin @ 2006-12-24 0:58 UTC (permalink / raw)
To: Eric Blake; +Cc: git
Hi,
On Sat, 23 Dec 2006, Eric Blake wrote:
> Indeed, on cygwin, using the 1.4.4.3 Makefile setting where NO_MMAP is
> commented out, I have not seemed to have any mmap problems.
Strictly speaking, it is not commented out. It is not set. And it has been
pointed out in another mail as well as in another thread that the problems
without NO_MMAP arise on FAT32 when committing.
Which proves again that "works for me" often is not good enough.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 11/17] Fully activate the sliding window pack access.
2006-12-23 20:10 ` Linus Torvalds
@ 2006-12-24 1:23 ` Johannes Schindelin
0 siblings, 0 replies; 25+ messages in thread
From: Johannes Schindelin @ 2006-12-24 1:23 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, git, Shawn O. Pearce
Hi,
On Sat, 23 Dec 2006, Linus Torvalds wrote:
> On Sat, 23 Dec 2006, Junio C Hamano wrote:
> >
> > I have been thinking that we should perhaps change Cygwin
> > default to NO_MMAP. Safer but slower would be better than not
> > working correctly for people on FAT32.
>
> The thing is, with a smaller pack access window, it might not even be
> slower. I don't know just _how_ many hoops cygwin jumps through for mmap,
AFAIK Windows API _does_ have an equivalent to mmap(), which just is not
as easy to use.
[/me asks my friend Google]
Yes. The function is called CreateFile(), and it takes a _filename_, not a
file descriptor. Which does not matter much, since Windows is braindead
enough not to be able to have files deleted which are still opened. Mind
you, renaming is no problem...
So, Cygwin has to keep record of the file name (and AFAICT it necessarily
fails if mmap() is called with a file descriptor of a file which has been
renamed _externally_ after opening) of the file descriptor.
Cygwin also has to reopen after a fork(), since Windows does not have
fork(), and it has to be simulated by a CreateProcess() and reestablishing
existing resources. That is the reason a fork() is really, really
expensive on Windows.
> and maybe mmap under cygwin is actually perfectly fine, but at the same
> time I do suspect that UNIX mmap semantics are a lot harder to emulate
> than just a regular "pread()", so it's quite possible that by avoiding
> mmap you could avoid a lot of complex cygwin code.
But should we really cater for a braindead API?
At the time, I _hated_ the fact that we had to change HEAD into a regular
file, just because people feared that Windows could not handle symbolic
links (and mind you, Cygwin handles them quite gracefully, using .lnk
files).
Besides, IIRC you said yourself that mmap() is better than pread() when
seeking a lot. And pack file accesses _are_ the perfect example for such a
use case, no?
> And yes, making it all work on FAT32 would obviously be a good thing.
If really needed, we could make a hack for Windows, which just adds a new
config option automagically set by init-db (on Windows only), and which is
checked in gitfakemmap() and gitfakeunmmap(). (BTW I don't like the naming
anymore; a good cleanup would be to name them git_mmap() and
git_unmmap()).
Having said that, if somebody is really dedicated to working on Windows,
she could overcome the mentioned problems. There's got to be a way to
avoid the error on FAT32 without using NO_MMAP. Don't look at me, I hate
Windows, I will not fix it.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 11/17] Fully activate the sliding window pack access.
2006-12-23 19:45 ` Junio C Hamano
2006-12-23 20:10 ` Linus Torvalds
@ 2006-12-24 2:23 ` Shawn Pearce
2006-12-24 2:35 ` Shawn Pearce
2 siblings, 0 replies; 25+ messages in thread
From: Shawn Pearce @ 2006-12-24 2:23 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Linus Torvalds, git
Junio C Hamano <junkio@cox.net> wrote:
> Also the patch makes the maximum mapped from (1<<26) bytes to
> 256MB which is four-fold; I think tweaking such settings should
> be done as a separate step.
Sure, I don't have any objection to that, but I'm also not going
to write a patch to drop it back down to 128 MiB.
The old code was somewhat flawed. It did not always honor the upper
limit of PACK_MAX. For example it always overshoots that limit
anytime we have a packfile larger than PACK_MAX.
The only way you will hit the new 256 MiB limit is if you actually
have a packfile (or a set of packfiles) that large. If that is the
case then you were probably overshooting PACK_MAX before. With this
series you will never overshoot core.packedGitLimit (256 MiB), no
matter how big your packfiles are.
So the new 256 MiB limit, although a four-fold increase, is actually
smaller virtual memory footprint for those users who may have been
hitting PACK_MAX before.
--
Shawn.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 11/17] Fully activate the sliding window pack access.
2006-12-23 19:45 ` Junio C Hamano
2006-12-23 20:10 ` Linus Torvalds
2006-12-24 2:23 ` Shawn Pearce
@ 2006-12-24 2:35 ` Shawn Pearce
2 siblings, 0 replies; 25+ messages in thread
From: Shawn Pearce @ 2006-12-24 2:35 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Linus Torvalds, git
Junio C Hamano <junkio@cox.net> wrote:
> Linus Torvalds <torvalds@osdl.org> writes:
> > Hmm. You seem to default the window size to 32MB.
> >
> > Maybe I'm reading that code wrong, but I think that's a bit sad.
> >
> > So I'd argue that if you fall back to read() (or pread) instead of mmap,
> > the 32MB thing is way too big.
> >
> > So maybe you should make the default depend on NO_MMAP (although it would
> > seem that the default Makefile makes Cygwin actually default to using mmap
> > these days, so maybe it's not a big deal).
>
> I agree that 32MB is too big for emulated mmap().
Yes, I agree too. I'll submit some additional patches on top of
the existing 17 patch series (which I see is already in 'pu').
> We might want
> to further enhance the new use_pack() API so that the caller can
> say how much it expects to consume, to help pread() based
> emulation avoid reading unnecessary data.
I'm not sure that is worthwhile right now.
The only two callers who know how many bytes they expect is
pack-objects (during delta/whole reuse) and verify-pack (during the
SHA1 check of the packfile itself). Both are maintenance commands
where preventing a read overshoot in the case of NO_MMAP is probably
not worthwhile, especially as both need to read every byte anyway.
--
Shawn.
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2006-12-24 2:35 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <53b67707929c7f051f6d384c5d96e653bfa8419c.1166857884.git.spearce@spearce.org>
2006-12-23 7:33 ` [PATCH 1/17] Replace unpack_entry_gently with unpack_entry Shawn O. Pearce
2006-12-23 7:33 ` [PATCH 2/17] Introduce new config option for mmap limit Shawn O. Pearce
2006-12-23 7:33 ` [PATCH 3/17] Refactor packed_git to prepare for sliding mmap windows Shawn O. Pearce
2006-12-23 7:33 ` [PATCH 4/17] Use off_t for index and pack file lengths Shawn O. Pearce
2006-12-23 7:33 ` [PATCH 5/17] Create read_or_die utility routine Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 6/17] Refactor how we open pack files to prepare for multiple windows Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 7/17] Replace use_packed_git with window cursors Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 8/17] Loop over pack_windows when inflating/accessing data Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 9/17] Document why header parsing won't exceed a window Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 10/17] Unmap individual windows rather than entire files Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 11/17] Fully activate the sliding window pack access Shawn O. Pearce
2006-12-23 18:44 ` Linus Torvalds
2006-12-23 19:34 ` Eric Blake
2006-12-24 0:58 ` Johannes Schindelin
2006-12-23 19:45 ` Junio C Hamano
2006-12-23 20:10 ` Linus Torvalds
2006-12-24 1:23 ` Johannes Schindelin
2006-12-24 2:23 ` Shawn Pearce
2006-12-24 2:35 ` Shawn Pearce
2006-12-23 7:34 ` [PATCH 12/17] Load core configuration in git-verify-pack Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 13/17] Ensure core.packedGitWindowSize cannot be less than 2 pages Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 14/17] Improve error message when packfile mmap fails Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 15/17] Support unmapping windows on 'temporary' packfiles Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 16/17] Create pack_report() as a debugging aid Shawn O. Pearce
2006-12-23 7:34 ` [PATCH 17/17] Test suite for sliding window mmap implementation Shawn O. Pearce
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).