* [PATCH] index: be careful when handling long names
@ 2008-07-20 0:51 Junio C Hamano
0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2008-07-20 0:51 UTC (permalink / raw)
To: git; +Cc: Petr Baudis
This is a follow-up to 7fec10b (index: be careful when handling long
names, 2008-01-18) where we fixed handling of index entries with long
names. There still remained a few remaining unsafe codepaths.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
builtin-apply.c | 2 +-
builtin-update-index.c | 2 +-
read-cache.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/builtin-apply.c b/builtin-apply.c
index e15471b..f2377fe 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -2758,7 +2758,7 @@ static void add_index_file(const char *path, unsigned mode, void *buf, unsigned
ce = xcalloc(1, ce_size);
memcpy(ce->name, path, namelen);
ce->ce_mode = create_ce_mode(mode);
- ce->ce_flags = namelen;
+ ce->ce_flags = create_ce_flags(namelen, 0);
if (S_ISGITLINK(mode)) {
const char *s = buf;
diff --git a/builtin-update-index.c b/builtin-update-index.c
index 38eb53c..003cb98 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -95,7 +95,7 @@ static int add_one_path(struct cache_entry *old, const char *path, int len, stru
size = cache_entry_size(len);
ce = xcalloc(1, size);
memcpy(ce->name, path, len);
- ce->ce_flags = len;
+ ce->ce_flags = create_ce_flags(len, 0);
fill_stat_cache_info(ce, st);
ce->ce_mode = ce_mode_from_stat(old, st->st_mode);
diff --git a/read-cache.c b/read-cache.c
index 1648428..2dd8131 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -498,7 +498,7 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
size = cache_entry_size(namelen);
ce = xcalloc(1, size);
memcpy(ce->name, path, namelen);
- ce->ce_flags = namelen;
+ ce->ce_flags = create_ce_flags(namelen, 0);
fill_stat_cache_info(ce, st);
if (trust_executable_bit && has_symlinks)
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Updated in-memory index cleanup
@ 2008-01-19 3:25 Linus Torvalds
2008-01-19 7:42 ` [PATCH] index: be careful when handling long names Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2008-01-19 3:25 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
This is a slightly updated version of the patch I sent out a few days ago,
which make the index use separate in-memory and on-disk representations
(with the in-memory format being host-endian and having an extended flags
field).
Not only has the patch been updated for current 'master', it also now
handles the "remove entry" condition with a ce_flags value (CE_REMOVE) to
mirror our old use of "update entry" (CE_UPDATE).
It also makes both CE_REMOVE and CE_UPDATE be in the "in-memory only" part
of ce_flags, so that they don't use up precios bits from the on-disk
format.
I have been using some version of this patch since I sent it out
originally, so it has actually gotten some testing. The CE_REMOVE parts
are new, but are pretty obvious cleanups and not subtle. It passes the
test-suite and yadda-yadda.
The patch isn't trivial, but the bulk of it is (still) of the obvious
cleanup kind, ie things like the removal of ntohl/htonl (since it's now
done by the disk-format conversion at read/write time):
- if (S_ISGITLINK(ntohl(ce->ce_mode))) {
+ if (S_ISGITLINK(ce->ce_mode)) {
and using CE_REMOVE instead of setting mode and stage to 0:
- ce->ce_mode = 0;
- ce->ce_flags &= ~htons(CE_STAGEMASK);
+ ce->ce_flags |= CE_REMOVE;
with the fact that there are about 45 more lines added than removed coming
mainly from the new definition of the on-disk/in-memory structures and the
conversion routines to convert from one to the other.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
builtin-apply.c | 10 +-
builtin-blame.c | 2 +-
builtin-commit.c | 2 +-
builtin-fsck.c | 2 +-
builtin-grep.c | 4 +-
builtin-ls-files.c | 10 +-
builtin-read-tree.c | 3 +-
builtin-rerere.c | 4 +-
builtin-update-index.c | 18 ++--
cache-tree.c | 4 +-
cache.h | 46 ++++++++----
diff-lib.c | 32 ++++-----
dir.c | 2 +-
entry.c | 6 +-
merge-index.c | 2 +-
merge-recursive.c | 2 +-
reachable.c | 2 +-
read-cache.c | 195 ++++++++++++++++++++++++++++--------------------
sha1_name.c | 2 +-
tree.c | 4 +-
unpack-trees.c | 29 ++++----
21 files changed, 214 insertions(+), 167 deletions(-)
diff --git a/builtin-apply.c b/builtin-apply.c
index 15432b6..30d86f2 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -1946,7 +1946,7 @@ static int read_file_or_gitlink(struct cache_entry *ce, struct strbuf *buf)
if (!ce)
return 0;
- if (S_ISGITLINK(ntohl(ce->ce_mode))) {
+ if (S_ISGITLINK(ce->ce_mode)) {
strbuf_grow(buf, 100);
strbuf_addf(buf, "Subproject commit %s\n", sha1_to_hex(ce->sha1));
} else {
@@ -2023,7 +2023,7 @@ static int check_to_create_blob(const char *new_name, int ok_if_exists)
static int verify_index_match(struct cache_entry *ce, struct stat *st)
{
- if (S_ISGITLINK(ntohl(ce->ce_mode))) {
+ if (S_ISGITLINK(ce->ce_mode)) {
if (!S_ISDIR(st->st_mode))
return -1;
return 0;
@@ -2082,12 +2082,12 @@ static int check_patch(struct patch *patch, struct patch *prev_patch)
return error("%s: does not match index",
old_name);
if (cached)
- st_mode = ntohl(ce->ce_mode);
+ st_mode = ce->ce_mode;
} else if (stat_ret < 0)
return error("%s: %s", old_name, strerror(errno));
if (!cached)
- st_mode = ntohl(ce_mode_from_stat(ce, st.st_mode));
+ st_mode = ce_mode_from_stat(ce, st.st_mode);
if (patch->is_new < 0)
patch->is_new = 0;
@@ -2388,7 +2388,7 @@ static void add_index_file(const char *path, unsigned mode, void *buf, unsigned
ce = xcalloc(1, ce_size);
memcpy(ce->name, path, namelen);
ce->ce_mode = create_ce_mode(mode);
- ce->ce_flags = htons(namelen);
+ ce->ce_flags = namelen;
if (S_ISGITLINK(mode)) {
const char *s = buf;
diff --git a/builtin-blame.c b/builtin-blame.c
index 9b4c02e..c7e6887 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -2092,7 +2092,7 @@ static struct commit *fake_working_tree_commit(const char *path, const char *con
if (!mode) {
int pos = cache_name_pos(path, len);
if (0 <= pos)
- mode = ntohl(active_cache[pos]->ce_mode);
+ mode = active_cache[pos]->ce_mode;
else
/* Let's not bother reading from HEAD tree */
mode = S_IFREG | 0644;
diff --git a/builtin-commit.c b/builtin-commit.c
index 0227936..c63ff82 100644
--- a/builtin-commit.c
+++ b/builtin-commit.c
@@ -156,7 +156,7 @@ static int list_paths(struct path_list *list, const char *with_tree,
for (i = 0; i < active_nr; i++) {
struct cache_entry *ce = active_cache[i];
- if (ce->ce_flags & htons(CE_UPDATE))
+ if (ce->ce_flags & CE_UPDATE)
continue;
if (!pathspec_match(pattern, m, ce->name, 0))
continue;
diff --git a/builtin-fsck.c b/builtin-fsck.c
index 2a6e94d..6fc9525 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -765,7 +765,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
struct blob *blob;
struct object *obj;
- mode = ntohl(active_cache[i]->ce_mode);
+ mode = active_cache[i]->ce_mode;
if (S_ISGITLINK(mode))
continue;
blob = lookup_blob(active_cache[i]->sha1);
diff --git a/builtin-grep.c b/builtin-grep.c
index 0d6cc73..9180b39 100644
--- a/builtin-grep.c
+++ b/builtin-grep.c
@@ -331,7 +331,7 @@ static int external_grep(struct grep_opt *opt, const char **paths, int cached)
struct cache_entry *ce = active_cache[i];
char *name;
int kept;
- if (!S_ISREG(ntohl(ce->ce_mode)))
+ if (!S_ISREG(ce->ce_mode))
continue;
if (!pathspec_matches(paths, ce->name))
continue;
@@ -387,7 +387,7 @@ static int grep_cache(struct grep_opt *opt, const char **paths, int cached)
for (nr = 0; nr < active_nr; nr++) {
struct cache_entry *ce = active_cache[nr];
- if (!S_ISREG(ntohl(ce->ce_mode)))
+ if (!S_ISREG(ce->ce_mode))
continue;
if (!pathspec_matches(paths, ce->name))
continue;
diff --git a/builtin-ls-files.c b/builtin-ls-files.c
index 0f0ab2d..d56e33e 100644
--- a/builtin-ls-files.c
+++ b/builtin-ls-files.c
@@ -189,7 +189,7 @@ static void show_ce_entry(const char *tag, struct cache_entry *ce)
return;
if (tag && *tag && show_valid_bit &&
- (ce->ce_flags & htons(CE_VALID))) {
+ (ce->ce_flags & CE_VALID)) {
static char alttag[4];
memcpy(alttag, tag, 3);
if (isalpha(tag[0]))
@@ -210,7 +210,7 @@ static void show_ce_entry(const char *tag, struct cache_entry *ce)
} else {
printf("%s%06o %s %d\t",
tag,
- ntohl(ce->ce_mode),
+ ce->ce_mode,
abbrev ? find_unique_abbrev(ce->sha1,abbrev)
: sha1_to_hex(ce->sha1),
ce_stage(ce));
@@ -242,7 +242,7 @@ static void show_files(struct dir_struct *dir, const char *prefix)
continue;
if (show_unmerged && !ce_stage(ce))
continue;
- if (ce->ce_flags & htons(CE_UPDATE))
+ if (ce->ce_flags & CE_UPDATE)
continue;
show_ce_entry(ce_stage(ce) ? tag_unmerged : tag_cached, ce);
}
@@ -350,7 +350,7 @@ void overlay_tree_on_cache(const char *tree_name, const char *prefix)
struct cache_entry *ce = active_cache[i];
if (!ce_stage(ce))
continue;
- ce->ce_flags |= htons(CE_STAGEMASK);
+ ce->ce_flags |= CE_STAGEMASK;
}
if (prefix) {
@@ -379,7 +379,7 @@ void overlay_tree_on_cache(const char *tree_name, const char *prefix)
*/
if (last_stage0 &&
!strcmp(last_stage0->name, ce->name))
- ce->ce_flags |= htons(CE_UPDATE);
+ ce->ce_flags |= CE_UPDATE;
}
}
}
diff --git a/builtin-read-tree.c b/builtin-read-tree.c
index c0ea034..5785401 100644
--- a/builtin-read-tree.c
+++ b/builtin-read-tree.c
@@ -45,8 +45,7 @@ static int read_cache_unmerged(void)
continue;
cache_tree_invalidate_path(active_cache_tree, ce->name);
last = ce;
- ce->ce_mode = 0;
- ce->ce_flags &= ~htons(CE_STAGEMASK);
+ ce->ce_flags |= CE_REMOVE;
}
*dst++ = ce;
}
diff --git a/builtin-rerere.c b/builtin-rerere.c
index a9e3ebc..b0c17bd 100644
--- a/builtin-rerere.c
+++ b/builtin-rerere.c
@@ -149,8 +149,8 @@ static int find_conflict(struct path_list *conflict)
if (ce_stage(e2) == 2 &&
ce_stage(e3) == 3 &&
ce_same_name(e2, e3) &&
- S_ISREG(ntohl(e2->ce_mode)) &&
- S_ISREG(ntohl(e3->ce_mode))) {
+ S_ISREG(e2->ce_mode) &&
+ S_ISREG(e3->ce_mode)) {
path_list_insert((const char *)e2->name, conflict);
i++; /* skip over both #2 and #3 */
}
diff --git a/builtin-update-index.c b/builtin-update-index.c
index c3a14c7..a8795d3 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -47,10 +47,10 @@ static int mark_valid(const char *path)
if (0 <= pos) {
switch (mark_valid_only) {
case MARK_VALID:
- active_cache[pos]->ce_flags |= htons(CE_VALID);
+ active_cache[pos]->ce_flags |= CE_VALID;
break;
case UNMARK_VALID:
- active_cache[pos]->ce_flags &= ~htons(CE_VALID);
+ active_cache[pos]->ce_flags &= ~CE_VALID;
break;
}
cache_tree_invalidate_path(active_cache_tree, path);
@@ -95,7 +95,7 @@ static int add_one_path(struct cache_entry *old, const char *path, int len, stru
size = cache_entry_size(len);
ce = xcalloc(1, size);
memcpy(ce->name, path, len);
- ce->ce_flags = htons(len);
+ ce->ce_flags = len;
fill_stat_cache_info(ce, st);
ce->ce_mode = ce_mode_from_stat(old, st->st_mode);
@@ -139,7 +139,7 @@ static int process_directory(const char *path, int len, struct stat *st)
/* Exact match: file or existing gitlink */
if (pos >= 0) {
struct cache_entry *ce = active_cache[pos];
- if (S_ISGITLINK(ntohl(ce->ce_mode))) {
+ if (S_ISGITLINK(ce->ce_mode)) {
/* Do nothing to the index if there is no HEAD! */
if (resolve_gitlink_ref(path, "HEAD", sha1) < 0)
@@ -183,7 +183,7 @@ static int process_file(const char *path, int len, struct stat *st)
int pos = cache_name_pos(path, len);
struct cache_entry *ce = pos < 0 ? NULL : active_cache[pos];
- if (ce && S_ISGITLINK(ntohl(ce->ce_mode)))
+ if (ce && S_ISGITLINK(ce->ce_mode))
return error("%s is already a gitlink, not replacing", path);
return add_one_path(ce, path, len, st);
@@ -226,7 +226,7 @@ static int add_cacheinfo(unsigned int mode, const unsigned char *sha1,
ce->ce_flags = create_ce_flags(len, stage);
ce->ce_mode = create_ce_mode(mode);
if (assume_unchanged)
- ce->ce_flags |= htons(CE_VALID);
+ ce->ce_flags |= CE_VALID;
option = allow_add ? ADD_CACHE_OK_TO_ADD : 0;
option |= allow_replace ? ADD_CACHE_OK_TO_REPLACE : 0;
if (add_cache_entry(ce, option))
@@ -246,14 +246,14 @@ static void chmod_path(int flip, const char *path)
if (pos < 0)
goto fail;
ce = active_cache[pos];
- mode = ntohl(ce->ce_mode);
+ mode = ce->ce_mode;
if (!S_ISREG(mode))
goto fail;
switch (flip) {
case '+':
- ce->ce_mode |= htonl(0111); break;
+ ce->ce_mode |= 0111; break;
case '-':
- ce->ce_mode &= htonl(~0111); break;
+ ce->ce_mode &= ~0111; break;
default:
goto fail;
}
diff --git a/cache-tree.c b/cache-tree.c
index 50b3526..bfc95d2 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -320,13 +320,13 @@ static int update_one(struct cache_tree *it,
}
else {
sha1 = ce->sha1;
- mode = ntohl(ce->ce_mode);
+ mode = ce->ce_mode;
entlen = pathlen - baselen;
}
if (mode != S_IFGITLINK && !missing_ok && !has_sha1_file(sha1))
return error("invalid object %s", sha1_to_hex(sha1));
- if (!ce->ce_mode)
+ if (ce->ce_flags & CE_REMOVE)
continue; /* entry being removed */
strbuf_grow(&buffer, entlen + 100);
diff --git a/cache.h b/cache.h
index 549f4bb..4a054c5 100644
--- a/cache.h
+++ b/cache.h
@@ -94,48 +94,66 @@ struct cache_time {
* We save the fields in big-endian order to allow using the
* index file over NFS transparently.
*/
+struct ondisk_cache_entry {
+ struct cache_time ctime;
+ struct cache_time mtime;
+ unsigned int dev;
+ unsigned int ino;
+ unsigned int mode;
+ unsigned int uid;
+ unsigned int gid;
+ unsigned int size;
+ unsigned char sha1[20];
+ unsigned short flags;
+ char name[FLEX_ARRAY]; /* more */
+};
+
struct cache_entry {
- struct cache_time ce_ctime;
- struct cache_time ce_mtime;
+ unsigned int ce_ctime;
+ unsigned int ce_mtime;
unsigned int ce_dev;
unsigned int ce_ino;
unsigned int ce_mode;
unsigned int ce_uid;
unsigned int ce_gid;
unsigned int ce_size;
+ unsigned int ce_flags;
unsigned char sha1[20];
- unsigned short ce_flags;
char name[FLEX_ARRAY]; /* more */
};
#define CE_NAMEMASK (0x0fff)
#define CE_STAGEMASK (0x3000)
-#define CE_UPDATE (0x4000)
#define CE_VALID (0x8000)
#define CE_STAGESHIFT 12
-#define create_ce_flags(len, stage) htons((len) | ((stage) << CE_STAGESHIFT))
-#define ce_namelen(ce) (CE_NAMEMASK & ntohs((ce)->ce_flags))
+/* In-memory only */
+#define CE_UPDATE (0x10000)
+#define CE_REMOVE (0x20000)
+
+#define create_ce_flags(len, stage) ((len) | ((stage) << CE_STAGESHIFT))
+#define ce_namelen(ce) (CE_NAMEMASK & (ce)->ce_flags)
#define ce_size(ce) cache_entry_size(ce_namelen(ce))
-#define ce_stage(ce) ((CE_STAGEMASK & ntohs((ce)->ce_flags)) >> CE_STAGESHIFT)
+#define ondisk_ce_size(ce) ondisk_cache_entry_size(ce_namelen(ce))
+#define ce_stage(ce) ((CE_STAGEMASK & (ce)->ce_flags) >> CE_STAGESHIFT)
#define ce_permissions(mode) (((mode) & 0100) ? 0755 : 0644)
static inline unsigned int create_ce_mode(unsigned int mode)
{
if (S_ISLNK(mode))
- return htonl(S_IFLNK);
+ return S_IFLNK;
if (S_ISDIR(mode) || S_ISGITLINK(mode))
- return htonl(S_IFGITLINK);
- return htonl(S_IFREG | ce_permissions(mode));
+ return S_IFGITLINK;
+ return S_IFREG | ce_permissions(mode);
}
static inline unsigned int ce_mode_from_stat(struct cache_entry *ce, unsigned int mode)
{
extern int trust_executable_bit, has_symlinks;
if (!has_symlinks && S_ISREG(mode) &&
- ce && S_ISLNK(ntohl(ce->ce_mode)))
+ ce && S_ISLNK(ce->ce_mode))
return ce->ce_mode;
if (!trust_executable_bit && S_ISREG(mode)) {
- if (ce && S_ISREG(ntohl(ce->ce_mode)))
+ if (ce && S_ISREG(ce->ce_mode))
return ce->ce_mode;
return create_ce_mode(0666);
}
@@ -146,14 +164,14 @@ static inline unsigned int ce_mode_from_stat(struct cache_entry *ce, unsigned in
S_ISLNK(mode) ? S_IFLNK : S_ISDIR(mode) ? S_IFDIR : S_IFGITLINK)
#define cache_entry_size(len) ((offsetof(struct cache_entry,name) + (len) + 8) & ~7)
+#define ondisk_cache_entry_size(len) ((offsetof(struct ondisk_cache_entry,name) + (len) + 8) & ~7)
struct index_state {
struct cache_entry **cache;
unsigned int cache_nr, cache_alloc, cache_changed;
struct cache_tree *cache_tree;
time_t timestamp;
- void *mmap;
- size_t mmap_size;
+ void *alloc;
};
extern struct index_state the_index;
diff --git a/diff-lib.c b/diff-lib.c
index d85d8f3..4a05b02 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -37,7 +37,7 @@ static int get_mode(const char *path, int *mode)
if (!path || !strcmp(path, "/dev/null"))
*mode = 0;
else if (!strcmp(path, "-"))
- *mode = ntohl(create_ce_mode(0666));
+ *mode = create_ce_mode(0666);
else if (stat(path, &st))
return error("Could not access '%s'", path);
else
@@ -384,7 +384,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
continue;
}
else
- dpath->mode = ntohl(ce_mode_from_stat(ce, st.st_mode));
+ dpath->mode = ce_mode_from_stat(ce, st.st_mode);
while (i < entries) {
struct cache_entry *nce = active_cache[i];
@@ -398,10 +398,10 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
*/
stage = ce_stage(nce);
if (2 <= stage) {
- int mode = ntohl(nce->ce_mode);
+ int mode = nce->ce_mode;
num_compare_stages++;
hashcpy(dpath->parent[stage-2].sha1, nce->sha1);
- dpath->parent[stage-2].mode = ntohl(ce_mode_from_stat(nce, mode));
+ dpath->parent[stage-2].mode = ce_mode_from_stat(nce, mode);
dpath->parent[stage-2].status =
DIFF_STATUS_MODIFIED;
}
@@ -442,15 +442,15 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
}
if (silent_on_removed)
continue;
- diff_addremove(&revs->diffopt, '-', ntohl(ce->ce_mode),
+ diff_addremove(&revs->diffopt, '-', ce->ce_mode,
ce->sha1, ce->name, NULL);
continue;
}
changed = ce_match_stat(ce, &st, ce_option);
if (!changed && !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
continue;
- oldmode = ntohl(ce->ce_mode);
- newmode = ntohl(ce_mode_from_stat(ce, st.st_mode));
+ oldmode = ce->ce_mode;
+ newmode = ce_mode_from_stat(ce, st.st_mode);
diff_change(&revs->diffopt, oldmode, newmode,
ce->sha1, (changed ? null_sha1 : ce->sha1),
ce->name, NULL);
@@ -471,7 +471,7 @@ static void diff_index_show_file(struct rev_info *revs,
struct cache_entry *ce,
unsigned char *sha1, unsigned int mode)
{
- diff_addremove(&revs->diffopt, prefix[0], ntohl(mode),
+ diff_addremove(&revs->diffopt, prefix[0], mode,
sha1, ce->name, NULL);
}
@@ -550,14 +550,14 @@ static int show_modified(struct rev_info *revs,
p->len = pathlen;
memcpy(p->path, new->name, pathlen);
p->path[pathlen] = 0;
- p->mode = ntohl(mode);
+ p->mode = mode;
hashclr(p->sha1);
memset(p->parent, 0, 2 * sizeof(struct combine_diff_parent));
p->parent[0].status = DIFF_STATUS_MODIFIED;
- p->parent[0].mode = ntohl(new->ce_mode);
+ p->parent[0].mode = new->ce_mode;
hashcpy(p->parent[0].sha1, new->sha1);
p->parent[1].status = DIFF_STATUS_MODIFIED;
- p->parent[1].mode = ntohl(old->ce_mode);
+ p->parent[1].mode = old->ce_mode;
hashcpy(p->parent[1].sha1, old->sha1);
show_combined_diff(p, 2, revs->dense_combined_merges, revs);
free(p);
@@ -569,9 +569,6 @@ static int show_modified(struct rev_info *revs,
!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
return 0;
- mode = ntohl(mode);
- oldmode = ntohl(oldmode);
-
diff_change(&revs->diffopt, oldmode, mode,
old->sha1, sha1, old->name, NULL);
return 0;
@@ -628,7 +625,7 @@ static int diff_cache(struct rev_info *revs,
cached, match_missing))
break;
diff_unmerge(&revs->diffopt, ce->name,
- ntohl(ce->ce_mode), ce->sha1);
+ ce->ce_mode, ce->sha1);
break;
case 3:
diff_unmerge(&revs->diffopt, ce->name,
@@ -664,7 +661,7 @@ static void mark_merge_entries(void)
struct cache_entry *ce = active_cache[i];
if (!ce_stage(ce))
continue;
- ce->ce_flags |= htons(CE_STAGEMASK);
+ ce->ce_flags |= CE_STAGEMASK;
}
}
@@ -722,8 +719,7 @@ int do_diff_cache(const unsigned char *tree_sha1, struct diff_options *opt)
cache_tree_invalidate_path(active_cache_tree,
ce->name);
last = ce;
- ce->ce_mode = 0;
- ce->ce_flags &= ~htons(CE_STAGEMASK);
+ ce->ce_flags |= CE_REMOVE;
}
*dst++ = ce;
}
diff --git a/dir.c b/dir.c
index 3e345c2..1b9cc7a 100644
--- a/dir.c
+++ b/dir.c
@@ -391,7 +391,7 @@ static enum exist_status directory_exists_in_index(const char *dirname, int len)
break;
if (endchar == '/')
return index_directory;
- if (!endchar && S_ISGITLINK(ntohl(ce->ce_mode)))
+ if (!endchar && S_ISGITLINK(ce->ce_mode))
return index_gitdir;
}
return index_nonexistent;
diff --git a/entry.c b/entry.c
index 257ab46..44f4b89 100644
--- a/entry.c
+++ b/entry.c
@@ -103,7 +103,7 @@ static int write_entry(struct cache_entry *ce, char *path, const struct checkout
int fd;
long wrote;
- switch (ntohl(ce->ce_mode) & S_IFMT) {
+ switch (ce->ce_mode & S_IFMT) {
char *new;
struct strbuf buf;
unsigned long size;
@@ -129,7 +129,7 @@ static int write_entry(struct cache_entry *ce, char *path, const struct checkout
strcpy(path, ".merge_file_XXXXXX");
fd = mkstemp(path);
} else
- fd = create_file(path, ntohl(ce->ce_mode));
+ fd = create_file(path, ce->ce_mode);
if (fd < 0) {
free(new);
return error("git-checkout-index: unable to create file %s (%s)",
@@ -221,7 +221,7 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *t
unlink(path);
if (S_ISDIR(st.st_mode)) {
/* If it is a gitlink, leave it alone! */
- if (S_ISGITLINK(ntohl(ce->ce_mode)))
+ if (S_ISGITLINK(ce->ce_mode))
return 0;
if (!state->force)
return error("%s is a directory", path);
diff --git a/merge-index.c b/merge-index.c
index fa719cb..bbb700b 100644
--- a/merge-index.c
+++ b/merge-index.c
@@ -48,7 +48,7 @@ static int merge_entry(int pos, const char *path)
break;
found++;
strcpy(hexbuf[stage], sha1_to_hex(ce->sha1));
- sprintf(ownbuf[stage], "%o", ntohl(ce->ce_mode));
+ sprintf(ownbuf[stage], "%o", ce->ce_mode);
arguments[stage] = hexbuf[stage];
arguments[stage + 4] = ownbuf[stage];
} while (++pos < active_nr);
diff --git a/merge-recursive.c b/merge-recursive.c
index c292a77..bdf03b1 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -333,7 +333,7 @@ static struct path_list *get_unmerged(void)
item->util = xcalloc(1, sizeof(struct stage_data));
}
e = item->util;
- e->stages[ce_stage(ce)].mode = ntohl(ce->ce_mode);
+ e->stages[ce_stage(ce)].mode = ce->ce_mode;
hashcpy(e->stages[ce_stage(ce)].sha, ce->sha1);
}
diff --git a/reachable.c b/reachable.c
index 6383401..00f289f 100644
--- a/reachable.c
+++ b/reachable.c
@@ -176,7 +176,7 @@ static void add_cache_refs(struct rev_info *revs)
* lookup_blob() on them, to avoid populating the hash table
* with invalid information
*/
- if (S_ISGITLINK(ntohl(active_cache[i]->ce_mode)))
+ if (S_ISGITLINK(active_cache[i]->ce_mode))
continue;
lookup_blob(active_cache[i]->sha1);
diff --git a/read-cache.c b/read-cache.c
index 7db5588..f5f9c3d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -30,20 +30,16 @@ struct index_state the_index;
*/
void fill_stat_cache_info(struct cache_entry *ce, struct stat *st)
{
- ce->ce_ctime.sec = htonl(st->st_ctime);
- ce->ce_mtime.sec = htonl(st->st_mtime);
-#ifdef USE_NSEC
- ce->ce_ctime.nsec = htonl(st->st_ctim.tv_nsec);
- ce->ce_mtime.nsec = htonl(st->st_mtim.tv_nsec);
-#endif
- ce->ce_dev = htonl(st->st_dev);
- ce->ce_ino = htonl(st->st_ino);
- ce->ce_uid = htonl(st->st_uid);
- ce->ce_gid = htonl(st->st_gid);
- ce->ce_size = htonl(st->st_size);
+ ce->ce_ctime = st->st_ctime;
+ ce->ce_mtime = st->st_mtime;
+ ce->ce_dev = st->st_dev;
+ ce->ce_ino = st->st_ino;
+ ce->ce_uid = st->st_uid;
+ ce->ce_gid = st->st_gid;
+ ce->ce_size = st->st_size;
if (assume_unchanged)
- ce->ce_flags |= htons(CE_VALID);
+ ce->ce_flags |= CE_VALID;
}
static int ce_compare_data(struct cache_entry *ce, struct stat *st)
@@ -116,7 +112,7 @@ static int ce_modified_check_fs(struct cache_entry *ce, struct stat *st)
return DATA_CHANGED;
break;
case S_IFDIR:
- if (S_ISGITLINK(ntohl(ce->ce_mode)))
+ if (S_ISGITLINK(ce->ce_mode))
return 0;
default:
return TYPE_CHANGED;
@@ -128,14 +124,14 @@ static int ce_match_stat_basic(struct cache_entry *ce, struct stat *st)
{
unsigned int changed = 0;
- switch (ntohl(ce->ce_mode) & S_IFMT) {
+ switch (ce->ce_mode & S_IFMT) {
case S_IFREG:
changed |= !S_ISREG(st->st_mode) ? TYPE_CHANGED : 0;
/* We consider only the owner x bit to be relevant for
* "mode changes"
*/
if (trust_executable_bit &&
- (0100 & (ntohl(ce->ce_mode) ^ st->st_mode)))
+ (0100 & (ce->ce_mode ^ st->st_mode)))
changed |= MODE_CHANGED;
break;
case S_IFLNK:
@@ -152,29 +148,17 @@ static int ce_match_stat_basic(struct cache_entry *ce, struct stat *st)
case 0: /* Special case: unmerged file in index */
return MODE_CHANGED | DATA_CHANGED | TYPE_CHANGED;
default:
- die("internal error: ce_mode is %o", ntohl(ce->ce_mode));
+ die("internal error: ce_mode is %o", ce->ce_mode);
}
- if (ce->ce_mtime.sec != htonl(st->st_mtime))
+ if (ce->ce_mtime != (unsigned int) st->st_mtime)
changed |= MTIME_CHANGED;
- if (ce->ce_ctime.sec != htonl(st->st_ctime))
+ if (ce->ce_ctime != (unsigned int) st->st_ctime)
changed |= CTIME_CHANGED;
-#ifdef USE_NSEC
- /*
- * nsec seems unreliable - not all filesystems support it, so
- * as long as it is in the inode cache you get right nsec
- * but after it gets flushed, you get zero nsec.
- */
- if (ce->ce_mtime.nsec != htonl(st->st_mtim.tv_nsec))
- changed |= MTIME_CHANGED;
- if (ce->ce_ctime.nsec != htonl(st->st_ctim.tv_nsec))
- changed |= CTIME_CHANGED;
-#endif
-
- if (ce->ce_uid != htonl(st->st_uid) ||
- ce->ce_gid != htonl(st->st_gid))
+ if (ce->ce_uid != (unsigned int) st->st_uid ||
+ ce->ce_gid != (unsigned int) st->st_gid)
changed |= OWNER_CHANGED;
- if (ce->ce_ino != htonl(st->st_ino))
+ if (ce->ce_ino != (unsigned int) st->st_ino)
changed |= INODE_CHANGED;
#ifdef USE_STDEV
@@ -183,11 +167,11 @@ static int ce_match_stat_basic(struct cache_entry *ce, struct stat *st)
* clients will have different views of what "device"
* the filesystem is on
*/
- if (ce->ce_dev != htonl(st->st_dev))
+ if (ce->ce_dev != (unsigned int) st->st_dev)
changed |= INODE_CHANGED;
#endif
- if (ce->ce_size != htonl(st->st_size))
+ if (ce->ce_size != (unsigned int) st->st_size)
changed |= DATA_CHANGED;
return changed;
@@ -205,7 +189,7 @@ int ie_match_stat(struct index_state *istate,
* If it's marked as always valid in the index, it's
* valid whatever the checked-out copy says.
*/
- if (!ignore_valid && (ce->ce_flags & htons(CE_VALID)))
+ if (!ignore_valid && (ce->ce_flags & CE_VALID))
return 0;
changed = ce_match_stat_basic(ce, st);
@@ -228,7 +212,7 @@ int ie_match_stat(struct index_state *istate,
*/
if (!changed &&
istate->timestamp &&
- istate->timestamp <= ntohl(ce->ce_mtime.sec)) {
+ istate->timestamp <= ce->ce_mtime) {
if (assume_racy_is_modified)
changed |= DATA_CHANGED;
else
@@ -257,7 +241,7 @@ int ie_modified(struct index_state *istate,
* the length field is zero. For other cases the ce_size
* should match the SHA1 recorded in the index entry.
*/
- if ((changed & DATA_CHANGED) && ce->ce_size != htonl(0))
+ if ((changed & DATA_CHANGED) && ce->ce_size != 0)
return changed;
changed_fs = ce_modified_check_fs(ce, st);
@@ -320,7 +304,7 @@ int index_name_pos(struct index_state *istate, const char *name, int namelen)
while (last > first) {
int next = (last + first) >> 1;
struct cache_entry *ce = istate->cache[next];
- int cmp = cache_name_compare(name, namelen, ce->name, ntohs(ce->ce_flags));
+ int cmp = cache_name_compare(name, namelen, ce->name, ce->ce_flags);
if (!cmp)
return next;
if (cmp < 0) {
@@ -405,7 +389,7 @@ int add_file_to_index(struct index_state *istate, const char *path, int verbose)
size = cache_entry_size(namelen);
ce = xcalloc(1, size);
memcpy(ce->name, path, namelen);
- ce->ce_flags = htons(namelen);
+ ce->ce_flags = namelen;
fill_stat_cache_info(ce, &st);
if (trust_executable_bit && has_symlinks)
@@ -583,7 +567,7 @@ static int has_file_name(struct index_state *istate,
continue;
if (p->name[len] != '/')
continue;
- if (!ce_stage(p) && !p->ce_mode)
+ if (p->ce_flags & CE_REMOVE)
continue;
retval = -1;
if (!ok_to_replace)
@@ -616,7 +600,7 @@ static int has_dir_name(struct index_state *istate,
}
len = slash - name;
- pos = index_name_pos(istate, name, ntohs(create_ce_flags(len, stage)));
+ pos = index_name_pos(istate, name, create_ce_flags(len, stage));
if (pos >= 0) {
/*
* Found one, but not so fast. This could
@@ -679,7 +663,7 @@ static int check_file_directory_conflict(struct index_state *istate,
/*
* When ce is an "I am going away" entry, we allow it to be added
*/
- if (!ce_stage(ce) && !ce->ce_mode)
+ if (ce->ce_flags & CE_REMOVE)
return 0;
/*
@@ -704,7 +688,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
cache_tree_invalidate_path(istate->cache_tree, ce->name);
- pos = index_name_pos(istate, ce->name, ntohs(ce->ce_flags));
+ pos = index_name_pos(istate, ce->name, ce->ce_flags);
/* existing match? Just replace it. */
if (pos >= 0) {
@@ -736,7 +720,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
if (!ok_to_replace)
return error("'%s' appears as both a file and as a directory",
ce->name);
- pos = index_name_pos(istate, ce->name, ntohs(ce->ce_flags));
+ pos = index_name_pos(istate, ce->name, ce->ce_flags);
pos = -pos-1;
}
return pos + 1;
@@ -810,7 +794,7 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
* valid again, under "assume unchanged" mode.
*/
if (ignore_valid && assume_unchanged &&
- !(ce->ce_flags & htons(CE_VALID)))
+ !(ce->ce_flags & CE_VALID))
; /* mark this one VALID again */
else
return ce;
@@ -826,7 +810,6 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
updated = xmalloc(size);
memcpy(updated, ce, size);
fill_stat_cache_info(updated, &st);
-
/*
* If ignore_valid is not set, we should leave CE_VALID bit
* alone. Otherwise, paths marked with --no-assume-unchanged
@@ -834,8 +817,8 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
* automatically, which is not really what we want.
*/
if (!ignore_valid && assume_unchanged &&
- !(ce->ce_flags & htons(CE_VALID)))
- updated->ce_flags &= ~htons(CE_VALID);
+ !(ce->ce_flags & CE_VALID))
+ updated->ce_flags &= ~CE_VALID;
return updated;
}
@@ -880,7 +863,7 @@ int refresh_index(struct index_state *istate, unsigned int flags, const char **p
/* If we are doing --really-refresh that
* means the index is not valid anymore.
*/
- ce->ce_flags &= ~htons(CE_VALID);
+ ce->ce_flags &= ~CE_VALID;
istate->cache_changed = 1;
}
if (quiet)
@@ -942,16 +925,34 @@ int read_index(struct index_state *istate)
return read_index_from(istate, get_index_file());
}
+static void convert_from_disk(struct ondisk_cache_entry *ondisk, struct cache_entry *ce)
+{
+ ce->ce_ctime = ntohl(ondisk->ctime.sec);
+ ce->ce_mtime = ntohl(ondisk->mtime.sec);
+ ce->ce_dev = ntohl(ondisk->dev);
+ ce->ce_ino = ntohl(ondisk->ino);
+ ce->ce_mode = ntohl(ondisk->mode);
+ ce->ce_uid = ntohl(ondisk->uid);
+ ce->ce_gid = ntohl(ondisk->gid);
+ ce->ce_size = ntohl(ondisk->size);
+ /* On-disk flags are just 16 bits */
+ ce->ce_flags = ntohs(ondisk->flags);
+ hashcpy(ce->sha1, ondisk->sha1);
+ memcpy(ce->name, ondisk->name, ce_namelen(ce)+1);
+}
+
/* remember to discard_cache() before reading a different cache! */
int read_index_from(struct index_state *istate, const char *path)
{
int fd, i;
struct stat st;
- unsigned long offset;
+ unsigned long src_offset, dst_offset;
struct cache_header *hdr;
+ void *mmap;
+ size_t mmap_size;
errno = EBUSY;
- if (istate->mmap)
+ if (istate->alloc)
return istate->cache_nr;
errno = ENOENT;
@@ -967,31 +968,47 @@ int read_index_from(struct index_state *istate, const char *path)
die("cannot stat the open index (%s)", strerror(errno));
errno = EINVAL;
- istate->mmap_size = xsize_t(st.st_size);
- if (istate->mmap_size < sizeof(struct cache_header) + 20)
+ mmap_size = xsize_t(st.st_size);
+ if (mmap_size < sizeof(struct cache_header) + 20)
die("index file smaller than expected");
- istate->mmap = xmmap(NULL, istate->mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+ mmap = xmmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
close(fd);
+ if (mmap == MAP_FAILED)
+ die("unable to map index file");
- hdr = istate->mmap;
- if (verify_hdr(hdr, istate->mmap_size) < 0)
+ hdr = mmap;
+ if (verify_hdr(hdr, mmap_size) < 0)
goto unmap;
istate->cache_nr = ntohl(hdr->hdr_entries);
istate->cache_alloc = alloc_nr(istate->cache_nr);
istate->cache = xcalloc(istate->cache_alloc, sizeof(struct cache_entry *));
- offset = sizeof(*hdr);
+ /*
+ * The disk format is actually larger than the in-memory format,
+ * due to space for nsec etc, so even though the in-memory one
+ * has room for a few more flags, we can allocate using the same
+ * index size
+ */
+ istate->alloc = xmalloc(mmap_size);
+
+ src_offset = sizeof(*hdr);
+ dst_offset = 0;
for (i = 0; i < istate->cache_nr; i++) {
+ struct ondisk_cache_entry *disk_ce;
struct cache_entry *ce;
- ce = (struct cache_entry *)((char *)(istate->mmap) + offset);
- offset = offset + ce_size(ce);
+ disk_ce = (struct ondisk_cache_entry *)((char *)mmap + src_offset);
+ ce = (struct cache_entry *)((char *)istate->alloc + dst_offset);
+ convert_from_disk(disk_ce, ce);
istate->cache[i] = ce;
+
+ src_offset += ondisk_ce_size(ce);
+ dst_offset += ce_size(ce);
}
istate->timestamp = st.st_mtime;
- while (offset <= istate->mmap_size - 20 - 8) {
+ while (src_offset <= mmap_size - 20 - 8) {
/* After an array of active_nr index entries,
* there can be arbitrary number of extended
* sections, each of which is prefixed with
@@ -999,40 +1016,36 @@ int read_index_from(struct index_state *istate, const char *path)
* in 4-byte network byte order.
*/
unsigned long extsize;
- memcpy(&extsize, (char *)(istate->mmap) + offset + 4, 4);
+ memcpy(&extsize, (char *)mmap + src_offset + 4, 4);
extsize = ntohl(extsize);
if (read_index_extension(istate,
- ((const char *) (istate->mmap)) + offset,
- (char *) (istate->mmap) + offset + 8,
+ (const char *) mmap + src_offset,
+ (char *) mmap + src_offset + 8,
extsize) < 0)
goto unmap;
- offset += 8;
- offset += extsize;
+ src_offset += 8;
+ src_offset += extsize;
}
+ munmap(mmap, mmap_size);
return istate->cache_nr;
unmap:
- munmap(istate->mmap, istate->mmap_size);
+ munmap(mmap, mmap_size);
errno = EINVAL;
die("index file corrupt");
}
int discard_index(struct index_state *istate)
{
- int ret;
-
istate->cache_nr = 0;
istate->cache_changed = 0;
istate->timestamp = 0;
cache_tree_free(&(istate->cache_tree));
- if (istate->mmap == NULL)
- return 0;
- ret = munmap(istate->mmap, istate->mmap_size);
- istate->mmap = NULL;
- istate->mmap_size = 0;
+ free(istate->alloc);
+ istate->alloc = NULL;
/* no need to throw away allocated active_cache */
- return ret;
+ return 0;
}
#define WRITE_BUFFER_SIZE 8192
@@ -1144,10 +1157,32 @@ static void ce_smudge_racily_clean_entry(struct cache_entry *ce)
* file, and never calls us, so the cached size information
* for "frotz" stays 6 which does not match the filesystem.
*/
- ce->ce_size = htonl(0);
+ ce->ce_size = 0;
}
}
+static int ce_write_entry(SHA_CTX *c, int fd, struct cache_entry *ce)
+{
+ int size = ondisk_ce_size(ce);
+ struct ondisk_cache_entry *ondisk = xcalloc(1, size);
+
+ ondisk->ctime.sec = htonl(ce->ce_ctime);
+ ondisk->ctime.nsec = 0;
+ ondisk->mtime.sec = htonl(ce->ce_mtime);
+ ondisk->mtime.nsec = 0;
+ ondisk->dev = htonl(ce->ce_dev);
+ ondisk->ino = htonl(ce->ce_ino);
+ ondisk->mode = htonl(ce->ce_mode);
+ ondisk->uid = htonl(ce->ce_uid);
+ ondisk->gid = htonl(ce->ce_gid);
+ ondisk->size = htonl(ce->ce_size);
+ hashcpy(ondisk->sha1, ce->sha1);
+ ondisk->flags = htons(ce->ce_flags);
+ memcpy(ondisk->name, ce->name, ce_namelen(ce));
+
+ return ce_write(c, fd, ondisk, size);
+}
+
int write_index(struct index_state *istate, int newfd)
{
SHA_CTX c;
@@ -1157,7 +1192,7 @@ int write_index(struct index_state *istate, int newfd)
int entries = istate->cache_nr;
for (i = removed = 0; i < entries; i++)
- if (!cache[i]->ce_mode)
+ if (cache[i]->ce_flags & CE_REMOVE)
removed++;
hdr.hdr_signature = htonl(CACHE_SIGNATURE);
@@ -1170,12 +1205,12 @@ int write_index(struct index_state *istate, int newfd)
for (i = 0; i < entries; i++) {
struct cache_entry *ce = cache[i];
- if (!ce->ce_mode)
+ if (ce->ce_flags & CE_REMOVE)
continue;
if (istate->timestamp &&
- istate->timestamp <= ntohl(ce->ce_mtime.sec))
+ istate->timestamp <= ce->ce_mtime)
ce_smudge_racily_clean_entry(ce);
- if (ce_write(&c, newfd, ce, ce_size(ce)) < 0)
+ if (ce_write_entry(&c, newfd, ce) < 0)
return -1;
}
diff --git a/sha1_name.c b/sha1_name.c
index 13e1164..be8489e 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -695,7 +695,7 @@ int get_sha1_with_mode(const char *name, unsigned char *sha1, unsigned *mode)
break;
if (ce_stage(ce) == stage) {
hashcpy(sha1, ce->sha1);
- *mode = ntohl(ce->ce_mode);
+ *mode = ce->ce_mode;
return 0;
}
pos++;
diff --git a/tree.c b/tree.c
index 8c0819f..87708ef 100644
--- a/tree.c
+++ b/tree.c
@@ -142,8 +142,8 @@ static int cmp_cache_name_compare(const void *a_, const void *b_)
ce1 = *((const struct cache_entry **)a_);
ce2 = *((const struct cache_entry **)b_);
- return cache_name_compare(ce1->name, ntohs(ce1->ce_flags),
- ce2->name, ntohs(ce2->ce_flags));
+ return cache_name_compare(ce1->name, ce1->ce_flags,
+ ce2->name, ce2->ce_flags);
}
int read_tree(struct tree *tree, int stage, const char **match)
diff --git a/unpack-trees.c b/unpack-trees.c
index aa2513e..ff46fd6 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -289,7 +289,6 @@ static struct checkout state;
static void check_updates(struct cache_entry **src, int nr,
struct unpack_trees_options *o)
{
- unsigned short mask = htons(CE_UPDATE);
unsigned cnt = 0, total = 0;
struct progress *progress = NULL;
char last_symlink[PATH_MAX];
@@ -297,7 +296,7 @@ static void check_updates(struct cache_entry **src, int nr,
if (o->update && o->verbose_update) {
for (total = cnt = 0; cnt < nr; cnt++) {
struct cache_entry *ce = src[cnt];
- if (!ce->ce_mode || ce->ce_flags & mask)
+ if (ce->ce_flags & (CE_UPDATE | CE_REMOVE))
total++;
}
@@ -310,15 +309,15 @@ static void check_updates(struct cache_entry **src, int nr,
while (nr--) {
struct cache_entry *ce = *src++;
- if (!ce->ce_mode || ce->ce_flags & mask)
+ if (ce->ce_flags & (CE_UPDATE | CE_REMOVE))
display_progress(progress, ++cnt);
- if (!ce->ce_mode) {
+ if (ce->ce_flags & CE_REMOVE) {
if (o->update)
unlink_entry(ce->name, last_symlink);
continue;
}
- if (ce->ce_flags & mask) {
- ce->ce_flags &= ~mask;
+ if (ce->ce_flags & CE_UPDATE) {
+ ce->ce_flags &= ~CE_UPDATE;
if (o->update) {
checkout_entry(ce, &state, NULL);
*last_symlink = '\0';
@@ -408,7 +407,7 @@ static void verify_uptodate(struct cache_entry *ce,
* submodules that are marked to be automatically
* checked out.
*/
- if (S_ISGITLINK(ntohl(ce->ce_mode)))
+ if (S_ISGITLINK(ce->ce_mode))
return;
errno = 0;
}
@@ -450,7 +449,7 @@ static int verify_clean_subdirectory(struct cache_entry *ce, const char *action,
int cnt = 0;
unsigned char sha1[20];
- if (S_ISGITLINK(ntohl(ce->ce_mode)) &&
+ if (S_ISGITLINK(ce->ce_mode) &&
resolve_gitlink_ref(ce->name, "HEAD", sha1) == 0) {
/* If we are not going to update the submodule, then
* we don't care.
@@ -481,7 +480,7 @@ static int verify_clean_subdirectory(struct cache_entry *ce, const char *action,
*/
if (!ce_stage(ce)) {
verify_uptodate(ce, o);
- ce->ce_mode = 0;
+ ce->ce_flags |= CE_REMOVE;
}
cnt++;
}
@@ -568,7 +567,7 @@ static void verify_absent(struct cache_entry *ce, const char *action,
cnt = cache_name_pos(ce->name, strlen(ce->name));
if (0 <= cnt) {
struct cache_entry *ce = active_cache[cnt];
- if (!ce_stage(ce) && !ce->ce_mode)
+ if (ce->ce_flags & CE_REMOVE)
return;
}
@@ -580,7 +579,7 @@ static void verify_absent(struct cache_entry *ce, const char *action,
static int merged_entry(struct cache_entry *merge, struct cache_entry *old,
struct unpack_trees_options *o)
{
- merge->ce_flags |= htons(CE_UPDATE);
+ merge->ce_flags |= CE_UPDATE;
if (old) {
/*
* See if we can re-use the old CE directly?
@@ -601,7 +600,7 @@ static int merged_entry(struct cache_entry *merge, struct cache_entry *old,
invalidate_ce_path(merge);
}
- merge->ce_flags &= ~htons(CE_STAGEMASK);
+ merge->ce_flags &= ~CE_STAGEMASK;
add_cache_entry(merge, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE);
return 1;
}
@@ -613,7 +612,7 @@ static int deleted_entry(struct cache_entry *ce, struct cache_entry *old,
verify_uptodate(old, o);
else
verify_absent(ce, "removed", o);
- ce->ce_mode = 0;
+ ce->ce_flags |= CE_REMOVE;
add_cache_entry(ce, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE);
invalidate_ce_path(ce);
return 1;
@@ -634,7 +633,7 @@ static void show_stage_entry(FILE *o,
else
fprintf(o, "%s%06o %s %d\t%s\n",
label,
- ntohl(ce->ce_mode),
+ ce->ce_mode,
sha1_to_hex(ce->sha1),
ce_stage(ce),
ce->name);
@@ -920,7 +919,7 @@ int oneway_merge(struct cache_entry **src,
struct stat st;
if (lstat(old->name, &st) ||
ce_match_stat(old, &st, CE_MATCH_IGNORE_VALID))
- old->ce_flags |= htons(CE_UPDATE);
+ old->ce_flags |= CE_UPDATE;
}
return keep_entry(old, o);
}
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH] index: be careful when handling long names
2008-01-19 3:25 Updated in-memory index cleanup Linus Torvalds
@ 2008-01-19 7:42 ` Junio C Hamano
0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2008-01-19 7:42 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List
We currently use lower 12-bit (masked with CE_NAMEMASK) in the
ce_flags field to store the length of the name in cache_entry,
without checking the length parameter given to
create_ce_flags(). This can make us store incorrect length.
Currently we are mostly protected by the fact that many
codepaths first copy the path in a variable of size PATH_MAX,
which typically is 4096 that happens to match the limit, but
that feels like a bug waiting to happen. Besides, that would
not allow us to shorten the width of CE_NAMEMASK to use the bits
for new flags.
This redefines the meaning of the name length stored in the
cache_entry. A name that does not fit is represented by storing
CE_NAMEMASK in the field, and the actual length needs to be
computed by actually counting the bytes in the name[] field.
This way, only the unusually long paths need to suffer.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
* This is rebased on your "Updated in-memory index" patch.
When we read on-disk index, I fear that we trust the name
length a bit too much. With or without this patch, you could
craft a malicious index file that records a namelen that is
longer than the real string name length for the last entry to
cause us to go beyond the mmaped area. Maybe we would want
to make sure that (1) the name lengths are sane; (2) names
have NUL at the place we expect them to be; and (3) names are
sorted in the proper cache order, while we iterate over the
ondisk structure and convert it to incore structure.
cache.h | 17 +++++++++++++++--
read-cache.c | 12 +++++++++++-
t/t0000-basic.sh | 20 ++++++++++++++++++++
3 files changed, 46 insertions(+), 3 deletions(-)
diff --git a/cache.h b/cache.h
index 4a054c5..9eaffde 100644
--- a/cache.h
+++ b/cache.h
@@ -131,8 +131,21 @@ struct cache_entry {
#define CE_UPDATE (0x10000)
#define CE_REMOVE (0x20000)
-#define create_ce_flags(len, stage) ((len) | ((stage) << CE_STAGESHIFT))
-#define ce_namelen(ce) (CE_NAMEMASK & (ce)->ce_flags)
+static inline unsigned create_ce_flags(size_t len, unsigned stage)
+{
+ if (len >= CE_NAMEMASK)
+ len = CE_NAMEMASK;
+ return (len | (stage << CE_STAGESHIFT));
+}
+
+static inline size_t ce_namelen(const struct cache_entry *ce)
+{
+ size_t len = ce->ce_flags & CE_NAMEMASK;
+ if (len < CE_NAMEMASK)
+ return len;
+ return strlen(ce->name + CE_NAMEMASK) + CE_NAMEMASK;
+}
+
#define ce_size(ce) cache_entry_size(ce_namelen(ce))
#define ondisk_ce_size(ce) ondisk_cache_entry_size(ce_namelen(ce))
#define ce_stage(ce) ((CE_STAGEMASK & (ce)->ce_flags) >> CE_STAGESHIFT)
diff --git a/read-cache.c b/read-cache.c
index f5f9c3d..f98f57c 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -927,6 +927,8 @@ int read_index(struct index_state *istate)
static void convert_from_disk(struct ondisk_cache_entry *ondisk, struct cache_entry *ce)
{
+ size_t len;
+
ce->ce_ctime = ntohl(ondisk->ctime.sec);
ce->ce_mtime = ntohl(ondisk->mtime.sec);
ce->ce_dev = ntohl(ondisk->dev);
@@ -938,7 +940,15 @@ static void convert_from_disk(struct ondisk_cache_entry *ondisk, struct cache_en
/* On-disk flags are just 16 bits */
ce->ce_flags = ntohs(ondisk->flags);
hashcpy(ce->sha1, ondisk->sha1);
- memcpy(ce->name, ondisk->name, ce_namelen(ce)+1);
+
+ len = ce->ce_flags & CE_NAMEMASK;
+ if (len == CE_NAMEMASK)
+ len = strlen(ondisk->name);
+ /*
+ * NEEDSWORK: If the original index is crafted, this copy could
+ * go unchecked.
+ */
+ memcpy(ce->name, ondisk->name, len + 1);
}
/* remember to discard_cache() before reading a different cache! */
diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
index 4e49d59..9f84b8d 100755
--- a/t/t0000-basic.sh
+++ b/t/t0000-basic.sh
@@ -297,4 +297,24 @@ test_expect_success 'absolute path works as expected' '
test "$sym" = "$(test-absolute-path $dir2/syml)"
'
+test_expect_success 'very long name in the index handled sanely' '
+
+ a=a && # 1
+ a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 16
+ a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 256
+ a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 4096
+ a=${a}q &&
+
+ >path4 &&
+ git update-index --add path4 &&
+ (
+ git ls-files -s path4 |
+ sed -e "s/ .*/ /" |
+ tr -d "\012"
+ echo "$a"
+ ) | git update-index --index-info &&
+ len=$(git ls-files "a*" | wc -c) &&
+ test $len = 4098
+'
+
test_done
--
1.5.4.rc3.37.gfdcf3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* performance problem: "git commit filename"
@ 2008-01-12 22:46 Linus Torvalds
2008-01-13 1:46 ` Linus Torvalds
0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2008-01-12 22:46 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List; +Cc: Kristian Høgsberg
I thought we had fixed this long long ago, but if we did, it has
re-surfaced.
Using an explicit filename with "git commit" is _extremely_ slow. Lookie
here:
[torvalds@woody linux]$ time git commit fs/exec.c
no changes added to commit (use "git add" and/or "git commit -a")
real 0m1.671s
user 0m1.200s
sys 0m0.328s
that's closer to two seconds on a fast machine, with the whole tree
cached!
And for the uncached case, it's just unbearably slow: two and a half
*minutes*.
In contrast, without the filename, it's much faster:
[torvalds@woody linux]$ time git commit
no changes added to commit (use "git add" and/or "git commit -a")
real 0m0.387s
user 0m0.220s
sys 0m0.168s
with the cold-cache case now being "just" 18s (which is still long, but
we're talking eight times faster, and certainly not unbearable!)
Doing an "strace -c" on the thing shows why. In the filename case, we
have:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
32.69 0.000868 0 92299 37 lstat
17.40 0.000462 0 29958 3993 open
15.78 0.000419 0 5522 getdents
15.56 0.000413 0 23165 mmap
11.37 0.000302 0 23118 munmap
5.76 0.000153 0 25966 2 close
1.43 0.000038 0 2845 fstat
...
and in the non-filename case we have
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
53.67 0.000600 0 69227 31 lstat
23.35 0.000261 0 5522 getdents
11.09 0.000124 2 55 munmap
4.20 0.000047 0 285 write
3.31 0.000037 0 5537 2638 open
2.33 0.000026 0 2899 1 close
2.06 0.000023 0 2844 fstat
...
notice how the expensive case has a lot of successful open/mmap/munmap
calls: it is *literally* ignoring the valid entries in the old index
entirely, and re-hashing every single file in the tree! No wonder it is
slow!
Just counting "lstat()" calls, it's worth noticing that the non-filename
case seems to do three lstat's for each index entry (and yes, that's two
too many), but the named file case has upped that to *four* lstats per
entry, and then added the one open/mmap/munmap/close on top of that!
I'm pretty sure we didn't use to do things this badly. And if this is a
regression like I think it is, it should be fixed before a real 1.5.4
release.
I'll try to see if I can see what's up, but I thought I'd better let
others know too, in case I don't have time. I *suspect* (but have nothing
what-so-ever to back that up) that this happened as part of making commit
a builtin.
Linus
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: performance problem: "git commit filename"
2008-01-12 22:46 performance problem: "git commit filename" Linus Torvalds
@ 2008-01-13 1:46 ` Linus Torvalds
2008-01-13 4:04 ` Linus Torvalds
0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2008-01-13 1:46 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List; +Cc: Kristian H?gsberg
On Sat, 12 Jan 2008, Linus Torvalds wrote:
>
> I thought we had fixed this long long ago, but if we did, it has
> re-surfaced.
It's new, and yes, it seems to be due to the new builtin-commit.c.
I think I know what is going on.
In the old git-commit.sh, this case used to be handled with
TMP_INDEX="$GIT_DIR/tmp-index$$"
GIT_INDEX_FILE="$THIS_INDEX" \
git read-tree --index-output="$TMP_INDEX" -i -m HEAD
which is a one-way merge of the *old* index and HEAD, taking the index
information from the old index, but the actual file information from HEAD
(to then later be updated by the named files).
This logic is implemented by builtin-read-tree.c with
struct unpack_trees_options opts;
..
opts.fn = oneway_merge;
..
unpack_trees(nr_trees, t, &opts);
where all the magic is done by that "oneway_merge()" function being called
for each entry by unpack_trees(). This does everything right, and the
result is that any index entry that was up-to-date in the old index and
unchanged in the base tree will be up-to-date in the new index too
HOWEVER. When that logic was converted from that shell-script into a
builtin-commit.c, that conversion was not done correctly. The old "git
read-tree -i -m" was not translated as a "unpack_trees()" call, but as
this in prepare_index():
discard_cache()
..
tree = parse_tree_indirect(head_sha1);
..
read_tree(tree, 0, NULL)
which is very wrong, because it replaces the old index entirely, and
doesn't do that stat information merging.
As a result, the index that is created by read-tree is totally bogus in
the stat cache, and yes, everything will have to be re-computed.
Kristian?
Linus
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: performance problem: "git commit filename"
2008-01-13 1:46 ` Linus Torvalds
@ 2008-01-13 4:04 ` Linus Torvalds
2008-01-13 8:12 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2008-01-13 4:04 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List; +Cc: Kristian H?gsberg
On Sat, 12 Jan 2008, Linus Torvalds wrote:
>
> HOWEVER. When that logic was converted from that shell-script into a
> builtin-commit.c, that conversion was not done correctly. The old "git
> read-tree -i -m" was not translated as a "unpack_trees()" call, but as
> this in prepare_index():
>
> discard_cache()
> ..
> tree = parse_tree_indirect(head_sha1);
> ..
> read_tree(tree, 0, NULL)
>
> which is very wrong, because it replaces the old index entirely, and
> doesn't do that stat information merging.
This patch may or may not fix it.
It makes builtin-commit.c use the same logic that "git read-tree -i -m"
does (which is what the old shell script did), and it seems to pass the
test-suite, and it looks pretty obvious.
It also brings down the number of open/mmap/munmap/close calls to where it
should be, although it still does *way* too many "lstat()" operations (ie
it does 4*lstat for each file in the index - one more than the
non-filename one does).
With that fixed, performance is also roughly where it should be (ie the
17-18s for the cold-cache case), because it no longer needs to rehash all
the files!
HOWEVER. This was just a quick hack, and while it all looks sane, this is
some damn core code. Somebody else should double- and triple-check this.
[ That 4x lstat thing bothers me. I think we should add a flag to the
index saying "we checked this once already, it's clean", so that if we
do multiple passes over the index, we can still do just a single lstat()
on just the first pass. But that's a separate issue.
On Linux, a cached lstat() is almost free. Well, at least compared to
all the crap operating systems out there. And obviously, if you do
multiple lstat's per file, all but the first one *will* be cached.
However, "almost free" still isn't zero, and with the kernel having 23k
files in it, doing almost a hundred thousand lstat's is still something
that only takes about half a second or so for me. We _really_ should do
only ~23k or so of them, and the cached cache should take on the order
of 0.15s, rather than half a second!
So this is worth optimizing. With bigger repositories, it's going to be
more noticeable, and with other operating systems, all those lstat()'s
will cost much _much_ more. Of course, any IO overhead will be much
bigger, so this is mostly a cached-case issue, but cached-case is still
important.. ]
Anyway, consider this being conditionally signed-off-by: me, assuming
a few other people spend a bit of time double-checking all my logic.
Please?
Linus
---
builtin-commit.c | 37 ++++++++++++++++++++++++++++---------
1 files changed, 28 insertions(+), 9 deletions(-)
diff --git a/builtin-commit.c b/builtin-commit.c
index 73f1e35..cc5134e 100644
--- a/builtin-commit.c
+++ b/builtin-commit.c
@@ -21,6 +21,7 @@
#include "utf8.h"
#include "parse-options.h"
#include "path-list.h"
+#include "unpack-trees.h"
static const char * const builtin_commit_usage[] = {
"git-commit [options] [--] <filepattern>...",
@@ -177,10 +178,34 @@ static void add_remove_files(struct path_list *list)
}
}
+static void create_base_index(void)
+{
+ struct tree *tree;
+ struct unpack_trees_options opts;
+ struct tree_desc t;
+
+ if (initial_commit) {
+ discard_cache();
+ return;
+ }
+
+ memset(&opts, 0, sizeof(opts));
+ opts.head_idx = 1;
+ opts.index_only = 1;
+ opts.merge = 1;
+
+ opts.fn = oneway_merge;
+ tree = parse_tree_indirect(head_sha1);
+ if (!tree)
+ die("failed to unpack HEAD tree object");
+ parse_tree(tree);
+ init_tree_desc(&t, tree->buffer, tree->size);
+ unpack_trees(1, &t, &opts);
+}
+
static char *prepare_index(int argc, const char **argv, const char *prefix)
{
int fd;
- struct tree *tree;
struct path_list partial;
const char **pathspec = NULL;
@@ -278,14 +303,8 @@ static char *prepare_index(int argc, const char **argv, const char *prefix)
fd = hold_lock_file_for_update(&false_lock,
git_path("next-index-%d", getpid()), 1);
- discard_cache();
- if (!initial_commit) {
- tree = parse_tree_indirect(head_sha1);
- if (!tree)
- die("failed to unpack HEAD tree object");
- if (read_tree(tree, 0, NULL))
- die("failed to read HEAD tree object");
- }
+
+ create_base_index();
add_remove_files(&partial);
refresh_cache(REFRESH_QUIET);
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: performance problem: "git commit filename"
2008-01-13 4:04 ` Linus Torvalds
@ 2008-01-13 8:12 ` Junio C Hamano
2008-01-13 10:33 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2008-01-13 8:12 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List, Kristian H?gsberg
Linus Torvalds <torvalds@linux-foundation.org> writes:
> HOWEVER. This was just a quick hack, and while it all looks sane, this is
> some damn core code. Somebody else should double- and triple-check this.
Double-checked. The patch looks sane.
> [ That 4x lstat thing bothers me. I think we should add a flag to the
> index saying "we checked this once already, it's clean", so that if we
> do multiple passes over the index, we can still do just a single lstat()
> on just the first pass. But that's a separate issue.
I've thought about this in a different context before, but it
seemed quite tricky, as some codepaths in more complex commands
(commit being one of them) tend to use the cache and discard to
use it for different purpose (like creating temporary index and
then reading the real index). Besides, I had an impression that
we ran out of the bits of ce_flags in the cache entry, although
we could shorten the maximum path we support from 4k down to 2k
bytes. I'll have to think about this a bit more.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: performance problem: "git commit filename"
2008-01-13 8:12 ` Junio C Hamano
@ 2008-01-13 10:33 ` Junio C Hamano
2008-01-13 17:24 ` Linus Torvalds
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2008-01-13 10:33 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List, Kristian H?gsberg
Junio C Hamano <gitster@pobox.com> writes:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> HOWEVER. This was just a quick hack, and while it all looks sane, this is
>> some damn core code. Somebody else should double- and triple-check this.
>
> Double-checked. The patch looks sane.
>
>> [ That 4x lstat thing bothers me. I think we should add a flag to the
>> index saying "we checked this once already, it's clean", so that if we
>> do multiple passes over the index, we can still do just a single lstat()
>> on just the first pass. But that's a separate issue.
>
> I've thought about this in a different context before, but it
> seemed quite tricky, as some codepaths in more complex commands
> (commit being one of them) tend to use the cache and discard to
> use it for different purpose (like creating temporary index and
> then reading the real index). Besides, I had an impression that
> we ran out of the bits of ce_flags in the cache entry, although
> we could shorten the maximum path we support from 4k down to 2k
> bytes. I'll have to think about this a bit more.
Aside from the lstat(2) done for work tree files, there are
quite many lstat(2) calls in refname dwimming codepath. I am
not currently looking into reducing them.
Some observations.
* In the partial commit codepath, we run one file_exists() and
then one add_file_to_index(), each of which has lstat(2) on
the same pathname, for the paths to be committed. This can
be reduced to one trivially by changing the calling
convention of add_file_to_index(). This is done in order to
build the temporary index file to be written out as a tree
for committing (and the index file needs to be written to be
shown to the pre-commit hook).
* Then refresh_cache_ent(), which has one lstat(2), is called
for everybody in the temporary index, in order to refresh the
temporary index. Again this cannot be avoided because we
promise to show the pre-commit hook a refreshed index.
* Then we run file_exists() and add_file_to_index() to update
the real index.
* And refresh_cache_ent() is run for everybody in the real
index.
* The final diffstat phase runs lstat(2) from
reuse_worktree_file() codepath to see if the work tree is up
to date, and read from the work tree files instead of having
to explode them from the object store.
The attached is a quick and dirty hack which may or may not
help. It all looks sane, this also is some core code, and meant
only for discussion and not application.
* It adds a new ce_flag, CE_UPTODATE, that is meant to mark the
cache entries that record a regular file blob that is up to
date in the work tree.
* I had to reduce the maximum length of allowed pathname from
4k down to 2k for the above. Incidentally I noticed that we
do not check the length of pathname does not exceed what we
can express with CE_NAMEMASK, which we may want to fix (and
make it barf if somebody tries to add too long a path)
independently from this issue. A low hanging fruit for
janitors -- hint, hint.
* fill_stat_cache_info() marks the cache entry it just added
with CE_UPTODATE. This has the effect of marking the paths
we write out of the index and lstat(2) immediately as "no
need to lstat -- we know it is up-to-date", from quite a lot
fo callers:
- git-apply --index
- git-update-index
- git-checkout-index
- git-add (uses add_file_to_index())
- git-commit (ditto)
- git-mv (ditto)
* write_index is changed not to write CE_UPTODATE out to the
index file, because CE_UPTODATE is meant to be transient only
in core. For the same reason, CE_UPDATE is not written to
prevent an accident from happening.
The fact that we write out a temporary index and then rebuild
the real index means CE_UPTODATE flag we populate in the
temporary index is lost and we still need to lstat(2) while
building the real index, which is a bit unfortunate. I suspect
that we can use the one-way merge to reset the index when
building the real index after we are done building the temporary
index, instead of discarding the in-core temporary index and
re-reading the real index.
---
cache.h | 3 ++-
diff.c | 4 ++++
read-cache.c | 10 ++++++++++
3 files changed, 16 insertions(+), 1 deletions(-)
diff --git a/cache.h b/cache.h
index 39331c2..9b950fc 100644
--- a/cache.h
+++ b/cache.h
@@ -108,7 +108,8 @@ struct cache_entry {
char name[FLEX_ARRAY]; /* more */
};
-#define CE_NAMEMASK (0x0fff)
+#define CE_NAMEMASK (0x07ff)
+#define CE_UPTODATE (0x0800)
#define CE_STAGEMASK (0x3000)
#define CE_UPDATE (0x4000)
#define CE_VALID (0x8000)
diff --git a/diff.c b/diff.c
index b18c140..41847ae 100644
--- a/diff.c
+++ b/diff.c
@@ -1510,6 +1510,10 @@ static int reuse_worktree_file(const char *name, const unsigned char *sha1, int
if (pos < 0)
return 0;
ce = active_cache[pos];
+
+ if (ce->ce_flags & htons(CE_UPTODATE))
+ return 1;
+
if ((lstat(name, &st) < 0) ||
!S_ISREG(st.st_mode) || /* careful! */
ce_match_stat(ce, &st, 0) ||
diff --git a/read-cache.c b/read-cache.c
index 7db5588..3a90db1 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -44,6 +44,9 @@ void fill_stat_cache_info(struct cache_entry *ce, struct stat *st)
if (assume_unchanged)
ce->ce_flags |= htons(CE_VALID);
+
+ if (S_ISREG(st->st_mode))
+ ce->ce_flags |= htons(CE_UPTODATE);
}
static int ce_compare_data(struct cache_entry *ce, struct stat *st)
@@ -794,6 +797,9 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
int changed, size;
int ignore_valid = options & CE_MATCH_IGNORE_VALID;
+ if (ce->ce_flags & htons(CE_UPTODATE))
+ return ce;
+
if (lstat(ce->name, &st) < 0) {
if (err)
*err = errno;
@@ -1170,13 +1176,17 @@ int write_index(struct index_state *istate, int newfd)
for (i = 0; i < entries; i++) {
struct cache_entry *ce = cache[i];
+ unsigned short ce_flags;
if (!ce->ce_mode)
continue;
if (istate->timestamp &&
istate->timestamp <= ntohl(ce->ce_mtime.sec))
ce_smudge_racily_clean_entry(ce);
+ ce_flags = ce->ce_flags;
+ ce->ce_flags &= htons(~(CE_UPDATE | CE_UPTODATE));
if (ce_write(&c, newfd, ce, ce_size(ce)) < 0)
return -1;
+ ce->ce_flags = ce_flags;
}
/* Write extension data here */
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: performance problem: "git commit filename"
2008-01-13 10:33 ` Junio C Hamano
@ 2008-01-13 17:24 ` Linus Torvalds
2008-01-13 19:39 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2008-01-13 17:24 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List, Kristian H?gsberg
On Sun, 13 Jan 2008, Junio C Hamano wrote:
>
> The attached is a quick and dirty hack which may or may not
> help. It all looks sane, this also is some core code, and meant
> only for discussion and not application.
I don't think this will help.
You never set CE_UPTODATE, except in the "fill_stat_cache_info()"
function, but that one will never be called for an old file that already
matched the stat.
So at a minimum, you should also make ie_match_stat() set CE_UPTODATE if
it matches. Or something.
Linus
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: performance problem: "git commit filename"
2008-01-13 17:24 ` Linus Torvalds
@ 2008-01-13 19:39 ` Junio C Hamano
2008-01-13 22:36 ` [PATCH] index: be careful when handling long names Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2008-01-13 19:39 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List, Kristian H?gsberg
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Sun, 13 Jan 2008, Junio C Hamano wrote:
>>
>> The attached is a quick and dirty hack which may or may not
>> help. It all looks sane, this also is some core code, and meant
>> only for discussion and not application.
>
> I don't think this will help.
>
> You never set CE_UPTODATE, except in the "fill_stat_cache_info()"
> function, but that one will never be called for an old file that already
> matched the stat.
>
> So at a minimum, you should also make ie_match_stat() set CE_UPTODATE if
> it matches. Or something.
Unfortunately ie_match_stat() is too late. The caller is
supposed to have already called lstat(2) and give the result to
that function.
When refresh_cache_ent() finds the entry actually matched, we
could mark the path with CE_UPTODATE. That would be a
relatively contained and safe optimization that might help
git-commit.
About the CE_NAMEMASK limitation (and currently we do not check
it, so I think we would be screwed when a pathname that is
longer than (CE_NAMEMASK+1) and still fits under PATH_MAX is
given), I think we do not have to limit the maximum pathname
length. Instead we can teach create_ce_flags() and ce_namelen()
that a name longer than 2k (or 4k) has the NAMEMASK bits that
are all 1 and ce->name[] must be counted if so (with an obvious
optimization to start counting at byte position 2k or 4k in
ce_namelen()).
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] index: be careful when handling long names
2008-01-13 19:39 ` Junio C Hamano
@ 2008-01-13 22:36 ` Junio C Hamano
2008-01-13 22:53 ` Alex Riesen
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2008-01-13 22:36 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List
We currently use lower 12-bit (masked with CE_NAMEMASK) in the
ce_flags field to store the length of the name in cache_entry,
without checking the length parameter given to
create_ce_flags(). This can make us store incorrect length.
Currently we are mostly protected by the fact that many
codepaths first copy the path in a variable of size PATH_MAX,
which typically is 4096 that happens to match the limit, but
that feels like a bug waiting to happen. Besides, that would
not allow us to shorten the width of CE_NAMEMASK to use the bits
for new flags.
This redefines the meaning of the name length stored in the
cache_entry. A name that does not fit is represented by storing
CE_NAMEMASK in the field, and the actual length needs to be
computed by actually counting the bytes in the name[] field.
This way, only the unusually long paths need to suffer.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Junio C Hamano <gitster@pobox.com> writes:
> About the CE_NAMEMASK limitation (and currently we do not check
> it, so I think we would be screwed when a pathname that is
> longer than (CE_NAMEMASK+1) and still fits under PATH_MAX is
> given), I think we do not have to limit the maximum pathname
> length. Instead we can teach create_ce_flags() and ce_namelen()
> that a name longer than 2k (or 4k) has the NAMEMASK bits that
> are all 1 and ce->name[] must be counted if so (with an obvious
> optimization to start counting at byte position 2k or 4k in
> ce_namelen()).
This should fix it. Passes tests including the new test that
the existing code fails.
cache.h | 17 +++++++++++++++--
t/t0000-basic.sh | 18 ++++++++++++++++++
2 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/cache.h b/cache.h
index 39331c2..ad53acc 100644
--- a/cache.h
+++ b/cache.h
@@ -114,8 +114,21 @@ struct cache_entry {
#define CE_VALID (0x8000)
#define CE_STAGESHIFT 12
-#define create_ce_flags(len, stage) htons((len) | ((stage) << CE_STAGESHIFT))
-#define ce_namelen(ce) (CE_NAMEMASK & ntohs((ce)->ce_flags))
+static inline unsigned create_ce_flags(size_t len, unsigned stage)
+{
+ if (len >= CE_NAMEMASK)
+ len = CE_NAMEMASK;
+ return htons(len | (stage << CE_STAGESHIFT));
+}
+
+static inline size_t ce_namelen(const struct cache_entry *ce)
+{
+ size_t len = ntohs((ce)->ce_flags) & CE_NAMEMASK;
+ if (len < CE_NAMEMASK) /* likely */
+ return len;
+ return strlen(ce->name + CE_NAMEMASK) + CE_NAMEMASK;
+}
+
#define ce_size(ce) cache_entry_size(ce_namelen(ce))
#define ce_stage(ce) ((CE_STAGEMASK & ntohs((ce)->ce_flags)) >> CE_STAGESHIFT)
diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
index 4e49d59..40551a3 100755
--- a/t/t0000-basic.sh
+++ b/t/t0000-basic.sh
@@ -297,4 +297,22 @@ test_expect_success 'absolute path works as expected' '
test "$sym" = "$(test-absolute-path $dir2/syml)"
'
+test_expect_success 'very long name in the index handled sanely' '
+
+ a=a && # 1
+ a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 16
+ a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 256
+ a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 4096
+ a=${a}q
+
+ >path4 &&
+ git update-index --add path4 &&
+ (
+ git ls-files -s path4 |
+ sed -e "s/ .*/ /" |
+ tr -d "\012"
+ echo "$a"
+ ) | git update-index --index-info
+'
+
test_done
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] index: be careful when handling long names
2008-01-13 22:36 ` [PATCH] index: be careful when handling long names Junio C Hamano
@ 2008-01-13 22:53 ` Alex Riesen
2008-01-13 23:08 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Alex Riesen @ 2008-01-13 22:53 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List
Junio C Hamano, Sun, Jan 13, 2008 23:36:34 +0100:
> +test_expect_success 'very long name in the index handled sanely' '
> +
> + a=a && # 1
> + a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 16
> + a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 256
> + a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 4096
I'd expect it to fail on some systems (everywindowsthing up to w2k,
maybe some commercial unices).
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] index: be careful when handling long names
2008-01-13 22:53 ` Alex Riesen
@ 2008-01-13 23:08 ` Junio C Hamano
2008-01-13 23:33 ` Alex Riesen
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2008-01-13 23:08 UTC (permalink / raw)
To: Alex Riesen; +Cc: Linus Torvalds, Git Mailing List
Alex Riesen <raa.lkml@gmail.com> writes:
> Junio C Hamano, Sun, Jan 13, 2008 23:36:34 +0100:
>> +test_expect_success 'very long name in the index handled sanely' '
>> +
>> + a=a && # 1
>> + a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 16
>> + a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 256
>> + a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 4096
>
> I'd expect it to fail on some systems (everywindowsthing up to w2k,
> maybe some commercial unices).
My understanding is that Everywindowsthing do not come with any
(POSIX compliant) shell that we support by default, so if you
are talking about a limit of shell variable value, I do not
think it is an issue to begin with. It is just the matter of
picking a sensible shell (I understand both Cygwin and msys
ports use a shell that supports more than 4k bytes in value
given to a variable).
I would agree that it might overflow the argument limit when
this is given to "echo", though. We cannot do much about it,
but you may have cleverer ideas.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] index: be careful when handling long names
2008-01-13 23:08 ` Junio C Hamano
@ 2008-01-13 23:33 ` Alex Riesen
2008-01-14 21:03 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Alex Riesen @ 2008-01-13 23:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List
Junio C Hamano, Mon, Jan 14, 2008 00:08:07 +0100:
> Alex Riesen <raa.lkml@gmail.com> writes:
>
> > Junio C Hamano, Sun, Jan 13, 2008 23:36:34 +0100:
> >> +test_expect_success 'very long name in the index handled sanely' '
> >> +
> >> + a=a && # 1
> >> + a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 16
> >> + a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 256
> >> + a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 4096
> >
> > I'd expect it to fail on some systems (everywindowsthing up to w2k,
> > maybe some commercial unices).
>
> My understanding is that Everywindowsthing do not come with any
> (POSIX compliant) shell that we support by default, so if you
> are talking about a limit of shell variable value, I do not
> think it is an issue to begin with. It is just the matter of
Oh, right. The file system wont even see it, it is passed directly to
update-index.
> picking a sensible shell (I understand both Cygwin and msys
> ports use a shell that supports more than 4k bytes in value
> given to a variable).
can't check right now, but I believe it is so
> I would agree that it might overflow the argument limit when
> this is given to "echo", though. We cannot do much about it,
> but you may have cleverer ideas.
I thought about conditionally disabling the test, like it was done
when the tabs in filenames had to be tested. Wont be needed for this
particular case.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] index: be careful when handling long names
2008-01-13 23:33 ` Alex Riesen
@ 2008-01-14 21:03 ` Junio C Hamano
0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2008-01-14 21:03 UTC (permalink / raw)
To: Alex Riesen; +Cc: Linus Torvalds, Git Mailing List
Alex Riesen <raa.lkml@gmail.com> writes:
> Junio C Hamano, Mon, Jan 14, 2008 00:08:07 +0100:
> ...
>> I would agree that it might overflow the argument limit when
>> this is given to "echo", though. We cannot do much about it,
>> but you may have cleverer ideas.
>
> I thought about conditionally disabling the test, like it was done
> when the tabs in filenames had to be tested.
Yes, we can and should do that when somebody reports this (or
any other tests) a real issue, as we have done for other tests.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-07-20 0:52 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-20 0:51 [PATCH] index: be careful when handling long names Junio C Hamano
-- strict thread matches above, loose matches on Subject: below --
2008-01-19 3:25 Updated in-memory index cleanup Linus Torvalds
2008-01-19 7:42 ` [PATCH] index: be careful when handling long names Junio C Hamano
2008-01-12 22:46 performance problem: "git commit filename" Linus Torvalds
2008-01-13 1:46 ` Linus Torvalds
2008-01-13 4:04 ` Linus Torvalds
2008-01-13 8:12 ` Junio C Hamano
2008-01-13 10:33 ` Junio C Hamano
2008-01-13 17:24 ` Linus Torvalds
2008-01-13 19:39 ` Junio C Hamano
2008-01-13 22:36 ` [PATCH] index: be careful when handling long names Junio C Hamano
2008-01-13 22:53 ` Alex Riesen
2008-01-13 23:08 ` Junio C Hamano
2008-01-13 23:33 ` Alex Riesen
2008-01-14 21:03 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).