* [PATCH 0/3] stop storing trailing slash in dir-hash @ 2013-09-13 11:15 Eric Sunshine 2013-09-13 11:15 ` [PATCH 1/3] name-hash: refactor polymorphic index_name_exists() Eric Sunshine ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Eric Sunshine @ 2013-09-13 11:15 UTC (permalink / raw) To: git Cc: Eric Sunshine, Junio C Hamano, Jeff King, Brian Gernhardt, Jonathan Nieder This series alters name-hash so that it no longer stores the (redundant) trailing slash with index_state.dir_hash entries. As an intentional side-effect, the series fixes [1] in a cleaner way (suggested by Junio [2]) than either [3] (680be044 in master) or [4]. As noted by Peff [5], this change is at a fairly fundamental level, so care has been taken to ensure that all tests still pass (thus it at least does not break anything covered by the tests). [1]: http://thread.gmane.org/gmane.comp.version-control.git/232727 [2]: http://thread.gmane.org/gmane.comp.version-control.git/232727/focus=232813 [3]: http://thread.gmane.org/gmane.comp.version-control.git/232796 [4]: http://thread.gmane.org/gmane.comp.version-control.git/232833 [5]: http://thread.gmane.org/gmane.comp.version-control.git/232727/focus=232822 Eric Sunshine (3): name-hash: refactor polymorphic index_name_exists() name-hash: stop storing trailing '/' on paths in index_state.dir_hash dir: revert work-around for retired dangerous behavior cache.h | 2 ++ dir.c | 25 +++++++------------------ name-hash.c | 54 ++++++++++++++++++++++++++++-------------------------- read-cache.c | 2 +- 4 files changed, 38 insertions(+), 45 deletions(-) -- 1.8.4.457.g424cb08 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/3] name-hash: refactor polymorphic index_name_exists() 2013-09-13 11:15 [PATCH 0/3] stop storing trailing slash in dir-hash Eric Sunshine @ 2013-09-13 11:15 ` Eric Sunshine 2013-09-13 18:40 ` Junio C Hamano 2013-09-13 11:15 ` [PATCH 2/3] name-hash: stop storing trailing '/' on paths in index_state.dir_hash Eric Sunshine 2013-09-13 11:15 ` [PATCH 3/3] dir: revert work-around for retired dangerous behavior Eric Sunshine 2 siblings, 1 reply; 9+ messages in thread From: Eric Sunshine @ 2013-09-13 11:15 UTC (permalink / raw) To: git Cc: Eric Sunshine, Junio C Hamano, Jeff King, Brian Gernhardt, Jonathan Nieder Depending upon the absence or presence of a trailing '/' on the incoming pathname, index_name_exists() checks either if a file is present in the index or if a directory is represented within the index. Each caller explicitly chooses the mode of operation by adding or removing a trailing '/' before invoking index_name_exists(). Since these two modes of operations are disjoint and have no code in common (one searches index_state.name_hash; the other dir_hash), they can be represented more naturally as distinct functions: one to search for a file, and one for a directory. Splitting index searching into two functions relieves callers of the artificial burden of having to add or remove a slash to select the mode of operation; instead they just call the desired function. A subsequent patch will take advantage of this benefit in order to eliminate the requirement that the incoming pathname for a directory search must have a trailing slash. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> --- cache.h | 2 ++ dir.c | 7 ++++--- name-hash.c | 47 ++++++++++++++++++++++++----------------------- read-cache.c | 2 +- 4 files changed, 31 insertions(+), 27 deletions(-) diff --git a/cache.h b/cache.h index 9ef778a..03ec24c 100644 --- a/cache.h +++ b/cache.h @@ -314,6 +314,7 @@ extern void free_name_hash(struct index_state *istate); #define refresh_cache(flags) refresh_index(&the_index, (flags), NULL, NULL, NULL) #define ce_match_stat(ce, st, options) ie_match_stat(&the_index, (ce), (st), (options)) #define ce_modified(ce, st, options) ie_modified(&the_index, (ce), (st), (options)) +#define cache_dir_exists(name, namelen) index_dir_exists(&the_index, (name), (namelen)) #define cache_name_exists(name, namelen, igncase) index_name_exists(&the_index, (name), (namelen), (igncase)) #define cache_name_is_other(name, namelen) index_name_is_other(&the_index, (name), (namelen)) #define resolve_undo_clear() resolve_undo_clear_index(&the_index) @@ -463,6 +464,7 @@ extern int write_index(struct index_state *, int newfd); extern int discard_index(struct index_state *); extern int unmerged_index(const struct index_state *); extern int verify_path(const char *path); +extern struct cache_entry *index_dir_exists(struct index_state *istate, const char *name, int namelen); extern struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int igncase); extern int index_name_pos(const struct index_state *, const char *name, int namelen); #define ADD_CACHE_OK_TO_ADD 1 /* Ok to add */ diff --git a/dir.c b/dir.c index b439ff0..0080673 100644 --- a/dir.c +++ b/dir.c @@ -860,7 +860,8 @@ static struct dir_entry *dir_entry_new(const char *pathname, int len) static struct dir_entry *dir_add_name(struct dir_struct *dir, const char *pathname, int len) { - if (cache_name_exists(pathname, len, ignore_case)) + if ((len == 0 || pathname[len - 1] != '/') && + cache_name_exists(pathname, len, ignore_case)) return NULL; ALLOC_GROW(dir->entries, dir->nr+1, dir->alloc); @@ -885,11 +886,11 @@ enum exist_status { /* * Do not use the alphabetically sorted index to look up * the directory name; instead, use the case insensitive - * name hash. + * directory hash. */ static enum exist_status directory_exists_in_index_icase(const char *dirname, int len) { - const struct cache_entry *ce = cache_name_exists(dirname, len + 1, ignore_case); + const struct cache_entry *ce = cache_dir_exists(dirname, len + 1); unsigned char endchar; if (!ce) diff --git a/name-hash.c b/name-hash.c index 617c86c..5b01554 100644 --- a/name-hash.c +++ b/name-hash.c @@ -222,11 +222,35 @@ static int same_name(const struct cache_entry *ce, const char *name, int namelen return slow_same_name(name, namelen, ce->name, len); } +struct cache_entry *index_dir_exists(struct index_state *istate, const char *name, int namelen) +{ + struct cache_entry *ce; + struct dir_entry *dir; + + lazy_init_name_hash(istate); + dir = find_dir_entry(istate, name, namelen); + if (dir && dir->nr) + return dir->ce; + + /* + * It might be a submodule. Unlike plain directories, which are stored + * in the dir-hash, submodules are stored in the name-hash, so check + * there, as well. + */ + ce = index_name_exists(istate, name, namelen - 1, 1); + if (ce && S_ISGITLINK(ce->ce_mode)) + return ce; + + return NULL; +} + struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int icase) { unsigned int hash = hash_name(name, namelen); struct cache_entry *ce; + assert(namelen == 0 || name[namelen - 1] != '/'); + lazy_init_name_hash(istate); ce = lookup_hash(hash, &istate->name_hash); @@ -237,29 +261,6 @@ struct cache_entry *index_name_exists(struct index_state *istate, const char *na } ce = ce->next; } - - /* - * When looking for a directory (trailing '/'), it might be a - * submodule or a directory. Despite submodules being directories, - * they are stored in the name hash without a closing slash. - * When ignore_case is 1, directories are stored in a separate hash - * table *with* their closing slash. - * - * The side effect of this storage technique is we have need to - * lookup the directory in a separate hash table, and if not found - * remove the slash from name and perform the lookup again without - * the slash. If a match is made, S_ISGITLINK(ce->mode) will be - * true. - */ - if (icase && name[namelen - 1] == '/') { - struct dir_entry *dir = find_dir_entry(istate, name, namelen); - if (dir && dir->nr) - return dir->ce; - - ce = index_name_exists(istate, name, namelen - 1, icase); - if (ce && S_ISGITLINK(ce->ce_mode)) - return ce; - } return NULL; } diff --git a/read-cache.c b/read-cache.c index 885943a..a59644a 100644 --- a/read-cache.c +++ b/read-cache.c @@ -643,7 +643,7 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st, if (*ptr == '/') { struct cache_entry *foundce; ++ptr; - foundce = index_name_exists(istate, ce->name, ptr - ce->name, ignore_case); + foundce = index_dir_exists(istate, ce->name, ptr - ce->name); if (foundce) { memcpy((void *)startPtr, foundce->name + (startPtr - ce->name), ptr - startPtr); startPtr = ptr; -- 1.8.4.457.g424cb08 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3] name-hash: refactor polymorphic index_name_exists() 2013-09-13 11:15 ` [PATCH 1/3] name-hash: refactor polymorphic index_name_exists() Eric Sunshine @ 2013-09-13 18:40 ` Junio C Hamano 2013-09-13 19:29 ` Eric Sunshine 0 siblings, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2013-09-13 18:40 UTC (permalink / raw) To: Eric Sunshine; +Cc: git, Jeff King, Brian Gernhardt, Jonathan Nieder Eric Sunshine <sunshine@sunshineco.com> writes: > Depending upon the absence or presence of a trailing '/' on the incoming > pathname, index_name_exists() checks either if a file is present in the > index or if a directory is represented within the index. Each caller > explicitly chooses the mode of operation by adding or removing a > trailing '/' before invoking index_name_exists(). > > Since these two modes of operations are disjoint and have no code in > common (one searches index_state.name_hash; the other dir_hash), they > can be represented more naturally as distinct functions: one to search > for a file, and one for a directory. > > Splitting index searching into two functions relieves callers of the > artificial burden of having to add or remove a slash to select the mode > of operation; instead they just call the desired function. A subsequent > patch will take advantage of this benefit in order to eliminate the > requirement that the incoming pathname for a directory search must have > a trailing slash. Good thinking. Thanks for working on this; I agree with the general direction this series is going. I however wonder if it would be a good idea to rename the other one to {cache|index}_file_exists(), and even keep the *_name_exists() variant that switches between the two based on the trailing slash, to avoid breaking other topics in flight that may have added new callsites to *_name_exists(). Otherwise the new callsites will feed a path with a trailing slash to *_name_exists() and get a false result, no? > Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> > --- > cache.h | 2 ++ > dir.c | 7 ++++--- > name-hash.c | 47 ++++++++++++++++++++++++----------------------- > read-cache.c | 2 +- > 4 files changed, 31 insertions(+), 27 deletions(-) > > diff --git a/cache.h b/cache.h > index 9ef778a..03ec24c 100644 > --- a/cache.h > +++ b/cache.h > @@ -314,6 +314,7 @@ extern void free_name_hash(struct index_state *istate); > #define refresh_cache(flags) refresh_index(&the_index, (flags), NULL, NULL, NULL) > #define ce_match_stat(ce, st, options) ie_match_stat(&the_index, (ce), (st), (options)) > #define ce_modified(ce, st, options) ie_modified(&the_index, (ce), (st), (options)) > +#define cache_dir_exists(name, namelen) index_dir_exists(&the_index, (name), (namelen)) > #define cache_name_exists(name, namelen, igncase) index_name_exists(&the_index, (name), (namelen), (igncase)) > #define cache_name_is_other(name, namelen) index_name_is_other(&the_index, (name), (namelen)) > #define resolve_undo_clear() resolve_undo_clear_index(&the_index) > @@ -463,6 +464,7 @@ extern int write_index(struct index_state *, int newfd); > extern int discard_index(struct index_state *); > extern int unmerged_index(const struct index_state *); > extern int verify_path(const char *path); > +extern struct cache_entry *index_dir_exists(struct index_state *istate, const char *name, int namelen); > extern struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int igncase); > extern int index_name_pos(const struct index_state *, const char *name, int namelen); > #define ADD_CACHE_OK_TO_ADD 1 /* Ok to add */ > diff --git a/dir.c b/dir.c > index b439ff0..0080673 100644 > --- a/dir.c > +++ b/dir.c > @@ -860,7 +860,8 @@ static struct dir_entry *dir_entry_new(const char *pathname, int len) > > static struct dir_entry *dir_add_name(struct dir_struct *dir, const char *pathname, int len) > { > - if (cache_name_exists(pathname, len, ignore_case)) > + if ((len == 0 || pathname[len - 1] != '/') && > + cache_name_exists(pathname, len, ignore_case)) > return NULL; > > ALLOC_GROW(dir->entries, dir->nr+1, dir->alloc); > @@ -885,11 +886,11 @@ enum exist_status { > /* > * Do not use the alphabetically sorted index to look up > * the directory name; instead, use the case insensitive > - * name hash. > + * directory hash. > */ > static enum exist_status directory_exists_in_index_icase(const char *dirname, int len) > { > - const struct cache_entry *ce = cache_name_exists(dirname, len + 1, ignore_case); > + const struct cache_entry *ce = cache_dir_exists(dirname, len + 1); > unsigned char endchar; > > if (!ce) > diff --git a/name-hash.c b/name-hash.c > index 617c86c..5b01554 100644 > --- a/name-hash.c > +++ b/name-hash.c > @@ -222,11 +222,35 @@ static int same_name(const struct cache_entry *ce, const char *name, int namelen > return slow_same_name(name, namelen, ce->name, len); > } > > +struct cache_entry *index_dir_exists(struct index_state *istate, const char *name, int namelen) > +{ > + struct cache_entry *ce; > + struct dir_entry *dir; > + > + lazy_init_name_hash(istate); > + dir = find_dir_entry(istate, name, namelen); > + if (dir && dir->nr) > + return dir->ce; > + > + /* > + * It might be a submodule. Unlike plain directories, which are stored > + * in the dir-hash, submodules are stored in the name-hash, so check > + * there, as well. > + */ > + ce = index_name_exists(istate, name, namelen - 1, 1); > + if (ce && S_ISGITLINK(ce->ce_mode)) > + return ce; > + > + return NULL; > +} > + > struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int icase) > { > unsigned int hash = hash_name(name, namelen); > struct cache_entry *ce; > > + assert(namelen == 0 || name[namelen - 1] != '/'); > + > lazy_init_name_hash(istate); > ce = lookup_hash(hash, &istate->name_hash); > > @@ -237,29 +261,6 @@ struct cache_entry *index_name_exists(struct index_state *istate, const char *na > } > ce = ce->next; > } > - > - /* > - * When looking for a directory (trailing '/'), it might be a > - * submodule or a directory. Despite submodules being directories, > - * they are stored in the name hash without a closing slash. > - * When ignore_case is 1, directories are stored in a separate hash > - * table *with* their closing slash. > - * > - * The side effect of this storage technique is we have need to > - * lookup the directory in a separate hash table, and if not found > - * remove the slash from name and perform the lookup again without > - * the slash. If a match is made, S_ISGITLINK(ce->mode) will be > - * true. > - */ > - if (icase && name[namelen - 1] == '/') { > - struct dir_entry *dir = find_dir_entry(istate, name, namelen); > - if (dir && dir->nr) > - return dir->ce; > - > - ce = index_name_exists(istate, name, namelen - 1, icase); > - if (ce && S_ISGITLINK(ce->ce_mode)) > - return ce; > - } > return NULL; > } > > diff --git a/read-cache.c b/read-cache.c > index 885943a..a59644a 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -643,7 +643,7 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st, > if (*ptr == '/') { > struct cache_entry *foundce; > ++ptr; > - foundce = index_name_exists(istate, ce->name, ptr - ce->name, ignore_case); > + foundce = index_dir_exists(istate, ce->name, ptr - ce->name); > if (foundce) { > memcpy((void *)startPtr, foundce->name + (startPtr - ce->name), ptr - startPtr); > startPtr = ptr; ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3] name-hash: refactor polymorphic index_name_exists() 2013-09-13 18:40 ` Junio C Hamano @ 2013-09-13 19:29 ` Eric Sunshine [not found] ` <CAPc5daVtDByrA6yakk_1fq9g5Hv3naNDzEho5G4Ghxc6jzpawg@mail.gmail.com> 0 siblings, 1 reply; 9+ messages in thread From: Eric Sunshine @ 2013-09-13 19:29 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git List, Jeff King, Brian Gernhardt, Jonathan Nieder On Fri, Sep 13, 2013 at 2:40 PM, Junio C Hamano <gitster@pobox.com> wrote: > Eric Sunshine <sunshine@sunshineco.com> writes: > >> Depending upon the absence or presence of a trailing '/' on the incoming >> pathname, index_name_exists() checks either if a file is present in the >> index or if a directory is represented within the index. Each caller >> explicitly chooses the mode of operation by adding or removing a >> trailing '/' before invoking index_name_exists(). >> >> Since these two modes of operations are disjoint and have no code in >> common (one searches index_state.name_hash; the other dir_hash), they >> can be represented more naturally as distinct functions: one to search >> for a file, and one for a directory. >> >> Splitting index searching into two functions relieves callers of the >> artificial burden of having to add or remove a slash to select the mode >> of operation; instead they just call the desired function. A subsequent >> patch will take advantage of this benefit in order to eliminate the >> requirement that the incoming pathname for a directory search must have >> a trailing slash. > > Good thinking. Thanks for working on this; I agree with the general > direction this series is going. > > I however wonder if it would be a good idea to rename the other one > to {cache|index}_file_exists(), and even keep the *_name_exists() > variant that switches between the two based on the trailing slash, > to avoid breaking other topics in flight that may have added new > callsites to *_name_exists(). Otherwise the new callsites will feed > a path with a trailing slash to *_name_exists() and get a false > result, no? An assert("no trailing /") was added to index_name_exists() precisely for the purpose of catching cases of code incorrectly calling index_name_exists() to search for a directory. No existing code in 'master' trips the assertion (at least according to the tests), however, the assertion may be annoying for topics in flight which do. Other than plopping these patches atop 'pu' and seeing if anything breaks, what would be the "git way" of detecting in-flight topics which add callers of index_name_exists()? (Excuse my git ignorance.) ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <CAPc5daVtDByrA6yakk_1fq9g5Hv3naNDzEho5G4Ghxc6jzpawg@mail.gmail.com>]
* Re: [PATCH 1/3] name-hash: refactor polymorphic index_name_exists() [not found] ` <CAPc5daVtDByrA6yakk_1fq9g5Hv3naNDzEho5G4Ghxc6jzpawg@mail.gmail.com> @ 2013-09-13 21:44 ` Eric Sunshine 2013-09-13 22:16 ` Junio C Hamano 0 siblings, 1 reply; 9+ messages in thread From: Eric Sunshine @ 2013-09-13 21:44 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git List, Jeff King, Brian Gernhardt, Jonathan Nieder On Fri, Sep 13, 2013 at 3:41 PM, Junio C Hamano <gitster@pobox.com> wrote: > On Fri, Sep 13, 2013 at 12:29 PM, Eric Sunshine <sunshine@sunshineco.com> wrote: >> On Fri, Sep 13, 2013 at 2:40 PM, Junio C Hamano <gitster@pobox.com> wrote: >>> Eric Sunshine <sunshine@sunshineco.com> writes: >>> >>>> Since these two modes of operations are disjoint and have no code in >>>> common (one searches index_state.name_hash; the other dir_hash), they >>>> can be represented more naturally as distinct functions: one to search >>>> for a file, and one for a directory. >>> >>> Good thinking. Thanks for working on this; I agree with the general >>> direction this series is going. >>> >>> I however wonder if it would be a good idea to rename the other one >>> to {cache|index}_file_exists(), and even keep the *_name_exists() >>> variant that switches between the two based on the trailing slash, >>> to avoid breaking other topics in flight that may have added new >>> callsites to *_name_exists(). Otherwise the new callsites will feed >>> a path with a trailing slash to *_name_exists() and get a false >>> result, no? >> >> An assert("no trailing /") was added to index_name_exists() precisely >> for the purpose of catching cases of code incorrectly calling >> index_name_exists() to search for a directory. No existing code in >> 'master' trips the assertion (at least according to the tests), >> however, the assertion may be annoying for topics in flight which do. >> >> Other than plopping these patches atop 'pu' and seeing if anything >> breaks, what would be the "git way" of detecting in-flight topics >> which add callers of index_name_exists()? (Excuse my git ignorance.) > > I would do a quick > > $ git diff master..pu | grep '^+.*_name_exists' > > which would be more conservative (it would also show an invocation > moved from one place to another). There appear to be no topics in flight which add new index_name_exists() callers (which is not to say that no new callers will be introduced before this topic lands in 'master'). I also plopped the patches atop 'pu' and there are no new tests failures on Linux or Mac OS X, so the series does not seem to break anything in flight. (There are a few test failures on 'pu' on Mac OS X, but they are not related to this series.) Given the above. How should I proceed? Do you still feel that it is advisable to keep an index_name_exists() around for compatibility reasons in case any new callers are introduced? Regardless of that answer, do you want index_name_exists() renamed to index_file_exists()? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3] name-hash: refactor polymorphic index_name_exists() 2013-09-13 21:44 ` Eric Sunshine @ 2013-09-13 22:16 ` Junio C Hamano 2013-09-13 22:20 ` Eric Sunshine 0 siblings, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2013-09-13 22:16 UTC (permalink / raw) To: Eric Sunshine; +Cc: Git List, Jeff King, Brian Gernhardt, Jonathan Nieder Eric Sunshine <sunshine@sunshineco.com> writes: > Given the above. How should I proceed? Do you still feel that it is > advisable to keep an index_name_exists() around for compatibility > reasons in case any new callers are introduced? Regardless of that > answer, do you want index_name_exists() renamed to > index_file_exists()? Renaming *_name_exists() to *_file_exists() without keeping a compatibility one will force new topics to be rebased on this series. Alternatively we could merge them to 'pu' (and later 'next' and 'master') with evil merges to adjust the change in the semantics of the called function. That increases the risk of accidental breakages, I think. It is safer to keep index_name_exists() around with the older semantics, if we can, and rename your "file only" one to a different name. That way, even if a new topic still uses index_name_exists() expecting the traditional behaviour, it will not break immediately and we do not need to risk evil merges making mistakes. Later, we can "git grep _name_exists" to spot them and convert such old-style calls to either "directory only" or "file only" variants after this series and these follow-on topics hit 'master' (and we do not know at this point in what order that happens). Hmm? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3] name-hash: refactor polymorphic index_name_exists() 2013-09-13 22:16 ` Junio C Hamano @ 2013-09-13 22:20 ` Eric Sunshine 0 siblings, 0 replies; 9+ messages in thread From: Eric Sunshine @ 2013-09-13 22:20 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git List, Jeff King, Brian Gernhardt, Jonathan Nieder On Fri, Sep 13, 2013 at 6:16 PM, Junio C Hamano <gitster@pobox.com> wrote: > Eric Sunshine <sunshine@sunshineco.com> writes: > >> Given the above. How should I proceed? Do you still feel that it is >> advisable to keep an index_name_exists() around for compatibility >> reasons in case any new callers are introduced? Regardless of that >> answer, do you want index_name_exists() renamed to >> index_file_exists()? > > Renaming *_name_exists() to *_file_exists() without keeping a > compatibility one will force new topics to be rebased on this > series. Alternatively we could merge them to 'pu' (and later 'next' > and 'master') with evil merges to adjust the change in the semantics > of the called function. That increases the risk of accidental > breakages, I think. > > It is safer to keep index_name_exists() around with the older > semantics, if we can, and rename your "file only" one to a different > name. That way, even if a new topic still uses index_name_exists() > expecting the traditional behaviour, it will not break immediately > and we do not need to risk evil merges making mistakes. > > Later, we can "git grep _name_exists" to spot them and convert such > old-style calls to either "directory only" or "file only" variants > after this series and these follow-on topics hit 'master' (and we do > not know at this point in what order that happens). Thanks. That's what I needed to know. I'll re-roll with the suggested changes. (And, I'm looking into the Mac-only test breakages not related to this patch series.) ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/3] name-hash: stop storing trailing '/' on paths in index_state.dir_hash 2013-09-13 11:15 [PATCH 0/3] stop storing trailing slash in dir-hash Eric Sunshine 2013-09-13 11:15 ` [PATCH 1/3] name-hash: refactor polymorphic index_name_exists() Eric Sunshine @ 2013-09-13 11:15 ` Eric Sunshine 2013-09-13 11:15 ` [PATCH 3/3] dir: revert work-around for retired dangerous behavior Eric Sunshine 2 siblings, 0 replies; 9+ messages in thread From: Eric Sunshine @ 2013-09-13 11:15 UTC (permalink / raw) To: git Cc: Eric Sunshine, Junio C Hamano, Jeff King, Brian Gernhardt, Jonathan Nieder When 5102c617 (Add case insensitivity support for directories when using git status, 2010-10-03) added directories to the name-hash there was only a single hash table in which both real cache entries and leading directory prefixes were registered. To distinguish between the two types of entries, directories were stored with a trailing '/'. 2092678c (name-hash.c: fix endless loop with core.ignorecase=true, 2013-02-28), however, moved directories to a separate hash table (index_state.dir_hash) but retained the now-redundant trailing '/', thus callers still bear the burden of ensuring its presence before searching via index_dir_exists(). Eliminate this redundancy by storing paths in the dir-hash without the trailing '/'. An important benefit of this change is that it eliminates undocumented and dangerous behavior of dir.c:directory_exists_in_index_icase() in which it assumes not only that it can validly access one character beyond the end of its incoming directory argument, but also that that character will unconditionally be a '/'. This perilous behavior was "tolerated" because the string passed in by its lone caller always had a '/' in that position, however, things broke [1] when 2eac2a4c (ls-files -k: a directory only can be killed if the index has a non-directory, 2013-08-15) added a new caller which failed to respect the undocumented assumption. [1]: http://thread.gmane.org/gmane.comp.version-control.git/232727 Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> --- dir.c | 2 +- name-hash.c | 9 +++++---- read-cache.c | 2 +- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/dir.c b/dir.c index 0080673..69d04a0 100644 --- a/dir.c +++ b/dir.c @@ -890,7 +890,7 @@ enum exist_status { */ static enum exist_status directory_exists_in_index_icase(const char *dirname, int len) { - const struct cache_entry *ce = cache_dir_exists(dirname, len + 1); + const struct cache_entry *ce = cache_dir_exists(dirname, len); unsigned char endchar; if (!ce) diff --git a/name-hash.c b/name-hash.c index 5b01554..2bae75d 100644 --- a/name-hash.c +++ b/name-hash.c @@ -58,9 +58,9 @@ static struct dir_entry *hash_dir_entry(struct index_state *istate, { /* * Throw each directory component in the hash for quick lookup - * during a git status. Directory components are stored with their + * during a git status. Directory components are stored without their * closing slash. Despite submodules being a directory, they never - * reach this point, because they are stored without a closing slash + * reach this point, because they are stored * in index_state.name_hash (as ordinary cache_entries). * * Note that the cache_entry stored with the dir_entry merely @@ -78,6 +78,7 @@ static struct dir_entry *hash_dir_entry(struct index_state *istate, namelen--; if (namelen <= 0) return NULL; + namelen--; /* lookup existing entry for that directory */ dir = find_dir_entry(istate, ce->name, namelen); @@ -97,7 +98,7 @@ static struct dir_entry *hash_dir_entry(struct index_state *istate, } /* recursively add missing parent directories */ - dir->parent = hash_dir_entry(istate, ce, namelen - 1); + dir->parent = hash_dir_entry(istate, ce, namelen); } return dir; } @@ -237,7 +238,7 @@ struct cache_entry *index_dir_exists(struct index_state *istate, const char *nam * in the dir-hash, submodules are stored in the name-hash, so check * there, as well. */ - ce = index_name_exists(istate, name, namelen - 1, 1); + ce = index_name_exists(istate, name, namelen, 1); if (ce && S_ISGITLINK(ce->ce_mode)) return ce; diff --git a/read-cache.c b/read-cache.c index a59644a..8990b61 100644 --- a/read-cache.c +++ b/read-cache.c @@ -643,7 +643,7 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st, if (*ptr == '/') { struct cache_entry *foundce; ++ptr; - foundce = index_dir_exists(istate, ce->name, ptr - ce->name); + foundce = index_dir_exists(istate, ce->name, ptr - ce->name - 1); if (foundce) { memcpy((void *)startPtr, foundce->name + (startPtr - ce->name), ptr - startPtr); startPtr = ptr; -- 1.8.4.457.g424cb08 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/3] dir: revert work-around for retired dangerous behavior 2013-09-13 11:15 [PATCH 0/3] stop storing trailing slash in dir-hash Eric Sunshine 2013-09-13 11:15 ` [PATCH 1/3] name-hash: refactor polymorphic index_name_exists() Eric Sunshine 2013-09-13 11:15 ` [PATCH 2/3] name-hash: stop storing trailing '/' on paths in index_state.dir_hash Eric Sunshine @ 2013-09-13 11:15 ` Eric Sunshine 2 siblings, 0 replies; 9+ messages in thread From: Eric Sunshine @ 2013-09-13 11:15 UTC (permalink / raw) To: git Cc: Eric Sunshine, Junio C Hamano, Jeff King, Brian Gernhardt, Jonathan Nieder directory_exists_in_index_icase() dangerously assumed that it could access one character beyond the end of its directory argument, and that that character would unconditionally be '/'. 2eac2a4c (ls-files -k: a directory only can be killed if the index has a non-directory, 2013-08-15) added a caller which did not respect this undocumented assumption, and 680be044 (dir.c::test_one_path(): work around directory_exists_in_index_icase() breakage, 2013-08-23) added a work-around which temporarily appends a '/' before invoking directory_exists_in_index_icase(). Since the dangerous behavior of directory_exists_in_index_icase() has been eliminated, the work-around is now redundant, so retire it (but not the tests added by the same commit). Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> --- dir.c | 18 +++--------------- 1 file changed, 3 insertions(+), 15 deletions(-) diff --git a/dir.c b/dir.c index 69d04a0..ef95160 100644 --- a/dir.c +++ b/dir.c @@ -1161,21 +1161,9 @@ static enum path_treatment treat_one_path(struct dir_struct *dir, */ if ((dir->flags & DIR_COLLECT_KILLED_ONLY) && (dtype == DT_DIR) && - !has_path_in_index) { - /* - * NEEDSWORK: directory_exists_in_index_icase() - * assumes that one byte past the given path is - * readable and has '/', which needs to be fixed, but - * until then, work it around in the caller. - */ - strbuf_addch(path, '/'); - if (directory_exists_in_index(path->buf, path->len - 1) == - index_nonexistent) { - strbuf_setlen(path, path->len - 1); - return path_none; - } - strbuf_setlen(path, path->len - 1); - } + !has_path_in_index && + (directory_exists_in_index(path->buf, path->len) == index_nonexistent)) + return path_none; exclude = is_excluded(dir, path->buf, &dtype); -- 1.8.4.457.g424cb08 ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-09-13 22:20 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-09-13 11:15 [PATCH 0/3] stop storing trailing slash in dir-hash Eric Sunshine 2013-09-13 11:15 ` [PATCH 1/3] name-hash: refactor polymorphic index_name_exists() Eric Sunshine 2013-09-13 18:40 ` Junio C Hamano 2013-09-13 19:29 ` Eric Sunshine [not found] ` <CAPc5daVtDByrA6yakk_1fq9g5Hv3naNDzEho5G4Ghxc6jzpawg@mail.gmail.com> 2013-09-13 21:44 ` Eric Sunshine 2013-09-13 22:16 ` Junio C Hamano 2013-09-13 22:20 ` Eric Sunshine 2013-09-13 11:15 ` [PATCH 2/3] name-hash: stop storing trailing '/' on paths in index_state.dir_hash Eric Sunshine 2013-09-13 11:15 ` [PATCH 3/3] dir: revert work-around for retired dangerous behavior Eric Sunshine
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).