* [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area
From: Johannes Schindelin @ 2017-02-14 11:31 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff Hostetler
On Windows, calls to memihash() and maintaining the istate.name_hash and
istate.dir_hash HashMaps take significant time on very large
repositories. This series of changes reduces the overall time taken for
various operations by reducing the number calls to memihash(), moving
some of them into multi-threaded code, and etc.
Note: one commenter in https://github.com/git-for-windows/git/pull/964
pointed out that memihash() only handles ASCII correctly. That is true.
And fixing this is outside the purview of this patch series.
[jes: renamed and reformatted a few places, and replaced global
constants by 1-bit fields, in the hopes to make the contribution a
smoother ride.]
Jeff Hostetler (5):
name-hash: eliminate duplicate memihash call
hashmap: allow memihash computation to be continued
name-hash: precompute hash values during preload-index
name-hash: specify initial size for istate.dir_hash table
name-hash: remember previous dir_entry during lazy_init_name_hash
cache.h | 6 +++
hashmap.c | 14 +++++++
hashmap.h | 2 +
name-hash.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++++--------
preload-index.c | 2 +
read-cache.c | 3 ++
6 files changed, 127 insertions(+), 16 deletions(-)
base-commit: 5588dbffbd61e4906e453808c6ad32f792fea521
Published-As: https://github.com/dscho/git/releases/tag/memihash-perf-v1
Fetch-It-Via: git fetch https://github.com/dscho/git memihash-perf-v1
--
2.11.1.windows.1
^ permalink raw reply
* [PATCH 3/5] name-hash: precompute hash values during preload-index
From: Johannes Schindelin @ 2017-02-14 11:32 UTC (permalink / raw)
To: git; +Cc: Jeff Hostetler, Junio C Hamano
In-Reply-To: <cover.1487071883.git.johannes.schindelin@gmx.de>
From: Jeff Hostetler <jeffhost@microsoft.com>
Precompute the istate.name_hash and istate.dir_hash values
for each cache-entry during the preload-index phase.
Move the expensive memihash() calculations from lazy_init_name_hash()
to the multi-threaded preload-index phase.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
cache.h | 6 ++++++
name-hash.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
preload-index.c | 2 ++
read-cache.c | 3 +++
4 files changed, 66 insertions(+), 2 deletions(-)
diff --git a/cache.h b/cache.h
index 61fc86e6d71..56f7c97cdbe 100644
--- a/cache.h
+++ b/cache.h
@@ -173,6 +173,10 @@ struct cache_entry {
unsigned int ce_flags;
unsigned int ce_namelen;
unsigned int index; /* for link extension */
+ struct {
+ unsigned initialized:1, root_entry:1;
+ unsigned int name, dir;
+ } precomputed_hash;
struct object_id oid;
char name[FLEX_ARRAY]; /* more */
};
@@ -229,6 +233,8 @@ struct cache_entry {
#error "CE_EXTENDED_FLAGS out of range"
#endif
+void precompute_istate_hashes(struct cache_entry *ce);
+
/* Forward structure decls */
struct pathspec;
struct child_process;
diff --git a/name-hash.c b/name-hash.c
index ad0bc0cef73..49eb84846df 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -50,6 +50,10 @@ static struct dir_entry *hash_dir_entry(struct index_state *istate,
*/
struct dir_entry *dir;
unsigned int hash;
+ int orig_namelen = namelen;
+
+ if (ce->precomputed_hash.initialized && ce->precomputed_hash.root_entry)
+ return NULL; /* item does not have a parent directory */
/* get length of parent directory */
while (namelen > 0 && !is_dir_sep(ce->name[namelen - 1]))
@@ -59,7 +63,10 @@ static struct dir_entry *hash_dir_entry(struct index_state *istate,
namelen--;
/* lookup existing entry for that directory */
- hash = memihash(ce->name, namelen);
+ if (ce->precomputed_hash.initialized && orig_namelen == ce_namelen(ce))
+ hash = ce->precomputed_hash.dir;
+ else
+ hash = memihash(ce->name, namelen);
dir = find_dir_entry_1(istate, ce->name, namelen, hash);
if (!dir) {
/* not found, create it and add to hash table */
@@ -99,10 +106,18 @@ static void remove_dir_entry(struct index_state *istate, struct cache_entry *ce)
static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
{
+ unsigned int h;
+
if (ce->ce_flags & CE_HASHED)
return;
ce->ce_flags |= CE_HASHED;
- hashmap_entry_init(ce, memihash(ce->name, ce_namelen(ce)));
+
+ if (ce->precomputed_hash.initialized)
+ h = ce->precomputed_hash.name;
+ else
+ h = memihash(ce->name, ce_namelen(ce));
+
+ hashmap_entry_init(ce, h);
hashmap_add(&istate->name_hash, ce);
if (ignore_case)
@@ -244,3 +259,41 @@ void free_name_hash(struct index_state *istate)
hashmap_free(&istate->name_hash, 0);
hashmap_free(&istate->dir_hash, 1);
}
+
+/*
+ * Precompute the hash values for this cache_entry
+ * for use in the istate.name_hash and istate.dir_hash.
+ *
+ * If the item is in the root directory, just compute the hash value (for
+ * istate.name_hash) on the full path.
+ *
+ * If the item is in a subdirectory, first compute the hash value for the
+ * immediate parent directory (for istate.dir_hash) and then the hash value for
+ * the full path by continuing the computation.
+ *
+ * Note that these hashes will be used by wt_status_collect_untracked() as it
+ * scans the worktree and maps observed paths back to the index (optionally
+ * ignoring case). Technically, we only *NEED* to precompute this for
+ * non-skip-worktree items (since status should not observe skipped items), but
+ * because lazy_init_name_hash() hashes everything, we force it here.
+ */
+void precompute_istate_hashes(struct cache_entry *ce)
+{
+ int namelen = ce_namelen(ce);
+
+ while (namelen > 0 && !is_dir_sep(ce->name[namelen - 1]))
+ namelen--;
+
+ if (namelen <= 0) {
+ ce->precomputed_hash.name = memihash(ce->name, ce_namelen(ce));
+ ce->precomputed_hash.root_entry = 1;
+ } else {
+ namelen--;
+ ce->precomputed_hash.dir = memihash(ce->name, namelen);
+ ce->precomputed_hash.name = memihash_continue(
+ ce->precomputed_hash.dir, ce->name + namelen,
+ ce_namelen(ce) - namelen);
+ ce->precomputed_hash.root_entry = 0;
+ }
+ ce->precomputed_hash.initialized = 1;
+}
diff --git a/preload-index.c b/preload-index.c
index c1fe3a3ef9c..602737f9d0f 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -47,6 +47,8 @@ static void *preload_thread(void *_data)
struct cache_entry *ce = *cep++;
struct stat st;
+ precompute_istate_hashes(ce);
+
if (ce_stage(ce))
continue;
if (S_ISGITLINK(ce->ce_mode))
diff --git a/read-cache.c b/read-cache.c
index 9054369dd0c..1a25c345441 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -73,6 +73,7 @@ void rename_index_entry_at(struct index_state *istate, int nr, const char *new_n
copy_cache_entry(new, old);
new->ce_flags &= ~CE_HASHED;
new->ce_namelen = namelen;
+ new->precomputed_hash.initialized = 0;
new->index = 0;
memcpy(new->name, new_name, namelen + 1);
@@ -614,6 +615,7 @@ static struct cache_entry *create_alias_ce(struct index_state *istate,
new = xcalloc(1, cache_entry_size(len));
memcpy(new->name, alias->name, len);
copy_cache_entry(new, ce);
+ new->precomputed_hash.initialized = 0;
save_or_free_index_entry(istate, ce);
return new;
}
@@ -1446,6 +1448,7 @@ static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *on
ce->ce_stat_data.sd_size = get_be32(&ondisk->size);
ce->ce_flags = flags & ~CE_NAMEMASK;
ce->ce_namelen = len;
+ ce->precomputed_hash.initialized = 0;
ce->index = 0;
hashcpy(ce->oid.hash, ondisk->sha1);
memcpy(ce->name, name, len);
--
2.11.1.windows.1
^ permalink raw reply related
* [PATCH 5/5] name-hash: remember previous dir_entry during lazy_init_name_hash
From: Johannes Schindelin @ 2017-02-14 11:32 UTC (permalink / raw)
To: git; +Cc: Jeff Hostetler, Junio C Hamano
In-Reply-To: <cover.1487071883.git.johannes.schindelin@gmx.de>
From: Jeff Hostetler <jeffhost@microsoft.com>
Teach hash_dir_entry() to remember the previously found dir_entry during
lazy_init_name_hash() iteration. This is a performance optimization.
Since items in the index array are sorted by full pathname, adjacent
items are likely to be in the same directory. This can save memihash()
computations and hash map lookups.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
name-hash.c | 50 ++++++++++++++++++++++++++++++++++++--------------
1 file changed, 36 insertions(+), 14 deletions(-)
diff --git a/name-hash.c b/name-hash.c
index 8f8336cc868..f95054f44cb 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -39,7 +39,8 @@ static struct dir_entry *find_dir_entry(struct index_state *istate,
}
static struct dir_entry *hash_dir_entry(struct index_state *istate,
- struct cache_entry *ce, int namelen)
+ struct cache_entry *ce, int namelen,
+ struct dir_entry **previous_dir)
{
/*
* Throw each directory component in the hash for quick lookup
@@ -63,11 +64,24 @@ static struct dir_entry *hash_dir_entry(struct index_state *istate,
namelen--;
/* lookup existing entry for that directory */
- if (ce->precomputed_hash.initialized && orig_namelen == ce_namelen(ce))
- hash = ce->precomputed_hash.dir;
- else
- hash = memihash(ce->name, namelen);
- dir = find_dir_entry_1(istate, ce->name, namelen, hash);
+ if (previous_dir && *previous_dir
+ && namelen == (*previous_dir)->namelen
+ && memcmp(ce->name, (*previous_dir)->name, namelen) == 0) {
+ /*
+ * When our caller is sequentially iterating through the index,
+ * items in the same directory will be sequential, and therefore
+ * refer to the same dir_entry.
+ */
+ dir = *previous_dir;
+ } else {
+ if (ce->precomputed_hash.initialized &&
+ orig_namelen == ce_namelen(ce))
+ hash = ce->precomputed_hash.dir;
+ else
+ hash = memihash(ce->name, namelen);
+ dir = find_dir_entry_1(istate, ce->name, namelen, hash);
+ }
+
if (!dir) {
/* not found, create it and add to hash table */
FLEX_ALLOC_MEM(dir, name, ce->name, namelen);
@@ -76,15 +90,21 @@ static struct dir_entry *hash_dir_entry(struct index_state *istate,
hashmap_add(&istate->dir_hash, dir);
/* recursively add missing parent directories */
- dir->parent = hash_dir_entry(istate, ce, namelen);
+ dir->parent = hash_dir_entry(istate, ce, namelen, NULL);
}
+
+ if (previous_dir)
+ *previous_dir = dir;
+
return dir;
}
-static void add_dir_entry(struct index_state *istate, struct cache_entry *ce)
+static void add_dir_entry(struct index_state *istate, struct cache_entry *ce,
+ struct dir_entry **previous_dir)
{
/* Add reference to the directory entry (and parents if 0). */
- struct dir_entry *dir = hash_dir_entry(istate, ce, ce_namelen(ce));
+ struct dir_entry *dir = hash_dir_entry(istate, ce, ce_namelen(ce),
+ previous_dir);
while (dir && !(dir->nr++))
dir = dir->parent;
}
@@ -95,7 +115,7 @@ static void remove_dir_entry(struct index_state *istate, struct cache_entry *ce)
* Release reference to the directory entry. If 0, remove and continue
* with parent directory.
*/
- struct dir_entry *dir = hash_dir_entry(istate, ce, ce_namelen(ce));
+ struct dir_entry *dir = hash_dir_entry(istate, ce, ce_namelen(ce), NULL);
while (dir && !(--dir->nr)) {
struct dir_entry *parent = dir->parent;
hashmap_remove(&istate->dir_hash, dir, NULL);
@@ -104,7 +124,8 @@ static void remove_dir_entry(struct index_state *istate, struct cache_entry *ce)
}
}
-static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
+static void hash_index_entry(struct index_state *istate, struct cache_entry *ce,
+ struct dir_entry **previous_dir)
{
unsigned int h;
@@ -121,7 +142,7 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
hashmap_add(&istate->name_hash, ce);
if (ignore_case)
- add_dir_entry(istate, ce);
+ add_dir_entry(istate, ce, previous_dir);
}
static int cache_entry_cmp(const struct cache_entry *ce1,
@@ -137,6 +158,7 @@ static int cache_entry_cmp(const struct cache_entry *ce1,
static void lazy_init_name_hash(struct index_state *istate)
{
+ struct dir_entry *previous_dir = NULL;
int nr;
if (istate->name_hash_initialized)
@@ -146,14 +168,14 @@ static void lazy_init_name_hash(struct index_state *istate)
hashmap_init(&istate->dir_hash, (hashmap_cmp_fn) dir_entry_cmp,
istate->cache_nr);
for (nr = 0; nr < istate->cache_nr; nr++)
- hash_index_entry(istate, istate->cache[nr]);
+ hash_index_entry(istate, istate->cache[nr], &previous_dir);
istate->name_hash_initialized = 1;
}
void add_name_hash(struct index_state *istate, struct cache_entry *ce)
{
if (istate->name_hash_initialized)
- hash_index_entry(istate, ce);
+ hash_index_entry(istate, ce, NULL);
}
void remove_name_hash(struct index_state *istate, struct cache_entry *ce)
--
2.11.1.windows.1
^ permalink raw reply related
* Re: [PATCH 11/11] refs: split and make get_*_ref_store() public API
From: Duy Nguyen @ 2017-02-14 10:04 UTC (permalink / raw)
To: Stefan Beller
Cc: git@vger.kernel.org, Junio C Hamano, Michael Haggerty,
Johannes Schindelin, David Turner
In-Reply-To: <CAGZ79kZC6TntQrW7MF6-h5z5En-u6rwNX=zuaHRNDpbO80ALHA@mail.gmail.com>
On Tue, Feb 14, 2017 at 6:55 AM, Stefan Beller <sbeller@google.com> wrote:
>>
>> +/*
>> + * Return the ref_store instance for the specified submodule. For the
>> + * main repository, use submodule==NULL; such a call cannot fail.
>
> So now we have both a get_main as well as a get_submodule function,
> but the submodule function can return the main as well?
>
> I'd rather see this as a BUG; or asking another way:
> What is the difference between get_submodule_ref_store(NULL)
> and get_main_ref_store() ?
Technical debts :) In order to do that, we need to make sure
_submodule() never takes NULL as main repo. But current code does. On
example is revision.c where submodule==NULL is the main repo. In the
end, I agree that get_submodule_ref_store(NULL) should be a bug.
> As you went through all call sites (by renaming the function), we'd
> be able to tell that there is no caller with NULL, or is it?
Direct call sites are just middle men though, e.g.
for_each_ref_submodule(). I'll attempt to clean this up at some point
in future (probably at the same time I attempt to kill *_submodule ref
api). But I think for now I'll just put a TODO or FIXME comment here.
--
Duy
^ permalink raw reply
* [PATCH v2] clean: use warning_errno() when appropriate
From: Nguyễn Thái Ngọc Duy @ 2017-02-14 9:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy
In-Reply-To: <20170213092702.10462-1-pclouds@gmail.com>
All these warning() calls are preceded by a system call. Report the
actual error to help the user understand why we fail to remove
something.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
v2 dances with errno
builtin/clean.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/builtin/clean.c b/builtin/clean.c
index d6bc3aaae..3569736f6 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -154,6 +154,7 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
struct strbuf quoted = STRBUF_INIT;
struct dirent *e;
int res = 0, ret = 0, gone = 1, original_len = path->len, len;
+ int saved_errno;
struct string_list dels = STRING_LIST_INIT_DUP;
*dir_gone = 1;
@@ -173,9 +174,11 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
if (!dir) {
/* an empty dir could be removed even if it is unreadble */
res = dry_run ? 0 : rmdir(path->buf);
+ saved_errno = errno;
if (res) {
quote_path_relative(path->buf, prefix, "ed);
- warning(_(msg_warn_remove_failed), quoted.buf);
+ errno = saved_errno;
+ warning_errno(_(msg_warn_remove_failed), quoted.buf);
*dir_gone = 0;
}
return res;
@@ -204,12 +207,14 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
continue;
} else {
res = dry_run ? 0 : unlink(path->buf);
+ saved_errno = errno;
if (!res) {
quote_path_relative(path->buf, prefix, "ed);
string_list_append(&dels, quoted.buf);
} else {
quote_path_relative(path->buf, prefix, "ed);
- warning(_(msg_warn_remove_failed), quoted.buf);
+ errno = saved_errno;
+ warning_errno(_(msg_warn_remove_failed), quoted.buf);
*dir_gone = 0;
ret = 1;
}
@@ -227,11 +232,13 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
if (*dir_gone) {
res = dry_run ? 0 : rmdir(path->buf);
+ saved_errno = errno;
if (!res)
*dir_gone = 1;
else {
quote_path_relative(path->buf, prefix, "ed);
- warning(_(msg_warn_remove_failed), quoted.buf);
+ errno = saved_errno;
+ warning_errno(_(msg_warn_remove_failed), quoted.buf);
*dir_gone = 0;
ret = 1;
}
@@ -853,7 +860,7 @@ static void interactive_main_loop(void)
int cmd_clean(int argc, const char **argv, const char *prefix)
{
- int i, res;
+ int i, res, saved_errno;
int dry_run = 0, remove_directories = 0, quiet = 0, ignored = 0;
int ignored_only = 0, config_set = 0, errors = 0, gone = 1;
int rm_flags = REMOVE_DIR_KEEP_NESTED_GIT;
@@ -980,9 +987,11 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
}
} else {
res = dry_run ? 0 : unlink(abs_path.buf);
+ saved_errno = errno;
if (res) {
qname = quote_path_relative(item->string, NULL, &buf);
- warning(_(msg_warn_remove_failed), qname);
+ errno = saved_errno;
+ warning_errno(_(msg_warn_remove_failed), qname);
errors++;
} else if (!quiet) {
qname = quote_path_relative(item->string, NULL, &buf);
--
2.11.0.157.gd943d85
^ permalink raw reply related
* Re: [PATCH 04/11] files-backend: replace *git_path*() with files_path()
From: Duy Nguyen @ 2017-02-14 9:43 UTC (permalink / raw)
To: Ramsay Jones
Cc: Git Mailing List, Junio C Hamano, Michael Haggerty,
Johannes Schindelin, Stefan Beller, David Turner
In-Reply-To: <8f6e09aa-f578-6faf-6acc-e35be71ee990@ramsayjones.plus.com>
On Tue, Feb 14, 2017 at 3:58 AM, Ramsay Jones
<ramsay@ramsayjones.plus.com> wrote:
>> -void files_path(struct files_ref_store *refs, struct strbuf *sb,
>> - const char *fmt, ...) __attribute__((format (printf, 3, 4)));
>
> You only added this in the last commit, so maybe mark it static in
> the previous patch! Also, just in case you were wondering, the 'Why?'
> of the previous email was, "Why do you need this forward declaration?"
> (hint: you don't ;-)
Yeah, my compiler considers unused static functions an error. But I
guess I should adjust those million compiler flags instead of doing it
like this. I'll need to check if it's an error with DEVELOPER_CFLAGS
because people may bisect with that.
--
Duy
^ permalink raw reply
* Re: [PATCH 06/11] refs-internal.h: correct is_per_worktree_ref()
From: Duy Nguyen @ 2017-02-14 9:40 UTC (permalink / raw)
To: Stefan Beller
Cc: git@vger.kernel.org, Junio C Hamano, Michael Haggerty,
Johannes Schindelin, David Turner
In-Reply-To: <CAGZ79kYkc-_=RiK1uJ+ndhQu=B8u=UDVusXZu-dYe7KnGNye3Q@mail.gmail.com>
On Tue, Feb 14, 2017 at 5:37 AM, Stefan Beller <sbeller@google.com> wrote:
> On Mon, Feb 13, 2017 at 7:20 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>> All refs outside refs/ directory is per-worktree, not just HEAD.
>>
>> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>> ---
>> refs/refs-internal.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
>> index f4aed49f5..69d02b6ba 100644
>> --- a/refs/refs-internal.h
>> +++ b/refs/refs-internal.h
>> @@ -653,7 +653,7 @@ const char *resolve_ref_recursively(struct ref_store *refs,
>>
>> static inline int is_per_worktree_ref(const char *refname)
>> {
>> - return !strcmp(refname, "HEAD") ||
>> + return !starts_with(refname, "refs/") ||
>> starts_with(refname, "refs/bisect/");
>
> you're loosing HEAD here? (assuming we get HEAD in
> short form here, as well as long form refs/HEAD)
I don't understand. if refname is HEAD then both !strcmp(...) and
!starts_with(refname, "refs/") return 1. If it's refs/HEAD, both
return 0. In other words, there's no functional changes?
--
Duy
^ permalink raw reply
* Re: [PATCH 07/11] files-backend: remove the use of git_path()
From: Duy Nguyen @ 2017-02-14 9:38 UTC (permalink / raw)
To: Stefan Beller
Cc: git@vger.kernel.org, Junio C Hamano, Michael Haggerty,
Johannes Schindelin, David Turner
In-Reply-To: <CAGZ79kbJXUY=UGvHtkeLLj-qMaoOyTwa2dr3-FqEdYi8eFs4LA@mail.gmail.com>
On Tue, Feb 14, 2017 at 6:09 AM, Stefan Beller <sbeller@google.com> wrote:
>> +
>> + if (submodule) {
>> + refs->submodule = xstrdup_or_null(submodule);
>
> drop the _or_null here?
>
> So in this patch we have either
> * submodule set
> * or gitdir/gitcommondir set
>
> which means that we exercise the commondir for regular repos.
> In the future when we want to be able to have a combination of worktrees
> and submodules this ought to be possible by setting submodule != NULL
> and still populating the gitdir/commondir buffers.
You probably have seen it by now. In the near future, submodule is
completely gone from here. We convert to a .git dir before we pass in
here. In a farther future, gitcommondir will be gone too with all the
per-worktree logic in this file. A linked worktree consists of two
backends actually, one per-worktree (which remains files-based), the
other for shared refs, which could be files, lmdb or whatever.
Stacking up submodule on top of a linked worktree should not be a
problem.
--
Duy
^ permalink raw reply
* Re: [PATCH 09/11] refs: move submodule code out of files-backend.c
From: Duy Nguyen @ 2017-02-14 9:32 UTC (permalink / raw)
To: Stefan Beller
Cc: git@vger.kernel.org, Junio C Hamano, Michael Haggerty,
Johannes Schindelin, David Turner
In-Reply-To: <CAGZ79kannQDWWYN5rHr6q-6=o1_ajeEjLn8Wzsc4RjqkfYOYwg@mail.gmail.com>
On Tue, Feb 14, 2017 at 6:35 AM, Stefan Beller <sbeller@google.com> wrote:
>> + git_dir = read_gitfile(buf.buf);
>
> if buf.buf is a (git) directory as opposed to a git file,
> we error out in read_gitfile. Did you mean to use
> read_gitfile_gently here or rather even resolve_gitdir_gently ?
This is what strbuf_git_path_submodule() does. I don't know the
backstory so I'm going to keep it as it is to keep the behavior
exactly (or very close) as before. We can replace it with a better
version (with explanation and all).
>> + if (git_dir) {
>
> when not using the _gently version git_dir is always
> non NULL here (or we're dead)?
>
>> + strbuf_reset(&buf);
>> + strbuf_addstr(&buf, git_dir);
>> + }
>> + if (!is_git_directory(buf.buf)) {
>> + gitmodules_config();
>> + sub = submodule_from_path(null_sha1, path);
>> + if (!sub)
>> + goto done;
>> + strbuf_reset(&buf);
>> + strbuf_git_path(&buf, "%s/%s", "modules", sub->name);
>
> You can inline "modules" into the format string?
Hm.. because this is strbuf_git_path_submodule() code. Perhaps it's
better to split it to a separate function and call it from here? Then
you can make more improvements on top that benefit everybody.
>> } else {
>> strbuf_addstr(&refs->gitdir, get_git_dir());
>> strbuf_addstr(&refs->gitcommondir, get_git_common_dir());
>> @@ -1034,8 +1025,6 @@ static struct ref_store *files_ref_store_create(const char *submodule)
>> static void files_assert_main_repository(struct files_ref_store *refs,
>> const char *caller)
>> {
>> - if (refs->submodule)
>> - die("BUG: %s called for a submodule", caller);
>> }
>
> In a followup we'd get rid of files_assert_main_repository
> presumably?
Yes. Can't delete it now because I would need to touch all callers.
--
Duy
^ permalink raw reply
* Re: [PATCH 01/11] refs-internal.c: make files_log_ref_write() static
From: Duy Nguyen @ 2017-02-14 9:23 UTC (permalink / raw)
To: Ramsay Jones
Cc: Git Mailing List, Junio C Hamano, Michael Haggerty,
Johannes Schindelin, Stefan Beller, David Turner
In-Reply-To: <6510f4b2-51bb-eabd-9cd7-30bc4164b25d@ramsayjones.plus.com>
On Tue, Feb 14, 2017 at 3:14 AM, Ramsay Jones
<ramsay@ramsayjones.plus.com> wrote:
>
>
> On 13/02/17 15:20, Nguyễn Thái Ngọc Duy wrote:
>> Created in 5f3c3a4e6f (files_log_ref_write: new function - 2015-11-10)
>> but probably never used outside refs-internal.c
>>
>> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>> ---
>> refs/files-backend.c | 3 +++
>> refs/refs-internal.h | 4 ----
>> 2 files changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/refs/files-backend.c b/refs/files-backend.c
>> index cdb6b8ff5..75565c3aa 100644
>> --- a/refs/files-backend.c
>> +++ b/refs/files-backend.c
>> @@ -165,6 +165,9 @@ static struct ref_entry *create_dir_entry(struct files_ref_store *ref_store,
>> const char *dirname, size_t len,
>> int incomplete);
>> static void add_entry_to_dir(struct ref_dir *dir, struct ref_entry *entry);
>> +static int files_log_ref_write(const char *refname, const unsigned char *old_sha1,
>> + const unsigned char *new_sha1, const char *msg,
>> + int flags, struct strbuf *err);
>
> Why? I don't like adding forward declarations unless it
> is absolutely necessary (ie mutually recursive functions),
> and even in the current 'pu' branch (@c04899d50), the
> definition of this function appears before all uses in
> this file. (ie, just add static to the definition).
>
> What am I missing?
It may have been needed at one point. With all the code changes and
movements, I guess I forgot to remove it.
--
Duy
^ permalink raw reply
* Re: Bug in 'git describe' if I have two tags on the same commit.
From: Istvan Pato @ 2017-02-14 8:07 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Kevin Daudt, git
In-Reply-To: <xmqqo9y5de1k.fsf@gitster.mtv.corp.google.com>
This is my fault: this is a lightweight tag.
Thank you!
2017-02-13 21:35 GMT+01:00 Junio C Hamano <gitster@pobox.com>:
> Kevin Daudt <me@ikke.info> writes:
>
>> On Sun, Feb 12, 2017 at 01:15:22PM +0100, Istvan Pato wrote:
>>
>>> (master) [1] % git show-ref --tag
>>> 76c634390... refs/tags/1.0.0
>>> b77c7cd17... refs/tags/1.1.0
>>> b77c7cd17... refs/tags/1.2.0
>>>
>>> (master) % git describe --tags --always
>>> 1.1.0-1-ge9e9ced
>>>
>>> ### Expected: 1.2.0
>>> ...
>>
>> Are these lightweight tags? Only annotated tags have a date associated
>> to them, which is where the rel-notes refers to.
>
> Good eyes. The fact that the two points at the same object means
> that even if they were both annotated tags, they have the same
> tagger dates.
>
> If the code that compares the candidates and picks better tag to
> describe the object with knows the refnames of these "tags", I'd
> imagine that we could use the versioncmp() logic as the final tie
> breaker, but I do not offhand remember if the original refname we
> took the tag (or commit) from is propagated that deep down the
> callchain. It probably does not, so some code refactoring may be
> needed if somebody wants to go in that direction.
>
>
>
>
>
^ permalink raw reply
* Re: Non-zero exit code without error
From: Serdar Sahin @ 2017-02-14 7:56 UTC (permalink / raw)
To: Christian Couder; +Cc: git
In-Reply-To: <BE964323A3E644BBB01F8672263419EA@peakgames.net>
Thanks. I know it is hard to get an idea without being able to
reproduce it. I think there is no alternative without debugging it
with GDB.
I’ve tried your suggestions and nothing has changed.
I’ve also update the GIT as you suggested, but it is still the same.
Just to see, if GIT server causes some issues, I’ve pushed to repo to
github public as a private repo, and can reproduce the issue there as
well.
Thanks for your support.
On Tue, Feb 14, 2017 at 10:53 AM, Serdar Sahin <serdar@peakgames.net> wrote:
> Hi Christian,
>
> Thanks. I know it is hard to get an idea without being able to reproduce it.
> I think there is no alternative without debugging it with GDB.
>
> I’ve tried your suggestions and nothing has changed.
>
> I’ve also update the GIT as you suggested, but it is still the same.
>
> Just to see, if GIT server causes some issues, I’ve pushed to repo to github
> public as a private repo, and can reproduce the issue there as well.
>
> Thanks for your support.
>
> --
> Serdar Sahin
> Peak Games
>
> On Saturday, 11 February 2017 at 15:28, Christian Couder wrote:
>
> On Wed, Feb 8, 2017 at 11:26 AM, Serdar Sahin <serdar@peakgames.net> wrote:
>
> Hi Christian,
>
>
> We are using a private repo (Github Enterprise).
>
>
> Maybe you could try 'git fast-export --anonymize ...' on it.
>
> Let me give you the
> details you requested.
>
>
> On Git Server: git version 2.6.5.1574.g5e6a493
>
> On my client: git version 2.10.1 (Apple Git-78)
>
>
> I’ve tried to reproduce it with public repos, but couldn’t do so.
>
>
> You might try using the latest released version (2.11.1).
>
> For example you could install the last version on the client and then
> clone from the server with --bare and use this bare repo as if it was
> the server.
>
> You could also try `git fsck` to see if there are problems on your repo.
>
> Are there submodules or something a bit special?
>
> In the end it's difficult for us to help if we cannot reproduce, so
> your best bet might be to debug yourself using gdb for example.
>
> If I
> could get an error/log output, that would be sufficient.
>
>
> I am also including the full output below. (also git gc)
>
>
> MacOSX:test serdar$ git clone --mirror --depth 50 --no-single-branch
> git@git.privateserver.com:Casual/code_repository.git
>
>
> You could also try with different options above...
>
> Cloning into bare repository 'code_repository.git'...
>
> remote: Counting objects: 3362, done.
>
> remote: Compressing objects: 100% (1214/1214), done.
>
> remote: Total 3362 (delta 2335), reused 2968 (delta 2094), pack-reused 0
>
> Receiving objects: 100% (3362/3362), 56.77 MiB | 1.83 MiB/s, done.
>
> Resolving deltas: 100% (2335/2335), done.
>
> MacOSX:test serdar$ cd code_repository.git/
>
> MacOSX:code_repository.git serdar$ GIT_CURL_VERBOSE=1 GIT_TRACE=1 git
> fetch --depth 50 origin cc086c96cdffe5c1ac78e6139a7a4b79e7c821ee
>
>
> ... and above.
>
> Also it looks like you use ssh so something like GIT_SSH_COMMAND="ssh
> -vv" might help more than GIT_CURL_VERBOSE=1
>
> 13:23:15.648337 git.c:350 trace: built-in: git 'fetch'
> '--depth' '50' 'origin' 'cc086c96cdffe5c1ac78e6139a7a4b79e7c821ee'
>
> 13:23:15.651127 run-command.c:336 trace: run_command: 'ssh'
> 'git@git.privateserver.com' 'git-upload-pack
> '\''Casual/code_repository.git'\'''
>
> 13:23:17.750015 run-command.c:336 trace: run_command: 'gc' '--auto'
>
> 13:23:17.750829 exec_cmd.c:189 trace: exec: 'git' 'gc' '--auto'
>
> 13:23:17.753983 git.c:350 trace: built-in: git 'gc' '--auto'
>
> MacOSX:code_repository.git serdar$ echo $?
>
> 1
>
> MacOSX:code_repository.git serdar$ GIT_CURL_VERBOSE=1 GIT_TRACE=1 git gc
> --auto
>
> 13:23:45.899038 git.c:350 trace: built-in: git 'gc' '--auto'
>
> MacOSX:code_repository.git serdar$ echo $?
>
> 0
>
>
^ permalink raw reply
* Re: Working with public-inbox.org [Was: [PATCH] rev-parse: respect core.hooksPath in --git-path]
From: Eric Wong @ 2017-02-14 7:13 UTC (permalink / raw)
To: Arif Khokar
Cc: Johannes Schindelin, Jakub Narębski, Jeff King,
Stefan Beller, meta, git
In-Reply-To: <7dbe0866-4a9b-7afe-8f51-ca1d5524d4a4@hotmail.com>
Arif Khokar <arif.i.khokar@gmail.com> wrote:
> On 02/13/2017 09:37 AM, Johannes Schindelin wrote:
> >I actually had expected *you* to put in a little bit of an effort, too. In
> >fact, I was very disappointed that you did not even look into porting that
> >script to use public-inbox instead of GMane.
>
> I wasn't aware of that expectation. My idea was to use NNTP as a way to
> facilitate the development of a new git utility that would serve as the
> inverse of git-send-email (sort of like the relationship between git
> format-patch and git am), rather than using a
Speaking for myself, I usually don't expect much, especially
from newcomers. So I am disappointed to see Dscho's disappointment
aimed at you, Arif. Especially since you're not a regular and
we have no idea how much free time, attention span, or familiarity
with Bourne shell you have.
> IIRC, I had posted some proof-of-concept Perl code to do so back in August
> in <DM5PR17MB1353B99EBD5F4FD23360DD41D3ED0@DM5PR17MB1353.namprd17.prod.outlook.com>
>
> Looking at public-inbox now at the archives of this group, it appears that
> several of the messages I sent weren't archived for some reason (and I
> didn't see any more responses to what I posted at the time). The messages
> are accessible via NNTP when connecting to gmane though.
It looks like it went to gmane via the meta@public-inbox.org to
gmane.mail.public-inbox.general mirror, not via the git@vger mirror.
I can't find it on git@vger's mail-archive.com mirror, either:
https://mail-archive.com/search?q=Arif+Khokar&l=git%40vger.kernel.org
> Also, looking at the source of the message I referenced, it appears that my
> MUA decided to base64 encode the message for some reason (which may have
> resulted in it getting filtered by those who I sent the message to).
It probably wasn't base64, but maybe it was one of these:
http://vger.kernel.org/majordomo-taboos.txt
Or it was the SPF softfail which you can see in the headers on both
gmane and public-inbox.
It might even be the '_' (underscore) in your other address.
But even Junio gets dropped by vger sometimes:
https://public-inbox.org/git/20170127035753.GA2604@dcvr/
But if I had to guess, vger gets hit by truckloads of spam and
the the backscatter volume could become unimaginable, so perhaps
it has good reason to discard silently.
Anyways, the eventual goal of public-inbox is to flip the
mailing list model backwards into "archives first" mode,
so a message needs to make it into public archives before
it goes out to subscribers. That might prevent or avoid
such problems... *shrug*
^ permalink raw reply
* Re: Developing git with IDE
From: Jeff King @ 2017-02-14 6:22 UTC (permalink / raw)
To: Oleg Taranenko; +Cc: git
In-Reply-To: <CABEd3j-kxA+ap7vqr85X-4HpQCvShPJUsS2Qq0BrMEK09BYS7A@mail.gmail.com>
On Tue, Feb 14, 2017 at 12:15:00AM +0100, Oleg Taranenko wrote:
> being last decade working with java & javascript I completely lost
> relation to c/c++ world. Trying to get into git internals I'm facing
> with issue what IDE is more suitable for developing git @ macos ?
>
> Have googled, but any my search queries following to non-relevant
> themes, like supporting git in IDEs etc.
>
> my first attempt - CLion (as far as I'm Jetbrains fan) - got failed,
> as far as doesn't support makefile-based projects, only CMake.
>
> There are a number of free C/C++ dev tools: Xcode, CodeBlocks,
> CodeLite. Gnat, Qt creator, Dev C++, C++ Builder (Borland? :),
> Eclipse, NetBeans... what else?
>
> Because of lack my modern C experience, could somebody share his own
> attempts/thoughts/use cases how more convenient dive into the git
> development process on the Mac?
I think most people just use a good editor (emacs or vim), and no IDE. I
do recommend using ctags or similar (and there is a "make tags" target
to build the tags) to help with jumping between functions.
> Tried to find in the git distribution Documentation more information
> about this, nothing as well... Would be nice to have a kind of
> 'Getting Started Manual'
There is Documentation/CodingGuidelines, but that's mostly about how to
_write_ code, not read it. Some protocols and subsystems are covered in
Documentation/technical. If you want a "big picture", I think you'd do
best to read something like:
https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain
That talks about the system as a whole, not the code, but the layout of
the code follows the overall system design (e.g., the entry point for
the "log" command is cmd_log(), and you can see which subsystems it uses
from there).
-Peff
^ permalink raw reply
* Re: [PATCH v2] remote helpers: avoid blind fall-back to ".git" when setting GIT_DIR
From: Jeff King @ 2017-02-14 6:16 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: git, Duy Nguyen, Junio C Hamano, Stefan Beller
In-Reply-To: <20161230004845.rknafqsyosmyr6z2@sigill.intra.peff.net>
On Thu, Dec 29, 2016 at 07:48:45PM -0500, Jeff King wrote:
> On Thu, Dec 29, 2016 at 04:11:14PM -0800, Jonathan Nieder wrote:
>
> > Thanks. Here's the patch again, now with commit messages and a test.
> > Thanks for the analysis and sorry for the slow turnaround.
>
> Thanks for following up. While working on a similar one recently, I had
> the nagging feeling that there was a case we had found that was still to
> be dealt with, and I think this was it. :)
>
> The patch to the C code looks good. I have a few comments on the tests:
I happened to run into this problem myself today, so I thought I'd prod
you. I think your patch looks good. Hopefully I didn't scare you off
with my comments. :)
-Peff
^ permalink raw reply
* Re: [PATCH] mingw: use OpenSSL's SHA-1 routines
From: Johannes Sixt @ 2017-02-14 6:14 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, Jeff King, git, Jeff Hostetler
In-Reply-To: <alpine.DEB.2.20.1702132337470.3496@virtualbox>
Am 13.02.2017 um 23:38 schrieb Johannes Schindelin:
> In addition, you build from a custom MINGW/MSys1 setup, correct?
Correct. Specifically, I use the build tools from "msysgit" times, but
build outside the premanufactured build environement; i.e., the
"THIS_IS_MSYSGIT" section in config.mak.uname is not used.
-- Hannes
^ permalink raw reply
* Re: [PATCH 0/7] grep rev/path parsing fixes
From: Jeff King @ 2017-02-14 6:10 UTC (permalink / raw)
To: Jonathan Tan; +Cc: git, gitster
In-Reply-To: <20170214060021.einv7372exbxa23z@sigill.intra.peff.net>
On Tue, Feb 14, 2017 at 01:00:21AM -0500, Jeff King wrote:
> On Mon, Feb 13, 2017 at 08:20:37PM -0500, Jeff King wrote:
>
> > > If there is a repo and "foo" is a rev, the "--no-index or --untracked
> > > cannot be used with revs." error would occur. If there is a repo and
> > > "foo" is not a rev, this command would proceed as usual. If there is no
> > > repo, the "setup_git_env called without repository" error would occur.
> > > (This is my understanding from reading the code - I haven't tested it
> > > out.)
> >
> > Yes, it's easy to see that "git grep --no-index foo bar" outside of a
> > repo generates the same BUG. I suspect that "--no-index" should just
> > disable looking up revs entirely, even if we are actually in a
> > repository directory.
>
> I've fixed that, along with a few other bugs and cleanups. The complete
> series is below. Patch 2 is your (untouched) patch. My suggestions for
> your test are in patch 3, which can either remain on its own or be
> squashed in.
>
> [1/7]: grep: move thread initialization a little lower
> [2/7]: grep: do not unnecessarily query repo for "--"
> [3/7]: t7810: make "--no-index --" test more robust
> [4/7]: grep: re-order rev-parsing loop
> [5/7]: grep: fix "--" rev/pathspec disambiguation
> [6/7]: grep: avoid resolving revision names in --no-index case
> [7/7]: grep: do not diagnose misspelt revs with --no-index
>
> builtin/grep.c | 78 +++++++++++++++++++++++++++++++++++++++------------------
> t/t7810-grep.sh | 66 ++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 119 insertions(+), 25 deletions(-)
Just to clarify: these are all existing bugs, and I think these are
probably maint-worthy patches (even the --no-index ones; though we don't
BUG on the out-of-repo without the patch from 'next', the code is still
doing the wrong thing in subtle ways).
But AFAIK they are all much older bugs than the upcoming v2.12, so there
is no pressing need to fit them into the upcoming release.
-Peff
^ permalink raw reply
* [PATCH 7/7] grep: do not diagnose misspelt revs with --no-index
From: Jeff King @ 2017-02-14 6:08 UTC (permalink / raw)
To: Jonathan Tan; +Cc: git, gitster
In-Reply-To: <20170214060021.einv7372exbxa23z@sigill.intra.peff.net>
If we are using --no-index, then our arguments cannot be
revs in the first place. Not only is it pointless to
diagnose them, but if we are not in a repository, we should
not be trying to resolve any names.
Signed-off-by: Jeff King <peff@peff.net>
---
builtin/grep.c | 2 +-
t/t7810-grep.sh | 5 +++++
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/builtin/grep.c b/builtin/grep.c
index c4c632594..1454bef49 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1201,7 +1201,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
if (!seen_dashdash) {
int j;
for (j = i; j < argc; j++)
- verify_filename(prefix, argv[j], j == i);
+ verify_filename(prefix, argv[j], j == i && use_index);
}
parse_pathspec(&pathspec, 0,
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index c051c7ee8..0ff9f6cae 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -1043,6 +1043,11 @@ test_expect_success 'grep --no-index prefers paths to revs' '
test_cmp expect actual
'
+test_expect_success 'grep --no-index does not "diagnose" revs' '
+ test_must_fail git grep --no-index o :1:hello.c 2>err &&
+ test_i18ngrep ! -i "did you mean" err
+'
+
cat >expected <<EOF
hello.c:int main(int argc, const char **argv)
hello.c: printf("Hello world.\n");
--
2.12.0.rc1.471.ga79ec8999
^ permalink raw reply related
* [PATCH 6/7] grep: avoid resolving revision names in --no-index case
From: Jeff King @ 2017-02-14 6:07 UTC (permalink / raw)
To: Jonathan Tan; +Cc: git, gitster
In-Reply-To: <20170214060021.einv7372exbxa23z@sigill.intra.peff.net>
We disallow the use of revisions with --no-index, but we
don't actually check and complain until well after we've
parsed the revisions.
This is the cause of a few problems:
1. We shouldn't be calling get_sha1() at all when we aren't
in a repository, as it might access the ref or object
databases. For now, this should generally just return
failure, but eventually it will become a BUG().
2. When there's a "--" disambiguator and you're outside a
repository, we'll complain early with "unable to resolve
revision". But we can give a much more specific error.
3. When there isn't a "--" disambiguator, we still do the
normal rev/path checks. This is silly, as we know we
cannot have any revs with --no-index. Everything we see
must be a path.
Outside of a repository this doesn't matter (since we
know it won't resolve), but inside one, we may complain
unnecessarily if a filename happens to also match a
refname.
This patch skips the get_sha1() call entirely in the
no-index case, and behaves as if it failed (with the
exception of giving a better error message).
Signed-off-by: Jeff King <peff@peff.net>
---
builtin/grep.c | 6 ++++++
t/t7810-grep.sh | 13 +++++++++++++
2 files changed, 19 insertions(+)
diff --git a/builtin/grep.c b/builtin/grep.c
index e83b33bda..c4c632594 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1176,6 +1176,12 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
break;
}
+ if (!use_index) {
+ if (seen_dashdash)
+ die(_("--no-index cannot be used with revs"));
+ break;
+ }
+
if (get_sha1_with_context(arg, 0, sha1, &oc)) {
if (seen_dashdash)
die(_("unable to resolve revision: %s"), arg);
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index a6011f9b1..c051c7ee8 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -1030,6 +1030,19 @@ test_expect_success 'grep --no-index pattern -- path' '
)
'
+test_expect_success 'grep --no-index complains of revs' '
+ test_must_fail git grep --no-index o master -- 2>err &&
+ test_i18ngrep "no-index cannot be used with revs" err
+'
+
+test_expect_success 'grep --no-index prefers paths to revs' '
+ test_when_finished "rm -f master" &&
+ echo content >master &&
+ echo master:content >expect &&
+ git grep --no-index o master >actual &&
+ test_cmp expect actual
+'
+
cat >expected <<EOF
hello.c:int main(int argc, const char **argv)
hello.c: printf("Hello world.\n");
--
2.12.0.rc1.471.ga79ec8999
^ permalink raw reply related
* [PATCH 5/7] grep: fix "--" rev/pathspec disambiguation
From: Jeff King @ 2017-02-14 6:05 UTC (permalink / raw)
To: Jonathan Tan; +Cc: git, gitster
In-Reply-To: <20170214060021.einv7372exbxa23z@sigill.intra.peff.net>
If we see "git grep pattern rev -- file" then we apply the
usual rev/pathspec disambiguation rules: any "rev" before
the "--" must be a revision, and we do not need to apply the
verify_non_filename() check.
But there are two bugs here:
1. We keep a seen_dashdash flag to handle this case, but
we set it in the same left-to-right pass over the
arguments in which we parse "rev".
So when we see "rev", we do not yet know that there is
a "--", and we mistakenly complain if there is a
matching file.
We can fix this by making a preliminary pass over the
arguments to find the "--", and only then checking the rev
arguments.
2. If we can't resolve "rev" but there isn't a dashdash,
that's OK. We treat it like a path, and complain later
if it doesn't exist.
But if there _is_ a dashdash, then we know it must be a
rev, and should treat it as such, complaining if it
does not resolve. The current code instead ignores it
and tries to treat it like a path.
This patch fixes both bugs, and tries to comment the parsing
flow a bit better.
It adds tests that cover the two bugs, but also some related
situations (which already worked, but this confirms that our
fixes did not break anything).
Signed-off-by: Jeff King <peff@peff.net>
---
builtin/grep.c | 29 ++++++++++++++++++++++++-----
t/t7810-grep.sh | 33 +++++++++++++++++++++++++++++++++
2 files changed, 57 insertions(+), 5 deletions(-)
diff --git a/builtin/grep.c b/builtin/grep.c
index 461347adb..e83b33bda 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1149,7 +1149,22 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
compile_grep_patterns(&opt);
- /* Check revs and then paths */
+ /*
+ * We have to find "--" in a separate pass, because its presence
+ * influences how we will parse arguments that come before it.
+ */
+ for (i = 0; i < argc; i++) {
+ if (!strcmp(argv[i], "--")) {
+ seen_dashdash = 1;
+ break;
+ }
+ }
+
+ /*
+ * Resolve any rev arguments. If we have a dashdash, then everything up
+ * to it must resolve as a rev. If not, then we stop at the first
+ * non-rev and assume everything else is a path.
+ */
for (i = 0; i < argc; i++) {
const char *arg = argv[i];
unsigned char sha1[20];
@@ -1158,13 +1173,14 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
if (!strcmp(arg, "--")) {
i++;
- seen_dashdash = 1;
break;
}
- /* Stop at the first non-rev */
- if (get_sha1_with_context(arg, 0, sha1, &oc))
+ if (get_sha1_with_context(arg, 0, sha1, &oc)) {
+ if (seen_dashdash)
+ die(_("unable to resolve revision: %s"), arg);
break;
+ }
object = parse_object_or_die(sha1, arg);
if (!seen_dashdash)
@@ -1172,7 +1188,10 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
add_object_array_with_path(object, arg, &list, oc.mode, oc.path);
}
- /* The rest are paths */
+ /*
+ * Anything left over is presumed to be a path. But in the non-dashdash
+ * "do what I mean" case, we verify and complain when that isn't true.
+ */
if (!seen_dashdash) {
int j;
for (j = i; j < argc; j++)
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 2c1f7373e..a6011f9b1 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -982,6 +982,39 @@ test_expect_success 'grep -e -- -- path' '
test_cmp expected actual
'
+test_expect_success 'dashdash disambiguates rev as rev' '
+ test_when_finished "rm -f master" &&
+ echo content >master &&
+ echo master:hello.c >expect &&
+ git grep -l o master -- hello.c >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'dashdash disambiguates pathspec as pathspec' '
+ test_when_finished "git rm -f master" &&
+ echo content >master &&
+ git add master &&
+ echo master:content >expect &&
+ git grep o -- master >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'report bogus arg without dashdash' '
+ test_must_fail git grep o does-not-exist
+'
+
+test_expect_success 'report bogus rev with dashdash' '
+ test_must_fail git grep o hello.c --
+'
+
+test_expect_success 'allow non-existent path with dashdash' '
+ # We need a real match so grep exits with success.
+ tree=$(git ls-tree HEAD |
+ sed s/hello.c/not-in-working-tree/ |
+ git mktree) &&
+ git grep o "$tree" -- not-in-working-tree
+'
+
test_expect_success 'grep --no-index pattern -- path' '
rm -fr non &&
mkdir -p non/git &&
--
2.12.0.rc1.471.ga79ec8999
^ permalink raw reply related
* [PATCH 4/7] grep: re-order rev-parsing loop
From: Jeff King @ 2017-02-14 6:04 UTC (permalink / raw)
To: Jonathan Tan; +Cc: git, gitster
In-Reply-To: <20170214060021.einv7372exbxa23z@sigill.intra.peff.net>
We loop over the arguments, but every branch of the loop
hits either a "continue" or a "break". Surely we can make
this simpler.
The final conditional is:
if (arg is a rev) {
... handle rev ...
continue;
}
break;
We can rewrite this as:
if (arg is not a rev)
break;
... handle rev ...
That makes the flow a little bit simpler, and will make
things much easier to follow when we add more logic in
future patches.
Signed-off-by: Jeff King <peff@peff.net>
---
builtin/grep.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/builtin/grep.c b/builtin/grep.c
index 081e1b57a..461347adb 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1154,20 +1154,22 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
const char *arg = argv[i];
unsigned char sha1[20];
struct object_context oc;
+ struct object *object;
+
if (!strcmp(arg, "--")) {
i++;
seen_dashdash = 1;
break;
}
- /* Is it a rev? */
- if (!get_sha1_with_context(arg, 0, sha1, &oc)) {
- struct object *object = parse_object_or_die(sha1, arg);
- if (!seen_dashdash)
- verify_non_filename(prefix, arg);
- add_object_array_with_path(object, arg, &list, oc.mode, oc.path);
- continue;
- }
- break;
+
+ /* Stop at the first non-rev */
+ if (get_sha1_with_context(arg, 0, sha1, &oc))
+ break;
+
+ object = parse_object_or_die(sha1, arg);
+ if (!seen_dashdash)
+ verify_non_filename(prefix, arg);
+ add_object_array_with_path(object, arg, &list, oc.mode, oc.path);
}
/* The rest are paths */
--
2.12.0.rc1.471.ga79ec8999
^ permalink raw reply related
* [PATCH 3/7] t7810: make "--no-index --" test more robust
From: Jeff King @ 2017-02-14 6:03 UTC (permalink / raw)
To: Jonathan Tan; +Cc: git, gitster
In-Reply-To: <20170214060021.einv7372exbxa23z@sigill.intra.peff.net>
This makes sure that we actually use the pathspecs after
"--" correctly (as opposed to just seeing if grep errored
out).
Signed-off-by: Jeff King <peff@peff.net>
---
This could be squashed into the previous patch.
I didn't end up using the "nongit" helper, because we actually have to
do some other setup inside the subshell (this is the reason the earlier
tests in this script don't use the helper, either).
t/t7810-grep.sh | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 29202f0e7..2c1f7373e 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -990,7 +990,10 @@ test_expect_success 'grep --no-index pattern -- path' '
export GIT_CEILING_DIRECTORIES &&
cd non/git &&
echo hello >hello &&
- git grep --no-index o -- .
+ echo goodbye >goodbye &&
+ echo hello:hello >expect &&
+ git grep --no-index o -- hello >actual &&
+ test_cmp expect actual
)
'
--
2.12.0.rc1.471.ga79ec8999
^ permalink raw reply related
* [PATCH 2/7] grep: do not unnecessarily query repo for "--"
From: Jeff King @ 2017-02-14 6:03 UTC (permalink / raw)
To: Jonathan Tan; +Cc: git, gitster
In-Reply-To: <20170214060021.einv7372exbxa23z@sigill.intra.peff.net>
From: Jonathan Tan <jonathantanmy@google.com>
When running a command of the form
git grep --no-index pattern -- path
in the absence of a Git repository, an error message will be printed:
fatal: BUG: setup_git_env called without repository
This is because "git grep" tries to interpret "--" as a rev. "git grep"
has always tried to first interpret "--" as a rev for at least a few
years, but this issue was upgraded from a pessimization to a bug in
commit 59332d1 ("Resurrect "git grep --no-index"", 2010-02-06), which
calls get_sha1 regardless of whether --no-index was specified. This bug
appeared to be benign until commit b1ef400 ("setup_git_env: avoid blind
fall-back to ".git"", 2016-10-20) when Git was taught to die in this
situation. (This "git grep" bug appears to be one of the bugs that
commit b1ef400 is meant to flush out.)
Therefore, always interpret "--" as signaling the end of options,
instead of trying to interpret it as a rev first.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff King <peff@peff.net>
---
Unchanged from your original.
builtin/grep.c | 9 +++++----
t/t7810-grep.sh | 12 ++++++++++++
2 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/builtin/grep.c b/builtin/grep.c
index 5a282c4d0..081e1b57a 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1154,6 +1154,11 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
const char *arg = argv[i];
unsigned char sha1[20];
struct object_context oc;
+ if (!strcmp(arg, "--")) {
+ i++;
+ seen_dashdash = 1;
+ break;
+ }
/* Is it a rev? */
if (!get_sha1_with_context(arg, 0, sha1, &oc)) {
struct object *object = parse_object_or_die(sha1, arg);
@@ -1162,10 +1167,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
add_object_array_with_path(object, arg, &list, oc.mode, oc.path);
continue;
}
- if (!strcmp(arg, "--")) {
- i++;
- seen_dashdash = 1;
- }
break;
}
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 19f0108f8..29202f0e7 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -982,6 +982,18 @@ test_expect_success 'grep -e -- -- path' '
test_cmp expected actual
'
+test_expect_success 'grep --no-index pattern -- path' '
+ rm -fr non &&
+ mkdir -p non/git &&
+ (
+ GIT_CEILING_DIRECTORIES="$(pwd)/non" &&
+ export GIT_CEILING_DIRECTORIES &&
+ cd non/git &&
+ echo hello >hello &&
+ git grep --no-index o -- .
+ )
+'
+
cat >expected <<EOF
hello.c:int main(int argc, const char **argv)
hello.c: printf("Hello world.\n");
--
2.12.0.rc1.471.ga79ec8999
^ permalink raw reply related
* [PATCH 1/7] grep: move thread initialization a little lower
From: Jeff King @ 2017-02-14 6:02 UTC (permalink / raw)
To: Jonathan Tan; +Cc: git, gitster
In-Reply-To: <20170214060021.einv7372exbxa23z@sigill.intra.peff.net>
Originally, we set up the threads for grep before parsing
the non-option arguments. In 53b8d931b (grep: disable
threading in non-worktree case, 2011-12-12), the thread code
got bumped lower in the function because it now needed to
know whether we got any revision arguments.
That put a big block of code in between the parsing of revs
and the parsing of pathspecs, both of which share some loop
variables. That makes it harder to read the code than the
original, where the shared loops were right next to each
other.
Let's bump the thread initialization until after all of the
parsing is done.
Signed-off-by: Jeff King <peff@peff.net>
---
I double-checked to make sure no other code was relying on
the thread setup having happened. I think we could actually
bump it quite a bit lower (to right before we actually start
grepping), but I doubt it matters much in practice.
builtin/grep.c | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/builtin/grep.c b/builtin/grep.c
index 2c727ef49..5a282c4d0 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1169,6 +1169,20 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
break;
}
+ /* The rest are paths */
+ if (!seen_dashdash) {
+ int j;
+ for (j = i; j < argc; j++)
+ verify_filename(prefix, argv[j], j == i);
+ }
+
+ parse_pathspec(&pathspec, 0,
+ PATHSPEC_PREFER_CWD |
+ (opt.max_depth != -1 ? PATHSPEC_MAXDEPTH_VALID : 0),
+ prefix, argv + i);
+ pathspec.max_depth = opt.max_depth;
+ pathspec.recursive = 1;
+
#ifndef NO_PTHREADS
if (list.nr || cached || show_in_pager)
num_threads = 0;
@@ -1190,20 +1204,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
}
#endif
- /* The rest are paths */
- if (!seen_dashdash) {
- int j;
- for (j = i; j < argc; j++)
- verify_filename(prefix, argv[j], j == i);
- }
-
- parse_pathspec(&pathspec, 0,
- PATHSPEC_PREFER_CWD |
- (opt.max_depth != -1 ? PATHSPEC_MAXDEPTH_VALID : 0),
- prefix, argv + i);
- pathspec.max_depth = opt.max_depth;
- pathspec.recursive = 1;
-
if (recurse_submodules) {
gitmodules_config();
compile_submodule_options(&opt, &pathspec, cached, untracked,
--
2.12.0.rc1.471.ga79ec8999
^ permalink raw reply related
* [PATCH 0/7] grep rev/path parsing fixes
From: Jeff King @ 2017-02-14 6:00 UTC (permalink / raw)
To: Jonathan Tan; +Cc: git, gitster
In-Reply-To: <20170214012037.u2eg2n7mvteullcx@sigill.intra.peff.net>
On Mon, Feb 13, 2017 at 08:20:37PM -0500, Jeff King wrote:
> > If there is a repo and "foo" is a rev, the "--no-index or --untracked
> > cannot be used with revs." error would occur. If there is a repo and
> > "foo" is not a rev, this command would proceed as usual. If there is no
> > repo, the "setup_git_env called without repository" error would occur.
> > (This is my understanding from reading the code - I haven't tested it
> > out.)
>
> Yes, it's easy to see that "git grep --no-index foo bar" outside of a
> repo generates the same BUG. I suspect that "--no-index" should just
> disable looking up revs entirely, even if we are actually in a
> repository directory.
I've fixed that, along with a few other bugs and cleanups. The complete
series is below. Patch 2 is your (untouched) patch. My suggestions for
your test are in patch 3, which can either remain on its own or be
squashed in.
[1/7]: grep: move thread initialization a little lower
[2/7]: grep: do not unnecessarily query repo for "--"
[3/7]: t7810: make "--no-index --" test more robust
[4/7]: grep: re-order rev-parsing loop
[5/7]: grep: fix "--" rev/pathspec disambiguation
[6/7]: grep: avoid resolving revision names in --no-index case
[7/7]: grep: do not diagnose misspelt revs with --no-index
builtin/grep.c | 78 +++++++++++++++++++++++++++++++++++++++------------------
t/t7810-grep.sh | 66 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 119 insertions(+), 25 deletions(-)
-Peff
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox