* git status reads too many files
@ 2011-03-21 12:40 Lasse Makholm
2011-03-21 16:41 ` Junio C Hamano
0 siblings, 1 reply; 9+ messages in thread
From: Lasse Makholm @ 2011-03-21 12:40 UTC (permalink / raw)
To: git
After a git checkout, git status has a tendency to read all the files
that were updated during the checkout. In git.git for example:
$ git checkout -b here v1.7.4
Switched to a new branch 'here'
$ git checkout -b there v1.7.4.1
Switched to a new branch 'there'
$ strace -o /tmp/trace1 git status
# On branch there
nothing to commit (working directory clean)
$ grep ^open /tmp/trace1 | wc -l
414
$ git diff --name-only here..there | tail -1
wrapper.c
$ grep -A2 wrapper.c /tmp/trace1
lstat("wrapper.c", {st_mode=S_IFREG|0644, st_size=7617, ...}) = 0
open("wrapper.c", O_RDONLY) = 3
read(3, "/*\n * Various trivial helper wra"..., 7617) = 7617
close(3) = 0
$
This persistent across multiple runs of git status:
$ strace -o /tmp/trace2 git status
# On branch there
nothing to commit (working directory clean)
$ grep ^open /tmp/trace2 | wc -l
414
$
...until the index is touched:
$ touch .git/index
$ strace -o /tmp/trace3 git status
# On branch there
nothing to commit (working directory clean)
$ grep ^open /tmp/trace3 | wc -l
362
$
This happening at least with 1.7.0.4 and 1.7.4.1.343.ga91df (master as
of now)...
Further scrutiny reveals that when this happens, the index and the
newly updated files in the working tree have identical modification
times, so I guess git status is reading all files which are not older
than the index...
Discussing this with Mr. Schindelin last week, my first thought was
that checkout should ensure that the index is newer than any of the
files in the working tree but Johannes seems to think that instead,
the first git status run should touch the index, thus preventing the
next run from reading the files again.
I'm not familiar enough with the semantics of the index to say which
way is correct or intended, but surely the current behaviour is not
desirable.
Bug?
--
/Lasse
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git status reads too many files
2011-03-21 12:40 git status reads too many files Lasse Makholm
@ 2011-03-21 16:41 ` Junio C Hamano
2011-03-21 17:16 ` [PATCH 1/2] diff/status: refactor opportunistic index update Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Junio C Hamano @ 2011-03-21 16:41 UTC (permalink / raw)
To: Lasse Makholm; +Cc: git
Lasse Makholm <lasse.makholm@gmail.com> writes:
> This persistent across multiple runs of git status:
>
> $ strace -o /tmp/trace2 git status
> # On branch there
> nothing to commit (working directory clean)
> $ grep ^open /tmp/trace2 | wc -l
> 414
> $
>
> ...until the index is touched:
>
> $ touch .git/index
Don't do this; you are breaking the racy-git protection.
I think we opportunistically update the .git/index file in "git status" to
refresh the stat bits (but we don't error out when we cannot write a new
index, as you may be only browsing somebody else's repository with only a
read access to it). It probably should be just the matter of adding a bit
of logic to notice that your index is racily clean.
Let me cook something real quick.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] diff/status: refactor opportunistic index update
2011-03-21 16:41 ` Junio C Hamano
@ 2011-03-21 17:16 ` Junio C Hamano
2011-03-21 18:46 ` Piotr Krukowiecki
2011-03-21 17:18 ` [PATCH 2/2] update $GIT_INDEX_FILE when there are racily clean entries Junio C Hamano
2011-03-21 20:39 ` git status reads too many files Lasse Makholm
2 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2011-03-21 17:16 UTC (permalink / raw)
To: git; +Cc: Lasse Makholm
When we had to refresh the index internally before running diff or status,
we opportunistically updated the $GIT_INDEX_FILE so that later invocation
of git can use the lstat(2) we already did in this invocation.
Make them share a helper function to do so.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
> I think we opportunistically update the .git/index file in "git status" to
> refresh the stat bits (but we don't error out when we cannot write a new
> index, as you may be only browsing somebody else's repository with only a
> read access to it). It probably should be just the matter of adding a bit
> of logic to notice that your index is racily clean.
>
> Let me cook something real quick.
builtin/commit.c | 9 ++-------
builtin/diff.c | 7 +------
cache.h | 1 +
read-cache.c | 12 ++++++++++++
4 files changed, 16 insertions(+), 13 deletions(-)
diff --git a/builtin/commit.c b/builtin/commit.c
index 66fdd22..0b6ce2f 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1090,13 +1090,8 @@ int cmd_status(int argc, const char **argv, const char *prefix)
refresh_index(&the_index, REFRESH_QUIET|REFRESH_UNMERGED, s.pathspec, NULL, NULL);
fd = hold_locked_index(&index_lock, 0);
- if (0 <= fd) {
- if (active_cache_changed &&
- !write_cache(fd, active_cache, active_nr))
- commit_locked_index(&index_lock);
- else
- rollback_lock_file(&index_lock);
- }
+ if (0 <= fd)
+ update_index_if_able(&the_index, &index_lock);
s.is_initial = get_sha1(s.reference, sha1) ? 1 : 0;
s.in_merge = in_merge;
diff --git a/builtin/diff.c b/builtin/diff.c
index a43d326..bab4bd9 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -197,12 +197,7 @@ static void refresh_index_quietly(void)
discard_cache();
read_cache();
refresh_cache(REFRESH_QUIET|REFRESH_UNMERGED);
-
- if (active_cache_changed &&
- !write_cache(fd, active_cache, active_nr))
- commit_locked_index(lock_file);
-
- rollback_lock_file(lock_file);
+ update_index_if_able(&the_index, lock_file);
}
static int builtin_diff_files(struct rev_info *revs, int argc, const char **argv)
diff --git a/cache.h b/cache.h
index 2ef2fa3..9a3cc8e 100644
--- a/cache.h
+++ b/cache.h
@@ -520,6 +520,7 @@ extern NORETURN void unable_to_lock_index_die(const char *path, int err);
extern int hold_lock_file_for_update(struct lock_file *, const char *path, int);
extern int hold_lock_file_for_append(struct lock_file *, const char *path, int);
extern int commit_lock_file(struct lock_file *);
+extern void update_index_if_able(struct index_state *, struct lock_file *);
extern int hold_locked_index(struct lock_file *, int);
extern int commit_locked_index(struct lock_file *);
diff --git a/read-cache.c b/read-cache.c
index 1f42473..561dc66 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1545,6 +1545,18 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, struct cache_entry *ce)
return result;
}
+/*
+ * Opportunisticly update the index but do not complain if we can't
+ */
+void update_index_if_able(struct index_state *istate, struct lock_file *lockfile)
+{
+ if (istate->cache_changed) &&
+ !write_index(istate, lockfile->fd))
+ commit_locked_index(lockfile);
+ else
+ rollback_lock_file(lockfile);
+}
+
int write_index(struct index_state *istate, int newfd)
{
git_SHA_CTX c;
--
1.7.4.1.554.gfdad8
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] update $GIT_INDEX_FILE when there are racily clean entries
2011-03-21 16:41 ` Junio C Hamano
2011-03-21 17:16 ` [PATCH 1/2] diff/status: refactor opportunistic index update Junio C Hamano
@ 2011-03-21 17:18 ` Junio C Hamano
2011-03-21 21:23 ` Lasse Makholm
2011-03-22 0:26 ` Eric Raible
2011-03-21 20:39 ` git status reads too many files Lasse Makholm
2 siblings, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2011-03-21 17:18 UTC (permalink / raw)
To: git; +Cc: Lasse Makholm
Traditional "opportunistic index update" done by read-only "diff" and
"status" was about updating cached lstat(2) information in the index for
the next round. We missed another obvious optimization opportunity to
when there are racily clean entries that will ceas to be racily clean
by updating $GIT_INDEX_FILE.
Noticed by Lasse Makholm by stracing "git status" in a fresh checkout and
counting the number of open(2) calls.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
read-cache.c | 15 ++++++++++++++-
1 files changed, 14 insertions(+), 1 deletions(-)
diff --git a/read-cache.c b/read-cache.c
index 561dc66..971e277 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1545,12 +1545,25 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, struct cache_entry *ce)
return result;
}
+static int has_racy_timestamp(struct index_state *istate)
+{
+ int entries = istate->cache_nr;
+ int i;
+
+ for (i = 0; i < entries; i++) {
+ struct cache_entry *ce = istate->cache[i];
+ if (is_racy_timestamp(istate, ce))
+ return 1;
+ }
+ return 0;
+}
+
/*
* Opportunisticly update the index but do not complain if we can't
*/
void update_index_if_able(struct index_state *istate, struct lock_file *lockfile)
{
- if (istate->cache_changed) &&
+ if ((istate->cache_changed || has_racy_timestamp(istate)) &&
!write_index(istate, lockfile->fd))
commit_locked_index(lockfile);
else
--
1.7.4.1.554.gfdad8
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] diff/status: refactor opportunistic index update
2011-03-21 17:16 ` [PATCH 1/2] diff/status: refactor opportunistic index update Junio C Hamano
@ 2011-03-21 18:46 ` Piotr Krukowiecki
2011-03-21 19:39 ` Junio C Hamano
0 siblings, 1 reply; 9+ messages in thread
From: Piotr Krukowiecki @ 2011-03-21 18:46 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Lasse Makholm
On Mon, Mar 21, 2011 at 6:16 PM, Junio C Hamano <gitster@pobox.com> wrote:
> +void update_index_if_able(struct index_state *istate, struct lock_file *lockfile)
> +{
> + if (istate->cache_changed) &&
> + !write_index(istate, lockfile->fd))
Mismatched parenthesis? Should be sth like
+ if (istate->cache_changed &&
+ !write_index(istate, lockfile->fd))
--
Piotr Krukowiecki
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] diff/status: refactor opportunistic index update
2011-03-21 18:46 ` Piotr Krukowiecki
@ 2011-03-21 19:39 ` Junio C Hamano
0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2011-03-21 19:39 UTC (permalink / raw)
To: Piotr Krukowiecki; +Cc: Junio C Hamano, git, Lasse Makholm
Piotr Krukowiecki <piotr.krukowiecki@gmail.com> writes:
> On Mon, Mar 21, 2011 at 6:16 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> +void update_index_if_able(struct index_state *istate, struct lock_file *lockfile)
>> +{
>> + if (istate->cache_changed) &&
>> + !write_index(istate, lockfile->fd))
>
> Mismatched parenthesis? Should be sth like
>
> + if (istate->cache_changed &&
> + !write_index(istate, lockfile->fd))
Yeah, "rebase -i" gotcha. Applying [2/2] should get rid of it anyway.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git status reads too many files
2011-03-21 16:41 ` Junio C Hamano
2011-03-21 17:16 ` [PATCH 1/2] diff/status: refactor opportunistic index update Junio C Hamano
2011-03-21 17:18 ` [PATCH 2/2] update $GIT_INDEX_FILE when there are racily clean entries Junio C Hamano
@ 2011-03-21 20:39 ` Lasse Makholm
2 siblings, 0 replies; 9+ messages in thread
From: Lasse Makholm @ 2011-03-21 20:39 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On 21 March 2011 17:41, Junio C Hamano <gitster@pobox.com> wrote:
> Lasse Makholm <lasse.makholm@gmail.com> writes:
>
>> This persistent across multiple runs of git status:
>>
>> $ strace -o /tmp/trace2 git status
>> # On branch there
>> nothing to commit (working directory clean)
>> $ grep ^open /tmp/trace2 | wc -l
>> 414
>> $
>>
>> ...until the index is touched:
>>
>> $ touch .git/index
>
> Don't do this; you are breaking the racy-git protection.
Yeah, I know, I was just proving a point... git reset (--hard?) HEAD
would achieve the same thing...
> I think we opportunistically update the .git/index file in "git status" to
> refresh the stat bits (but we don't error out when we cannot write a new
> index, as you may be only browsing somebody else's repository with only a
> read access to it). It probably should be just the matter of adding a bit
> of logic to notice that your index is racily clean.
I figured as much... My original thought of checkout ensuring an index
newer than any working file is stupid, of course, for a multitude of
reasons -- one of which is that the "next" timestamp may be a full 2
seconds away...
> Let me cook something real quick.
Sweet, thanks...
--
/Lasse
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] update $GIT_INDEX_FILE when there are racily clean entries
2011-03-21 17:18 ` [PATCH 2/2] update $GIT_INDEX_FILE when there are racily clean entries Junio C Hamano
@ 2011-03-21 21:23 ` Lasse Makholm
2011-03-22 0:26 ` Eric Raible
1 sibling, 0 replies; 9+ messages in thread
From: Lasse Makholm @ 2011-03-21 21:23 UTC (permalink / raw)
Cc: git
Works for me.
Thanks.
--
/Lasse
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] update $GIT_INDEX_FILE when there are racily clean entries
2011-03-21 17:18 ` [PATCH 2/2] update $GIT_INDEX_FILE when there are racily clean entries Junio C Hamano
2011-03-21 21:23 ` Lasse Makholm
@ 2011-03-22 0:26 ` Eric Raible
1 sibling, 0 replies; 9+ messages in thread
From: Eric Raible @ 2011-03-22 0:26 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Lasse Makholm
On 11:59 AM, Junio C Hamano wrote:
> Traditional "opportunistic index update" done by read-only "diff" and
> "status" was about updating cached lstat(2) information in the index for
> the next round. We missed another obvious optimization opportunity to
> when there are racily clean entries that will ceas to be racily clean
> by updating $GIT_INDEX_FILE.
s/ceas/cease/
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-03-22 0:27 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-21 12:40 git status reads too many files Lasse Makholm
2011-03-21 16:41 ` Junio C Hamano
2011-03-21 17:16 ` [PATCH 1/2] diff/status: refactor opportunistic index update Junio C Hamano
2011-03-21 18:46 ` Piotr Krukowiecki
2011-03-21 19:39 ` Junio C Hamano
2011-03-21 17:18 ` [PATCH 2/2] update $GIT_INDEX_FILE when there are racily clean entries Junio C Hamano
2011-03-21 21:23 ` Lasse Makholm
2011-03-22 0:26 ` Eric Raible
2011-03-21 20:39 ` git status reads too many files Lasse Makholm
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).