* [PATCH] checkout: most of the time we have good leading directories
@ 2013-11-08 0:30 Junio C Hamano
2013-11-09 14:24 ` Thomas Rast
0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2013-11-08 0:30 UTC (permalink / raw)
To: git
When "git checkout" wants to create a path, e.g. a/b/c/d/e, after
seeing if the entire thing already exists (in which case we check if
that is up-to-date and do not bother to check it out, or we unlink
and recreate it), we validate that the leading directory path is
without funny symlinks by seeing a/, a/b/, a/b/c/ and then a/b/c/d/
are all without funny symlinks, by calling has_dirs_only_path() in
this order.
When we are checking out many files (imagine: initial checkout),
however, it is likely that an earlier checkout would have already
made sure that the leading directory a/b/c/d/ is in good order; by
first checking the whole path a/b/c/d/ first, we can often bypass
calls to has_dirs_only_path() for leading part.
This cuts down the number of calls to has_dirs_only_path() for
checking out Linux kernel sources afresh from 190k down to 98k.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
* Just a random experimental change I was playing with today,
looking for low hanging fruits before having to thread the entire
checkout codepath.
entry.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/entry.c b/entry.c
index 7b7aa81..e2c0ac6 100644
--- a/entry.c
+++ b/entry.c
@@ -6,9 +6,17 @@
static void create_directories(const char *path, int path_len,
const struct checkout *state)
{
- char *buf = xmalloc(path_len + 1);
- int len = 0;
+ char *buf;
+ int len;
+
+ for (len = path_len - 1; 0 <= len; len--)
+ if (path[len] == '/')
+ break;
+ if (has_dirs_only_path(path, len, state->base_dir_len))
+ return; /* ok, we have the whole leading directory */
+ buf = xmalloc(path_len + 1);
+ len = 0;
while (len < path_len) {
do {
buf[len] = path[len];
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] checkout: most of the time we have good leading directories
2013-11-08 0:30 [PATCH] checkout: most of the time we have good leading directories Junio C Hamano
@ 2013-11-09 14:24 ` Thomas Rast
2013-11-09 20:09 ` Junio C Hamano
0 siblings, 1 reply; 3+ messages in thread
From: Thomas Rast @ 2013-11-09 14:24 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Junio C Hamano <gitster@pobox.com> writes:
> When "git checkout" wants to create a path, e.g. a/b/c/d/e, after
> seeing if the entire thing already exists (in which case we check if
> that is up-to-date and do not bother to check it out, or we unlink
> and recreate it), we validate that the leading directory path is
> without funny symlinks by seeing a/, a/b/, a/b/c/ and then a/b/c/d/
> are all without funny symlinks, by calling has_dirs_only_path() in
> this order.
>
> When we are checking out many files (imagine: initial checkout),
> however, it is likely that an earlier checkout would have already
> made sure that the leading directory a/b/c/d/ is in good order; by
> first checking the whole path a/b/c/d/ first, we can often bypass
> calls to has_dirs_only_path() for leading part.
Naively one would think that this is just as much work -- to correctly
verify that the path consist only of actual directories (not symlinks)
we have to lstat() every component regardless. It seems the reason this
is an optimization is that has_dirs_only_path() caches its results, so
that we can get 'a/b/c/d/ is okay in every component' from the cache.
Is this analysis correct? If so, can you spell that out in the commit
message?
--
Thomas Rast
tr@thomasrast.ch
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] checkout: most of the time we have good leading directories
2013-11-09 14:24 ` Thomas Rast
@ 2013-11-09 20:09 ` Junio C Hamano
0 siblings, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2013-11-09 20:09 UTC (permalink / raw)
To: Thomas Rast; +Cc: git
Thomas Rast <tr@thomasrast.ch> writes:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> When "git checkout" wants to create a path, e.g. a/b/c/d/e, after
>> seeing if the entire thing already exists (in which case we check if
>> that is up-to-date and do not bother to check it out, or we unlink
>> and recreate it), we validate that the leading directory path is
>> without funny symlinks by seeing a/, a/b/, a/b/c/ and then a/b/c/d/
>> are all without funny symlinks, by calling has_dirs_only_path() in
>> this order.
>>
>> When we are checking out many files (imagine: initial checkout),
>> however, it is likely that an earlier checkout would have already
>> made sure that the leading directory a/b/c/d/ is in good order; by
>> first checking the whole path a/b/c/d/ first, we can often bypass
>> calls to has_dirs_only_path() for leading part.
>
> Naively one would think that this is just as much work -- to correctly
> verify that the path consist only of actual directories (not symlinks)
> we have to lstat() every component regardless. It seems the reason this
> is an optimization is that has_dirs_only_path() caches its results, so
> that we can get 'a/b/c/d/ is okay in every component' from the cache.
>
> Is this analysis correct? If so, can you spell that out in the commit
> message?
It was done without analysis ;-) but I think you are correct.
If you are checking out a/b/c/d/{m,a,n,y}, after you checked out
a/b/c/d/m, the has_dirs_only_path cache knows a/b/c/d/ is in good
order so when you check out a/b/c/d/{a,n,y}, we can just ask for
a/b/c/d/ and get an OK immediately. There is no point asking from
a/, a/b/, a/b/c/ and then a/b/c/d/, in the original pessimistic
order. A change done _right_ to properly optimize this might even
want to change the main loop that the patch bypassed.
I do not think the patch (or the "change done right" for that
matter) will make much difference on a platform with good filesystem
metadata caching. It may be very interesting to see if that simple
patch makes any difference on Windows, though. If it does, then we
may want to look into cleaning up the code further.
Thanks for a comment.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-11-09 20:09 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-08 0:30 [PATCH] checkout: most of the time we have good leading directories Junio C Hamano
2013-11-09 14:24 ` Thomas Rast
2013-11-09 20:09 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).