From: Junio C Hamano <gitster@pobox.com>
To: Thomas Rast <tr@thomasrast.ch>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] checkout: most of the time we have good leading directories
Date: Sat, 09 Nov 2013 12:09:05 -0800 [thread overview]
Message-ID: <xmqqsiv5mj6m.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <87iow1ps9t.fsf@linux-k42r.v.cablecom.net> (Thomas Rast's message of "Sat, 09 Nov 2013 15:24:30 +0100")
Thomas Rast <tr@thomasrast.ch> writes:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> When "git checkout" wants to create a path, e.g. a/b/c/d/e, after
>> seeing if the entire thing already exists (in which case we check if
>> that is up-to-date and do not bother to check it out, or we unlink
>> and recreate it), we validate that the leading directory path is
>> without funny symlinks by seeing a/, a/b/, a/b/c/ and then a/b/c/d/
>> are all without funny symlinks, by calling has_dirs_only_path() in
>> this order.
>>
>> When we are checking out many files (imagine: initial checkout),
>> however, it is likely that an earlier checkout would have already
>> made sure that the leading directory a/b/c/d/ is in good order; by
>> first checking the whole path a/b/c/d/ first, we can often bypass
>> calls to has_dirs_only_path() for leading part.
>
> Naively one would think that this is just as much work -- to correctly
> verify that the path consist only of actual directories (not symlinks)
> we have to lstat() every component regardless. It seems the reason this
> is an optimization is that has_dirs_only_path() caches its results, so
> that we can get 'a/b/c/d/ is okay in every component' from the cache.
>
> Is this analysis correct? If so, can you spell that out in the commit
> message?
It was done without analysis ;-) but I think you are correct.
If you are checking out a/b/c/d/{m,a,n,y}, after you checked out
a/b/c/d/m, the has_dirs_only_path cache knows a/b/c/d/ is in good
order so when you check out a/b/c/d/{a,n,y}, we can just ask for
a/b/c/d/ and get an OK immediately. There is no point asking from
a/, a/b/, a/b/c/ and then a/b/c/d/, in the original pessimistic
order. A change done _right_ to properly optimize this might even
want to change the main loop that the patch bypassed.
I do not think the patch (or the "change done right" for that
matter) will make much difference on a platform with good filesystem
metadata caching. It may be very interesting to see if that simple
patch makes any difference on Windows, though. If it does, then we
may want to look into cleaning up the code further.
Thanks for a comment.
prev parent reply other threads:[~2013-11-09 20:09 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-08 0:30 [PATCH] checkout: most of the time we have good leading directories Junio C Hamano
2013-11-09 14:24 ` Thomas Rast
2013-11-09 20:09 ` Junio C Hamano [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqsiv5mj6m.fsf@gitster.dls.corp.google.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=tr@thomasrast.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.