From: Junio C Hamano <gitster@pobox.com>
To: Kjetil Barvik <barvik@broadpark.no>
Cc: git@vger.kernel.org, Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH/RFC 1/4] Optimised, faster, more effective symlink/directory detection
Date: Tue, 06 Jan 2009 00:19:26 -0800 [thread overview]
Message-ID: <7vprj0j181.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: 1231161001-32599-2-git-send-email-barvik@broadpark.no
Kjetil Barvik <barvik@broadpark.no> writes:
> +static inline void
> +update_path_cache(unsigned int ret_flags, unsigned int track_flags,
> + int last_slash)
> +{
> + /* Max 3 different path types can be cached for the moment! */
> + unsigned int save_flags =
> + ret_flags & track_flags & (LSTAT_DIR|
> + LSTAT_NOTDIR|
> + LSTAT_SYMLINK);
> + if (save_flags && last_slash > 0 && last_slash < PATH_MAX) {
> + cache_flags = save_flags;
> + cache_len = last_slash;
> + } else {
> + cache_flags = 0;
> + cache_len = 0;
> + }
> +}
I personally found this inline function with a single call site
distracting in following the logic. It does not make the indentation
level shallower, either. Also, the else part should probably call
clear_lstat_cache() to protect it from possible future enhancements to add
more state variables.
> +
> +/*
> + * Check if name 'name' of length 'len' has a symlink leading
> + * component, or if the directory exists and is real, or not.
> + *
> + * To speed up the check, some information is allowed to be cached.
> + * This is indicated by the 'track_flags' argument.
> + */
> +unsigned int
> +check_lstat_cache(int len, const char *name, unsigned int track_flags)
> +{
> + int match_len, last_slash, max_len;
> + unsigned int match_flags, ret_flags;
> + struct stat st;
> +
> + /* Check if match from the cache for 2 "excluding" path types.
> + */
> + match_len = last_slash =
> + greatest_common_path_cache_prefix(len, name);
> + match_flags =
> + cache_flags & track_flags & (LSTAT_NOTDIR|
> + LSTAT_SYMLINK);
> + if (match_flags && match_len == cache_len)
> + return match_flags;
Let me see if I understand the logic behind this caching. When you have
checked A/B/C earlier and you already know B is a symlink, you remember
that A/B was a symlink.. You can fill a request to check A/B/$whatever
(assuming A/B does not change --- otherwise the caller should clear the
cache) from the cached data, because no matter what $whatever is, it will
result in the same "has-leading-symlink".
Similarly, if you know A/B is not a directory from an earlier test, you
know that a request to check A/B/$whatever will result in the same ENOTDIR
no matter what $whatever is, so you can return early.
The above "return match_flags" will not trigger if the cached path does
not have any leading symlink. So we know the matched part are all good
directories when we start lstat() loop.
Am I following you so far?
> + /* Okay, no match from the cache so far, so now we have to
> + * check the rest of the path components and update the cache.
> + */
> + ret_flags = LSTAT_DIR;
> + max_len = len < PATH_MAX ? len : PATH_MAX;
> + while (match_len < max_len) {
> + do {
> + cache_path[match_len] = name[match_len];
> + match_len++;
> + } while (match_len < max_len && name[match_len] != '/');
You take one component from the input, and append it to the part that is
already known to be true directory (i.e. cached part and the part earlier
iteration of the loop checked so far), to be tested by lstat()...
> + if (match_len >= max_len)
> + break;
... but you are not interested in the full input. We are only checking
the leading path (e.g. check for "A/B/C" may lstat() "A", "A/B" but not
"A/B/C").
> + last_slash = match_len;
> + cache_path[last_slash] = '\0';
> +
> + if (lstat(cache_path, &st)) {
> + ret_flags = LSTAT_LSTATERR;
> + if (errno == ENOENT || errno == ENOTDIR)
> + ret_flags |= LSTAT_NOTDIR;
If you tested "A/B" here and got ENOENT back, you know "A/B" does not
exist; you cache this knowledge as "A/B is not a directory" (I also think
you could use it as a cached knowledge that "A exists and is a directory".
I am not sure if you are taking advantage of that).
What I do not understand about this code is the ENOTDIR case. If you
tested "A/B" and got ENOTDIR back, what you know is that "A" is not a
directory (if the path tested this round were deeper like "X/Y/A/B", you
know "X/Y/A" is not a directory, and you know "X" and "X/Y" are true
directories; otherwise the loop would have exited before this round when
you tested "X" or "X/Y" in the earlier rounds).
So as far as I can think of, ENOENT case and ENOTDIR case would give you
different information (ENOENT would say "A is a dir, A/B is not"; ENOTDIR
would say "A is not a dir"). I am confused how you can cache the same
path and same flag between these two cases here.
> + } else if (S_ISDIR(st.st_mode)) {
> + continue;
> + } else if (S_ISLNK(st.st_mode)) {
> + ret_flags = LSTAT_SYMLINK;
> + } else {
> + ret_flags = LSTAT_ERR;
> + }
> + break;
> + }
> + update_path_cache(ret_flags, track_flags, last_slash);
> + return ret_flags;
> +}
> +
> +/*
> + * Before usage of the check_lstat_cache() function one should call
> + * clear_lstat_cache() (at an appropriate place) to make sure that the
> + * cache is clean before first call to check_lstat_cache().
> + */
> +void clear_lstat_cache(void)
> +{
> + cache_flags = 0;
> + cache_len = 0;
> +}
> --
> 1.6.1.rc1.49.g7f705
next prev parent reply other threads:[~2009-01-06 8:21 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-05 13:09 [PATCH/RFC 0/4] git checkout: optimise away lots of lstat() calls Kjetil Barvik
2009-01-05 13:09 ` [PATCH/RFC 1/4] Optimised, faster, more effective symlink/directory detection Kjetil Barvik
2009-01-06 8:19 ` Junio C Hamano [this message]
2009-01-06 8:56 ` Junio C Hamano
2009-01-06 9:56 ` Kjetil Barvik
2009-01-06 13:45 ` Kjetil Barvik
2009-01-06 9:36 ` Kjetil Barvik
2009-01-05 13:09 ` [PATCH/RFC 2/4] Use 'lstat_cache()' instead of 'has_symlink_leading_path()' Kjetil Barvik
2009-01-06 8:19 ` Junio C Hamano
2009-01-06 12:50 ` Kjetil Barvik
2009-01-05 13:10 ` [PATCH/RFC 3/4] create_directories() inside entry.c: only check each directory once! Kjetil Barvik
2009-01-05 13:10 ` [PATCH/RFC 4/4] remove the old 'has_symlink_leading_path()' function Kjetil Barvik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vprj0j181.fsf@gitster.siamese.dyndns.org \
--to=gitster@pobox.com \
--cc=barvik@broadpark.no \
--cc=git@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).