git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "René Scharfe" <rene.scharfe@lsrfire.ath.cx>
To: Kjetil Barvik <barvik@broadpark.no>
Cc: git@vger.kernel.org, Pete Harlan <pgit@pcharlan.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH/RFC v5 1/1] more cache effective symlink/directory detection
Date: Sat, 10 Jan 2009 11:11:12 +0100	[thread overview]
Message-ID: <49687440.5090506@lsrfire.ath.cx> (raw)
In-Reply-To: <1231527954-868-2-git-send-email-barvik@broadpark.no>

Kjetil Barvik schrieb:
> - Also introduce a 'void clear_lstat_cache(void)' function, which
>   should be used to clean the cache before usage.  If for instance,
>   you have changed the types of directories which should be cached,
>   the cache could contain a path which was not wanted.

Is it possible to make the cache detect these situations automatically
by saving track_flags along with the cache contents?  Not having to
clear the cache manually would be a major feature.

> --- a/cache.h
> +++ b/cache.h
> @@ -719,7 +719,29 @@ struct checkout {
>  };
>  
>  extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
> -extern int has_symlink_leading_path(int len, const char *name);
> +
> +#define LSTAT_REG      (1u << 0)
> +#define LSTAT_DIR      (1u << 1)
> +#define LSTAT_NOENT    (1u << 2)
> +#define LSTAT_SYMLINK  (1u << 3)
> +#define LSTAT_LSTATERR (1u << 4)
> +#define LSTAT_ERR      (1u << 5)
> +#define LSTAT_FULLPATH (1u << 6)
> +extern unsigned int lstat_cache(int len, const char *name,
> +				unsigned int track_flags, int prefix_len_stat_func);
> +extern void clear_lstat_cache(void);
> +static inline unsigned int has_symlink_leading_path(int len, const char *name)
> +{
> +	return lstat_cache(len, name, LSTAT_SYMLINK|LSTAT_DIR, -1) &
> +		LSTAT_SYMLINK;
> +}
> +#define clear_symlink_cache() clear_lstat_cache()
> +static inline unsigned int has_symlink_or_noent_leading_path(int len, const char *name)
> +{
> +	return lstat_cache(len, name, LSTAT_SYMLINK|LSTAT_NOENT|LSTAT_DIR, -1) &
> +		(LSTAT_SYMLINK|LSTAT_NOENT);
> +}
> +#define clear_symlink_or_noent_cache() clear_lstat_cache()

What's the advantage of inlining the wrappers (expressed in units of
space and/or time)?  The interface would be much nicer if you exported
the wrappers, only, and not all those constants along with them.

And why define aliases for clear_lstat_cache()?

> diff --git a/entry.c b/entry.c
> index aa2ee46a84033585d8e07a585610c5a697af82c2..293400cf5be63fd66b797a68e17bf953c600fe99 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -8,35 +8,28 @@ static void create_directories(const char *path, const struct checkout *state)
>  	const char *slash = path;
>  
>  	while ((slash = strchr(slash+1, '/')) != NULL) {
> -		struct stat st;
> -		int stat_status;
> -
>  		len = slash - path;
>  		memcpy(buf, path, len);
>  		buf[len] = 0;
>  
> -		if (len <= state->base_dir_len)
> -			/*
> -			 * checkout-index --prefix=<dir>; <dir> is
> -			 * allowed to be a symlink to an existing
> -			 * directory.
> -			 */
> -			stat_status = stat(buf, &st);
> -		else
> -			/*
> -			 * if there currently is a symlink, we would
> -			 * want to replace it with a real directory.
> -			 */
> -			stat_status = lstat(buf, &st);
> -
> -		if (!stat_status && S_ISDIR(st.st_mode))
> +		/* For 'checkout-index --prefix=<dir>', <dir> is
> +		 * allowed to be a symlink to an existing directory,
> +		 * therefore we must give 'state->base_dir_len' to the
> +		 * cache, such that we test path components of the
> +		 * prefix with stat() instead of lstat()
> +		 *
> +		 * We must also tell the cache to test the complete
> +		 * length of the buffer (the '|LSTAT_FULLPATH' part).
> +		 */
> +		if (lstat_cache(len, buf, LSTAT_DIR|LSTAT_FULLPATH,
> +				state->base_dir_len) &
> +		    LSTAT_DIR)
>  			continue; /* ok, it is already a directory. */

I'd say this usage is worth another wrapper.

Also, it's probably worth to split this patch up again.  First switching
to your improved implementation of has_symlink_leading_path(), then
introducing has_symlink_or_noent_leading_path() and finally adding
LSTAT_FULLPATH and the fourth parameter of lstat_cache() etc. and using
this feature in entry.c seems like a nice incremental progression.

René

  reply	other threads:[~2009-01-10 10:12 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-09 19:05 [PATCH/RFC v5 0/1] git checkout: optimise away lots of lstat() calls Kjetil Barvik
2009-01-09 19:05 ` [PATCH/RFC v5 1/1] more cache effective symlink/directory detection Kjetil Barvik
2009-01-10 10:11   ` René Scharfe [this message]
2009-01-11  0:48     ` Junio C Hamano
2009-01-11  8:26       ` Kjetil Barvik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49687440.5090506@lsrfire.ath.cx \
    --to=rene.scharfe@lsrfire.ath.cx \
    --cc=barvik@broadpark.no \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pgit@pcharlan.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).