git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: David Turner <dturner@twopensource.com>,
	Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org
Cc: David Turner <dturner@twitter.com>
Subject: Re: [PATCH v4 1/2] refs.c: optimize check_refname_component()
Date: Sun, 01 Jun 2014 22:50:27 +0200	[thread overview]
Message-ID: <538B9213.2020205@alum.mit.edu> (raw)
In-Reply-To: <1401599865-14117-1-git-send-email-dturner@twitter.com>

Thanks for splitting this up into two patches.  See my comments below.

On 06/01/2014 07:17 AM, David Turner wrote:
> In a repository with many refs, check_refname_component can be a major
> contributor to the runtime of some git commands. One such command is
> 
> git rev-parse HEAD
> 
> Timings for one particular repo, with about 60k refs, almost all
> packed, are:
> 
> Old: 35 ms
> New: 29 ms
> 
> Many other commands which read refs are also sped up.
> 
> Signed-off-by: David Turner <dturner@twitter.com>
> ---
>  refs.c | 68 ++++++++++++++++++++++++++++++++++++++++--------------------------
>  1 file changed, 41 insertions(+), 27 deletions(-)
> 
> diff --git a/refs.c b/refs.c
> index 28d5eca..62e2301 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -5,9 +5,32 @@
>  #include "dir.h"
>  #include "string-list.h"
>  
> +/* How to handle various characters in refnames:

We format multiline comments with no text on the opening line:

/*
 * How to handle...
 * ... is derived.
 */

> + * 0: An acceptable character for refs
> + * 1: End-of-component
> + * 2: ., look for a following . to reject .. in refs
> + * 3: @, look for a following { to reject @{ in refs
> + * 9: A bad character, reject ref

This explanation does not agree with the code (or I'm not reading it
correctly).  For example, the character with disposition 3 is '{', not
'@', so ISTM the comment should be

     * 3: {, look for a preceding @ to reject @{ in refnames

BTW, is there a special reason that the values in your table jump from 3
to 9?  I imagine that compilers would use a jump table to implement the
"switch" statement where these values are used, and they might have a
slightly easier time if there are no "holes" between the legal values.

> + *
> + * See below for the list of illegal characters, from which
> + * this table is derived.
> + */
> +static unsigned char refname_disposition[] = {

I think you need to define the length explicitly to 256 here to cause
the entries for 0x80..0xff to be created and initialized to zeros.

The fact that you didn't notice this bug suggests that there might be no
tests involving refnames with non-ASCII characters, which would be
another nice thing to remedy while you are working in this area.

> +	1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
> +	9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
> +	9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 2, 1,
> +	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 9,
> +	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> +	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 9, 0, 9, 0,
> +	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> +	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 9, 9
> +};
> +
>  /*
> - * Make sure "ref" is something reasonable to have under ".git/refs/";
> - * We do not like it if:
> + * Try to read one refname component from the front of refname.
> + * Return the length of the component found, or -1 if the component is
> + * not legal.  It is legal if it is something reasonable to have under
> + * ".git/refs/"; We do not like it if:
>   *
>   * - any path component of it begins with ".", or
>   * - it has double dots "..", or
> @@ -15,24 +38,7 @@
>   * - it ends with a "/".
>   * - it ends with ".lock"
>   * - it contains a "\" (backslash)
> - */
>  

^^^ doesn't this leave a blank line inside the comment?

> -/* Return true iff ch is not allowed in reference names. */
> -static inline int bad_ref_char(int ch)
> -{
> -	if (((unsigned) ch) <= ' ' || ch == 0x7f ||
> -	    ch == '~' || ch == '^' || ch == ':' || ch == '\\')
> -		return 1;
> -	/* 2.13 Pattern Matching Notation */
> -	if (ch == '*' || ch == '?' || ch == '[') /* Unsupported */
> -		return 1;
> -	return 0;
> -}
> -
> -/*
> - * Try to read one refname component from the front of refname.  Return
> - * the length of the component found, or -1 if the component is not
> - * legal.
>   */
> [...]

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

      parent reply	other threads:[~2014-06-01 20:50 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-01  5:17 [PATCH v4 1/2] refs.c: optimize check_refname_component() David Turner
2014-06-01  5:17 ` [PATCH v4 2/2] refs.c: SSE4.2 optimizations for check_refname_component David Turner
2014-06-01  7:17 ` [PATCH v4 1/2] refs.c: optimize check_refname_component() Andreas Schwab
2014-06-01 19:43 ` Philip Oakley
2014-06-01 20:50 ` Michael Haggerty [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=538B9213.2020205@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=dturner@twitter.com \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).