git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] add strnncmp() function
@ 2014-06-17  7:34 Jeremiah Mahler
  2014-06-17  7:34 ` [PATCH v2 1/3] " Jeremiah Mahler
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Jeremiah Mahler @ 2014-06-17  7:34 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Jeremiah Mahler

Add a strnncmp() function which behaves like strncmp() except it takes
the length of both strings instead of just one.

Then simplify tree-walk.c and unpack-trees.c using this new function.
Replace all occurrences of name_compare() with strnncmp().  Remove
name_compare(), which they both had identical copies of.

Version 2 includes suggestions from Jonathan Neider [1]:

  - Fix the logic which caused the new strnncmp() to behave differently
	from the old version.  Now it is identical to strncmp().

  - Improve description of strnncmp().

Also, strnncmp() was switched from using memcmp() to strncmp()
internally to make it clear that this is meant for strings, not
general buffers.

[1]: http://marc.info/?l=git&m=140294981320743&w=2

Jeremiah Mahler (3):
  add strnncmp() function
  tree-walk: simplify via strnncmp()
  unpack-trees: simplify via strnncmp()

 strbuf.c       |  9 +++++++++
 strbuf.h       |  2 ++
 tree-walk.c    | 16 +++-------------
 unpack-trees.c | 13 +------------
 4 files changed, 15 insertions(+), 25 deletions(-)

-- 
2.0.0.695.g38ee9a9

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/3] add strnncmp() function
  2014-06-17  7:34 [PATCH v2 0/3] add strnncmp() function Jeremiah Mahler
@ 2014-06-17  7:34 ` Jeremiah Mahler
  2014-06-17  8:23   ` Torsten Bögershausen
                     ` (2 more replies)
  2014-06-17  7:34 ` [PATCH v2 2/3] tree-walk: simplify via strnncmp() Jeremiah Mahler
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 15+ messages in thread
From: Jeremiah Mahler @ 2014-06-17  7:34 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Jeremiah Mahler

Add a strnncmp() function which behaves like strncmp() except it takes
the length of both strings instead of just one.  It behaves the same as
strncmp() up to the minimum common length between the strings.  When the
strings are identical up to this minimum common length, the length
difference is returned.

Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
---
 strbuf.c | 9 +++++++++
 strbuf.h | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/strbuf.c b/strbuf.c
index ac62982..4eb7954 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -600,3 +600,12 @@ char *xstrdup_tolower(const char *string)
 	result[i] = '\0';
 	return result;
 }
+
+int strnncmp(const char *a, int len_a, const char *b, int len_b)
+{
+	int min_len = (len_a < len_b) ? len_a : len_b;
+	int cmp = strncmp(a, b, min_len);
+	if (cmp)
+		return cmp;
+	return (len_a - len_b);
+}
diff --git a/strbuf.h b/strbuf.h
index e9ad03e..88af9bf 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -187,4 +187,6 @@ extern int fprintf_ln(FILE *fp, const char *fmt, ...);
 
 char *xstrdup_tolower(const char *);
 
+extern int strnncmp(const char *a, int len_a, const char *b, int len_b);
+
 #endif /* STRBUF_H */
-- 
2.0.0.695.g38ee9a9

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/3] tree-walk: simplify via strnncmp()
  2014-06-17  7:34 [PATCH v2 0/3] add strnncmp() function Jeremiah Mahler
  2014-06-17  7:34 ` [PATCH v2 1/3] " Jeremiah Mahler
@ 2014-06-17  7:34 ` Jeremiah Mahler
  2014-06-17  7:34 ` [PATCH v2 3/3] unpack-trees: " Jeremiah Mahler
  2014-06-17 11:08 ` [PATCH v2 0/3] add strnncmp() function Torsten Bögershausen
  3 siblings, 0 replies; 15+ messages in thread
From: Jeremiah Mahler @ 2014-06-17  7:34 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Jeremiah Mahler

Simplify tree-walk.c using the strnncmp() function and remove the
name_compare() function.

Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
---
 tree-walk.c | 16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index 4dc86c7..efbd3b7 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -144,16 +144,6 @@ struct tree_desc_x {
 	struct tree_desc_skip *skip;
 };
 
-static int name_compare(const char *a, int a_len,
-			const char *b, int b_len)
-{
-	int len = (a_len < b_len) ? a_len : b_len;
-	int cmp = memcmp(a, b, len);
-	if (cmp)
-		return cmp;
-	return (a_len - b_len);
-}
-
 static int check_entry_match(const char *a, int a_len, const char *b, int b_len)
 {
 	/*
@@ -174,7 +164,7 @@ static int check_entry_match(const char *a, int a_len, const char *b, int b_len)
 	 * scanning further.
 	 */
 
-	int cmp = name_compare(a, a_len, b, b_len);
+	int cmp = strnncmp(a, a_len, b, b_len);
 
 	/* Most common case first -- reading sync'd trees */
 	if (!cmp)
@@ -369,7 +359,7 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 				first_len = len;
 				continue;
 			}
-			if (name_compare(e->path, len, first, first_len) < 0) {
+			if (strnncmp(e->path, len, first, first_len) < 0) {
 				first = e->path;
 				first_len = len;
 			}
@@ -383,7 +373,7 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 				if (!e->path)
 					continue;
 				len = tree_entry_len(e);
-				if (name_compare(e->path, len, first, first_len))
+				if (strnncmp(e->path, len, first, first_len))
 					entry_clear(e);
 			}
 		}
-- 
2.0.0.695.g38ee9a9

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/3] unpack-trees: simplify via strnncmp()
  2014-06-17  7:34 [PATCH v2 0/3] add strnncmp() function Jeremiah Mahler
  2014-06-17  7:34 ` [PATCH v2 1/3] " Jeremiah Mahler
  2014-06-17  7:34 ` [PATCH v2 2/3] tree-walk: simplify via strnncmp() Jeremiah Mahler
@ 2014-06-17  7:34 ` Jeremiah Mahler
  2014-06-17 11:08 ` [PATCH v2 0/3] add strnncmp() function Torsten Bögershausen
  3 siblings, 0 replies; 15+ messages in thread
From: Jeremiah Mahler @ 2014-06-17  7:34 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Jeremiah Mahler

Simplify unpack-trees.c using the strnncmp() function and remove the
name_compare() function.

Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
---
 unpack-trees.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 4a9cdf2..9a71b5a 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -629,17 +629,6 @@ static int unpack_failed(struct unpack_trees_options *o, const char *message)
 	return -1;
 }
 
-/* NEEDSWORK: give this a better name and share with tree-walk.c */
-static int name_compare(const char *a, int a_len,
-			const char *b, int b_len)
-{
-	int len = (a_len < b_len) ? a_len : b_len;
-	int cmp = memcmp(a, b, len);
-	if (cmp)
-		return cmp;
-	return (a_len - b_len);
-}
-
 /*
  * The tree traversal is looking at name p.  If we have a matching entry,
  * return it.  If name p is a directory in the index, do not return
@@ -678,7 +667,7 @@ static int find_cache_pos(struct traverse_info *info,
 			ce_len = ce_slash - ce_name;
 		else
 			ce_len = ce_namelen(ce) - pfxlen;
-		cmp = name_compare(p->path, p_len, ce_name, ce_len);
+		cmp = strnncmp(p->path, p_len, ce_name, ce_len);
 		/*
 		 * Exact match; if we have a directory we need to
 		 * delay returning it.
-- 
2.0.0.695.g38ee9a9

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] add strnncmp() function
  2014-06-17  7:34 ` [PATCH v2 1/3] " Jeremiah Mahler
@ 2014-06-17  8:23   ` Torsten Bögershausen
  2014-06-17 15:48     ` Jeremiah Mahler
  2014-06-17  9:09   ` Erik Faye-Lund
  2014-06-17 17:55   ` Junio C Hamano
  2 siblings, 1 reply; 15+ messages in thread
From: Torsten Bögershausen @ 2014-06-17  8:23 UTC (permalink / raw)
  To: Jeremiah Mahler, Jonathan Nieder; +Cc: git

On 2014-06-17 09.34, Jeremiah Mahler wrote:
> Add a strnncmp() function which behaves like strncmp() except it takes
> the length of both strings instead of just one.  It behaves the same as
> strncmp() up to the minimum common length between the strings.  When the
minimum common length? Isn'n t that 0?
Using the word "common", I think we could call it "common length".
(And more places below)

> strings are identical up to this minimum common length, the length
> difference is returned.
> 
> Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
> ---
>  strbuf.c | 9 +++++++++
>  strbuf.h | 2 ++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/strbuf.c b/strbuf.c
> index ac62982..4eb7954 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -600,3 +600,12 @@ char *xstrdup_tolower(const char *string)
>  	result[i] = '\0';
>  	return result;
>  }
> +
strncmp uses size_t, not int:
int strncmp(const char *s1, const char *s2, size_t n);

Is there a special reason to allow negative string length?
Some call sites use int when calling strncmp() or others,
that is one thing.
But when writing a generic strnncmp() function, I think
it should use size_t, unless negative values have a meaning and
are handled in the code.


> +int strnncmp(const char *a, int len_a, const char *b, int len_b)
> +{
> +	int min_len = (len_a < len_b) ? len_a : len_b;
> +	int cmp = strncmp(a, b, min_len);

> +	if (cmp)
> +		return cmp;
> +	return (len_a - len_b);
> +}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] add strnncmp() function
  2014-06-17  7:34 ` [PATCH v2 1/3] " Jeremiah Mahler
  2014-06-17  8:23   ` Torsten Bögershausen
@ 2014-06-17  9:09   ` Erik Faye-Lund
  2014-06-17 15:49     ` Jeremiah Mahler
  2014-06-17 17:55   ` Junio C Hamano
  2 siblings, 1 reply; 15+ messages in thread
From: Erik Faye-Lund @ 2014-06-17  9:09 UTC (permalink / raw)
  To: Jeremiah Mahler; +Cc: Jonathan Nieder, GIT Mailing-list

On Tue, Jun 17, 2014 at 9:34 AM, Jeremiah Mahler <jmmahler@gmail.com> wrote:
> Add a strnncmp() function which behaves like strncmp() except it takes
> the length of both strings instead of just one.  It behaves the same as
> strncmp() up to the minimum common length between the strings.  When the
> strings are identical up to this minimum common length, the length
> difference is returned.
>
> Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
> ---
>  strbuf.c | 9 +++++++++
>  strbuf.h | 2 ++
>  2 files changed, 11 insertions(+)
>
> diff --git a/strbuf.c b/strbuf.c
> index ac62982..4eb7954 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -600,3 +600,12 @@ char *xstrdup_tolower(const char *string)
>         result[i] = '\0';
>         return result;
>  }
> +
> +int strnncmp(const char *a, int len_a, const char *b, int len_b)
> +{
> +       int min_len = (len_a < len_b) ? len_a : len_b;
> +       int cmp = strncmp(a, b, min_len);
> +       if (cmp)
> +               return cmp;
> +       return (len_a - len_b);
> +}

Using a name that sounds like it's from the stdlib makes me cringe a
little bit. Names that start with "str" reserved for stdlib[1][2], but
we already ignore this for strbuf (and perhaps some other functions).
However, in this case it doesn't seem *that* unlikely that we might
collide with some stdlib-extensions.

[1]: http://pubs.opengroup.org/onlinepubs/007904975/functions/xsh_chap02_02.html#tag_02_02_02
[2]: http://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] add strnncmp() function
  2014-06-17  7:34 [PATCH v2 0/3] add strnncmp() function Jeremiah Mahler
                   ` (2 preceding siblings ...)
  2014-06-17  7:34 ` [PATCH v2 3/3] unpack-trees: " Jeremiah Mahler
@ 2014-06-17 11:08 ` Torsten Bögershausen
  2014-06-17 15:49   ` Jeremiah Mahler
  2014-06-18 10:33   ` Ondřej Bílka
  3 siblings, 2 replies; 15+ messages in thread
From: Torsten Bögershausen @ 2014-06-17 11:08 UTC (permalink / raw)
  To: Jeremiah Mahler, Jonathan Nieder; +Cc: git

On 2014-06-17 09.34, Jeremiah Mahler wrote:
> Add a strnncmp() function which behaves like strncmp() except it takes
> the length of both strings instead of just one.
> 
> Then simplify tree-walk.c and unpack-trees.c using this new function.
> Replace all occurrences of name_compare() with strnncmp().  Remove
> name_compare(), which they both had identical copies of.
> 
> Version 2 includes suggestions from Jonathan Neider [1]:
> 
>   - Fix the logic which caused the new strnncmp() to behave differently
> 	from the old version.  Now it is identical to strncmp().
> 
>   - Improve description of strnncmp().
> 
> Also, strnncmp() was switched from using memcmp() to strncmp()
> internally to make it clear that this is meant for strings, not
> general buffers.
I don't think this is a good change, for 2 reasons:
- It changes the semantics of existing code, which should be carefully
  reviewed, documented and may be put into a seperate commit.
- Looking into the code for memcmp() and strncmp() in libc,
  I can see that memcmp() is written in 13 lines of assembler,
  (on a 386 system) with a fast
    repz cmpsb %es:(%edi),%ds:(%esi)
  working as the core engine.
  
  strncmp() uses 83 lines of assembler, because after each comparison
  the code needs to check of the '\0' in both strings.
- I can't see a reason to replace efficient code with less efficient code,
  so moving the old function "as is" into a include file, and declare
  it "static inline" could be the first step.

  Having code inline may open the door for the compiler to decide,
  "Oh, I know exactly what memcmp() does, so I through in a handfull
  of lines assembly code, instead of calling memcmp() from libc".


And another thing:
 What does cache_name_compare(name, namelen, ce->name, len))
 in name-hash.c do?
 Isn't that the same function ?

I like strnncmp() better than 
cache_name_compare() or name_compare(),
but I agree with Erik here that strnncmp() has the potential to
become a name clash some day, so that git_strnncmp() may be better.

Thanks for the effort, cleaning up is needed.


  

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] add strnncmp() function
  2014-06-17  8:23   ` Torsten Bögershausen
@ 2014-06-17 15:48     ` Jeremiah Mahler
  0 siblings, 0 replies; 15+ messages in thread
From: Jeremiah Mahler @ 2014-06-17 15:48 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git

Torsten,

On Tue, Jun 17, 2014 at 10:23:14AM +0200, Torsten Bögershausen wrote:
> On 2014-06-17 09.34, Jeremiah Mahler wrote:
> > Add a strnncmp() function which behaves like strncmp() except it takes
> > the length of both strings instead of just one.  It behaves the same as
> > strncmp() up to the minimum common length between the strings.  When the
> minimum common length? Isn'n t that 0?
> Using the word "common", I think we could call it "common length".
> (And more places below)
> 
Yes, "minimum" doesn't make sense.  "common length" sounds better.

> > strings are identical up to this minimum common length, the length
> > difference is returned.
> > 
> > Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
> > ---
> >  strbuf.c | 9 +++++++++
> >  strbuf.h | 2 ++
> >  2 files changed, 11 insertions(+)
> > 
> > diff --git a/strbuf.c b/strbuf.c
> > index ac62982..4eb7954 100644
> > --- a/strbuf.c
> > +++ b/strbuf.c
> > @@ -600,3 +600,12 @@ char *xstrdup_tolower(const char *string)
> >  	result[i] = '\0';
> >  	return result;
> >  }
> > +
> strncmp uses size_t, not int:
> int strncmp(const char *s1, const char *s2, size_t n);
> 
> Is there a special reason to allow negative string length?
> Some call sites use int when calling strncmp() or others,
> that is one thing.
> But when writing a generic strnncmp() function, I think
> it should use size_t, unless negative values have a meaning and
> are handled in the code.
> 
Don't need negatives, size_t is more appropriate.  Fixed.

> 
> > +int strnncmp(const char *a, int len_a, const char *b, int len_b)
> > +{
> > +	int min_len = (len_a < len_b) ? len_a : len_b;
> > +	int cmp = strncmp(a, b, min_len);
> 
> > +	if (cmp)
> > +		return cmp;
> > +	return (len_a - len_b);
> > +}

Thanks,
-- 
Jeremiah Mahler
jmmahler@gmail.com
http://github.com/jmahler

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] add strnncmp() function
  2014-06-17  9:09   ` Erik Faye-Lund
@ 2014-06-17 15:49     ` Jeremiah Mahler
  0 siblings, 0 replies; 15+ messages in thread
From: Jeremiah Mahler @ 2014-06-17 15:49 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: git

Erik,

On Tue, Jun 17, 2014 at 11:09:59AM +0200, Erik Faye-Lund wrote:
> On Tue, Jun 17, 2014 at 9:34 AM, Jeremiah Mahler <jmmahler@gmail.com> wrote:
> > Add a strnncmp() function which behaves like strncmp() except it takes
> > the length of both strings instead of just one.  It behaves the same as
> > strncmp() up to the minimum common length between the strings.  When the
> > strings are identical up to this minimum common length, the length
> > difference is returned.
> >
> > Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
> > ---
> >  strbuf.c | 9 +++++++++
> >  strbuf.h | 2 ++
> >  2 files changed, 11 insertions(+)
> >
> > diff --git a/strbuf.c b/strbuf.c
> > index ac62982..4eb7954 100644
> > --- a/strbuf.c
> > +++ b/strbuf.c
> > @@ -600,3 +600,12 @@ char *xstrdup_tolower(const char *string)
> >         result[i] = '\0';
> >         return result;
> >  }
> > +
> > +int strnncmp(const char *a, int len_a, const char *b, int len_b)
> > +{
> > +       int min_len = (len_a < len_b) ? len_a : len_b;
> > +       int cmp = strncmp(a, b, min_len);
> > +       if (cmp)
> > +               return cmp;
> > +       return (len_a - len_b);
> > +}
> 
> Using a name that sounds like it's from the stdlib makes me cringe a
> little bit. Names that start with "str" reserved for stdlib[1][2], but
> we already ignore this for strbuf (and perhaps some other functions).
> However, in this case it doesn't seem *that* unlikely that we might
> collide with some stdlib-extensions.
> 
> [1]: http://pubs.opengroup.org/onlinepubs/007904975/functions/xsh_chap02_02.html#tag_02_02_02
> [2]: http://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html

I chose strnncmp() to try and emphasize its similarity to strncmp().
But you have a good point about potential name conflicts.  That could be
a problem.  I will change the name.

-- 
Jeremiah Mahler
jmmahler@gmail.com
http://github.com/jmahler

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] add strnncmp() function
  2014-06-17 11:08 ` [PATCH v2 0/3] add strnncmp() function Torsten Bögershausen
@ 2014-06-17 15:49   ` Jeremiah Mahler
  2014-06-17 17:48     ` Jonathan Nieder
  2014-06-18 10:33   ` Ondřej Bílka
  1 sibling, 1 reply; 15+ messages in thread
From: Jeremiah Mahler @ 2014-06-17 15:49 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git

Torsten,

On Tue, Jun 17, 2014 at 01:08:05PM +0200, Torsten Bögershausen wrote:
> On 2014-06-17 09.34, Jeremiah Mahler wrote:
> > Add a strnncmp() function which behaves like strncmp() except it takes
> > the length of both strings instead of just one.
> > 
> > Then simplify tree-walk.c and unpack-trees.c using this new function.
> > Replace all occurrences of name_compare() with strnncmp().  Remove
> > name_compare(), which they both had identical copies of.
> > 
> > Version 2 includes suggestions from Jonathan Neider [1]:
> > 
> >   - Fix the logic which caused the new strnncmp() to behave differently
> > 	from the old version.  Now it is identical to strncmp().
> > 
> >   - Improve description of strnncmp().
> > 
> > Also, strnncmp() was switched from using memcmp() to strncmp()
> > internally to make it clear that this is meant for strings, not
> > general buffers.
> I don't think this is a good change, for 2 reasons:
> - It changes the semantics of existing code, which should be carefully
>   reviewed, documented and may be put into a seperate commit.
> - Looking into the code for memcmp() and strncmp() in libc,
>   I can see that memcmp() is written in 13 lines of assembler,
>   (on a 386 system) with a fast
>     repz cmpsb %es:(%edi),%ds:(%esi)
>   working as the core engine.
>   
>   strncmp() uses 83 lines of assembler, because after each comparison
>   the code needs to check of the '\0' in both strings.
> - I can't see a reason to replace efficient code with less efficient code,
>   so moving the old function "as is" into a include file, and declare
>   it "static inline" could be the first step.
> 
>   Having code inline may open the door for the compiler to decide,
>   "Oh, I know exactly what memcmp() does, so I through in a handfull
>   of lines assembly code, instead of calling memcmp() from libc".
> 
Thanks for explaining the benefits of memcmp() over strcmp(), I will
switch it back.

The only case I can imagine where it would make a difference is when
there is a '\0' in the middle of the string.  But that would be an
unlikely case since it probably meant the lengths were mis-calculated.

> 
> And another thing:
>  What does cache_name_compare(name, namelen, ce->name, len))
>  in name-hash.c do?
>  Isn't that the same function ?
> 
cache_name_compare() is the same except it returns -1, +1 instead of -N,
+N.  However, none of the cases where name_compare() is used need the
magnitude so this function could be used.

> I like strnncmp() better than 
> cache_name_compare() or name_compare(),
> but I agree with Erik here that strnncmp() has the potential to
> become a name clash some day, so that git_strnncmp() may be better.
> 
Agreed.

> Thanks for the effort, cleaning up is needed.
> 

Thanks for the feedback :-)

-- 
Jeremiah Mahler
jmmahler@gmail.com
http://github.com/jmahler

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] add strnncmp() function
  2014-06-17 15:49   ` Jeremiah Mahler
@ 2014-06-17 17:48     ` Jonathan Nieder
  2014-06-17 19:09       ` Jeremiah Mahler
  0 siblings, 1 reply; 15+ messages in thread
From: Jonathan Nieder @ 2014-06-17 17:48 UTC (permalink / raw)
  To: Jeremiah Mahler; +Cc: git, Torsten Bögershausen

>> On 2014-06-17 09.34, Jeremiah Mahler wrote:

>>> Also, strnncmp() was switched from using memcmp() to strncmp()
>>> internally to make it clear that this is meant for strings, not
>>> general buffers.

Why shouldn't I want to use this helper on arbitrary data?  One of the
advantages of other helpers in git that take a pointer and a length
(e.g., the strbuf library) are that they are 8-bit clean and can work
on binary data when it's useful.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] add strnncmp() function
  2014-06-17  7:34 ` [PATCH v2 1/3] " Jeremiah Mahler
  2014-06-17  8:23   ` Torsten Bögershausen
  2014-06-17  9:09   ` Erik Faye-Lund
@ 2014-06-17 17:55   ` Junio C Hamano
  2014-06-17 19:27     ` Jeremiah Mahler
  2 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2014-06-17 17:55 UTC (permalink / raw)
  To: Jeremiah Mahler; +Cc: Jonathan Nieder, git

Jeremiah Mahler <jmmahler@gmail.com> writes:

> Add a strnncmp() function which behaves like strncmp() except it takes
> the length of both strings instead of just one.  It behaves the same as
> strncmp() up to the minimum common length between the strings.  When the
> strings are identical up to this minimum common length, the length
> difference is returned.
>
> Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
> ---
>  strbuf.c | 9 +++++++++
>  strbuf.h | 2 ++
>  2 files changed, 11 insertions(+)
>
> diff --git a/strbuf.c b/strbuf.c
> index ac62982..4eb7954 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -600,3 +600,12 @@ char *xstrdup_tolower(const char *string)
>  	result[i] = '\0';
>  	return result;
>  }
> +
> +int strnncmp(const char *a, int len_a, const char *b, int len_b)
> +{
> +	int min_len = (len_a < len_b) ? len_a : len_b;
> +	int cmp = strncmp(a, b, min_len);
> +	if (cmp)
> +		return cmp;
> +	return (len_a - len_b);
> +}

I am not sure if the interface into this function conceptually makes
much sense.  strncmp(entry, string, 14) was invented as the way to
see if a NUL terminated "string" matches with the contents in an
array of char "entry" that is up to 14 bytes long, and because the
"entry" was allowed to fill full 14-byte space without terminated
with a NUL, the maximum possible length is specified separately, but
a NUL termination in "entry", if exists, is still honored.  Is there
any case where such a pair of "maximum N bytes but could be shorter"
strings are compared, especially with different N's defined per
string, in our codebase (or in other people's project for that
matter)?

Further, I do think that the interface into this function and its
implementation are inappropriate for implementing the name_compare()
function in tree-walk.c and unpack-trees.c.  These functions are
designed to take counted strings; in a tuple <a, a_len> they take,
"a_len" is the only thing that determines the length of string "a".
There is no room for a NUL termination inside "a" come into play to
make "a" shorter than "a_len".

In other words, "Two NUL-terminated strings can be compared with
strcmp(a, b), but we use counted strings in many places in our
codebase, and compare_counted_strings(a, a_len, b, b_len) function
would help us, so let's add one and use it in name_compare()" may
make good sense, but if we were to do so, I do not think strncmp()
would be involved in its implementation.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] add strnncmp() function
  2014-06-17 17:48     ` Jonathan Nieder
@ 2014-06-17 19:09       ` Jeremiah Mahler
  0 siblings, 0 replies; 15+ messages in thread
From: Jeremiah Mahler @ 2014-06-17 19:09 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

Jonathan,

On Tue, Jun 17, 2014 at 10:48:17AM -0700, Jonathan Nieder wrote:
> >> On 2014-06-17 09.34, Jeremiah Mahler wrote:
> 
> >>> Also, strnncmp() was switched from using memcmp() to strncmp()
> >>> internally to make it clear that this is meant for strings, not
> >>> general buffers.
> 
> Why shouldn't I want to use this helper on arbitrary data?  One of the
> advantages of other helpers in git that take a pointer and a length
> (e.g., the strbuf library) are that they are 8-bit clean and can work
> on binary data when it's useful.
> 
> Thanks,
> Jonathan

Yes, along with the performance of strncmp() being worse than memcmp(),
and Junios explanation of "counted strings", I think this was a bad idea.
I will switch back to the memcmp() version.

-- 
Jeremiah Mahler
jmmahler@gmail.com
http://github.com/jmahler

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] add strnncmp() function
  2014-06-17 17:55   ` Junio C Hamano
@ 2014-06-17 19:27     ` Jeremiah Mahler
  0 siblings, 0 replies; 15+ messages in thread
From: Jeremiah Mahler @ 2014-06-17 19:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio,

On Tue, Jun 17, 2014 at 10:55:18AM -0700, Junio C Hamano wrote:
> Jeremiah Mahler <jmmahler@gmail.com> writes:
> 
> > Add a strnncmp() function which behaves like strncmp() except it takes
> > the length of both strings instead of just one.  It behaves the same as
> > strncmp() up to the minimum common length between the strings.  When the
> > strings are identical up to this minimum common length, the length
> > difference is returned.
> >
> > Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
> > ---
> >  strbuf.c | 9 +++++++++
> >  strbuf.h | 2 ++
> >  2 files changed, 11 insertions(+)
> >
> > diff --git a/strbuf.c b/strbuf.c
> > index ac62982..4eb7954 100644
> > --- a/strbuf.c
> > +++ b/strbuf.c
> > @@ -600,3 +600,12 @@ char *xstrdup_tolower(const char *string)
> >  	result[i] = '\0';
> >  	return result;
> >  }
> > +
> > +int strnncmp(const char *a, int len_a, const char *b, int len_b)
> > +{
> > +	int min_len = (len_a < len_b) ? len_a : len_b;
> > +	int cmp = strncmp(a, b, min_len);
> > +	if (cmp)
> > +		return cmp;
> > +	return (len_a - len_b);
> > +}
> 
> I am not sure if the interface into this function conceptually makes
> much sense.  strncmp(entry, string, 14) was invented as the way to
> see if a NUL terminated "string" matches with the contents in an
> array of char "entry" that is up to 14 bytes long, and because the
> "entry" was allowed to fill full 14-byte space without terminated
> with a NUL, the maximum possible length is specified separately, but
> a NUL termination in "entry", if exists, is still honored.  Is there
> any case where such a pair of "maximum N bytes but could be shorter"
> strings are compared, especially with different N's defined per
> string, in our codebase (or in other people's project for that
> matter)?
> 
> Further, I do think that the interface into this function and its
> implementation are inappropriate for implementing the name_compare()
> function in tree-walk.c and unpack-trees.c.  These functions are
> designed to take counted strings; in a tuple <a, a_len> they take,
> "a_len" is the only thing that determines the length of string "a".
> There is no room for a NUL termination inside "a" come into play to
> make "a" shorter than "a_len".
> 
> In other words, "Two NUL-terminated strings can be compared with
> strcmp(a, b), but we use counted strings in many places in our
> codebase, and compare_counted_strings(a, a_len, b, b_len) function
> would help us, so let's add one and use it in name_compare()" may
> make good sense, but if we were to do so, I do not think strncmp()
> would be involved in its implementation.
> 
> 
The concept of a counted string clears up some of the confusion I was
having.  Thanks for that explanation.

-- 
Jeremiah Mahler
jmmahler@gmail.com
http://github.com/jmahler

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] add strnncmp() function
  2014-06-17 11:08 ` [PATCH v2 0/3] add strnncmp() function Torsten Bögershausen
  2014-06-17 15:49   ` Jeremiah Mahler
@ 2014-06-18 10:33   ` Ondřej Bílka
  1 sibling, 0 replies; 15+ messages in thread
From: Ondřej Bílka @ 2014-06-18 10:33 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Jeremiah Mahler, Jonathan Nieder, git

On Tue, Jun 17, 2014 at 01:08:05PM +0200, Torsten Bögershausen wrote:
> On 2014-06-17 09.34, Jeremiah Mahler wrote:
> > Add a strnncmp() function which behaves like strncmp() except it takes
> > the length of both strings instead of just one.
> > 
> > Then simplify tree-walk.c and unpack-trees.c using this new function.
> > Replace all occurrences of name_compare() with strnncmp().  Remove
> > name_compare(), which they both had identical copies of.
> > 
> > Version 2 includes suggestions from Jonathan Neider [1]:
> > 
> >   - Fix the logic which caused the new strnncmp() to behave differently
> > 	from the old version.  Now it is identical to strncmp().
> > 
> >   - Improve description of strnncmp().
> > 
> > Also, strnncmp() was switched from using memcmp() to strncmp()
> > internally to make it clear that this is meant for strings, not
> > general buffers.
> I don't think this is a good change, for 2 reasons:
> - It changes the semantics of existing code, which should be carefully
>   reviewed, documented and may be put into a seperate commit.
> - Looking into the code for memcmp() and strncmp() in libc,
>   I can see that memcmp() is written in 13 lines of assembler,
>   (on a 386 system) with a fast
>     repz cmpsb %es:(%edi),%ds:(%esi)
>   working as the core engine.
>   
>   strncmp() uses 83 lines of assembler, because after each comparison
>   the code needs to check of the '\0' in both strings.
> - I can't see a reason to replace efficient code with less efficient code,
>   so moving the old function "as is" into a include file, and declare
>   it "static inline" could be the first step.
> 
That is not true, a rep cmpsb was fast for 486 but is relatively slow
for newer processors. For performance a correct answer is to measure it than do 
blind guess. Are these strings null terminated or is giving a size just
a hint? If it is a hint then a plain strcmp could be faster (this
depends on implementation). A reason is that for implementations that
check more bytes at once it is easier to combine a terminating null mask with 
difference than trying to first find which of first 16 bytes are different and 
then compare if it is within size.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-06-18 10:33 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-17  7:34 [PATCH v2 0/3] add strnncmp() function Jeremiah Mahler
2014-06-17  7:34 ` [PATCH v2 1/3] " Jeremiah Mahler
2014-06-17  8:23   ` Torsten Bögershausen
2014-06-17 15:48     ` Jeremiah Mahler
2014-06-17  9:09   ` Erik Faye-Lund
2014-06-17 15:49     ` Jeremiah Mahler
2014-06-17 17:55   ` Junio C Hamano
2014-06-17 19:27     ` Jeremiah Mahler
2014-06-17  7:34 ` [PATCH v2 2/3] tree-walk: simplify via strnncmp() Jeremiah Mahler
2014-06-17  7:34 ` [PATCH v2 3/3] unpack-trees: " Jeremiah Mahler
2014-06-17 11:08 ` [PATCH v2 0/3] add strnncmp() function Torsten Bögershausen
2014-06-17 15:49   ` Jeremiah Mahler
2014-06-17 17:48     ` Jonathan Nieder
2014-06-17 19:09       ` Jeremiah Mahler
2014-06-18 10:33   ` Ondřej Bílka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).