git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC/PATCH] Reduce cost of deletion in levenstein distance (4 -> 3)
@ 2012-04-27  8:58 Matthieu Moy
  2012-04-27  9:43 ` Zbigniew Jędrzejewski-Szmek
  0 siblings, 1 reply; 3+ messages in thread
From: Matthieu Moy @ 2012-04-27  8:58 UTC (permalink / raw)
  To: git; +Cc: Matthieu Moy

Before this patch, a character deletion has the same cost as 2 swaps, or
4 additions, so Git prefers suggesting a completely scrambled command
name to removing a character. For example, "git tags" suggests "stage",
but not "tag".

By setting the deletion cost to 3, we keep it higher than swaps or
additions, but prefer 1 deletion to 2 swaps. "git tags" now suggests
"tag" in addition to staged.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
---
The funny thing is: A while ago, I reported that the typo "git tags"
was not suggesting "git tag" as alternative "did you mean this?". A
patch was posted and merged:

  http://thread.gmane.org/gmane.comp.version-control.git/101278

According to the discussion, I it worked for me. But I can't reproduce
this "works for me" even going back to the version right after the fix
above.

 help.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/help.c b/help.c
index 14eefc9..fdaa90d 100644
--- a/help.c
+++ b/help.c
@@ -334,7 +334,7 @@ const char *help_unknown_cmd(const char *cmd)
 		}
 
 		main_cmds.names[i]->len =
-			levenshtein(cmd, candidate, 0, 2, 1, 4) + 1;
+			levenshtein(cmd, candidate, 0, 2, 1, 3) + 1;
 	}
 
 	qsort(main_cmds.names, main_cmds.cnt,
-- 
1.7.10.363.g9094b

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC/PATCH] Reduce cost of deletion in levenstein distance (4 -> 3)
  2012-04-27  8:58 [RFC/PATCH] Reduce cost of deletion in levenstein distance (4 -> 3) Matthieu Moy
@ 2012-04-27  9:43 ` Zbigniew Jędrzejewski-Szmek
  2012-05-24 20:33   ` Matthieu Moy
  0 siblings, 1 reply; 3+ messages in thread
From: Zbigniew Jędrzejewski-Szmek @ 2012-04-27  9:43 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git

On 04/27/2012 10:58 AM, Matthieu Moy wrote:
> Before this patch, a character deletion has the same cost as 2 swaps, or
> 4 additions, so Git prefers suggesting a completely scrambled command
> name to removing a character. For example, "git tags" suggests "stage",
> but not "tag".
> 
> By setting the deletion cost to 3, we keep it higher than swaps or
> additions, but prefer 1 deletion to 2 swaps. "git tags" now suggests
> "tag" in addition to staged.

Hi,
looks sensible, but I wonder if the algorithm shouldn't be tweaked even
further. I understand why 'tags' and 'stage' are similar,
but if I say 'tagz', git proposes (with your change), both 'stage' and
'tag'. 'tag' is one deletion away, but 'stage' requires a deletion and a
replacement, so should loose to 'tag', I think.

Zbyszek

> Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
> ---
> The funny thing is: A while ago, I reported that the typo "git tags"
> was not suggesting "git tag" as alternative "did you mean this?". A
> patch was posted and merged:
> 
>   http://thread.gmane.org/gmane.comp.version-control.git/101278
> 
> According to the discussion, I it worked for me. But I can't reproduce
> this "works for me" even going back to the version right after the fix
> above.
> 
>  help.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/help.c b/help.c
> index 14eefc9..fdaa90d 100644
> --- a/help.c
> +++ b/help.c
> @@ -334,7 +334,7 @@ const char *help_unknown_cmd(const char *cmd)
>  		}
>  
>  		main_cmds.names[i]->len =
> -			levenshtein(cmd, candidate, 0, 2, 1, 4) + 1;
> +			levenshtein(cmd, candidate, 0, 2, 1, 3) + 1;
>  	}
>  
>  	qsort(main_cmds.names, main_cmds.cnt,

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC/PATCH] Reduce cost of deletion in levenstein distance (4 -> 3)
  2012-04-27  9:43 ` Zbigniew Jędrzejewski-Szmek
@ 2012-05-24 20:33   ` Matthieu Moy
  0 siblings, 0 replies; 3+ messages in thread
From: Matthieu Moy @ 2012-05-24 20:33 UTC (permalink / raw)
  To: Zbigniew Jędrzejewski-Szmek; +Cc: git

[ Sorry for the looong delay ]

Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl> writes:

> On 04/27/2012 10:58 AM, Matthieu Moy wrote:
>> Before this patch, a character deletion has the same cost as 2 swaps, or
>> 4 additions, so Git prefers suggesting a completely scrambled command
>> name to removing a character. For example, "git tags" suggests "stage",
>> but not "tag".
>> 
>> By setting the deletion cost to 3, we keep it higher than swaps or
>> additions, but prefer 1 deletion to 2 swaps. "git tags" now suggests
>> "tag" in addition to staged.
>
> Hi,
> looks sensible, but I wonder if the algorithm shouldn't be tweaked even
> further. I understand why 'tags' and 'stage' are similar,
> but if I say 'tagz', git proposes (with your change), both 'stage' and
> 'tag'. 'tag' is one deletion away, but 'stage' requires a deletion and a
> replacement, so should loose to 'tag', I think.

First, my patch is also an improvement here since it allows showing tags
(previously, it showed only stage). The idea for showing stage before
tag is that the cost of deletion is greater than the cost of insertion,
which corresponds to the hypothesis that it's more common to miss one
character when typing than typing too many. That's probably subjective,
but I think it makes sense.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-05-24 20:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-27  8:58 [RFC/PATCH] Reduce cost of deletion in levenstein distance (4 -> 3) Matthieu Moy
2012-04-27  9:43 ` Zbigniew Jędrzejewski-Szmek
2012-05-24 20:33   ` Matthieu Moy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).