64-bit support.

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 64-bit support.
@ 2013-11-14 12:58 Лежанкин Иван
  2013-11-14 13:07 ` Konstantin Khomoutov
  2013-11-14 14:40 ` Kent R. Spillner
  0 siblings, 2 replies; 6+ messages in thread
From: Лежанкин Иван @ 2013-11-14 12:58 UTC (permalink / raw)
  To: Git Mailing List

Hi!

Do you plan to implement the 64-bit support in git? - Right now I have
a problems sometimes with a huge repo and renaming detection. If I
merge more than 32768 files at once, then the renaming detection
fails, because of limitation inside git. The limitation is put by max
32-bit value.

I tweaked sources locally, to use 64-bit value as a number of merging
files - generally, it works. But I'm not so brave to try to replace it
everywhere in git.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 64-bit support.
  2013-11-14 12:58 64-bit support Лежанкин Иван
@ 2013-11-14 13:07 ` Konstantin Khomoutov
  2013-11-14 14:40 ` Kent R. Spillner
  1 sibling, 0 replies; 6+ messages in thread
From: Konstantin Khomoutov @ 2013-11-14 13:07 UTC (permalink / raw)
  To: Лежанкин Иван
  Cc: Git Mailing List

On Thu, 14 Nov 2013 16:58:31 +0400
Лежанкин Иван  <abyss.7@gmail.com> wrote:

> Do you plan to implement the 64-bit support in git? - Right now I have
> a problems sometimes with a huge repo and renaming detection. If I
> merge more than 32768 files at once, then the renaming detection
> fails, because of limitation inside git. The limitation is put by max
> 32-bit value.
> 
> I tweaked sources locally, to use 64-bit value as a number of merging
> files - generally, it works. But I'm not so brave to try to replace it
> everywhere in git.

$ apt-cache policy git
git:
  Installed: 1:1.7.10.4-1+wheezy1
  Candidate: 1:1.7.10.4-1+wheezy1
  Version table:
 *** 1:1.7.10.4-1+wheezy1 0
        500 http://cdn.debian.net/debian/ wheezy/main amd64 Packages
        100 /var/lib/dpkg/status

Notice the amd64 arch.  So what the question really is about?

If you have found a place where Git explicitly uses a 32-bit integer
where it would better be using a 64-bit one, please propose a patch to
discuss.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 64-bit support.
  2013-11-14 12:58 64-bit support Лежанкин Иван
  2013-11-14 13:07 ` Konstantin Khomoutov
@ 2013-11-14 14:40 ` Kent R. Spillner
  2013-11-14 14:55   ` Лежанкин Иван
  1 sibling, 1 reply; 6+ messages in thread
From: Kent R. Spillner @ 2013-11-14 14:40 UTC (permalink / raw)
  To: Лежанкин Иван
  Cc: Git Mailing List

Can you be more specific about the limitation you suspect you are hitting?  32,768 is not the "max 32-bit value."


> On Nov 14, 2013, at 6:58, Лежанкин Иван  <abyss.7@gmail.com> wrote:
> 
> Hi!
> 
> Do you plan to implement the 64-bit support in git? - Right now I have
> a problems sometimes with a huge repo and renaming detection. If I
> merge more than 32768 files at once, then the renaming detection
> fails, because of limitation inside git. The limitation is put by max
> 32-bit value.
> 
> I tweaked sources locally, to use 64-bit value as a number of merging
> files - generally, it works. But I'm not so brave to try to replace it
> everywhere in git.
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 64-bit support.
  2013-11-14 14:40 ` Kent R. Spillner
@ 2013-11-14 14:55   ` Лежанкин Иван
  2013-11-14 15:11     ` Konstantin Khomoutov
  0 siblings, 1 reply; 6+ messages in thread
From: Лежанкин Иван @ 2013-11-14 14:55 UTC (permalink / raw)
  To: Kent R. Spillner; +Cc: Git Mailing List

I hit this limit in file 'diffcore-rename.c':

if (rename_limit <= 0 || rename_limit > 32767)
        rename_limit = 32767;

I just guess, that this limit comes from the O(N^2) complexity of the
comparison algorithm. Since the max 32-bit signed value is 2^31, then
the 2^15 = 32768 is somehow correlated with its square root, maybe,
like 2^(32/2 - 1) - to prevent overflow.
I'm trying to prepare the patch right now, that changes the `int
rename_limit` => `long rename_limit` and all intermediate variable
types. Is it a correct way to do?

On 14 November 2013 18:40, Kent R. Spillner <kspillner@acm.org> wrote:
> Can you be more specific about the limitation you suspect you are hitting?  32,768 is not the "max 32-bit value."
>
>
>> On Nov 14, 2013, at 6:58, Лежанкин Иван  <abyss.7@gmail.com> wrote:
>>
>> Hi!
>>
>> Do you plan to implement the 64-bit support in git? - Right now I have
>> a problems sometimes with a huge repo and renaming detection. If I
>> merge more than 32768 files at once, then the renaming detection
>> fails, because of limitation inside git. The limitation is put by max
>> 32-bit value.
>>
>> I tweaked sources locally, to use 64-bit value as a number of merging
>> files - generally, it works. But I'm not so brave to try to replace it
>> everywhere in git.
>> --
>> To unsubscribe from this list: send the line "unsubscribe git" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 64-bit support.
  2013-11-14 14:55   ` Лежанкин Иван
@ 2013-11-14 15:11     ` Konstantin Khomoutov
  2013-11-14 22:15       ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Konstantin Khomoutov @ 2013-11-14 15:11 UTC (permalink / raw)
  To: Лежанкин Иван
  Cc: Kent R. Spillner, Git Mailing List

On Thu, 14 Nov 2013 18:55:52 +0400
Лежанкин Иван  <abyss.7@gmail.com> wrote:

> I hit this limit in file 'diffcore-rename.c':
> 
> if (rename_limit <= 0 || rename_limit > 32767)
>         rename_limit = 32767;
> 
> I just guess, that this limit comes from the O(N^2) complexity of the
> comparison algorithm. Since the max 32-bit signed value is 2^31, then
> the 2^15 = 32768 is somehow correlated with its square root, maybe,
> like 2^(32/2 - 1) - to prevent overflow.
> I'm trying to prepare the patch right now, that changes the `int
> rename_limit` => `long rename_limit` and all intermediate variable
> types. Is it a correct way to do?

I beleive rename_limit comes from reading the diff.renameLimit
configuration variable.  The gitconfig(1) manual page hints to look at
the "-l" command-line option of `git diff` which is described this way:

  -l<num>

    The -M and -C options require O(n^2) processing time where n is the
  number of potential rename/copy targets. This option prevents
  rename/copy detection from running if the number of rename/copy
  targets exceeds the specified number.

This description is not too clear, I admit.

Looks like you're on the right track but the patch appears to require a
more wide impact:
* 32767 should be a default limit, applied in the case the user did not
  specify neither diff.renameLimit nor "-l".
* If whatever value read from those sources is less than 0, an error
  should be thrown--it looks strange to just revert it to the default
  value in this case.
* If the user-supplied value is >= 0, then just use it, assume the user
  knows what they are doing.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 64-bit support.
  2013-11-14 15:11     ` Konstantin Khomoutov
@ 2013-11-14 22:15       ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2013-11-14 22:15 UTC (permalink / raw)
  To: Konstantin Khomoutov
  Cc: Лежанкин Иван,
	Kent R. Spillner, Git Mailing List

On Thu, Nov 14, 2013 at 07:11:31PM +0400, Konstantin Khomoutov wrote:

> > I just guess, that this limit comes from the O(N^2) complexity of the
> > comparison algorithm. Since the max 32-bit signed value is 2^31, then
> > the 2^15 = 32768 is somehow correlated with its square root, maybe,
> > like 2^(32/2 - 1) - to prevent overflow.
> > I'm trying to prepare the patch right now, that changes the `int
> > rename_limit` => `long rename_limit` and all intermediate variable
> > types. Is it a correct way to do?

Yes, the 32768 limit is there to prevent 32-bit overflow of the
squared value. We can use a uint64_t and bump the overflow limit to 2^32
if we are actually bumping up against this.

> I beleive rename_limit comes from reading the diff.renameLimit
> configuration variable.  The gitconfig(1) manual page hints to look at
> the "-l" command-line option of `git diff` which is described this way:

Yes, though there are also builtin defaults, depending on the operation
(400 for diffs, 1000 for merges).

> Looks like you're on the right track but the patch appears to require a
> more wide impact:
> * 32767 should be a default limit, applied in the case the user did not
>   specify neither diff.renameLimit nor "-l".

That is probably too high for a default limit. The point is that we are
making an O(n^2) matrix, and consuming CPU and memory in line with that.
The defaults were determined empirically on "reasonable" times (and are
now based on old hardware, so it may be time to bump them up).

The idea was that 32767 was unreasonably high, and you would not want to
spend that much CPU time anyway. On my shiny new machine, doing a 2048^2
matrix takes 5s. A 32767^2 matrix would be 21 minutes.

And it was also expected that you would simply not have that big a
matrix anyway (you might >32767 candidates on one side, but you will not
have a square bigger than rename_limit^2, which is what this is
checking).

> * If whatever value read from those sources is less than 0, an error
>   should be thrown--it looks strange to just revert it to the default
>   value in this case.

It's not the default, though. The default is something like 400. So
saying "-1" (or 0) is a way of saying "turn off the limit" without
having to guess at an arbitrarily high number.

> * If the user-supplied value is >= 0, then just use it, assume the user
>   knows what they are doing.

At some point we have to deal with integer overflow. It was assumed that
a 32-bit max was "big enough" and that nobody would have a real world
case that hit that. I'd be curious to hear if that has actually been
hit, or if there is simply confusion over how the rename-limit config
works (the limit is the square root of the max matrix size).

Here's what the 64-bit patch could look like (I notice that we are not
using unsigned 32-bit integers, either, and we probably should be):

---
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 6c7a72f..8356a62 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -427,11 +427,16 @@ static void record_if_better(struct diff_score m[], struct diff_score *o)
  * 1 if we need to disable inexact rename detection;
  * 2 if we would be under the limit if we were given -C instead of -C -C.
  */
-static int too_many_rename_candidates(int num_create,
+static int too_many_rename_candidates(uint64_t num_create,
 				      struct diff_options *options)
 {
-	int rename_limit = options->rename_limit;
-	int num_src = rename_src_nr;
+	/*
+	 * signed to unsigned conversion here is intentional, and
+	 * converts limits < 0 into large (effectively infinite)
+	 * limits
+	 */
+	uint64_t rename_limit = options->rename_limit;
+	uint64_t num_src = rename_src_nr;
 	int i;

 	options->needed_rename_limit = 0;
@@ -442,11 +447,10 @@ static int too_many_rename_candidates(int num_create,
 	 *
 	 *    num_create * num_src > rename_limit * rename_limit
 	 *
-	 * but handles the potential overflow case specially (and we
-	 * assume at least 32-bit integers)
+	 * but handles the potential overflow case specially.
 	 */
-	if (rename_limit <= 0 || rename_limit > 32767)
-		rename_limit = 32767;
+	if (!rename_limit || rename_limit > 4294967295)
+		rename_limit = 4294967295;
 	if ((num_create <= rename_limit || num_src <= rename_limit) &&
 	    (num_create * num_src <= rename_limit * rename_limit))
 		return 0;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-11-14 22:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-14 12:58 64-bit support Лежанкин Иван
2013-11-14 13:07 ` Konstantin Khomoutov
2013-11-14 14:40 ` Kent R. Spillner
2013-11-14 14:55   ` Лежанкин Иван
2013-11-14 15:11     ` Konstantin Khomoutov
2013-11-14 22:15       ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).