All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sam Vilain <sam@vilain.net>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] git-fast-import: note 1M limit of mark number
Date: Wed, 16 Apr 2008 09:05:06 +1200	[thread overview]
Message-ID: <48051882.8000201@vilain.net> (raw)
In-Reply-To: <4804CECE.2040205@alum.mit.edu>

Michael Haggerty wrote:
>> ++
>> +Note that due to current internal limitations, you may not make marks
>> +with a higher number than 1048575 (2^20-1).
>>  
>>  * A complete 40 byte or abbreviated commit SHA-1 in hex.
>>  
> 
> Oh.  Um.  That is an awkwardly small number nowadays.
> 
> cvs2svn has been used for repositories with O(2^20) distinct file
> revisions (KDE, Mozilla, NetBSD, ...).  So this limit will likely be too
> small for some users.

Right.  But, if you're not making the importer you write for a
conversion of that size restartable, you're insane.  So, marking more
than 1Mi *marks* in a single gfi session might not be so vital.

It only tripped me up because I was using a database sequence to
generate the marks, which meant I hit the ceiling.

> Moreover, cvs2git needs to generate marks for both file contents and for
> commits.  It generates the latter by adding 1000000000 to the small
> integer IDs that it uses internally.  If git-fast-import only allows
> 20-bit integers, this makes me wonder why this hasn't broken
> dramatically in the past.  Pure numerological good fortune, combined
> with weak range checking in git-fast-import?

Perhaps.  All I saw was that after I hit 1Mi for the mark ID, the mark
numbers in the returned file were drastically different from the ones I
put in.  I had a glance over this code and it seemed likely to be a
culprit - this docpatch is really more raising awareness of the problem.
 Obviously finding the fault and fixing it would be preferable.

> While I'm at it, let me also renew my suggestion that git-fast-import
> use separate namespaces ("markspaces", so to speak) for file content
> marks and for commit marks.  There is no reason for these distinct types
> of marks to be located in a shared space of integers.

There is a reason, it's because they're both just object IDs.  Is it
really that much of a drag?  I know what you mean though, it meant for
my code I had to keep track of which type each mark was.

Sam.

  reply	other threads:[~2008-04-15 21:02 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-15 12:54 [PATCH] git-fast-import: note 1M limit of mark number Sam Vilain
2008-04-15 15:50 ` Michael Haggerty
2008-04-15 21:05   ` Sam Vilain [this message]
2008-04-16  6:54     ` Shawn O. Pearce
2008-04-16  7:04     ` Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48051882.8000201@vilain.net \
    --to=sam@vilain.net \
    --cc=git@vger.kernel.org \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.