Re: Incremental CVS update

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Jon Smirl" <jonsmirl@gmail.com>
To: "Martin Langhoff" <martin.langhoff@gmail.com>
Cc: "Keith Packard" <keithp@keithp.com>, git <git@vger.kernel.org>
Subject: Re: Incremental CVS update
Date: Thu, 22 Jun 2006 16:08:09 -0400	[thread overview]
Message-ID: <9e4733910606221308v2f995adev8c5b721be0a009e2@mail.gmail.com> (raw)
In-Reply-To: <46a038f90606221236j2c5c9692yecef924aa769c1c9@mail.gmail.com>

On 6/22/06, Martin Langhoff <martin.langhoff@gmail.com> wrote:
> On 6/23/06, Jon Smirl <jonsmirl@gmail.com> wrote:
> > cvsps keeps it's incremental status in ~/.cvps/*. parsecvs might want
> > to keep it's status in the .git repository and use tags to locate it.
> > You could even have a utility to show when and what was imported. By
> > keeping everything in git it doesn't matter who runs the incremental
> > update commands.
>
> Jon,
>
> what cvsps keeps is a cache of what it knows about the repo history,
> to ask only for new commits. Now, cvsps will always write to STDOUT
> the full history, and git-cvsimport discards the commits it has
> already seen, based on reading the state of each git head.

The cache is 723MB for the Mozilla repo. Since the info gets cached in
my home directory anyone else who needs to sync the repo doesn't get
to use the cache.

[jonsmirl@jonsmirl .cvsps]$ pwd
/home/jonsmirl/.cvsps
[jonsmirl@jonsmirl .cvsps]$ ls -l
total 707492
-rw-rw-r-- 1 jonsmirl jonsmirl 723758657 Jun 15 16:10 #home#mozcvs##mozilla
[jonsmirl@jonsmirl .cvsps]$

Keith is rewriting parsecvs. If you analyze all of the data
structures, the info needed for the conversion should be able to fit
into well under 100MB instead of the ~2GB the current programs are
using.

There are lots of ways to reduce memory consumption. You can turm CVS
revisions into git IDs as soon as the revision is seen. That lets you
get away from tracking file names and long CVS revision numbers. It
also works to turn the author/log fields immediately into a hash. When
possible switching to arrays instead of linked list is smaller too.

Some stats:
1M revisions
200K unique changesets (author/log combos)
200KB symbols
1,800 branches

cvsps has the lowest memory consumption, it uses 1200 bytes per
revision. It looks like it is possible to lower this to less than 100
bytes per rev.

>
> So cvsps + git-cvsimport don't keep any extra data around, and I am
> 100% certain that parsecvs don't need that either.
>
> cheers,
>
>
> martin
>

-- 
Jon Smirl
jonsmirl@gmail.com

     prev parent reply	other threads:[~2006-06-22 20:08 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-22 12:26 Incremental CVS update Jon Smirl
2006-06-22 19:36 ` Martin Langhoff
2006-06-22 20:08   ` Jon Smirl [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e4733910606221308v2f995adev8c5b721be0a009e2@mail.gmail.com \
    --to=jonsmirl@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=keithp@keithp.com \
    --cc=martin.langhoff@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).