git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Incremental CVS update
@ 2006-06-22 12:26 Jon Smirl
  2006-06-22 19:36 ` Martin Langhoff
  0 siblings, 1 reply; 3+ messages in thread
From: Jon Smirl @ 2006-06-22 12:26 UTC (permalink / raw)
  To: Keith Packard, git

cvsps keeps it's incremental status in ~/.cvps/*. parsecvs might want
to keep it's status in the .git repository and use tags to locate it.
You could even have a utility to show when and what was imported. By
keeping everything in git it doesn't matter who runs the incremental
update commands.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Incremental CVS update
  2006-06-22 12:26 Incremental CVS update Jon Smirl
@ 2006-06-22 19:36 ` Martin Langhoff
  2006-06-22 20:08   ` Jon Smirl
  0 siblings, 1 reply; 3+ messages in thread
From: Martin Langhoff @ 2006-06-22 19:36 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Keith Packard, git

On 6/23/06, Jon Smirl <jonsmirl@gmail.com> wrote:
> cvsps keeps it's incremental status in ~/.cvps/*. parsecvs might want
> to keep it's status in the .git repository and use tags to locate it.
> You could even have a utility to show when and what was imported. By
> keeping everything in git it doesn't matter who runs the incremental
> update commands.

Jon,

what cvsps keeps is a cache of what it knows about the repo history,
to ask only for new commits. Now, cvsps will always write to STDOUT
the full history, and git-cvsimport discards the commits it has
already seen, based on reading the state of each git head.

So cvsps + git-cvsimport don't keep any extra data around, and I am
100% certain that parsecvs don't need that either.

cheers,


martin

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Incremental CVS update
  2006-06-22 19:36 ` Martin Langhoff
@ 2006-06-22 20:08   ` Jon Smirl
  0 siblings, 0 replies; 3+ messages in thread
From: Jon Smirl @ 2006-06-22 20:08 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Keith Packard, git

On 6/22/06, Martin Langhoff <martin.langhoff@gmail.com> wrote:
> On 6/23/06, Jon Smirl <jonsmirl@gmail.com> wrote:
> > cvsps keeps it's incremental status in ~/.cvps/*. parsecvs might want
> > to keep it's status in the .git repository and use tags to locate it.
> > You could even have a utility to show when and what was imported. By
> > keeping everything in git it doesn't matter who runs the incremental
> > update commands.
>
> Jon,
>
> what cvsps keeps is a cache of what it knows about the repo history,
> to ask only for new commits. Now, cvsps will always write to STDOUT
> the full history, and git-cvsimport discards the commits it has
> already seen, based on reading the state of each git head.

The cache is 723MB for the Mozilla repo. Since the info gets cached in
my home directory anyone else who needs to sync the repo doesn't get
to use the cache.

[jonsmirl@jonsmirl .cvsps]$ pwd
/home/jonsmirl/.cvsps
[jonsmirl@jonsmirl .cvsps]$ ls -l
total 707492
-rw-rw-r-- 1 jonsmirl jonsmirl 723758657 Jun 15 16:10 #home#mozcvs##mozilla
[jonsmirl@jonsmirl .cvsps]$


Keith is rewriting parsecvs. If you analyze all of the data
structures, the info needed for the conversion should be able to fit
into well under 100MB instead of the ~2GB the current programs are
using.

There are lots of ways to reduce memory consumption. You can turm CVS
revisions into git IDs as soon as the revision is seen. That lets you
get away from tracking file names and long CVS revision numbers. It
also works to turn the author/log fields immediately into a hash. When
possible switching to arrays instead of linked list is smaller too.

Some stats:
1M revisions
200K unique changesets (author/log combos)
200KB symbols
1,800 branches

cvsps has the lowest memory consumption, it uses 1200 bytes per
revision. It looks like it is possible to lower this to less than 100
bytes per rev.

>
> So cvsps + git-cvsimport don't keep any extra data around, and I am
> 100% certain that parsecvs don't need that either.
>
> cheers,
>
>
> martin
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-06-22 20:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-22 12:26 Incremental CVS update Jon Smirl
2006-06-22 19:36 ` Martin Langhoff
2006-06-22 20:08   ` Jon Smirl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).