* Fixes to parsecvs
@ 2006-04-06 6:36 ` Keith Packard
2006-04-06 12:08 ` Jan-Benedict Glaw
2006-04-06 18:15 ` parsecvs tool now creates git repositories Jim Radford
0 siblings, 2 replies; 13+ messages in thread
From: Keith Packard @ 2006-04-06 6:36 UTC (permalink / raw)
To: Git Mailing List; +Cc: keithp
[-- Attachment #1: Type: text/plain, Size: 656 bytes --]
note, parsecvs remains available from:
git://git.freedesktop.org/~keithp/parsecvs
I've "fixed" the lexer to permit getc/ungetc in the data parsing
functions. This should resolve the flex -l / -X problems.
Jim Radford send a patch to add '/' as a legal tag character
I added my custom edit-change-log script for people dealing with
X.org-style commit messages.
And, it deals with import branch revisions that aren't supposed to
get merged back to the trunk, creating a custom branch name based on the
branch revision (which must be global across all files).
5e5f4c012aec2db012a08b1c7ed5219ed5100111
--
keith.packard@intel.com
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fixes to parsecvs
2006-04-06 6:36 ` Fixes to parsecvs Keith Packard
@ 2006-04-06 12:08 ` Jan-Benedict Glaw
2006-04-06 14:48 ` Keith Packard
2006-04-06 18:15 ` parsecvs tool now creates git repositories Jim Radford
1 sibling, 1 reply; 13+ messages in thread
From: Jan-Benedict Glaw @ 2006-04-06 12:08 UTC (permalink / raw)
To: Keith Packard; +Cc: Git Mailing List
[-- Attachment #1: Type: text/plain, Size: 1977 bytes --]
On Wed, 2006-04-05 23:36:32 -0700, Keith Packard <keithp@keithp.com> wrote:
> note, parsecvs remains available from:
>
> git://git.freedesktop.org/~keithp/parsecvs
It now compiles out-of-the-box for me, nice work.
However, it would be nice if you'd add a short description about how
to use it. Something like this:
---------------------------------------------------------------------
There's still a lot of work to do on parsecvs, but if you want to give
it a run, first create a copy of the whole CVS tree and go to the base
directory of this copy. (You find a lot of *,v files in this directory
and all its subdirectories.)
Now feed all ,v filenames into parsecvs. Keep in mind that a
`edit-change-log' executable needs to be in your $PATH (a one-line
script only exit'ing with 0 will do the job.):
find . -type f -name '*,v' -print | parsecvs
This will create the .git/ directory and put all the objects, commits
and tree information into this new git repository.
---------------------------------------------------------------------
I just ran it against a locally rsync'ed copy of the Binutils ,v
files. Looging at the progress bar, it is bascally ready:
Load: winsup/configure.in,v ....................* 27704 of 27704
But it seems it now starts to really consume memory:
jbglaw@bixie:~/bin$ ps axflwww|egrep '(VSZ|parsecvs)'|grep -v grep
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 1000 15564 22879 18 0 2805084 549996 finish T pts/10 30:51 | \_ parsecvs
How well does this work with even larger repositories?
MfG, JBG
--
Jan-Benedict Glaw jbglaw@lug-owl.de . +49-172-7608481 _ O _
"Eine Freie Meinung in einem Freien Kopf | Gegen Zensur | Gegen Krieg _ _ O
für einen Freien Staat voll Freier Bürger" | im Internet! | im Irak! O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fixes to parsecvs
2006-04-06 12:08 ` Jan-Benedict Glaw
@ 2006-04-06 14:48 ` Keith Packard
2006-04-06 15:26 ` Johannes Schindelin
2006-04-09 23:17 ` Francois Romieu
0 siblings, 2 replies; 13+ messages in thread
From: Keith Packard @ 2006-04-06 14:48 UTC (permalink / raw)
To: Jan-Benedict Glaw; +Cc: keithp, Git Mailing List
[-- Attachment #1: Type: text/plain, Size: 2098 bytes --]
On Thu, 2006-04-06 at 14:08 +0200, Jan-Benedict Glaw wrote:
> On Wed, 2006-04-05 23:36:32 -0700, Keith Packard <keithp@keithp.com> wrote:
> > note, parsecvs remains available from:
> >
> > git://git.freedesktop.org/~keithp/parsecvs
>
> It now compiles out-of-the-box for me, nice work.
cool
>
> However, it would be nice if you'd add a short description about how
> to use it. Something like this:
I'd rather just fix the usage to be more sane; that shouldn't take but a
few minutes...
> I just ran it against a locally rsync'ed copy of the Binutils ,v
> files. Looging at the progress bar, it is bascally ready:
>
>
> Load: winsup/configure.in,v ....................* 27704 of 27704
Now all of the ,v files have been parsed and each revision placed in
the .git repository as a blob.
> But it seems it now starts to really consume memory:
Yeah, it's doing the change set computation, which is not very space
efficient; it computes the entire set of files at each commit which can
take 'a bit' of space with a large number of files over a long period of
time. Obviously computing revision deltas and saving those would make it
use a lot less memory.
> jbglaw@bixie:~/bin$ ps axflwww|egrep '(VSZ|parsecvs)'|grep -v grep
> F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
> 0 1000 15564 22879 18 0 2805084 549996 finish T pts/10 30:51 | \_ parsecvs
I'd run a large repository on a large machine; I managed to get
postgresql to run on my laptop (615M CVS with 6000 files), but anything
larger I'd probably want to get it onto a big enough machine. The
question is whether it needs to be more efficient so that people can
constantly convert repositories or whether moving the repository to a
sufficiently large machine for the one-time conversion is 'good enough'.
> How well does this work with even larger repositories?
postgresql is the largest I've run; starting with a 615M CVS repository,
it built a 1.7G .git tree, which packed down to 125M.
--
keith.packard@intel.com
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fixes to parsecvs
2006-04-06 14:48 ` Keith Packard
@ 2006-04-06 15:26 ` Johannes Schindelin
2006-04-06 16:09 ` Jan-Benedict Glaw
2006-04-06 17:36 ` Keith Packard
2006-04-09 23:17 ` Francois Romieu
1 sibling, 2 replies; 13+ messages in thread
From: Johannes Schindelin @ 2006-04-06 15:26 UTC (permalink / raw)
To: Keith Packard; +Cc: Git Mailing List
Hi,
On Thu, 6 Apr 2006, Keith Packard wrote:
> On Thu, 2006-04-06 at 14:08 +0200, Jan-Benedict Glaw wrote:
>
> > But it seems it now starts to really consume memory:
>
> The question is whether it needs to be more efficient so that people can
> constantly convert repositories or whether moving the repository to a
> sufficiently large machine for the one-time conversion is 'good enough'.
Keep in mind that there are many more valid uses for tracking a CVS
repository than to import it once.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fixes to parsecvs
2006-04-06 15:26 ` Johannes Schindelin
@ 2006-04-06 16:09 ` Jan-Benedict Glaw
2006-04-06 17:36 ` Keith Packard
1 sibling, 0 replies; 13+ messages in thread
From: Jan-Benedict Glaw @ 2006-04-06 16:09 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Keith Packard, Git Mailing List
[-- Attachment #1: Type: text/plain, Size: 1518 bytes --]
On Thu, 2006-04-06 17:26:14 +0200, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> On Thu, 6 Apr 2006, Keith Packard wrote:
> > On Thu, 2006-04-06 at 14:08 +0200, Jan-Benedict Glaw wrote:
> > > But it seems it now starts to really consume memory:
> > The question is whether it needs to be more efficient so that people can
> > constantly convert repositories or whether moving the repository to a
> > sufficiently large machine for the one-time conversion is 'good enough'.
>
> Keep in mind that there are many more valid uses for tracking a CVS
> repository than to import it once.
Even the most simplest usage case reveals this. (It's also what I'm
about to do the the converted GCC repository.)
Get the repo, locally track the changes (so the importet branches are
all like "vendor branches") and do own work in local branches.
I'll do this eg. to be able to easily re-diff patches, which I want to
put into GIT, just because it's so much more convenient than SVN.
However, this is only possible because I'm able to keep track of
upstream SVN changes. They probably won't change their SCM again, just
after they've introduced SVN.
MfG, JBG
--
Jan-Benedict Glaw jbglaw@lug-owl.de . +49-172-7608481 _ O _
"Eine Freie Meinung in einem Freien Kopf | Gegen Zensur | Gegen Krieg _ _ O
für einen Freien Staat voll Freier Bürger" | im Internet! | im Irak! O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fixes to parsecvs
2006-04-06 15:26 ` Johannes Schindelin
2006-04-06 16:09 ` Jan-Benedict Glaw
@ 2006-04-06 17:36 ` Keith Packard
1 sibling, 0 replies; 13+ messages in thread
From: Keith Packard @ 2006-04-06 17:36 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: keithp, Git Mailing List
[-- Attachment #1: Type: text/plain, Size: 587 bytes --]
On Thu, 2006-04-06 at 17:26 +0200, Johannes Schindelin wrote:
> Keep in mind that there are many more valid uses for tracking a CVS
> repository than to import it once.
Sure, but we should fix parsecvs to handle incremental CVS tracking if
that's one of the goals for this utility. git-cvsimport does this by
skipping commits earlier than a fixed time; if we did that, we'd
eliminate the huge memory usage except for initial imports. I haven't
considered how this might be done in detail yet; I have no personal need
for this functionality.
--
keith.packard@intel.com
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: parsecvs tool now creates git repositories
2006-04-06 6:36 ` Fixes to parsecvs Keith Packard
2006-04-06 12:08 ` Jan-Benedict Glaw
@ 2006-04-06 18:15 ` Jim Radford
2006-04-06 20:12 ` Keith Packard
1 sibling, 1 reply; 13+ messages in thread
From: Jim Radford @ 2006-04-06 18:15 UTC (permalink / raw)
To: Keith Packard; +Cc: Git Mailing List
Hi Keith,
Here's one more build patch. For some reason the Fedora lex doesn't
want a space after the -o.
Almost all of the errors I was seeing in the last version were fixed
with your "branches that don't get merged back to the trunk" fix.
Thanks,
-Jim
diff --git a/Makefile b/Makefile
index 4ca6ffd..137ed34 100644
--- a/Makefile
+++ b/Makefile
@@ -4,7 +4,7 @@ GCC_WARNINGS3=-Wnested-externs -fno-stri
GCC_WARNINGS=$(GCC_WARNINGS1) $(GCC_WARNINGS2) $(GCC_WARNINGS3)
CFLAGS=-O0 -g $(GCC_WARNINGS)
YFLAGS=-d -l
-LFLAGS=-l -o lex.c
+LFLAGS=-l -olex.c
SRCS=gram.y lex.l cvs.h parsecvs.c cvsutil.c \
revlist.c atom.c revcvs.c git.c gitutil.c
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: parsecvs tool now creates git repositories
2006-04-06 18:15 ` parsecvs tool now creates git repositories Jim Radford
@ 2006-04-06 20:12 ` Keith Packard
2006-04-06 21:51 ` Martin Langhoff
0 siblings, 1 reply; 13+ messages in thread
From: Keith Packard @ 2006-04-06 20:12 UTC (permalink / raw)
To: Jim Radford; +Cc: keithp, Git Mailing List
[-- Attachment #1: Type: text/plain, Size: 560 bytes --]
On Thu, 2006-04-06 at 11:15 -0700, Jim Radford wrote:
> Hi Keith,
>
> Here's one more build patch. For some reason the Fedora lex doesn't
> want a space after the -o.
I probably shouldn't even use the -o flag; all it does is change the
#line directives in the output file to point at lex.c instead of
<stdout>. I'm sure it'll break something.
> Almost all of the errors I was seeing in the last version were fixed
> with your "branches that don't get merged back to the trunk" fix.
That's good news at least.
--
keith.packard@intel.com
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: parsecvs tool now creates git repositories
2006-04-06 20:12 ` Keith Packard
@ 2006-04-06 21:51 ` Martin Langhoff
2006-04-06 22:19 ` Keith Packard
0 siblings, 1 reply; 13+ messages in thread
From: Martin Langhoff @ 2006-04-06 21:51 UTC (permalink / raw)
To: Keith Packard; +Cc: Jim Radford, Git Mailing List
On 4/7/06, Keith Packard <keithp@keithp.com> wrote:
> > Almost all of the errors I was seeing in the last version were fixed
> > with your "branches that don't get merged back to the trunk" fix.
>
> That's good news at least.
I'm re-running my import of Moodle's cvs (20K commits) with the newer
parsecvs. The previous attempt looked very good except that
- file additions were recorded with one-commit-per-file. I am not
sure how rcs is recording these, but hte user does enter a common
message at "commit" time. Perhaps the file addition action could be
ignored then?
- some tags made on a branch show up in HEAD. This may be due to
partial-tree branches, but I am not sure.
cheers
m
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: parsecvs tool now creates git repositories
2006-04-06 21:51 ` Martin Langhoff
@ 2006-04-06 22:19 ` Keith Packard
2006-04-06 23:22 ` Martin Langhoff
0 siblings, 1 reply; 13+ messages in thread
From: Keith Packard @ 2006-04-06 22:19 UTC (permalink / raw)
To: Martin Langhoff; +Cc: keithp, Jim Radford, Git Mailing List
[-- Attachment #1: Type: text/plain, Size: 844 bytes --]
On Fri, 2006-04-07 at 09:51 +1200, Martin Langhoff wrote:
> - file additions were recorded with one-commit-per-file. I am not
> sure how rcs is recording these, but hte user does enter a common
> message at "commit" time. Perhaps the file addition action could be
> ignored then?
If the log message is identical, and the dates are in-range, parsecvs
"should" put the adds in the same commit.
> - some tags made on a branch show up in HEAD. This may be due to
> partial-tree branches, but I am not sure.
Finding branch points is not perfect; it's complicated by bizzarre
behaviour when adding files and casual CVS changes which make precise
branch points hard to detect. Can I get at this repository to play with?
I'd like to see if we can't get the branch point detection more
accurate.
--
keith.packard@intel.com
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: parsecvs tool now creates git repositories
2006-04-06 22:19 ` Keith Packard
@ 2006-04-06 23:22 ` Martin Langhoff
2006-04-07 7:24 ` Keith Packard
0 siblings, 1 reply; 13+ messages in thread
From: Martin Langhoff @ 2006-04-06 23:22 UTC (permalink / raw)
To: Keith Packard; +Cc: Jim Radford, Git Mailing List
On 4/7/06, Keith Packard <keithp@keithp.com> wrote:
> On Fri, 2006-04-07 at 09:51 +1200, Martin Langhoff wrote:
>
> > - file additions were recorded with one-commit-per-file. I am not
> > sure how rcs is recording these, but hte user does enter a common
> > message at "commit" time. Perhaps the file addition action could be
> > ignored then?
>
> If the log message is identical, and the dates are in-range, parsecvs
> "should" put the adds in the same commit.
parsecvs is committing them with the "added file foo.x" message, not
the actual commit message.
> > - some tags made on a branch show up in HEAD. This may be due to
> > partial-tree branches, but I am not sure.
>
> Finding branch points is not perfect; it's complicated by bizzarre
> behaviour when adding files and casual CVS changes which make precise
> branch points hard to detect. Can I get at this repository to play with?
I fetch it with something along the lines of...
while ( true ) ; do
wget -qc http://cvs.sourceforge.net/cvstarballs/moodle-cvsroot.tar.bz2 &&
break
sleep 5
done
and then import the "moodle" module.
cheers,
m
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: parsecvs tool now creates git repositories
2006-04-06 23:22 ` Martin Langhoff
@ 2006-04-07 7:24 ` Keith Packard
0 siblings, 0 replies; 13+ messages in thread
From: Keith Packard @ 2006-04-07 7:24 UTC (permalink / raw)
To: Martin Langhoff; +Cc: keithp, Jim Radford, Git Mailing List
[-- Attachment #1: Type: text/plain, Size: 565 bytes --]
On Fri, 2006-04-07 at 11:22 +1200, Martin Langhoff wrote:
> parsecvs is committing them with the "added file foo.x" message, not
> the actual commit message.
heh. my cvs repositories are all so kludged that no files have ever been
added, it appears. I'll fix this when I've got a copy of the moodle
repository. sf.net is as useful as always.
I suspect the change is as simple as checking the format of the log
message and time time stamps of the commits and then just dropping the
1.1 revision from the tree entirely.
--
keith.packard@intel.com
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fixes to parsecvs
2006-04-06 14:48 ` Keith Packard
2006-04-06 15:26 ` Johannes Schindelin
@ 2006-04-09 23:17 ` Francois Romieu
1 sibling, 0 replies; 13+ messages in thread
From: Francois Romieu @ 2006-04-09 23:17 UTC (permalink / raw)
To: Keith Packard; +Cc: Jan-Benedict Glaw, Git Mailing List
Keith Packard <keithp@keithp.com> :
[...]
> > How well does this work with even larger repositories?
>
> postgresql is the largest I've run; starting with a 615M CVS repository,
> it built a 1.7G .git tree, which packed down to 125M.
As a datapoint, I gave parsecvs a try on a local CVS repository.
The repository weights 3.28 Go. It contains 53k files (45k non-attic).
.git/objets grew from ~100k files at the end of the first pass to
199k files (~11k commit). It took 18h on a 3GHz PIV with 2Go RAM.
After 6 hours, 400 Mo were pushed to swap and parsecvs took 1.95 Go
of RAM for itself. No significant swap activity. Swap grew to 900 Mo
at end of run. A tarball (5 Mo) containing vmstat + size of objects
is available at http://www.cogenit.fr/linux/misc/cvsparse-debug.tar.bz2
I have interrupted 'git repack -a -d' after 6 hours.
--
Ueimor
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2006-04-09 23:22 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20060405174247.GA29758@blackbean.org>
[not found] ` <1144262498.2303.231.camel@neko.keithp.com>
2006-04-06 6:36 ` Fixes to parsecvs Keith Packard
2006-04-06 12:08 ` Jan-Benedict Glaw
2006-04-06 14:48 ` Keith Packard
2006-04-06 15:26 ` Johannes Schindelin
2006-04-06 16:09 ` Jan-Benedict Glaw
2006-04-06 17:36 ` Keith Packard
2006-04-09 23:17 ` Francois Romieu
2006-04-06 18:15 ` parsecvs tool now creates git repositories Jim Radford
2006-04-06 20:12 ` Keith Packard
2006-04-06 21:51 ` Martin Langhoff
2006-04-06 22:19 ` Keith Packard
2006-04-06 23:22 ` Martin Langhoff
2006-04-07 7:24 ` Keith Packard
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).