All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] git-cvsimport-script: parse multidigit revisions
@ 2005-07-12 21:35 Sven Verdoolaege
  2005-07-13  1:18 ` Matthias Urlichs
  0 siblings, 1 reply; 11+ messages in thread
From: Sven Verdoolaege @ 2005-07-12 21:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Matthias Urlichs, Git Mailing List

git-cvsimport-script: parse multidigit revisions.

Previously, git-cvsimport-script would fail
on revisions with more than one digit.

Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>

---
commit 7b5f7bcc470528beb4a0b6ef1c93ce634aaa0158
tree db66d0759f97016bd123e2351aa0e77585e3177b
parent e30e814dbfef7a6e89418863e5d7291a2d53b18f
author Sven Verdoolaege <skimo@kotnet.org> Tue, 12 Jul 2005 22:36:57 +0200
committer Sven Verdoolaege <skimo@kotnet.org> Tue, 12 Jul 2005 22:36:57 +0200

 git-cvsimport-script |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/git-cvsimport-script b/git-cvsimport-script
--- a/git-cvsimport-script
+++ b/git-cvsimport-script
@@ -675,7 +675,7 @@ while(<CVS>) {
 		$state = 9;
 	} elsif($state == 8) {
 		$logmsg .= "$_\n";
-	} elsif($state == 9 and /^\s+(\S+):(INITIAL|\d(?:\.\d+)+)->(\d(?:\.\d+)+)\s*$/) {
+	} elsif($state == 9 and /^\s+(\S+):(INITIAL|\d+(?:\.\d+)+)->(\d+(?:\.\d+)+)\s*$/) {
 #	VERSION:1.96->1.96.2.1
 		my $init = ($2 eq "INITIAL");
 		my $fn = $1;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] git-cvsimport-script: parse multidigit revisions
  2005-07-12 21:35 [PATCH] git-cvsimport-script: parse multidigit revisions Sven Verdoolaege
@ 2005-07-13  1:18 ` Matthias Urlichs
  2005-07-25 23:00   ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Matthias Urlichs @ 2005-07-13  1:18 UTC (permalink / raw)
  To: Sven Verdoolaege; +Cc: Linus Torvalds, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 422 bytes --]

Hi,

Sven Verdoolaege:
> Previously, git-cvsimport-script would fail
> on revisions with more than one digit.
> 
Ouch. Thanks.

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
 - -
When a guy says...
Good idea.
	He really means....
It'll never work.  And I'll spend the rest of the day gloating.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] git-cvsimport-script: parse multidigit revisions
  2005-07-13  1:18 ` Matthias Urlichs
@ 2005-07-25 23:00   ` Linus Torvalds
  2005-07-25 23:42     ` Matthias Urlichs
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2005-07-25 23:00 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: Sven Verdoolaege, Git Mailing List



On Wed, 13 Jul 2005, Matthias Urlichs wrote:
> 
> Sven Verdoolaege:
> > Previously, git-cvsimport-script would fail
> > on revisions with more than one digit.
> > 
> Ouch. Thanks.

Hmm.. I finally tried to import the bkcvs kernel tree into git, and while 
I'm cursing the slowness of CVS (I'm _hoping_ that part of the problem is 
just that CVS is especially slow with old versions of files, and that the 
import will eventually start speeding up), there's definitely something 
wrong going on with new files in an archive...

In particular, they always end up being imported as zero-sized empty
files, and will be filled in only later if that file is ever touched 
again. In other words, the resulting git tree ends up being bogus.

The command line I used was

	git cvsimport -d /home/torvalds/bkcvs -p --bkcvs linux-2.5

and I wonder if it's the "--bkcvs" thing that confuses cvsimport, but I 
also wonder if anybody has actually tried this before. With or without 
the --bkcvs line, I get

	Argument "28213 has collisions" isn't numeric in addition (+) at /home/torvalds/bin/git-cvsimport-script line 600, <CVS> line 1.
	* UNKNOWN LINE * PatchSet 28209 has collisions
	* UNKNOWN LINE * PatchSet 28194 has collisions
	* UNKNOWN LINE * PatchSet 28181 has collisions
	* UNKNOWN LINE * PatchSet 28180 has collisions
	...

Sadly, since I don't know perl, I can't tell whether this is a problem 
with cvsps on the kernel, or the cvsimport perl script. I'm going to try 
my old hacky C + shell alternative next just to see, but I thought I'd ask 
whether people have tried this before and already know what's wrong.

Btw, looking at what the perl script _seems_ to do, it does seem to do
insane things for the local CVS archive case. As far as I can tell from
the spaghetti that is perl, it uses a CVS server to handle even the local 
file case, which just _can't_ be right. I realize you'd want to do that to 
avoid connecting millions of times, but maybe it's better to use something 
like cvsnup to download the whole thing, and then always use a local CVS 
archive?

			Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] git-cvsimport-script: parse multidigit revisions
  2005-07-25 23:00   ` Linus Torvalds
@ 2005-07-25 23:42     ` Matthias Urlichs
  2005-07-26  3:07       ` Linus Torvalds
  2005-07-26  4:22       ` Ryan Anderson
  0 siblings, 2 replies; 11+ messages in thread
From: Matthias Urlichs @ 2005-07-25 23:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Sven Verdoolaege, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 2335 bytes --]

Hi,

Linus Torvalds:
> In particular, they always end up being imported as zero-sized empty
> files, and will be filled in only later if that file is ever touched 
> again. In other words, the resulting git tree ends up being bogus.
> 
That's a problem with the bkcvs tree. Remember tht Bitkeeper does
exactly the same thing -- the 1.0 version of *any* file is empty, and
content appears only in version 1.1.

Well, the bkcvs export preserved that ... "feature".

(Side question - why aren't you doing a direct bk2git import?)

> 	Argument "28213 has collisions" isn't numeric in addition (+) at /home/torvalds/bin/git-cvsimport-script line 600, <CVS> line 1.

That's an output from cvsps that is not handled yet.
If you really need it I'll have to investigate.

> Btw, looking at what the perl script _seems_ to do, it does seem to do
> insane things for the local CVS archive case. As far as I can tell from
> the spaghetti that is perl, it uses a CVS server to handle even the local 
> file case, which just _can't_ be right.

Sure it is, because ...

>                                         I realize you'd want to do that to 
> avoid connecting millions of times, but maybe it's better to use something 
> like cvsnup to download the whole thing, and then always use a local CVS 
> archive?

... I don't have a sensible RCS library for perl (the code that I could
find is just a command line front-end). Fork+exec of some cvs checkout
command per file is slower than just running a persistent CVS server.

I've tried other ideas, but they run into problems because some
idiots^Wpeople occasionally tag only parts of a CVS tree, or they do it
at different times, and cvsps has to rearrange stuff in a way the CVS
utilities don't understand, so any higher-level access than "grab a
bunch of files by their revision number and stick them into a commit"
don't work in real life.

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
 - -
Please try to limit the amount of "this room doesn't have any bazingas"
until you are told that those rooms are "punched out."  Once punched out,
we have a right to complain about atrocities, missing bazingas, and such.
		-- N. Meyrowitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] git-cvsimport-script: parse multidigit revisions
  2005-07-25 23:42     ` Matthias Urlichs
@ 2005-07-26  3:07       ` Linus Torvalds
  2005-07-26  3:43         ` Linus Torvalds
  2005-07-26  4:22       ` Ryan Anderson
  1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2005-07-26  3:07 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: Sven Verdoolaege, Git Mailing List, David Mansfield


On Tue, 26 Jul 2005, Matthias Urlichs wrote:
>
> That's a problem with the bkcvs tree. Remember tht Bitkeeper does
> exactly the same thing -- the 1.0 version of *any* file is empty, and
> content appears only in version 1.1.

Not really. That may be how the SCCS _deltas_ end up being done
internally, but it's definitely not how BK changesets work. 

For example, in BK SCCS files (if I understood it correctly, I've not
actually looked very closely) a "rename" ends up being two deltas - the
"diff" delta and the "rename" delta. That does _not_ really mean that BK
considers it two different things, it's purely a SCCS file layout thing, 
and it just shows through to bkcvs.

Similarly, apparently in SCCS the "create" event is one revision, and the
"initial data" is another one, and again, that shows up in bkcvs even
though that's not really how BK works conceptually at all.

So then the cvs archives end up showing some of these things as multiple
separate patches, but that just means that cvsps doesn't understand that
they all get collapsed into one changeset in the bk model.  The fact that
some changes may end up showing as multiple deltas is just a result of BK
mostly re-using an old fileformat that just doesn't know anything at all
about changesets what-so-ever.

cvsps should really have collapsed those things into _one_ changeset. I 
had thought that it would do so automatically based on the date (the date 
should always be exactly the same), but it turns out that since the log 
messages might be different, cvsps will split _one_ ChangeSet into 
multiple patches, which is _wrong_.

In fact, I guess the log messages _will_ be different, because the bkcvs 
thing will always put in the BK key in the "real" message.

I _thought_ that this was exactly what the "--bkcvs" flag was going to
notice, but it seems to not be the case. Or maybe "cvsimport" just doesn't 
pass that flag through.. In fact, I just checked, and "cvsps --bkcvs" 
_does_ do the right thing, and collapses all of these things to one 
"PatchSet".

But I think I see why "git cvsimport" does the wrong thing: since that one 
commit has _all_ the deltas associated with the commit, it looks like 
this:

	PatchSet 2
	Date: 2002/02/05 17:40:40
	Author: torvalds
	Branch: HEAD
	Tag: v2_4_0
	Log:
	Import changeset
	
	BKrev: 3c601918i-Rse1XOIZxu4fPHUrTmmA
	
	Members:
	        COPYING:1.1->1.2
	        COPYING:INITIAL->1.1
	        CREDITS:1.1->1.2
	        CREDITS:INITIAL->1.1
	        ChangeSet:1.2->1.3
	        MAINTAINERS:1.1->1.2
		....

and notice how "COPYING" (and all other new files) has two deltas
associated with it, the "INITIAL->1.1" and the "1.1->1.2" one.

And they are in the wrong order, so "cvsimport" ends up committing the 
last one, which is the _empty_ one.

Notice? We'll end up committing "COPYING 1.1" (the empty initial create)
even though we _should_ have committed "COPYING 1.2" (the actual thing
that BK committed).

> Well, the bkcvs export preserved that ... "feature".

No, the bkcvs thing exports an atomic BK commit as several deltas (with
the same date) not because it's a "feature" of BK, but partly because you
can't express what BK does in CVS, and partly because of what appears to
be purely internal BK implementation details (ie a feature of the SCCS
file).

BK did it right, and we just imported it wrong.

David - is there some way where cvsps could always order these things by 
revision? I now realize that this is probably also what causes cvsps to 
complain about things like:

	..
	PatchSet 2 has collisions
	..

which means that cvsps actually _saw_ this, but just output the result in 
the wrong order as far as "git cvsimport" was concerned.

The preferred solution would be to always just suppress the older revision
when you see multiple ones - it is by definition not interesting (you
cannot actually ever access it even in the original BK tree).

> (Side question - why aren't you doing a direct bk2git import?)

Because I don't have any BK trees left, and because I'm not going to touch
Andrews code. We'd have had a portable BK export format (in fact, I wrote
one as a proof-of-concept thing when we were tryign to convince Tridge
that he really doesn't want to muck inside BK internals), but then shit 
happens..

So I'll use the CVS thing. 

> >                                         I realize you'd want to do that to 
> > avoid connecting millions of times, but maybe it's better to use something 
> > like cvsnup to download the whole thing, and then always use a local CVS 
> > archive?
> 
> ... I don't have a sensible RCS library for perl (the code that I could
> find is just a command line front-end). Fork+exec of some cvs checkout
> command per file is slower than just running a persistent CVS server.

Fair enough, I suspect that without a builtin RCS library it really ends 
up being faster than the whole exec thing. 

I thought there was a RCS library (I know I got some hits for "rcslib"  
for some scripting language and was cursing the fact that it wasn't for
C), but I never really looked closer. I'm not surprised if the "library"
ends up just doing a "system()" thing.

> I've tried other ideas, but they run into problems because some
> idiots^Wpeople occasionally tag only parts of a CVS tree

Yeah, CVS really allows some very very annoying things that only work in a 
file-at-a-time model.

		Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] git-cvsimport-script: parse multidigit revisions
  2005-07-26  3:07       ` Linus Torvalds
@ 2005-07-26  3:43         ` Linus Torvalds
  2005-07-26 16:50           ` Linus Torvalds
  2005-07-26 21:46           ` David Mansfield
  0 siblings, 2 replies; 11+ messages in thread
From: Linus Torvalds @ 2005-07-26  3:43 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: Sven Verdoolaege, Git Mailing List, David Mansfield



On Mon, 25 Jul 2005, Linus Torvalds wrote:
> 
> And they are in the wrong order, so "cvsimport" ends up committing the 
> last one, which is the _empty_ one.
> 
> Notice? We'll end up committing "COPYING 1.1" (the empty initial create)
> even though we _should_ have committed "COPYING 1.2" (the actual thing
> that BK committed).

David, how about a patch like this to cvsps? My very very limited testing
seems to say that it does the right thing..

It's very simple: if we are adding the same file twice to the same 
PatchSet, we just look at the ordering of the revisions. If the revision 
we're adding is older than the revision we already have, we just drop that 
revision entirely. If it's the same, something is really wrong, and we add 
it to the "collisions" list. And if it's newer, then we remove the old 
revision for that file, and add the new one instead.

As far as I can tell, the old code really was broken, since it would
happen to list different revisions in a random order when you had multiple
changes to the same file in the same patchset. This one always selects the
last one, which would seem to be the sane behaviour.

And this all seem to make "git cvsimport -p --bkcvs" do the right thing. 

		Linus

---
diff --git a/cvsps.c b/cvsps.c
--- a/cvsps.c
+++ b/cvsps.c
@@ -2384,8 +2384,31 @@ void patch_set_add_member(PatchSet * ps,
     for (next = ps->members.next; next != &ps->members; next = next->next) 
     {
 	PatchSetMember * m = list_entry(next, PatchSetMember, link);
-	if (m->file == psm->file && ps->collision_link.next == NULL) 
-		list_add(&ps->collision_link, &collisions);
+	if (m->file == psm->file) {
+		int order = compare_rev_strings(psm->post_rev->rev, m->post_rev->rev);
+
+		/*
+		 * Same revision too? Add it to the collision list
+		 * if it isn't already.
+		 */
+		if (!order) {
+			if (ps->collision_link.next == NULL)
+				list_add(&ps->collision_link, &collisions);
+			return;
+		}
+			
+		/*
+		 * If this is an older revision than the one we already have
+		 * in this patchset, just ignore it
+		 */
+		if (order < 0)
+			return;
+
+		/*
+		 * This is a newer one, remove the old one
+		 */
+		list_del(&m->link);
+	}
     }
 
     psm->ps = ps;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] git-cvsimport-script: parse multidigit revisions
  2005-07-25 23:42     ` Matthias Urlichs
  2005-07-26  3:07       ` Linus Torvalds
@ 2005-07-26  4:22       ` Ryan Anderson
  1 sibling, 0 replies; 11+ messages in thread
From: Ryan Anderson @ 2005-07-26  4:22 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: Linus Torvalds, Sven Verdoolaege, Git Mailing List

On Tue, Jul 26, 2005 at 01:42:57AM +0200, Matthias Urlichs wrote:
> (Side question - why aren't you doing a direct bk2git import?)

The last time I went looking for a tool to do this, I failed to find it
- where can I get this?


-- 

Ryan Anderson
  sometimes Pug Majere

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] git-cvsimport-script: parse multidigit revisions
  2005-07-26  3:43         ` Linus Torvalds
@ 2005-07-26 16:50           ` Linus Torvalds
  2005-07-26 17:41             ` Rene Scharfe
  2005-07-26 21:46           ` David Mansfield
  1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2005-07-26 16:50 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: Sven Verdoolaege, Git Mailing List



On Mon, 25 Jul 2005, Linus Torvalds wrote:
> 
> David, how about a patch like this to cvsps? My very very limited testing
> seems to say that it does the right thing..

Hmm.. David Mansfields address is bouncing, and it's apparently not just 
that "cvsps" thing, since it says that the MX machine can't be looked up. 
Does anybody have an alternate address for him? All the ones I've seen so 
far with google are at the same failing "dm.cobite.com" address.

		Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] git-cvsimport-script: parse multidigit revisions
  2005-07-26 16:50           ` Linus Torvalds
@ 2005-07-26 17:41             ` Rene Scharfe
  0 siblings, 0 replies; 11+ messages in thread
From: Rene Scharfe @ 2005-07-26 17:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Matthias Urlichs, Sven Verdoolaege, Git Mailing List

Linus Torvalds wrote:
> Hmm.. David Mansfields address is bouncing, and it's apparently not just 
> that "cvsps" thing, since it says that the MX machine can't be looked up. 
> Does anybody have an alternate address for him? All the ones I've seen so 
> far with google are at the same failing "dm.cobite.com" address.

Last message from him to this list came from david at "cobite.com" two
months ago.  Have you tried that one?

Rene

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] git-cvsimport-script: parse multidigit revisions
  2005-07-26  3:43         ` Linus Torvalds
  2005-07-26 16:50           ` Linus Torvalds
@ 2005-07-26 21:46           ` David Mansfield
  2005-07-26 22:01             ` Linus Torvalds
  1 sibling, 1 reply; 11+ messages in thread
From: David Mansfield @ 2005-07-26 21:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthias Urlichs, Sven Verdoolaege, Git Mailing List,
	David Mansfield

Linus Torvalds wrote:
> 
> On Mon, 25 Jul 2005, Linus Torvalds wrote:
> 
>>And they are in the wrong order, so "cvsimport" ends up committing the 
>>last one, which is the _empty_ one.
>>
>>Notice? We'll end up committing "COPYING 1.1" (the empty initial create)
>>even though we _should_ have committed "COPYING 1.2" (the actual thing
>>that BK committed).
> 
> 
> David, how about a patch like this to cvsps? My very very limited testing
> seems to say that it does the right thing..
> 
> It's very simple: if we are adding the same file twice to the same 
> PatchSet, we just look at the ordering of the revisions. If the revision 
> we're adding is older than the revision we already have, we just drop that 
> revision entirely. If it's the same, something is really wrong, and we add 
> it to the "collisions" list. And if it's newer, then we remove the old 
> revision for that file, and add the new one instead.
> 
> As far as I can tell, the old code really was broken, since it would
> happen to list different revisions in a random order when you had multiple
> changes to the same file in the same patchset. This one always selects the
> last one, which would seem to be the sane behaviour.
> 
> And this all seem to make "git cvsimport -p --bkcvs" do the right thing. 
> 

I've been 'off the web' for a few weeks on vacation.  I'll look at the 
context of the thread.  It 'smells' wierd to have to revisions in the 
same patchset at all, but I suppose you've all been through that before. 
  So let me catch up with this thread and get back to you...

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] git-cvsimport-script: parse multidigit revisions
  2005-07-26 21:46           ` David Mansfield
@ 2005-07-26 22:01             ` Linus Torvalds
  0 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2005-07-26 22:01 UTC (permalink / raw)
  To: David Mansfield
  Cc: Matthias Urlichs, Sven Verdoolaege, Git Mailing List,
	David Mansfield


Ahh, the cobite.com address worked ;)

David, as you may or may not be aware, the dm.cobite.com address was 
bouncing at least as of yesterday.

On Tue, 26 Jul 2005, David Mansfield wrote:
> 
> It 'smells' wierd to have to revisions in the same patchset at all, but
> I suppose you've all been through that before.

It seems to be just a result of how BK ends up having this internal notion
of a "delta", and one commit can contain multiple deltas to the same file.
That's really just some BK internal implementation issue showing through -
the deltas really aren't even individually accessible, it's just that BK
has this two-stage commit thing where you first commit the individual file
changes (the "delta") and then you do the _real_ commit which gathers them
all up.

Normally you don't even see this at all, since the tools basically hide 
this, but especially if you script things you'll see the difference.

In fact, in many ways the usage model when scripting ends up being a bit
like the git two-phase "git-update-cache" + "git commit" approach,
although for totally different reasons. But unlike git, you can actually
tell how somebody did several updates on the same file, and it seems to
show through in the bkcvs archives.

I bet that it wasn't even intentional, and that it's really just a result
of the bkcvs thing really just being pretty much a raw SCCS->RCS
translation (with the addition of the "changeset" notion at a higher
level).

So normally you'd not like to see this unless you use the --bkcvs flag,
but I suspect that with a big fuzz-factor and repeated commit messages you
can see it even with a perfectly normal CVS tree - if only because cvsps
might collapse two separate commits that shouldn't really be collapsed.

			Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-07-26 22:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-12 21:35 [PATCH] git-cvsimport-script: parse multidigit revisions Sven Verdoolaege
2005-07-13  1:18 ` Matthias Urlichs
2005-07-25 23:00   ` Linus Torvalds
2005-07-25 23:42     ` Matthias Urlichs
2005-07-26  3:07       ` Linus Torvalds
2005-07-26  3:43         ` Linus Torvalds
2005-07-26 16:50           ` Linus Torvalds
2005-07-26 17:41             ` Rene Scharfe
2005-07-26 21:46           ` David Mansfield
2005-07-26 22:01             ` Linus Torvalds
2005-07-26  4:22       ` Ryan Anderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.