public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [ANNOUNCE] BK->CVS (real time mirror)
@ 2003-03-12 17:42 Larry McVoy
  2003-03-12 18:01 ` Roman Zippel
                   ` (2 more replies)
  0 siblings, 3 replies; 109+ messages in thread
From: Larry McVoy @ 2003-03-12 17:42 UTC (permalink / raw)
  To: linux-kernel

[BK is locking up our data]
[BitMover has to give us our data in an open format]
[The BK pill is oh-so-bitter]
[My tummy hurts and it's Larry's fault]

Boo hoo, cry me a river.

Those of you complaining ought to at least look before you complain.
You just assumed that we were screwing you and you couldn't be bothered
to verify it before you complained.  We didn't screw you at all, all
the data is there.  And BK itself has always had the ability to export
any data in any format, if you read the man pages you might notice that,
but that would be too much work, it's easier to complain.

If you had actually gone and looked at the CVS repository you would have
seen that there is nothing of value missing, in almost 100% of the files,
the full revision history is preserved:

	CVS: 110,076 deltas over all files
	BK:  121,891 deltas over all files

You guys don't have that much parallelism in your files and the exporter
is capturing all that it can which is virtually everything.  It's worth
noting that many deltas in BK are just event recorders, they are just
empty merge delta noise and in fact many people have asked us to get rid
of them.  Once again, it's easier to complain than think.  I'm detecting
a trend.

The graph traversal managed to capture an amazing amount of information,
it's bloody awesome, which you might have noticed if you had looked.
But, nooooo, let's just piss and moan.  What a bunch of friggin' whiners.

The next time you open your mouth, the words that come out of it should be
"thank you".  Nothing else, just that.  If you can't say something nice,
now is a good time to say nothing at all because we are sick and tired of
dealing with people who complain far more than they code.  I'm serious,
we've done way more than anyone could reasonably expect and you react
with no basis in fact, assume bad things that aren't true, don't bother
to look to see if there is a real problem, and don't bother to say thanks.
Aren't you the slightest bit ashamed of your behaviour?  
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 109+ messages in thread
* Re: [ANNOUNCE] BK->CVS (real time mirror)
@ 2003-03-17 23:08 David Mansfield
  2003-03-17 23:25 ` Andrea Arcangeli
  0 siblings, 1 reply; 109+ messages in thread
From: David Mansfield @ 2003-03-17 23:08 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel


Andrea,

FWIW, I have already written a program called cvsps (www.cobite.com/cvsps) 
which extracts 'patchset' information from cvs log output.  

Currently, this program doesn't work with the bk-cvs because the log 
messages that are committed with each file in a changeset can be 
different, and cvsps assumes the log message will  be the same.  

However, about a 5 line hack to my program (in progress) will allow it to 
recreate the ChangeSet information, since Larry has promised that the 
timestamps of all files touched by a changeset will be unique.

This might help you out.  I'll let you know when the '--bk-cvs' option has 
been implemented ;-)

David

-- 
/==============================\
| David Mansfield              |
| lkml@dm.cobite.com           |
\==============================/


^ permalink raw reply	[flat|nested] 109+ messages in thread
* Re: [ANNOUNCE] BK->CVS (real time mirror)
@ 2003-03-13 15:38 David Mansfield
  2003-03-13 15:42 ` Larry McVoy
  0 siblings, 1 reply; 109+ messages in thread
From: David Mansfield @ 2003-03-13 15:38 UTC (permalink / raw)
  To: Larry McVoy; +Cc: linux-kernel


Hi Larry,

I've been reading this thread, and I think the CVS repository you set up 
is a great service.  I have a request to improve the quality of the data.

If you want to skip straight to the suggestion, goto SUGGESTION.

I am maintainer of a handy GPLed utility called 'cvsps' (plug:  
http://www.cobite.com/cvsps) which extracts 'patchset' information from a
cvs repository by parsing the 'cvs log' output. It attempts to recreate a
commit as a single atomic action, and all the branch and tag gook that
goes with it.

It's a read-only tool that I find useful to see what is going on in a cvs 
repository.

Back to you: I've looked at the CVS log output from your repository, and
had my program parse it back into 'patchsets' but it's not doing a great
job because the log messages from separate parts of the commit are
different.

This is fine, because I can easily write a 'hack' to look explicitly for 
the '(Logical change x.yyyyy)' text to group individual file commits back 
into patchsets.

But this text is missing from the 'main' file commit (to the ChangeSet
file) that has the BKrev: tag in it.

SUGGESTION:
Put the '(Logical change x.yyyy)' text into EVERY log message that is a 
port of the logical change, including the 'main' commit to the ChangeSet, 
that commit has the BKrev: in it (it's missing from this one file's log 
message).

Then I can make a '--bk' hack to my program to use this 'key' to recreate 
the commits.

Let me know what you think,
David

-- 
/==============================\
| David Mansfield              |
| lkml@dm.cobite.com           |
\==============================/


^ permalink raw reply	[flat|nested] 109+ messages in thread
* [ANNOUNCE] BK->CVS (real time mirror)
@ 2003-03-12  3:43 Larry McVoy
  2003-03-12  4:16 ` Ben Collins
                   ` (3 more replies)
  0 siblings, 4 replies; 109+ messages in thread
From: Larry McVoy @ 2003-03-12  3:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: ockman, dev

We've been working on a gateway between BitKeeper and CVS to provide
the revision history in a form which makes the !BK people happy (or
happier).

We have the first pass of this completed and have a linux 2.5 tree on
kernel.bkbits.net and you can check out the tree as follows (please don't
do this unless you are a programmer and will be using this.  Penguin
Computing provided the hardware and the bandwidth for that machine and
if you all melt down the network they could get annoyed.  By all means
go for it if you actually write code, though, that's why it is there.)

    mkdir ws
    cd ws
    cvs -d:pserver:anonymous@kernel.bkbits.net:/home/cvs co linux-2.5

Each of the releases are tagged, they are of the form v2_5_64 etc.

Linus had said in the past that someone other than us should do this but
as it turns out, to do a reasonable job you need BK source.  So we did it.
What do we mean by a reasonable job?  BitKeeper has an automatic branch
feature which captures all parallel development.  It's cool but a bit
pedantic and it makes exporting to a different system almost impossible
if you try and match what BK does exactly.  So we didn't.  What we
(actually Wayne Scott) did was to write a graph traversal alg which
finds the longest path through the revision history which includes
all tags.  For the 2.5 tree, that is currently 8298 distinct points.
Each of those points has been captured in CVS as a commit.  If we did
our job correctly, each of these commits has the same timestamp across
all files.  So you should be able to get any changeset out of the CVS
tree with the appropriate CVS command based on dates.

We also created a ChangeSet file in the CVS tree.  It has no contents, it
serves as a place to capture the BK changeset comments.  Each file which
is part of a changeset has an extra comment which is of the form

	(Logical change 1.%d)

where the "1.%d" matches the changeset rev.  So you can look for all files
that have (Logical change 1.300) in their comments to reconstruct the 
changeset.  NOTE!  That information is actually redundant, the timestamps
are supposed to do the same thing, let us know if that is not working, we'll
redo it.  I expect we'll find bugs, please be patient, it takes 4 hours of
CPU time on a 2.1Ghz Athlon to do the conversion, that's a big part of 
why this has taken so long.  That's after a week's worth of optimizations.

Each ChangeSet delta has a BK rev associated with it in the comments.
We'll be giving you a small shell script which you can use to send Linus
patches that include the rev and we'll modify BK so that it can take
those patches with no patch rejects if you used that script.

We have a first pass of a real time gateway between BK and this CVS tree 
done.  Right now it is done by hand (by me) but as soon as it is debugged
you will see this tree being updated about 1-3 minutes after Linus pushes
to bkbits.  

Once you guys look this over and decide you like it, we'll do the same
thing for the 2.4 tree.

We're also talking to an unnamed (in case it doesn't work out) Linux
company who may host bkbits.net for us.  If they do that, we'll turn
the GNU patch exporter feature in BKD.  That means that you'll be able
to wget any changeset as a GNU patch, complete with checkin comments.
I'm working with Alan on the format, I think we're close though I have
to run the latest version past him.

If all of this sounds nice, it is.  It was a lot of work for us to do
this and you might be wondering why we bothered.  Well, for a couple of
reasons.  First of all, it was only recently that I realized that because
BK is not free software some people won't run BK to get data out of BK.
It may be dense on my part, but I simply did not anticipate that people
would be that extreme, it never occurred to me.  We did a ton of work to
make sure anyone could get their data out of BK but you do have to run
BK to get the data.  I never thought of people not being willing to run
BK to get at the data.  Second, we have maintained SCCS compatible file
formats so that there would be another way to get the data out of BK.
This has held us back in terms of functionality and performance.  I had
thought there was some value in the SCCS format but recent discussions
on this list have convinced me that without the changeset information
the file format doesn't have much value.

Our goal is to provide the data in a way that you can get at it without
being dependent on us or BK in any way.  As soon as we have this
debugged, I'd like to move the CVS repositories to kernel.org (if I can
get HPA to agree) and then you'll have the revision history and can live
without the fear of the "don't piss Larry off license".  Quite frankly,
we don't like the current situation any better than many of you, so if
this addresses your concerns that will take some pressure off of us.

Another goal is to have the freedom to evolve our file formats to be
better, better performance and more features.  SCCS is holding us back.
So you should look hard at what we are providing and figure out if it
is enough.  If you come back with "well, it's not BitKeeper so it's
not enough" we'll just ignore that.  CVS isn't BitKeeper.  On the
other hand, we believe we have gone as far as is possible to provide
all of the information, checkin comments, data, timestamps, user names,
everything.  The graph traversal alg captures information at an extremely
fine granularity, absolutely as fine is possible.  We have 8298 distinct
points over the 2.5.0 .. 2.5.64 set of changes, so it is 130 times finer
than the official releases.  If you think something is missing, tell us,
we'll try and fix it.

The payoff for you is that you have the data in a format that is not
locked into some tool which could be taken away.  The payoff for us is
that we can evolve our tool as we see fit.  We have that right today,
we can do whatever we want, but it would be anywhere from annoying
to unethical to do so if that meant that you couldn't get at the data
except through BitKeeper.  So the "deal" here is that you get the data
in CVS (and/or patches + comments) and we get to hack the heck out of
the file format.  Our changes are going to move far faster than CSSC or
anyone else could keep up without a lot of effort.  On the other hand,
our changes are going to make cold cache performance be much closer to
hot cache performance, use a lot less disk space, a lot less memory,
and a lot less CPU.

So take a look and tell me what you think.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 109+ messages in thread

end of thread, other threads:[~2003-03-22  0:41 UTC | newest]

Thread overview: 109+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-03-12 17:42 [ANNOUNCE] BK->CVS (real time mirror) Larry McVoy
2003-03-12 18:01 ` Roman Zippel
2003-03-12 18:34 ` Ben Collins
2003-03-12 19:03   ` Sam Ravnborg
2003-03-12 19:38     ` Roman Zippel
2003-03-12 19:32   ` Nicolas Pitre
2003-03-12 19:53     ` Ben Collins
2003-03-12 20:09       ` Ben Collins
2003-03-12 20:20       ` Jeff Garzik
2003-03-12 23:58         ` Roman Zippel
2003-03-12 20:37       ` Nicolas Pitre
2003-03-13  2:57   ` Aaron Lehmann
2003-03-16  3:48   ` Andrea Arcangeli
2003-03-16 17:45     ` Roman Zippel
2003-03-16 18:54       ` Nicolas Pitre
2003-03-16 19:30         ` Shawn
2003-03-16 19:33         ` Roman Zippel
2003-03-16 21:52           ` Andrea Arcangeli
2003-03-17  1:18             ` Roman Zippel
2003-03-17  1:35               ` Larry McVoy
2003-03-17  1:56                 ` Roman Zippel
2003-03-17  9:01                 ` Henning P. Schmiedehausen
2003-03-17 17:46                 ` Daniel Phillips
2003-03-17 18:04                   ` Jeff Garzik
2003-03-17 19:32                     ` Jamie Lokier
2003-03-17 19:40                       ` David Lang
2003-03-17 20:00                         ` Jamie Lokier
2003-03-17 20:43                       ` Andrea Arcangeli
2003-03-17 20:12                     ` Roman Zippel
2003-03-17 21:56             ` Pavel Machek
2003-03-17 22:08               ` Andrea Arcangeli
2003-03-21 14:16                 ` Larry McVoy
2003-03-21 17:42                   ` Andrea Arcangeli
2003-03-21 19:40                   ` H. Peter Anvin
2003-03-22  0:15                     ` Larry McVoy
2003-03-22  0:51                       ` H. Peter Anvin
2003-03-17 17:41           ` Horst von Brand
2003-03-17 18:04             ` Petr Baudis
2003-03-12 19:21 ` Nicolas Pitre
2003-03-12 19:51   ` Larry McVoy
2003-03-12 20:08     ` Ben Collins
2003-03-12 20:14     ` Sam Ravnborg
2003-03-12 20:18       ` Larry McVoy
2003-03-12 20:46       ` Nicolas Pitre
2003-03-12 20:58         ` Larry McVoy
2003-03-12 21:08           ` Nicolas Pitre
2003-03-13  0:41             ` Larry McVoy
2003-03-12 21:18           ` Eli Carter
2003-03-13 20:45           ` Horst von Brand
2003-03-13  1:58       ` Larry McVoy
2003-03-13 23:40       ` Larry McVoy
2003-03-12 21:05     ` Daniel Jacobowitz
2003-03-12 21:18       ` Larry McVoy
2003-03-12 21:31         ` Daniel Jacobowitz
2003-03-12 21:33           ` Larry McVoy
2003-03-12 21:45         ` Kai Germaschewski
2003-03-12 22:01           ` Larry McVoy
2003-03-12 22:21             ` David Lang
2003-03-12 22:30               ` Larry McVoy
2003-03-12 23:18                 ` Andreas Dilger
2003-03-15 16:52                   ` Larry McVoy
2003-03-13 21:00             ` Horst von Brand
2003-03-13  9:43     ` Geert Uytterhoeven
2003-03-13 23:26       ` Larry McVoy
2003-03-14  8:53         ` Geert Uytterhoeven
  -- strict thread matches above, loose matches on Subject: below --
2003-03-17 23:08 David Mansfield
2003-03-17 23:25 ` Andrea Arcangeli
2003-03-17 23:33   ` Larry McVoy
2003-03-17 23:57     ` Andrea Arcangeli
2003-03-18  1:48     ` David Mansfield
2003-03-18  2:43       ` Andrea Arcangeli
2003-03-13 15:38 David Mansfield
2003-03-13 15:42 ` Larry McVoy
2003-03-12  3:43 Larry McVoy
2003-03-12  4:16 ` Ben Collins
2003-03-12  8:55   ` Jens Axboe
2003-03-12 10:26     ` Andreas Dilger
2003-03-12 10:31       ` Jens Axboe
2003-03-12 10:56         ` Andreas Dilger
2003-03-12 11:15           ` Jens Axboe
2003-03-12 11:20       ` Jamie Lokier
2003-03-12 16:13       ` H. Peter Anvin
2003-03-12 16:30         ` Dana Lacoste
2003-03-12 16:47           ` John Bradford
2003-03-12 17:08           ` Roman Zippel
2003-03-12 21:50             ` Alan Cox
2003-03-13 23:30               ` Roman Zippel
2003-03-12 17:29           ` H. Peter Anvin
2003-03-12 17:57             ` John Bradford
2003-03-12 18:03               ` Larry McVoy
2003-03-12 20:49                 ` H. Peter Anvin
2003-03-13  7:59                 ` Theodore Ts'o
2003-03-13  9:58                   ` Roman Zippel
2003-03-12 16:18       ` Ben Collins
2003-03-12 16:47         ` Lars Marowsky-Bree
2003-03-12 17:34           ` Ryan Anderson
2003-03-12 18:38   ` Arador
2003-03-12 18:47     ` Ben Collins
2003-03-12 19:12       ` Andreas Dilger
2003-03-13  0:29       ` Martin J. Bligh
2003-03-13  0:56         ` Larry McVoy
2003-03-16  3:44       ` Andrea Arcangeli
2003-03-12  4:39 ` H. Peter Anvin
2003-03-12  4:56   ` Larry McVoy
2003-03-16  3:10   ` Andrea Arcangeli
2003-03-12 19:34 ` Brandon Low
2003-03-16 13:45 ` Pavel Machek
2003-03-17 14:18   ` Wayne Scott
2003-03-17 14:45     ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox