* import determinism
@ 2010-11-07 20:25 Enrico Weigelt
2010-11-07 20:46 ` Ævar Arnfjörð Bjarmason
0 siblings, 1 reply; 13+ messages in thread
From: Enrico Weigelt @ 2010-11-07 20:25 UTC (permalink / raw)
To: git
Hi folks,
I'm curious on how deterministic the imports (git-cvsimport and
git-svn) are. Suppose I close the same cvs repo twice (assuming
no write access in between), are the resulting object SHA-1's
the same ?
cu
--
----------------------------------------------------------------------
Enrico Weigelt, metux IT service -- http://www.metux.de/
phone: +49 36207 519931 email: weigelt@metux.de
mobile: +49 151 27565287 icq: 210169427 skype: nekrad666
----------------------------------------------------------------------
Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: import determinism
2010-11-07 20:25 import determinism Enrico Weigelt
@ 2010-11-07 20:46 ` Ævar Arnfjörð Bjarmason
2010-11-07 21:01 ` Andreas Schwab
0 siblings, 1 reply; 13+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-11-07 20:46 UTC (permalink / raw)
To: weigelt, git
On Sun, Nov 7, 2010 at 21:25, Enrico Weigelt <weigelt@metux.de> wrote:
> I'm curious on how deterministic the imports (git-cvsimport and
> git-svn) are. Suppose I close the same cvs repo twice (assuming
> no write access in between), are the resulting object SHA-1's
> the same ?
No, because the committer dates will be different.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: import determinism
2010-11-07 20:46 ` Ævar Arnfjörð Bjarmason
@ 2010-11-07 21:01 ` Andreas Schwab
2010-11-07 21:56 ` Enrico Weigelt
2010-11-07 22:20 ` Martin Langhoff
0 siblings, 2 replies; 13+ messages in thread
From: Andreas Schwab @ 2010-11-07 21:01 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason; +Cc: weigelt, git
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> On Sun, Nov 7, 2010 at 21:25, Enrico Weigelt <weigelt@metux.de> wrote:
>> I'm curious on how deterministic the imports (git-cvsimport and
>> git-svn) are. Suppose I close the same cvs repo twice (assuming
>> no write access in between), are the resulting object SHA-1's
>> the same ?
>
> No, because the committer dates will be different.
The committer info in every commit will be the same as the author info,
so the repository conversion is completely deterministic.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: import determinism
2010-11-07 21:01 ` Andreas Schwab
@ 2010-11-07 21:56 ` Enrico Weigelt
2010-11-07 22:20 ` Martin Langhoff
1 sibling, 0 replies; 13+ messages in thread
From: Enrico Weigelt @ 2010-11-07 21:56 UTC (permalink / raw)
To: git
* Andreas Schwab <schwab@linux-m68k.org> wrote:
> The committer info in every commit will be the same as the author info,
> so the repository conversion is completely deterministic.
Thanks, that helps a lot.
My point is that I'm running a buch of automatic mirrors and like
to get them clustered. In theory, it should be enough - for cvs -
running git-cvsimport regularily on all nodes, right ?
cu
--
----------------------------------------------------------------------
Enrico Weigelt, metux IT service -- http://www.metux.de/
phone: +49 36207 519931 email: weigelt@metux.de
mobile: +49 151 27565287 icq: 210169427 skype: nekrad666
----------------------------------------------------------------------
Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: import determinism
2010-11-07 21:01 ` Andreas Schwab
2010-11-07 21:56 ` Enrico Weigelt
@ 2010-11-07 22:20 ` Martin Langhoff
2010-11-07 22:45 ` Andreas Schwab
1 sibling, 1 reply; 13+ messages in thread
From: Martin Langhoff @ 2010-11-07 22:20 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Ævar Arnfjörð, weigelt, git
On Sun, Nov 7, 2010 at 4:01 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> The committer info in every commit will be the same as the author info,
> so the repository conversion is completely deterministic.
For cvsimport, it is not deterministic. Given cvs'
ambiguous/buggy/inconsistent internal semantics around some
operations, cvsps makes educated guesses about what happened.
Later commits can affect those educated guesses.
IIRC, this affects the exact tree at the birth of a branch, and files
added on a branch (instead of on HEAD).
If you use cvsimport in 'incremental' runs you won't have this
problem. Whatever it decided to do, sticks.
I haven't worked in detail with Keith Packard's parsecvs but I assume
it has similar issues.
hth,
m
--
martin.langhoff@gmail.com
martin@laptop.org -- School Server Architect
- ask interesting questions
- don't get distracted with shiny stuff - working code first
- http://wiki.laptop.org/go/User:Martinlanghoff
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: import determinism
2010-11-07 22:20 ` Martin Langhoff
@ 2010-11-07 22:45 ` Andreas Schwab
2010-11-09 13:43 ` Enrico Weigelt
0 siblings, 1 reply; 13+ messages in thread
From: Andreas Schwab @ 2010-11-07 22:45 UTC (permalink / raw)
To: Martin Langhoff; +Cc: Ævar Arnfjörð, weigelt, git
Martin Langhoff <martin.langhoff@gmail.com> writes:
> On Sun, Nov 7, 2010 at 4:01 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
>> The committer info in every commit will be the same as the author info,
>> so the repository conversion is completely deterministic.
>
> For cvsimport, it is not deterministic. Given cvs'
> ambiguous/buggy/inconsistent internal semantics around some
> operations, cvsps makes educated guesses about what happened.
>
> Later commits can affect those educated guesses.
The OP was assuming an unchanging repository.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: import determinism
2010-11-07 22:45 ` Andreas Schwab
@ 2010-11-09 13:43 ` Enrico Weigelt
2010-11-10 4:40 ` Martin Langhoff
2010-11-11 4:28 ` Michael Haggerty
0 siblings, 2 replies; 13+ messages in thread
From: Enrico Weigelt @ 2010-11-09 13:43 UTC (permalink / raw)
To: git
* Andreas Schwab <schwab@linux-m68k.org> wrote:
> > For cvsimport, it is not deterministic. Given cvs'
> > ambiguous/buggy/inconsistent internal semantics around some
> > operations, cvsps makes educated guesses about what happened.
> >
> > Later commits can affect those educated guesses.
>
> The OP was assuming an unchanging repository.
My assumption is:
* the original cvs repo will have later additions
(so I'm incrementally importing)
* no commints (besides cvsimport) in the git mirror, but others
for off from there
* the mirror could get lost in an desaster (no separate backup)
and should be recreated afresh in that case.
The point behind this is: I'm running a growing number of cvs2git
mirrors and dont want to do full backups of them.
cu
--
----------------------------------------------------------------------
Enrico Weigelt, metux IT service -- http://www.metux.de/
phone: +49 36207 519931 email: weigelt@metux.de
mobile: +49 151 27565287 icq: 210169427 skype: nekrad666
----------------------------------------------------------------------
Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: import determinism
2010-11-09 13:43 ` Enrico Weigelt
@ 2010-11-10 4:40 ` Martin Langhoff
2010-11-10 16:18 ` Enrico Weigelt
2010-11-11 4:28 ` Michael Haggerty
1 sibling, 1 reply; 13+ messages in thread
From: Martin Langhoff @ 2010-11-10 4:40 UTC (permalink / raw)
To: weigelt@metux.de, git@vger.kernel.org
Every full clone is a good backup. If nobody is doing a full clone...
the do back them up. That's my recommendation.
M
On Tuesday, November 9, 2010, Enrico Weigelt <weigelt@metux.de> wrote:
> * Andreas Schwab <schwab@linux-m68k.org> wrote:
>
>> > For cvsimport, it is not deterministic. Given cvs'
>> > ambiguous/buggy/inconsistent internal semantics around some
>> > operations, cvsps makes educated guesses about what happened.
>> >
>> > Later commits can affect those educated guesses.
>>
>> The OP was assuming an unchanging repository.
>
> My assumption is:
>
> * the original cvs repo will have later additions
> (so I'm incrementally importing)
> * no commints (besides cvsimport) in the git mirror, but others
> for off from there
> * the mirror could get lost in an desaster (no separate backup)
> and should be recreated afresh in that case.
>
>
> The point behind this is: I'm running a growing number of cvs2git
> mirrors and dont want to do full backups of them.
>
>
> cu
> --
> ----------------------------------------------------------------------
> Enrico Weigelt, metux IT service -- http://www.metux.de/
>
> phone: +49 36207 519931 email: weigelt@metux.de
> mobile: +49 151 27565287 icq: 210169427 skype: nekrad666
> ----------------------------------------------------------------------
> Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
> ----------------------------------------------------------------------
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
martin.langhoff@gmail.com
martin@laptop.org -- School Server Architect
- ask interesting questions
- don't get distracted with shiny stuff - working code first
- http://wiki.laptop.org/go/User:Martinlanghoff
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: import determinism
2010-11-10 4:40 ` Martin Langhoff
@ 2010-11-10 16:18 ` Enrico Weigelt
2010-11-10 21:25 ` Martin Langhoff
0 siblings, 1 reply; 13+ messages in thread
From: Enrico Weigelt @ 2010-11-10 16:18 UTC (permalink / raw)
To: git
* Martin Langhoff <martin.langhoff@gmail.com> wrote:
> Every full clone is a good backup. If nobody is doing a full clone...
> the do back them up. That's my recommendation.
Does the clone have all information necessary for the incremental
cvsimport runs ?
cu
--
----------------------------------------------------------------------
Enrico Weigelt, metux IT service -- http://www.metux.de/
phone: +49 36207 519931 email: weigelt@metux.de
mobile: +49 151 27565287 icq: 210169427 skype: nekrad666
----------------------------------------------------------------------
Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: import determinism
2010-11-10 16:18 ` Enrico Weigelt
@ 2010-11-10 21:25 ` Martin Langhoff
2010-11-10 22:04 ` Enrico Weigelt
0 siblings, 1 reply; 13+ messages in thread
From: Martin Langhoff @ 2010-11-10 21:25 UTC (permalink / raw)
To: weigelt, git
On Wed, Nov 10, 2010 at 11:18 AM, Enrico Weigelt <weigelt@metux.de> wrote:
> * Martin Langhoff <martin.langhoff@gmail.com> wrote:
>> Every full clone is a good backup. If nobody is doing a full clone...
>> the do back them up. That's my recommendation.
>
> Does the clone have all information necessary for the incremental
> cvsimport runs ?
Yes -- all cvsimport tracks is the branches. No extra import-tracking
data is maintained. (Other importers may or may not keep additional
data, YMMV, etc)
m
--
martin.langhoff@gmail.com
martin@laptop.org -- School Server Architect
- ask interesting questions
- don't get distracted with shiny stuff - working code first
- http://wiki.laptop.org/go/User:Martinlanghoff
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: import determinism
2010-11-10 21:25 ` Martin Langhoff
@ 2010-11-10 22:04 ` Enrico Weigelt
0 siblings, 0 replies; 13+ messages in thread
From: Enrico Weigelt @ 2010-11-10 22:04 UTC (permalink / raw)
To: git
* Martin Langhoff <martin.langhoff@gmail.com> wrote:
> > Does the clone have all information necessary for the incremental
> > cvsimport runs ?
>
> Yes -- all cvsimport tracks is the branches. No extra import-tracking
> data is maintained. (Other importers may or may not keep additional
> data, YMMV, etc)
That's good news. Now I'll have to sit down and code some little
cluster management tool (mirrors are getting more and more), or
does anyone already did that ? ...
cu
--
----------------------------------------------------------------------
Enrico Weigelt, metux IT service -- http://www.metux.de/
phone: +49 36207 519931 email: weigelt@metux.de
mobile: +49 151 27565287 icq: 210169427 skype: nekrad666
----------------------------------------------------------------------
Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: import determinism
2010-11-09 13:43 ` Enrico Weigelt
2010-11-10 4:40 ` Martin Langhoff
@ 2010-11-11 4:28 ` Michael Haggerty
2010-11-11 13:09 ` Enrico Weigelt
1 sibling, 1 reply; 13+ messages in thread
From: Michael Haggerty @ 2010-11-11 4:28 UTC (permalink / raw)
To: git
On 11/07/2010 09:25 PM, Enrico Weigelt wrote:
> I'm curious on how deterministic the imports (git-cvsimport and
> git-svn) are. Suppose I close the same cvs repo twice (assuming
> no write access in between), are the resulting object SHA-1's
> the same ?
On 11/09/2010 02:43 PM, Enrico Weigelt wrote:
> The point behind this is: I'm running a growing number of cvs2git
> mirrors and dont want to do full backups of them.
If you are using cvs2git, why are you asking about git-cvsimport and
git-svn?
No tool that imports from CVS or Subversion can make a blanket guarantee
about consistency across conversions because both CVS and SVN allow
retroactive changes to the project history. For example:
* Both CVS and SVN allow commit messages and other metadata of old
commits to be changed
* CVS allows files to be added retroactively to tags and branches with
no timestamp indicating that the file was not part of the original tag.
* CVS allows old revisions to be "obsoleted" (i.e., expunged from history).
* In CVS it is common practice for people to muck about directly in the
repository, for example renaming *,v files.
So (in the general case) there is no way to guarantee that two
independent conversions will have consistent results for the overlapping
parts of their history. And even incremental conversions will
necessarily have to decide between converting the current state of the
repository accurately and converting in a way that is consistent with
earlier conversions.
In practice, especially if you are willing to constrain what the CVS
users are allowed to do, the overlapping parts of two conversions should
usually be identical or at least very similar (with older history more
likely to be identical). Perhaps an rsync-style backup would be smart
enough to copy only the changed part of the history without excluding
the possibility that there are retroactive changes between subsequent
conversions.
If you run two cvs2git conversions on *exactly* the same CVS repository,
then the results *should* be identical. I have tried always to process
data in a defined order rather than, say, in filesystem or
hashmap-determined order. But AFAIK this property has not been tested
and could easily be buggy if I overlooked some source of indeterminism
somewhere in the cvs2git code.
Michael
--
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: import determinism
2010-11-11 4:28 ` Michael Haggerty
@ 2010-11-11 13:09 ` Enrico Weigelt
0 siblings, 0 replies; 13+ messages in thread
From: Enrico Weigelt @ 2010-11-11 13:09 UTC (permalink / raw)
To: git
* Michael Haggerty <mhagger@alum.mit.edu> wrote:
> On 11/07/2010 09:25 PM, Enrico Weigelt wrote:
> > I'm curious on how deterministic the imports (git-cvsimport and
> > git-svn) are. Suppose I close the same cvs repo twice (assuming
> > no write access in between), are the resulting object SHA-1's
> > the same ?
>
> On 11/09/2010 02:43 PM, Enrico Weigelt wrote:
> > The point behind this is: I'm running a growing number of cvs2git
> > mirrors and dont want to do full backups of them.
>
> If you are using cvs2git, why are you asking about git-cvsimport and
> git-svn?
No, I'm not using the "cvs2git" tool (wasn't aware of that at all) ;-)
I'm using git-cvsimport.
cu
--
----------------------------------------------------------------------
Enrico Weigelt, metux IT service -- http://www.metux.de/
phone: +49 36207 519931 email: weigelt@metux.de
mobile: +49 151 27565287 icq: 210169427 skype: nekrad666
----------------------------------------------------------------------
Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-11-11 13:22 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-07 20:25 import determinism Enrico Weigelt
2010-11-07 20:46 ` Ævar Arnfjörð Bjarmason
2010-11-07 21:01 ` Andreas Schwab
2010-11-07 21:56 ` Enrico Weigelt
2010-11-07 22:20 ` Martin Langhoff
2010-11-07 22:45 ` Andreas Schwab
2010-11-09 13:43 ` Enrico Weigelt
2010-11-10 4:40 ` Martin Langhoff
2010-11-10 16:18 ` Enrico Weigelt
2010-11-10 21:25 ` Martin Langhoff
2010-11-10 22:04 ` Enrico Weigelt
2010-11-11 4:28 ` Michael Haggerty
2010-11-11 13:09 ` Enrico Weigelt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).